|
Below is a list of websites with information relevant to my Masters project.
For a more dynamic collection of websites that I find useful, try my del.icio.us links page. Note that this
page also contains links that are completely irrelevant to this work.
| Site | Affiliation | Description |
|
The Enron Email Corpus
|
MIT/Stanford/Carnegie Mellon
|
This website hosts the Enron Email Corpus. This is a collection of email data
from about 150 users, mostly senior management of Enron, organized into
folders. The data represents the only known substantial corpus of 'real' email
(as opposed to synthesized or specifically elicited copora) that is publicly
available.
|
|
| Professor William W Cohen's site
| Carnegie Mellon University |
Contains references to a number of annotated email corpora relevant to my
proposed topic. Has links to a number of pieces of software used for
statistical classification, machine learning and approximate matching.
|
|
| MinorThird |
SourceForge Project |
MinorThird is a an open-source collection of Java classes for storing,
categorizing and annotating text, and for learning to extract entities.
It offers a toolkit of learning methods which are tightly integrated
with other tools for annotating text, both manually and programmatically. It
also offers visualizing both training data and the performance of the various
classifiers.
|
|
| Weka |
University of Waikato, New Zealand |
Weka is a collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset or called from Java
code. Weka contains tools for data pre-processing, classification, regression,
clustering, association rules, and visualization.
|
|