Section Image
SGI Home
Blog
Language Technology
Email Research
Enron Email Corpus
Java
About SGI


















:: Welcome to SGI ::


Getting information off the Internet is like taking a drink from a fire hydrant

Mitch Kapor, Founder of Lotus Development Corporation


Welcome to SGI, a site which contains an eclectic mix of information about topics related to Andrew Lampert's work and research in natural language processing, speech processing, language technology, data mining, information retrieval and delivery, programming languages, user modelling and a host of related fields.

Currently, this site contains:


Recent Language Technology Thoughtlets


iPhone iOS4 adds Event / Date detection in Email

Published: Thu, 01 Jul 2010 02:59:14 +0000

A quick note about a new feature in the email client on the iPhone in the latest iOS4 release. When you receive an email with a date or time mentioned in it, Apple’s email client automatically detects the date, and presents it as an underlined hyperlink. Clicking the date then creates an event in your [...]

Comment On This ...

First Clinton Administration Email Released

Published: Fri, 25 Jun 2010 13:55:39 +0000

Way back in 2008 at the AAAI Workshop on Enhanced Email, Mark Dredze mooted that emails from the Clinton era would at some stage be released to the public. Happily, just days ago, the William J. Clinton Presidential Library and Museum began releasing email and other records from the US Clinton Administration. The first release [...]

Comment On This ...

The Failed Wikileaks Auction of Venezuelan Diplomatic Email Messages

Published: Wed, 13 Jan 2010 05:32:36 +0000

I was recently contacted by Stefan Mey, who interviewed Julian Assange. Assange, an Australian, is the spokesperson of Wikileaks. The interview makes for interesting reading. In discussing how Wikileaks is financed, Mey elicits some interesting comments on the controversial auction of Venezuelan government email that I’ve previously covered on this blog. Back in September 2008, [...]

Comment On This ...

New Enron Email Corpus release with attachments

Published: Wed, 25 Nov 2009 11:13:51 +0000

Exciting news – there’s a new version of the Enron email corpus that’s now publicly available which includes both the email messages and attachments. Recently, an organisation called EDRM (Electronic Discovery Reference Model) has made a version of the Enron email corpus available for download that includes attachments, which were missing from the widely used [...]

Comment On This ...

Email Zoning: Finding Signal amongst the Textual Noise of Email Messages

Published: Mon, 10 Aug 2009 05:46:06 +0000

In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their functional parts. Today, the explosion of different email formats and styles, coupled with the ad hoc ways in which people vary the structure and layout of their messages, means that [...]

Comment On This ...