Way back in 2008 at the AAAI Workshop on Enhanced Email, Mark Dredze mooted that emails from the Clinton era would at some stage be released to the public. Happily, just days ago, the William J. Clinton Presidential Library and Museum began releasing email and other records from the US Clinton Administration.
The first release has focused on messages and documents authored by or sent to Elena Kagan. Elena Kagan served in two positions during the Clinton Administration. She was Associate White House Counsel from 1995 to 1996 and Deputy Assistant to the President for Domestic Policy and Deputy Director of the Domestic Policy Council from 1997 to 1999. To date, the released records include email created and received by Elena Kagan, along with 114 messages deemed to be part of the Federal side of the Clinton White House. These messages also include forwards, reply chains, and attachments. The attached documents include notes, memorandum, articles, reports, executive orders, bills, and directives.
Released emails are arranged into categories called “buckets”; within buckets, messages are arranged by creation date. Emails were stored in these “buckets” by the Automated Records Management System (ARMS) that was used during the Clinton Administration to capture email from Lotus Notes. The ARMS databases hold proprietary software based attachments that were converted to hexadecimal code (hex-code or hex-dump). When this hexadecimal code is included in an email message, archivists have converted the hexadecimal code back to readable text. Converted attachments have been arranged behind their corresponding created or received email.
The emails released so far span wide range of topics, including Amtrak, campaign finance reform, gaming/gambling (especially as it relates to Native Americans), timber, regulatory reform, welfare and domestic policy topics such as AIDS, budget appropriations, education, health, labor, race, and tobacco.
Sadly, like many email releases, the messages are rendered as PDF files, rather than in their native digital form. The files I have examined, however, have been OCR’d, and so the message text is at least searchable, and presumably extractable. One obvious question is why the emails should have been OCR’d, when ARMS presumably stored things as electronic text to begin with? As Tom Lee notes, it appears that these are re-digitised versions of data dumped out of ARMS.
To download or take a look at the data yourself, head on over to the Clinton Presidential Library. Alternatively the Sunlight Foundation has put up a familiar inbox-style view of the data for more convenient browsing.