Open Source Email is Big Business
It’s been a big day for email today. Apparently Yahoo has agreed to purchase Zimbra, the open source messaging and collaboration company, for (US)$350 million. Opinion has been varied on whether it’s a smart move by Yahoo or a gross over-valuation of Zimbra.
Zimbra is a company that I’ve been monitoring since their public launch back in 2005. At the time they noted (as many have) that email is broken.
“From overflowing inboxes to the nuisance of organizing correspondence, to the cost of managing storage, viruses, availability, retention and legal discovery and compliance, dealing with corporate e-mail has become a nightmare.â€
Zimbra have worked to address these problems with their Collaboration Suite software (ZCS), which is notable for the fact that much of the source code is released under an open-source licence.
Regardless of your take on Yahoo’s purchase, what’s interesting is that this deal clearly shows the value that the market places on an alternative to Exchange/Outlook, and Lotus Notes, even an open-source one. I read this as recognition that there’s plenty of room left for innovation in the email and collaboration space.
MediaDefender Email Corpus: 6600 email messages released
The internet is buzzing with conversations about the huge email leak from MediaDefender, a company which makes its living selling services and software to prevent illegal content sharing in peer-to-peer networks. I was made aware of this hugely exciting opportunity thanks to the excellent Death By Email blog which provides a good summary of the unfolding drama.
Given its business, MediaDefender is of course not a popular company within the file-sharing community. It thus shouldn’t be surprising that people have been very eager to jump on the more than 6600 company email messages from MediaDefender employees and begin dissecting their content. The emails appear to date from the period between April 2007 and September 2007.
According to Ars-Technica, the e-mail was leaked to the public by a group that calls itself MediaDefender-Defenders. In a text file distributed with the email data, the group claims that MediaDefender employee Jay Mairs forwarded all of his company emails to a Gmail account, from where the email data was leaked. “A special thanks to Jay Maris, for circumventing there entire email-security by forwarding all your emails to your gmail account, and using the really highly secure password: blahbob”.
The group’s motivation for releasing the email is also made clear: “By releasing these emails we hope to secure the privacy and personal integrity of all peer-to-peer users. The emails contains information about the various tactics and technical solutions for tracking p2p users, and disrupt p2p services. So here it is; we hope this is enough to create a viable defense to the tactics used by these companies …”
As someone whose first use of bit-torrent was to download this email corpus, my interest in the data is purely academic – is this another corpus we could use for email research? Conveniently, the MediaDefender email data is released in mbox format, which is a welcome change from the image-based PDF files (created by scanning printed email messages!) that have been released in recent US court cases. Being in mbox format, the data has all the header information, making the data perfect for research purposes.
The (insurmountable?) problem with using this data for research is the of course the fact that the email was not legally obtained. So, is there any way we could get ethics approval for publishing experiments using this data? It seems very doubtful to me, but I’d be curious to hear your thoughts.