Business users receive 10 times more email than personal users?
Friday March 30th 2007, 9:05 pm
Filed under: email, language technology, research, technology
Posted by: Andrew Lampert

This afternoon I came across an interesting quote about Yahoo! Mail a Sydney Morning Herald article. Here’s the relevant extract from the article:

According to Yahoo engineer [and Group Vice President of Engineering] David Nakayama, in 1997 its total storage capacity for mail accounts was just 200GB. Today, he said that amount of space was consumed by just 10 minutes worth of inbound email.

I thought it would be interesting to do some calculations about email usage based on these numbers, and compare these results with previous numbers I’ve posted about email statistics within CSIRO.

In order to turn David Nakayama’s 200Gb figure into something useful, we need to have some idea of the number of users that Yahoo! Mail has. If we believe numbers quoted by TechCrunch, based on Comscore Media Metrix statistics, then this number is something like 250 million users.

Ok, next we need to get some sort of handle on the size of an average email message. My first thought was to look to the Enron Email Corpus. The raw version of the corpus that is available from CMU contains 517,431 email messages (including duplicates) and takes up about 1.34 Gbytes of disk space. So the average email size in the Enron Email Corpus turns out to be approximately 2.7Kbytes. Of course, we should remember that this number underestimates the true average size, since the messages in the Enron Corpus have had all email attachments removed. Despite this methodological flaw, calculations using the average email size in the Enron Corpus should give us an approximate upper bound on the volume of email being processed at Yahoo! Mail.

At 2.7Kb per message, every Terabyte of data transferred represents approximately 370 million email messages. At 1200 Gbytes per hour, Yahoo! Mail is processing roughly 28.8 Terabytes of email per day, which with the Enron numbers, equates to 10.66 billion email messages per day. Based on 250 million users, that’s roughly 42.6 email messages per user per day. That seems like a plausible figure. As we’ve already noted, however, the average email size of 2.7Kbytes from the Enron data is just one data point, and almost certainly under-estimates the true average email size.

It turns out that some of the spam processing companies have looked at this problem too. Just recently, SoftScan released email statistics suggesting that the average size of a spam email message was now 11.76 Kbytes. (This number is apparently increasing, due to the growing number of image spam messages). Presumably, a very large proportion of mail processed by Yahoo! Mail is actually spam, (especially if the numbers are anything like the spam statistics for CSIRO), so numbers based on average spam email size are probably quite a realistic approximation.

At 11.76Kb per message, every Terabyte of data transferred represents approximately 85 million email messages. In this case, our 28.8 Terabytes of daily Yahoo! Mail equates to roughly 2.45 billion email messages, or roughly 9.8 email messages per user per day. Now, of course not all Yahoo! Mail accounts are active, so the volume of email received by each active user is probably somewhat higher than this, but this gives us a reasonable lower-bound on email volume per user.

So, the average Yahoo! Mail user probably receives somewhere between 10 and 43 messages per day (including spam – hopefully a significant amount of which wouldn’t actually reach users’ inboxes).

Why is this interesting? Well, we see a pretty stark contrast when we compare these numbers to those from inside a company. In my previous post, I calculated that the average number of incoming emails per user per day in CSIRO is upwards of 400 (including spam). That’s at least an order of magnitude greater than the numbers at Yahoo!.

I’m very curious whether these numbers are representative of a more general trend in email volumes for personal email users (who presumably dominate the Yahoo! Mail figures) and business users. Does anyone else have any additional email usage figures they can share that might shed light on this?

Finally, it’s also interesting to quickly consider what these numbers mean in terms of network bandwidth required for running Yahoo! Mail. Some simple back-of-the-envelope calculations tell us that 200 Gbytes/10 minutes equates to roughly 2.6 Gigabits/second in network traffic. And that’s just for email traffic. It should be clear why there is such large scale infrastructure investment from companies like Google, Yahoo! and Microsoft – and that’s not even considering the requirements for search and other applications (crawling, processing queries, serving video data, replicating copies of the internet across data centres etc.).


No Comments so far
Leave a comment



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)