Filed under: email,information delivery,language technology,research,technology
Posted by: Andrew Lampert
Brij Singh at MessageDance has posted an interesting motivation for applying sentiment analysis to incoming email. He asks whether the sentiment evoked by incoming email results in cognitive turnover for knowledge workers, thus disrupting their productivity.
Brij thinks that the application of sentiment analysis to email could help address this mental wandering for knowledge-based employees:
I think it’s high time for companies to invest in sentiment classification and routing toxic emails to platform where immediate impact on employee productivity is less. Can carefully controlled social platform enable this process?
Having just yesterday attended a research presentation by Mary Gardiner on sentiment classification, it’s interesting to consider the possibilities and practicalities of applying the sentiment classification techniques to email.
One unsupervised technique, pioneered by Turney and Littman, is to use pointwise mutual information (PMI) and word co-occurrence counts from a search engine to help determine the valence of each word in a text. Turney and Littman used the NEAR operator in Altavista to determine the co-occurrence of each word in their text to be classified (in our case, this would be each word from an incoming email message) with each word from a set of words with known positive or negative valence. The counts for co-occurrence with the known-positive words contribute to the positive sentiment of our unclassified word, while counts for co-occurrence with negative words contribute to the negative sentiment. These co-occurrence counts are then normalised and combined to determine the overall valence of each word from our unclassified text. The technique, though simple, worked surprisingly well (80% classification accuracy at the word level), much better than many more complex techniques.
Ignoring the sad reality that the NEAR operator is no longer available to use in Altavista queries (and that no other search engines offer an operator of similar functionality in their public query interface), it’s interesting to think about whether such a technique could be usefully applied to email. I don’t know if people have addressed how to move from word-level classification up to message-level sentiment classification, but it doesn’t seem to be an insurmountable problem.
More of an issue for email is whether people would be happy for the entire text of their email messages to be sent in clear text to a single search provider. Depending on the volume and nature of data on a user’s own machine, perhaps we could use the desktop search interface to approximate Turney and Littman’s technique, without passing sensitive email data out onto the network? Of course, there’s a big difference in the scale of corpus being used to generate the co-occurrence counts in this case – Altavista at the time of the experiment, claimed to be indexing around 100 billion words. My desktop search index claims to contain about 1.5 million items (email messages, documents, visited web pages etc.) . While that’s not going to get us to 100 billion words, it might be enough to get some credible results?
4 Comments so far
Leave a comment
Turney and Littman also ran their experiments on 10 million words and get about 61% accuracy at the word level using PMI and 65% using LSA. But LSA is fairly computationally expensive.
However, that’s word level classification and you want to classify documents. Classification of entire documents has better results (more features available), so your desktop might provide enough data (although supervised training, as always, would be better).
Comment by Mary 01.23.08 @ 7:39 amWell, have a look at http://www.jane16.com, it’s the online sentiment analysis engine( downloadable), and you just evoke in me ia idea to use it for email sentiment analysis as well.
The area is a bit more complex cause there are few types of sentiment. (product, financial, emotional, political….plus I am sure many more.)
If only gmail (I use) has a some sort of plugin interface would be great to try it out.
Comment by 123kuko321 03.10.09 @ 8:02 amThanks for the pointer to Jane16 – looks really interesting, and excellent to see some open source tools available in the sentiment analysis space. It’s definitely on my list of tools to download and explore.
I agree there are lots of different domains for sentiment analysis, as you mention, and also different scales of conceiving positivity, negativity and neutrality. I wonder how well Jane16 might work on, say, business email (rather than email in general)? Or helpdesk email? I’m just wondering whether you can restrict the domain somewhat to get more consistent results. I’d be really interested to hear about any results if you do try running your tools over email data.
Thanks for dropping by. I look forward to hearing more about your work.
Cheers,
Andrew
It looks like Jane16 isn’t available for download any more. I’d be very interested in any other open source sentiment analysis tools.
–Trindaz on Fedang #sentiment-analysis
Comment by Dave 10.03.10 @ 12:16 amLeave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
