Posted August 24, 2009 10:01 am by with 16 comments

Tweet about this on TwitterShare on LinkedInShare on Google+Share on FacebookBuffer this page

The New York Times takes a look at the developments in automated sentiment analysis. I was intrigued by one company’s analogy:

“This is a canary in a coal mine for us,” said John Whelan, StubHub’s director of customer service.

Which is about where I stand on automated sentiment analysis: it’s a blunt tool at best that’s still many years away from fulfilling its potential.

I’m not entirely against sentiment analysis–70% accuracy is better than 0%–but I continue to be concerned that businesses are lulled into a false sense of security by it. After all, would you walk into a coal mine with a bird that has a 30% chance of getting it wrong about dangerous gas levels? I know I wouldn’t.

The problem is that most sentiment analysis algorithms rely on us using simple terms to express our sentiment about a product or service. If it were as easy as identifying “I love BestBuy” or “I hate the iPhone” then we could all build a database of keywords and sentiment analysis would be 100% accurate. Unfortunately, the English language–or any language for that matter–isn’t that simple.

“Sentiments are very different from conventional facts,” said Seth Grimes, the founder of the suburban Maryland consulting firm Alta Plana, who points to the many cultural factors and linguistic nuances that make it difficult to turn a string of written text into a simple pro or con sentiment. “ ‘Sinful’ is a good thing when applied to chocolate cake,” he said.

I recently spoke to a very large technology firm that had tried just about every social media measurement tool available and they expressed dissatisfaction at the accuracy of such automated guesswork.

Still, it’s not all bad news. Once coal mining companies realized the importance of canaries, they started developing better gas detection technologies. And the same will happen with sentiment analysis–it will get better! It has too! There are now hundreds of millions of us willing to share our sentiment online and there are thousands of companies starting to listen-in to these conversations. Those two factors will lead to better accuracy in automated sentiment analysis.

Until that day comes, I stand by my assertion that the most accurate sentiment analysis continues to be of the human variety. Only you can determine–with 100% accuracy–if a blog post about your company is to the benefit or detriment of your reputation. Until technology improves, we’ll have to continue acting as our own canaries. 😉

  • Simon Heseltine

    …and it’s not just the language and semantics that’s an issue. What one company may deem to be a neutral comment another may decide is positive. i.e. one of their staff gets quoted in a mundane industry article, for a large company that’s most likely no big deal, for a small company that could be viewed as a positive as their employee has been identified as an industry expert.

    Therefore I absolutely agree with you Andy, no matter how much automation you build in, you’re going to need a human eye to make a final determination in most cases

  • Hi Andy,
    It is true that automated sentiment analysis is not 100% accurate – and it is obvious that neither is human analysis – so I think there is a smart and useful way to integrate some form of automation into the process to help with both efficiency and accuracy.

    I agree that things are getting better – and cheaper – and faster – so time will tell where this all leads us because you are correct, there’s a whole lot of sentiment out there these days that companies need to know about.

  • I think of those tools like a metal detector, sure it beeps when it finds something, but it’s still up to you to expend the energy to dig in up to your elbows and find the nugget.
    .-= Phil Buckley´s last blog ..Does Your Business Need A Social Media Czar? =-.

  • The key is to not create and offer tools for automated sentiment analysis, but rather to create and offer tools that aid human analysis. And to speak honestly about both…..very few vendors are doing either.

  • Joe, GREAT point buddy. If we need tools to help humans work more quickly, do you think a day will come when using mechanical turk allows companies to scale the review of commentary quickly, with those workers just bucketing content to be reviewed by the social media strategists in the US. Heck, if Jott can do it with transcription, maybe these tools can outsource the bucketing to india at a cost effective clip as well?
    .-= Wil Reynolds´s last blog ..The Dos & Don’ts of Google Base: Avoid the Headaches & Keep Making Money =-.

  • Hi Andy,

    Thanks for the post. I agree that sentiment can be semantically tricky and something is better than nothing. What is even more important in this age is that having all of these new data research capabilities forces companies to be user/customer centric. As you know, reputation may be one of the most valuable assets a company can have these days especially if you’re in the business of serving customers, which is everyone last time I checked.

    Beyond reputation, I think its important for companies to understand how to measure their effort by using social media monitoring tools. For instance, how is revenue, reach and return on operations being measured in the various tools on the market? Are they at all? I’d love to see one.

    Phil, I was thinking the same thing this morning re: metal detectors. It still takes a human to dig in the dirt and it should always since we are using these tools to connect with and influence public/people perception as Joe alludes to. It’s supposed to be authentic, right? =)
    .-= Jason Cronkhite´s last blog ..How Do You Measure Social Media? =-.

  • Thanks everyone for your thoughts. Very interesting, please keep them coming! 🙂

  • Sentiment analysis certainly is a dead-end. Even with the most advanced Natural Language Processing (NLP) technology, such as the PARC one that was just sold to Microsoft (for lots of money) would have hard time dealing with the unstructured content we see (especially in Twitter) online.

    And why bother, it’s just one step in much longer online media metric stairs:

    Volume (how many) > Meaning (what is it) > Intent (why is it) > Results (why should I care) > P&L (how much)

    A good guideline for tech developers working with computing is to automate things that machine can do better than man, and leave everything else for man. When the time is ready for sentiment analysis by machines in a sensible way, we’ll be way ahead up the stairs and it wont even matter anymore.

    Have fun!
    .-= Mikko Kotila´s last blog ..mikkokotila: @r2r0 “I’m just working”-> “I’m learning the strategy” -> “I understand the mission” -> “It’s my mission” = self managing teams =-.

  • Hi Andy

    I agree to some extent with what you are saying in your post.
    You argue that “would you walk into a coal mine with a bird that has a 30% chance of getting it wrong about dangerous gas levels? I know I wouldn’t.” Knowing that the humans interpret information about 80% accurate the gap between humans and the “computer” is not that big.

    For a machine to understand and interpret text (Twitter, blog, article, comment about a product) it needs to have deep semantic knowledge (like a human) which is why Saplo ( base their technology on leading (academically accepted) research on how the human brain process, interpret and recall information. In other words, Saplo imitates how the human brain interprets information.

    This is not to say that we can leave it all to the computer. Joe Hall (above) said it nicely “The key is to not create and offer tools for automated sentiment analysis, but rather to create and offer tools that aid human analysis.”

    Peter Larsson

    PS. Saplo have documented results, in comparison with human analysts (80% accuracy), that shows 80-90% accaracy.

  • @Peter – I agree that human analysis can be on par with computers, but not when the human is determining the sentiment for their OWN brand. I can tell, with 100% accuracy, if a review of Marketing Pilgrim is positive or negative–because only I know if it helps or hurts me.

  • Hi Andy:

    First let me congratulate the NYT for noticing that we are doing this. When we started up MotiveQuest about 6 years ago – BuzzMetrics and Cymphony were already up and running.

    Second, @Joe has it right above. Use computers for what they are good at (pattern recognition) and use humans to define the patterns the computers look for.

    Our software tools are designed so that the strategist adjusts the linguist model for every context and project. Sick (obviously) doesn’t mean the same thing in online gaming as it does in osteoporosis. This makes the tools quite complex to master – but much more effective at generating accurate, relevant and useful sentiment scores.

    @Mikko has a great point with this comment:

    Volume (how many) > Meaning (what is it) > Intent (why is it) > Results (why should I care) > P&L (how much)

    Our work is predicated on getting to the why and then measurement of real world results. Otherwise you are just counting number of mentions and sentiment which in and of themselves are meaningless.

    Tom O’Brien
    MotiveQuest LLC
    .-= Tom O’Brien´s last blog ..Advocates are more important than influencers =-.

  • My feeling about many of the sentiment analysis tools out there is that the sentiment categories are too broad swath in scope. For example, positive/negative/neutral categories are good for an overview, but what about more specific terms like enthusiastic or vitriolic which would be more telling? My own company, Adaptive Semantics confronted this particular issue in developing our first product; a community moderation system for online publishers. Initially we were focused on weeding out abusive content and making publish/review/delete recommendations. Not satisfied with the results from that version, we opted for a much more specific set of terms (critical of publication, discriminatory, violent threats, congenial, informative, etc). Now our clients get a much more nuanced sense of their user generated content, the opinions their communities have about particular subjects, and their biases in regards to them. Having been on this sentiment analysis ‘journey’ I have to say that training an algorithm to recognize specific terms is far easier and more rewarding in terms of results than it is to train on broad categories.

    @Simon In regards to customization: this is also something I’ve noticed is lacking in most sentiment analysis tools. Dealing with our own clients we find differences in their community and editorial standards ( the same goes for PR standards for corporations). You have to be able to customize the tool to these standards in order to achieve the best results.

    As for accuracy, we’re at 90% and climbing…testament to what we call principled training.

  • Andy,

    Interesting post. I think Tom from MotiveQuest and Christine from Lexalytics did a nice job at articulating things. I would agree that at this stage in evolution of the space automated sentiment is certainly not 100% accurate but it certainly aids in the dissection data for action by brands that have a lot of data to sift through. Human analysis is also very important because their is an interpretive layer of insight that varies depending on the desired intentions of what brands maybe trying to get consumers to evoke. With our technology we have seen accuracy in 65 to 85+% range but also variations and unique differences by vertical, affinity, B2C and B2B. Sentiment is a very important piece of the puzzle for folks interacting in the social space and I am certain that will continue to see technology improvements but human reasoning/interpretation would be disappearing anytime soon either.


  • Hi Andy,

    Thanks for writing this post, as we have to agree with you on certain points and were very interested in the NYT piece. While automated analysis has its benefits, it is not as much an option yet for an international monitoring system or anyone who wants to monitor their brand outside of English-speaking countries.

    While there may be certain software programs that have developed an impressive sentiment analysis accuracy rate, sourcing in countries like China and Russia, such as we do, cannot be left to an automated program.

    Humans are the most accurate for working out the intricacies of a language and determining its sentiment as related to a product, brand, or company. While we are developing certain automated functions, we ultimately use native speakers of the countries in which we monitor.

    Phil made a great point about putting in the elbow grease to find out where the pearls are hiding. We would like to make everything automated, wouldn’t we? Would we?
    Thanks again, Andy, it was great to see all of the responses.


  • Pingback: Big brands, small ideas | b r a n t s()

  • Pingback: Sentiment140 – can it be trusted? | JOUR2722()