Twitter Predictions: the future is just 140 characters away

August 1st, 2011 by

microtask_twittwer_predictionsConsider Twitter: 200 million users, 7 languages, 1 billion tweets a week. Since it was founded in 2006, the site has been hyped, ridiculed, subpoenaed (by the U.S government no less) and accused of starting revolutions. From Lady Gaga to the Pope, everybody tweets.

For a Web 2.0 giant, Twitter’s functionality is surprisingly basic. But (like Ikea furniture and Forest Gump) Twitter’s simplicity is its strength. Everyday tweeters create a vast, searchable dataset of thoughts and opinions – a 140-character global mood meter. Far from being mere “cyber-babble” our collective tweets contain valuable information: what movies are popular, who we plan to vote for, how a whole nation feels. Believe it or not there’s growing evidence that if you ask the right questions, Twitter can even predict the future.

Movies, moods and markets
A 2010 study by HP Labs demonstrated that the “tweet-rate” of pre-released movies can accurately predict future ticket sales. Basically, the more mentions a movie gets on Twitter (positive or negative) the bigger the box-office success. Similar techniques have been used to successfully predict election outcomes in the UK and US, and (here’s the really crucial example) X-Factor and American Idol winners.

In a more sophisticated Twitter experiment, Dr Johan Bollen of Indiana University used mood-profiling software to analyze the actual content (rather than just the volume) of millions of tweets. He found that the “Twitter mood” of America closely corresponded to national events. On Thanksgiving tweeters were unusually happy, just before the presidential election, unusually anxious. Bollen also found a strange (and potentially very lucrative) link between certain Twitter moods and US stock market prices: after a calm Twitter day, share prices rose; after an anxious day, they fell. Dr Bollan is reported to have licensed his Twitter-prediction method to a London hedge fund (and so will presumably be retiring to a Caribbean island very soon).

Meaning overload
So far, Twitter experimenters have relied on relatively simple language-processing software to extract data from tweets. To get deeper insights, you need to do deeper analysis. The trouble is that human language is notoriously difficult for machines to interpret. Give humans 140 characters and we insist on making jokes, using sarcasm, and loading statements with double meanings. Of course, my professional instinct is to suggest crowdsourcing as a solution, but even I have to admit that using micro-workers to analyze millions of tweets per day is (just a little) impractical. But the crowd could still play a role: microworkers could provide deeper analysis of key groups of tweets or double-check machine accuracy by re-analyzing random tweet samples. Plus crowd-generated feedback could be used to train and improve language-processing software.

With a little imagination, the possibilities of “Twitter mining” are endless. From advertisers tracing tweet product recommendations, to politicians pre-testing reactions (positive, negative, incurably cynical), to government policies. One prediction I’m prepared to make is that the future will hold a lot more research into the predictive powers of social networking. In the meantime, I guess we all just keep on tweeting.


  • http://twitter.com/david_tinker David Tinker

    At BrandsEye we use crowd sourcing to rate sentiment etc. for brand mentions in tweets and other content. We use the human verified data to train our machine learning algorithms as you suggest. It is working out very well so far.


<<

>>