Kaggle: crowdsourcing genius or statistical anomaly?

November 14th, 2011 by

microtask_kaggleIn 2009 Hal Varian, Google’s chief economist, famously claimed that “the sexy job in the next ten years will be statistician”. It still sounds pretty improbable right?

I mean, when did you last see a data-analyst fighting off screaming groupies? Statistics isn’t even a cool branch of maths.

Real “pure” mathematicians wrestle with the fundamental mysteries of the universe. Statisticians wrestle with pie charts.

Sexiness aside, Hal’s point was (probably) that in our information-driven world, statisticians are a precious, in-demand resource. Without them, companies like Google would drown in a tsunami of data. Recently, Silicon Valley funders proved just how much they appreciate the humble number cruncher, investing over $11 million in Australian data analytics crowdsourcing start-up Kaggle.

Kaggle is a classic “brain-based” crowd competition site. Organizations post statistical problems and Kaggle’s crowd of “the world’s best data analysts” compete to solve them. As well as prize money, Kaggle boasts various gamified incentives such as a real-time leader board and “kudos point” rankings. Only founded in 2010, the company has had an impressive first year. Completed competitions include: working with NASA on dark matter (okay I admit that’s pretty cool), improving the World Chess Federation’s official rating system (still reasonably cool) and accurately predicting the outcome of the Eurovision Song contest (totally uncool, but very profitable).

Founder Anthony Goldbloom aims to grow Kaggle into a “buzzing hive” able to support “hundreds or thousands of data scientists relying on Kaggle for their full-time incomes”. It’s an ambitious step-up from other, older science competition sites (InnoCentive is a classic example) which tend to market crowdsourcing as a rewarding hobby rather than a career choice. Professionalizing will be a major challenge. Can Goldbloom really guarantee enough competitions to support thousands of workers? At the time of writing, Kaggle has over 17,000 data analysts and only 12 active competitions. Okay, there are a couple of big prizes up for grabs, but I certainly wouldn’t want to depend on Kaggle for a regular income.

Investors clearly see potential in Kaggle. Will the company “buck the trend” and manage to convert all the money and media-hype into statistically significant growth? Perhaps we should get the Kaggle crowd to calculate the probabilities.