Is that a hard drive in your pocket?May 12th, 2010 by Ville Miettinen
There are many subjects you should avoid on a first date. Along with â€śdiseases I have hadâ€ť and financial problems that mean you won’t be able to pay for dinner, itâ€™s better not to brag about how big your hard drive is. Yet, for those of us who were young and interested in computers in the 1980s â€“ when a 20MB hard disk was more desirable than Demi Moore â€“ this can be tricky: there is something awe inspiring about new data storage technology.
But even amongst those who claim they are not aroused by the smooth, sleek finish of a pocket sized 1.5TB hard disk, it seems many are excited by the opportunities presented by the data it stores. It doesnâ€™t matter who you are, big data is a big deal. With enough raw data, todayâ€™s algorithms can identify business trends, prevent diseases and fight crime. As The Economist recently said in a special report data is â€śbecoming the new raw material of business: an economic input almost on a par with capital and laborâ€ť.
According to International Data Corp 1,200 exabytes (thatâ€™s 1.2 billion terabytes) of information will be created this year. This data increases tenfold every five years. Driving this explosion is the growth of the middle classes around the world, and the proliferation of technology, such as mobile phones and digital cameras: there are now 4.6 billion mobile phone subscriptions and 1-2 billion people using the net around the world.
Along with finding somewhere safe to store all this information, the key challenge is making sense of it all. Not only is there an enormous quantity, only 5% of it is structured in a standard format of words and numbers that a computer can easily recognize. The rest are things like photos and phone calls.
Even amongst the data that is structured, differing format and meta-data standards make it hard for the data to flow from one place to another. As Dataspore noted, this means for example the SEC is unable to compare its financial data with its European equivalent, because they lack common formats and labels (such as XBRL). In this case a common language would have helped regulators to spot irregularities and potentially avoid the disaster that Bernie Madoff created.
As you would imagine, ever faster computers and improved algorithms are crucial for managing this data. But there is still a huge amount of data that computers are unable to recognize. One way to filter and label this information is using crowdsourcing. A good example of this was after the Haiti earthquake, when volunteers logged on to Haiti.com to translate, verify and generally help make sense of the vast amount of information pouring in. Samasource currently pays hundreds of people from the worldâ€™s poorest countries to complete various tasks including those relating to filtering information, while the extraordinaries uses volunteers to help with such tasks as tagging photos for museums.
For some time companies have been aware of the opportunities presented by this tidal wave of data â€“ if it can be properly harnessed. Its power to change the world is immensely valuable. (Googleâ€™s access to vast amounts of data exhaust helps explain its eye popping $170Bn valuation.) Perhaps this is why Googleâ€™s chief economist Hal Varian has said that the sexy job in the next ten years will be statisticians, those with the ability to take dataâ€”to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it. If true, one day soon talking about the size of your hard drive wonâ€™t be such a first date disaster. Although still rather dull, at least it might indicate that you’ll be able to pay for dinner.