Distributed work and data security: can the crowd keep a secret?

May 23rd, 2011 by

microtask_crowd_secret_confidentialIn these post-WikiLeaks days, many people (and governments) might argue the only way to keep data confidential is to keep it offline. After all, the web was designed to link and share information. Online, as Sun Microsystems founder Scott McNealy once tactfully remarked, “You have zero privacy. Get over it.”

The trouble with the “McNealy philosophy” is that the web is now the place where millions of people go to work, as well as play. Even the most innocent, open, non-evil companies generally have some data – such as personal information, research and development strategy, secret Santa lists – they need to keep secure.

Distributed work platforms face a unique and particularly knotty data security dilemma. If your business model relies on distributing client data among a vast (often anonymous) crowd of strangers, how do you make sure that any confidential information stays on the QT?

Crowd control
Six years after the launch of Mechanical Turk, crowd labor platforms have become pretty good at extracting high-quality, consistent results from workers. But in some ways, confidentiality is even more crucial than accuracy. A bad data set and you’ll probably have to rerun some tasks. Bad data security and you might end up several million dollars worse off (as Google Buzz recently learnt the hard way).

Many of the basic “bread and butter” tasks of distributed work involve potentially sensitive data. Think of phone transcription, handwriting recognition, email address searches and SMS translation. There’s also the growing area of crowdsourced market research and product testing. What if someone in the crowd reveals your killer new mobile app (Moderately Irritated Frogs) to the folks at Rovio?

C.S confidential
It’s up to individual distributed work platforms to figure out how to deal with data security. One option is simply to do nothing. This sophisticated strategy is employed by the granddaddy of crowd labor, Mechanical Turk. To quote from their participation agreement:

“submission of any…materials is at your own risk, none of Amazon Mechanical Turk, its Affiliates, Requesters or Providers has any obligations (including without limitation obligations of confidentiality) with respect to such materials.”
In other words, it ain’t our fault if you’re dumb enough to spill your secrets here. Unsurprisingly, you don’t see much that would interest Julian Assange in Mechanical Turk requests.

Most service providers are more hands on. Editing and translation platform Serv.io, guarantees that: “All Servio workers sign an agreement that prohibits them from using information for any purpose other than carrying out their Servio tasks.” Even more hardcore, crowdsourced transcription company CastingWords operate a strict one-strike-and-you’re-out policy: “workers understand that the work is confidential, and that they will never work for us again if they release it.”

Here at Microtask we prefer to opt for prevention rather than cure. Our strategy is to break material down into tasks so small that confidentiality ceases to be an issue. Individual workers never get hold of enough puzzle pieces to see the big picture.

The tasks are out there
As the crowd labor sector develops, projects are becoming bigger, more customized and more complex. Some clients have data requiring different levels of confidentiality. Leaving micro tasks aside, the most economic solution for such clients will be platforms that can categorize data according to sensitivity and treat it accordingly. For example, tasks rated highly classified (X-tasks perhaps) could be restricted to pre-selected, traceable workers who have signed legally binding non-disclosure agreements. Really ultra-confidential tasks (like transcribing the CIA’s Roswell archives or KFC’s secret recipe) could even be passed back to employees within the client company (who are later killed, I suppose).

Data security will be a key issue for the crowdsourcing industry. It’s a simple equation: the more companies trust us to keep their information confidential, the more work we’ll get (and vice versa). If anyone has any thoughts or juicy crowdsourcing stories to share, as always we’d love to hear from you. Please check your confidentiality terms first.