VizWiz: what the crowd sees is what you getJune 9th, 2011 by Ville Miettinen
For most people the phrase “I just couldn’t live without my iPhone/ Android/ Blackberry” is just a figure of speech. We might love (or in the case of Apple fanboys, worship) our gadgets but the majority of us would probably still function without 24-hour access to touch-screens, wifi, and Plants vs. Zombies. For many disabled people however, access to technology can be literally life-changing.
Over the past couple of decades there’s been a quiet revolution in “disabled access technology”: un-sexy, practical applications that help people get online and get on with their lives. There are now dozens of purpose-built accessibility apps which incorporate assistive technologies like OCR and speech-recognition. These are impressive advances but human-computer expert Jeffrey Bigham believes that, with the help of the crowd, things are about to get much, much better.
As Professor Bigham points out, purely automated technologies often struggle to cope with the “infinite variability” of reality. Here at Microtask, we’re very familiar with this phenomenon. The whole Digitalkoot project (did I mention we now have almost 50,000 users?) is founded on the fact that OCR scanners can’t read old and/or handwritten documents. Similarly speech-recognition software, which converts sound to text, often fails to understand strong accents and multiple voices (ideal if you just happen to be a lonely elocution expert). Bigham’s solution is to backup the “fragile” technology with help from online workers (or as he calls them “always-available human-powered services technology”).
To test the theory Bigham and his team created VizWiz, an iPhone app which “enables blind people to recruit remote sighted workers to help them with visual problems in nearly real-time.” It works like this: users take a picture of whatever they need to identify, speak a question and upload it to the crowd. Questions can be anything: what flavor are these noodles? Do these socks match? Is that guy at the bar still sitting on his own? VizWiz workers examine the photo and send back an answer. Simple.
Researchers trialled VizWiz with a group of blind users. The results were generally positive with participants “uniformly excited” about the potential of the system. Average response time was 67 seconds at a cost of $0.07 per question. A second trial (VizWiz 2.0) with better photo software and a larger pool of workers cut the average time to 27 seconds (although at an increased cost).
Killer app or stop-gap?
VizWiz is clearly an idea with potential, but is it a long term-solution or just a quick-fix while we wait for AI to catch up with the crowd? Are crowd-computer collaborations doomed to be the minidisc (as opposed to the mp3) of disabled access tech? Already, fully-automated apps like LookTel Money Reader offer cheap solutions to specific visual problems. Plus VizWiz is still very much an academic research project. Will fresh-faced Team Bigham be able to compete in the big bad commercial world?
Reading about VizWiz I was struck by how much users liked the concept of working with a human crowd. Several participants even thought the system should allow greater interaction between users and workers. The “humanity” of VizWiz might be its greatest asset. Unlike AI, human crowds can answer complex or even subjective questions on pretty much any subject. With a bit of imagination (and some serious investment) VizWiz could be developed from a simple identification tool into a unique visually-aided Q&A service. Now that really would be something worth seeing.