See hear: crowdsourced subtitles for everyday life

August 9th, 2012 by

microtask_backstroke_of_the_westAs anyone who has watched a few movies with unofficial subtitles will know, quality control can be a bit of an issue. But even in the world of terrible, inaccurate subtitles, nothing comes close to the awful poetry that is Star War the Third Gathers: Backstroke of the West (better known as Star Wars Episode III: Revenge of the Sith). The homemade subtitles that accompany this pirated version of George Lucas’ blockbuster are so wildly mangled that it has achieved cult status. (To give you a taste, in this version “Obi Wan Kenobi” is ingeniously translated to “Ratio the Tile”).

Thankfully, the mastermind behind Backstroke of the West has absolutely nothing to do with an incredible piece of subtitling software in development at the University of Rochester.

I see what you mean

The software was created to provide deaf people with real-time subtitles for everyday life. Currently, deaf people are forced to rely on expensive professional transcribers, who are only available at certain times. The Rochester research team, led by Walter Lasecki, instead rely on Mechanical Turk users to provide live translation for a fraction of the cost, day or night.

As well as helping deaf people navigate the noisy world, this crowdsourced transcription system is perfect for travelers who don’t speak the local language. The researchers plan to launch a smartphone app called Legion:Scribe this October, which will help you pay too much for souvenirs in almost any language you can think of.

The possibilities are truly exciting. The app could transform your smartphone into a Star Trek-style Universal Translator, letting you talk to absolutely anyone. If the researchers manage to hit their goal of a near-instant transcription, you could even watch a play, lecture or movie, no matter where you are in the world.

But what prevents you from getting nonsense like Backstroke of the West? The reason Star Wars Episode III came out so mangled in this translation (apart from the input of the lizard-being who replaced George Lucas in 1989) is partly because it’s the work of just one person (apparently this person translated what he heard into Chinese, then fed it through a machine translator to get the English subs that are so hilariously mangled).

Listen and earn

Legion:Scribe breaks up the stream of sound into bite-size chunks, and members of the crowd transcribe each chunk for a small fee. Just as we’ve found at Microtask, the researchers noted that smaller tasks led to improved accuracy, and that by letting multiple members transcribe each chunk, the work can be automatically verified. As Mechanical Turk does not have such a system in place (unlike some crowd computing companies we could mention), the researchers had to create their own algorithm, which is currently in the beta-testing phase.

The beta version of Legion: Scribe has a 74% accuracy when transcribing ordinary conversation, compared with an average 88.5% for professional transcribers. This is a significant difference, but the researchers aim to tweak the software and close the gap before the final version hits the app store.

The Legion: Scribe project is an inspiring illustration of what human computing can achieve. By combining innovative research with the existing power of the crowd, the Rochester research team isn’t just helping millions of deaf people to communicate. They may also demolish the spoken language barrier entirely.

Like the Descriptive Camera, which I discussed back in May, the Legion:Scribe app has gone from idea to reality at an astonishing speed. The fact it already generates reasonably accurate results demonstrates how far our understanding of distributed labor has developed in the last few years. Let’s just hope that it won’t prevent entertaining disasters like Backstroke of the West popping up every now and then!

  • jandriene

    I have to say, the comment about “transcribers” having an 88.5 accuracy rate is befuddling…”realtime broadcast captioners,” as they are termed, write live television broadcasts with an accuracy rating of at least 98 percent. Anything less than that is being done by “voice to text” systems, which cannot compare to a real, live, intelligent captioner!