SpeakerText: serious about subtitlesMarch 23rd, 2011 by Ville Miettinen
The 1980s was a strange decade. An era of big business, big hair and really big mobile phones. It was also the golden age of the VHS video player.
Thanks to a cutting edge piece of technology called the rewind button, we could enjoy classic scenes from Star Wars, The Karate Kid and Crocodile Dundee again and again (although watching the Death Star explode too many times ruined the tape).
Fast-forward thirty years and video technology is a lot more sophisticated (as is my taste in movies, well most of the time). On the surface, the internet seems like the perfect video host: you can stream, download, enjoy live TV, and risk losing brain cells on YouTube. But one part of the modern movie experience is still stuck in the VHS era. The only way to search inside an online video is to sit down and watch it through.
Off the radar
The internet was originally designed for text documents (that’s what happens when you put academics in charge of technology). Search engines still only recognize text: video content is invisible.
Most providers solve this problem by SEO-ing the web pages that contain videos (“Search engine optimization” is internet-jargon for sucking up to Google). A more sophisticated solution is video transcription: converting the spoken contents of a video into text and embedding it into the web page.
There are two “traditional” video transcription methods. Option one: get someone (for example a nice lady from California) to watch the whole video and type it out – accurate but expensive and slow. Option two: use machine transcription – cheap but often unintentionally hilarious.
Crowdsourcing startup SpeakerText has proposed a third way (cue drum roll and dry ice). The company claims to “combine artificial and human intelligence to offer low-cost, high-quality video transcription”.
SpeakerText breaks the transcription process into a series of crowd and machine based tasks. First, speech-recognition software splits a video into 10-second chunks. These chunks are then (no prizes for guessing the next step) transcribed by Mechanical Turk workers. Finally, editors stick the whole thing back together to form a (hopefully) readable text document.
Search for the content inside yourself
The SpeakerText interface has some neat functionality. All transcriptions are time stamped. You can click on any sentence in the “SpeakerBar” display and the video will start playing from that point. Also, if you copy and paste a quote from the SpeakerBar, the quote becomes a link that starts playing the video at the time where the quote appeared. Hours of fun, huh?
CEO Matt Mireles (whose previous jobs include paramedic, firefighter and journalist) has admitted that when he started the company in 2008, he had no idea what he was doing. Three years, an office in Mountain View and $600,000 of angel funding later, he seems to be getting the hang of things.
The stampede of funders for SpeakerText shows that video publishers are desperate to get their content on Google’s radar. While it’s easy to get caught up in the hot-new-startup hype, success brings its own problems (just ask Barack Obama). SpeakerText is a small, inexperienced company. Will it be able to scale-up in response to demand? How about building for mobile devices or providing transcription in other languages? I guess we’ll have to wait and see.
Working the line
Over the past few months we’ve covered a whole bunch of companies that use the “assembly-line” model of distributed work. From ServioTranslate to Mybossisarobot, crowd-machine workflow is definitely in fashion.