Crowdsourcing and machine translation: the start of a beautiful friendship

microtask_machine_translationMachine-aided translation is one of those things people love to hate. Despite the best efforts of enthusiasts like myself, the majority of computer users still believe that machines are useless translators.

The whole area of machine translation has a terrible image problem. There are endless jokes and “true” stories about computer translation failures. Some of these are very funny (like the machine that apparently translated the English saying “out of sight, out of mind” into “invisible idiot” in Russian). However with a little crowdsourcing help, I suspect the machines may have the last laugh.

In defense of machines
Sometimes I almost feel there is a conspiracy against computers. Take the entry on Machine-aided translation in the Finnish version of Wikipedia. Far from being neutral, the article’s author seems determined to rubbish all machine translation. He offers readers a Finnish translation of the following English text, done using Google’s translator:

“William Shakespeare (baptized 26 April 1564 – 23 April 1616) was an English poet and playwright, widely regarded as the greatest writer in the English language and the world’s preeminent dramatist.”

“William Shakespeare (kastettu 26 huhtikuu 1564 – 23 huhtikuu 1616) oli Englanti runoilija ja näytelmäkirjailija, laajalti pidetään suurin kirjailija, että Englanti kielen ja maailman preeminent näytelmäkirjailija.”

For readers who do not understand Finnish let me explain: this is a lousy translation. Of course it is! The Wikipedia author wanted the translation to be lousy just so he could prove his point. People like this guy treat computers like electronic slaves. Instead of learning how machine translation actually works, they just bash the keys then yell “I told you so” when the machine (quite understandably) fails to deliver.

I prefer to view computers as partners and collaborators. In this spirit, I politely ask Google Translate (I call her GT) to do the work she is best at, and help her with the rest.

When I collaborate with GT, I first convert the Finnish text into what I call Googlish: a simplified version of the Finnish language which GT understands well. The variant of Googlish I use is one I have constructed specially for translating from Finnish into English.

Softly, I whisper “please GT, translate the Finnish national song “Maamme” (Our Land). I will convert the Finnish lyrics into Googlish, then you do the translation, and finally I’ll brush-up your text a little bit.”

Here is an extract from our result:

Our country is poor and will remain so,
if it’s gold you want.
A stranger walks by us proud,
but this is the land we love,
its forests, its mountains and its reefs,
they to us are dear.

Dear GT, thank you. It is an honor to be your collaborator and friend!

Inviting the crowd
I’m sure you can see the “crowdsourcing potential” of this human/computer approach. I just ask a native Finnish crowd to do the pre-editing phase (Finnish to Googlish) then, post-translation, I ask a native English-speaking crowd to do the final brush-up. These two crowds can certainly do work much faster and cheaper than I can (and probably also considerably better).

Language is a skill that took us humans hundreds of thousands of years to develop. Given that computers have only been “evolving” for a few decades, their language skills are really very impressive. I’m convinced that machine-aided translation has enormous potential to help people understand and communicate better. Just as long as we also learn to understand and communicate a bit better with our machines.

     I like this idea and obviously there is a lot of potential for this mix of people and computer services. At my group, a colleague worked on "Human Provided Services" and how to integrate them into workflows. I'm studying how to build a simple grammar for coordinating tasks in crowds with Tweetflows. We have created some prototypes and are in the process of creating a (hopefully sound) "scientific" foundation for the latter.

    Machine translation may not be perfect but anyone that has ever dealt with translation of any sort knows that human translation isn't perfect either. There was a keynote given a few years ago about the challenges facing machine translation. I think crowdsourcing definitely could have a positive affect

    Nice post!

    The need for different types and levels of translation is definitely not shrinking. There will certainly be a market for all kinds of translation from pure machine translation to traditional translation (without translation memories or other productivity aids), and everything in-between. There can (and will) be translation and editing by professionals and the crowd, and any combinations of these and machine translation.In the discussion it seems sometimes to be forgotten that with new tools you can’t directly apply old processes, but the whole approach has to change. Believing that machine or crowd translation is produced in an identical fashion as professional human translation is what leads to bad results, not the machines or crowds as such.