Language Lessons: translating the global conversation

February 21st, 2011 by

microtask_language_translationOver the years, computers have come a long way from being humble adding machines. These days hardware is lightweight and good looking, while some software is so sophisticated it practically comes with Michelin stars. Despite all this progress, one area where machines have traditionally struggled is language translation (remember the bad old days of Alta Vista’s Babel Fish?).

The problem is that human languages are too complex to be easily broken down into computer-proof algorithms. However, as we’ve discussed before, machine translation is now starting to evolve. Google, in particular, has pioneered new techniques that work by using cloud computing to trawl through and analyze huge numbers of multilingual documents online.

On the web, English remains dominant, and businesses still have to speak English to survive (except in Denmark, according to our Travelling Salesman). However, the development of web 2.0, mobile internet and broadband mean people are spending more and more personal time online. And, as life gets uploaded, so do languages. Our friends at the social network Xiha Life have even added a “Translate button” to their site interface. Using machine translation, the button allows members (who come from over 200 countries) to switch between languages in real-time.

Facing the crowd
While machine translation is slowly improving, crowdsourced translation is booming. Take Facebook for example: in 2008 the social network launched Translations, an open community where users translate, review and verify new language versions of the site. There have been some teething troubles but, two years on, Facebook is now available in 64 languages and counting. Similar crowdsourcing methods have been used by Twitter and Wikipedia (as well as by smaller, but equally useful, sites like Italian subs addicted).

But the “community translation method” isn’t always popular. In 2009, business network LinkedIn tried pretty much the same thing as Facebook. LinkedIn asked members who were listed as professional translators to help render the site into more languages. When users realized the work was unpaid, many refused and some even said they felt insulted.

I guess the lesson is to pitch to the right crowd. Facebook users are there to socialize, have fun, and spam you with endless friend requests. Translations became another way of getting involved and meeting new people. LinkedIn, on the other hand, provides a service for business people. It’s a useful networking tool, but not necessarily a site users want to spend all hours of the day on or personally help develop.

A crowded market
The big names of web 2.0 may choose to translate internally, but crowdsourcing translation startups are also on the rise. There’s a long list of companies offering everything from document translation to software localization.

Take servioTranslate, part of crowdsourcing heavyweight CloudCrowd. ServioTranslate focuses on cheap (6.7 cents a word) and speedy document translation. The process runs like an assembly line: text is divided up, put through a machine translation and then distributed to the crowd (via a Facebook app) for error checking and reassembly. In contrast, at German startup toLingo the mantra is quality. ToLingo boasts a pre-checked database of 6000 translators, guaranteed native speakers and “triple checks” on documents. One potential issue with both startups is that users have to upload text or documents individually. Compared to the cutting edge services on offer, this seems old-fashioned and inflexible – the Servio uploader, for example, will only accept MS Word documents.

It’s too early to tell which startup will emerge as leader of the multi-lingual pack. The race is on to create a service with the perfect combination of high speed, high quality and low-cost. Of course, the only way to test if the crowd can really beat computers at translation is to try them out. If anyone’s had experience, good or bad, with crowdsourced translation, we’d love to hear from you. Comments in your language of choice.

  • Jani Penttinen

    Right on the money, Ville! With machine translation and crowd sourced translation growing rapidly in terms of both quality and quantity, the translation industry is going trough probably the biggest changes in history. I think the first wave of the new translation start ups are doing is slightly wrong, offering the old fashioned manual service with crowd sourced back end.nnYou definitely want to focus on providing a great API and letting other companies build their services on top of that, fully automated. We chose MyGengo as our translations backend for service because of the APIs they provide – we want to provide fully automated, near-real time translation of social media feeds to our customers and there’s no other way to do manually.

  • Tadej Gregorcic

    Speaking of crowd-sourced translation, it is important not to underestimate the speed advantage professionally translated corpora have over generally crowd-sourced ones.nnTo elaborate:nGoogle has used passers-by to correct incorrect Google Translate assumptions for a while now. If, when using their statistical translation system, you see an error, you are able to suggest a correction that helps the engine learn.nnHowever, the quality of these corrections is always questionable and while it may slowly approach 100% with n->inf, it may take a while for e.g. Slovenian or any other minor language translations to start making sense.nnGoogle did introduce a tool that may accelerate this considerably, though – I just don’t know if they realize it.nnThe Google Translator Toolkit allows professional translators to do the same they have been doing using tools like Trados (re-using translation memory to avoid duplicate work), but this time with a global and free corpus.nnThe changes professionals make to the automatic suggestions are usually of much higher quality than what random visitors can offer.nnHaving worked in the translation industry, though, I find it that Google is not making their offering compelling enough and that this opportunity will somehow go to waste.nnSelling the idea of a global (perhaps open) translation memory (a.k.a. corpus) to professional translators may well be the key to approaching a linguistic globalization of the web with more determination.

  • Anders Schepelern

    I guess it all boils down to the quality of text you are ready to accept as a client. Using the crowd you can often source translators ready to do acceptable work for you at a fixed low price and, by offering the service of including machine translations, you can considerably cut down the time your translators spend.nnWhat I’m particularly interested in here is the process that ensures quality, readability and flow (i.e. coherence) of the text, which is something I’d question whether you get from a machine or a layman translator. Reasons for the machine missing out are evident. The reason for the layman doing the same is a simple question of proficiency. So with the machine+layman approach, for a text of 100+ words you’ll end up with a mid-level to ok version of what could have been.nnEnter the copy-editor. Copy-editing is the process of improving text – in this case text that has been passed through both a machine and a (layman) translator. The beautiful thing about copy-editing is that it’s both cheaper and faster than translation (professional level or amateur alike), but really imperative to the quality of the end product. So in terms of the foreseeable future, where everyone really can be a translator to and from two to three languages, ensuring that text is of a professional native-tongue quality is a job that, in my humble opinion, belongs to the professional editor.nnOf course, I’m probably mildly biased in this matter, since I run – a professional, human copy-editing service .

  • Tommaso De Benetti

    There is one problem though, copy-editing from a professional is usually even more expensive than brand new copywriting from the same person.

  • Anders Schepelern

    Not in my experience. The suggested hourly rate for a professional copy-editor is EUR 24 (USD 34) with a standard output of 1400-1600 words per hour. Hourly rate for a translator is virtually the same with an output of 400-600 words per hour. As for copywriting done by an editor/writer with deep knowledge of the subject matter you will pay anywhere from EUR 20 (USD 28) to EUR 40 (USD 56) per 400 words – while running the risk of writers not being the greatest editors of their own work (- as in this case).