Marketing Pilgrim's "Search Marketing" Channel

Marketing Pilgrim's Search Marketing Channel is sponsored by ClickZ Live Chicago. Register to attend today!

All Your Translations Are Belong to Google



Google’s statistical machine translation initiative aims to make huge advances in document translation. Instead of relying on humans to translate documentrs, Google is taking a different approach. According to Reuters

Instead, they feed documents humans have already translated into two languages and then rely on computers to discern patterns for future translations.

While linguistic experts are impressed…

“Some people that are in machine translations for a long time and then see our Arabic-English output, then they say, that’s amazing, that’s a breakthrough,”

The average web user is less so…

“And then other people who have never seen what machine translation was … they read through the sentence and they say, the first mistake here in line five — it doesn’t seem to work because there is a mistake there.”

The ultimate goal is to be “good enough” at translation. Certainly better than AYB. ;-)

  • Jordan McCollum

    Let me just speak for the linguistic experts that taught my linguistics classes:

    Machine translation is several (software) generations away from human translation. Being better than previous iterations doesn’t mean a whole lot yet (literally).

    I don’t machine translate things and see a mistake in line five, I see an article with precious few phrases that make sense. The highest scores in the study (received by Google in Arabic to English translation) were 51%.

    Of course, that was higher than a human-aided computer translation (43%).

    I fear, when reading translated sites, that even the things that do make sense in the English translation are far from what the original message meant.

  • http://andybeard.eu/ Andy Beard

    I have helped develop and market machine translation, or more specifically CAT (computer aided translation) software, because the automated modes never give truly satisfactory results.

    If you are translating into your primary language using CAT, and are familiar with the subject, the quality of translation can often be better than that of a professional translator translating from mother tongue to a 2nd language.

    I didn’t always use the software the company I worked for developed, often we used professional translators for product manuals for games. I always had to make corrections unless we were using a native English speaker who also happened to know Polish, and the subject matter (computer games)

  • http://www.fusability.com Greg Scowen

    Interesting little piece Andy. Possibly because I speak a few languages and my wife is a degree qualified translator from Switzerland.

    You mention using CAT over translators that are translating into their second tongue. In a way the suggestion of the quality sometimes being better doesn’t surprise me. No truly professional translator will work into a second-tongue, it is just inappropriate, unless they have a partnership with someone who holds that as their first tongue.

    My wife frequently turns down work into English (MT is German) for this reason, unless I am available to sit in on the job.

    As commented by Jordan; Machine Translation does have a very long way to go. I have even thought about looking into this as a possibility for my research, but decided the task is just too big. It needs a lot more than little old me, and besides… I find usability studies of websites much more interesting.

    I would love to hear more about this CAT software you were involved with though.

  • http://www.fusability.com Greg Scowen

    Oh heck… look at me. Two Andy’s addressed in one post. Sorry, guys, I need to open my eyes.

  • http://software.allaboutthese.org Jonix

    Well, the translations from english to portuguese at google, are far from perfect, very very far.

  • Jordan McCollum

    Portuguese is still being translated the old way; the study that Andy blogged about is looking at a new method that they were only testing with Chinese and Arabic.

    So for now we’ll have to settle for the old way, which is certainly not as good as the new method they’ve been testing.