MOUNTAIN VIEW, California (Reuters) – In Google Inc.’s Instead, they feed documents humans have already translated into two languages and then rely on computers to discern patterns for future translations. While the quality is not perfect, it is an improvement on previous efforts at machine translation, said Franz Och, 35, a German who heads Google’s translation effort at its Mountain View headquarters south of San Francisco. “And then other people who have never seen what machine translation was … they read through the sentence and they say, the first mistake here in line five—it doesn’t seem to work because there is a mistake there.” But for some tasks, a mostly correct translation may be good enough. Speaking over lunch this week in a Google cafeteria famed for offering free, healthy food, Och showed a translation of an Arabic Web news site into easily digestible English. Two Google workers speaking Russian at a nearby table said, however, that a translation of a news site from English into their native tongue was understandable but a bit awkward. FEEDING THE MACHINE Och, who speaks German, English and some Italian, feeds hundreds of millions of words from parallel texts such as Arabic and English into the computer, using United Nations and European Union documents as key sources. Languages without considerable translated texts, such as some African languages, face greater obstacles. “The more data we feed into the system, the better it gets,” said Och, who moved to the United States from Germany in 2002. The program applies statistical analysis, an approach he hopes will avoid diplomatic faux pas, such as when Russian leader Vladimir Putin’s translator miffed then German Chancellor Gerhard Schroeder by calling him the German “Fuehrer.” The word is verboten in that context because of its association with Adolf Hitler. “I would hope that the language model would say, well, Fuehrer Gerhard Schroeder is … very rare but Bundeskanzler Gerhard Schroeder is probably 100 times more frequent than Fuehrer and then it would make the right decision,” Och said. The center of Google’s effort looks surprisingly modest. Och shares a spartan office with two others on his team, with little clutter other than a shelf of linguistic books above his desk. That’s because the muscle work is performed by machines. So far, Google is offering its own statistical machine translations of Arabic, Chinese and Russian to and from English at http://www.google.com/language_tools. Third-party software gives access on the site to German and other languages, Och said. “So far, the focus is let’s make it really, really good,” Och said. “As part of a general Google philosophy, once it’s really useful and it has impact, then there will be found ways how to make money out of it.” Miles Osborne, a professor at the University of Edinburgh, who spent a sabbatical last year working on the Google project, praises Google’s effort but sees limitations. “The best systems (e.g. Google) can be very good indeed for language pairs such as Arabic-English,” he said. But he added software will not overtake humans in expert translations as it has in playing chess; software should be used for understanding rather than polishing documents. “It may also be useful when deciding whether to pay a human to do a good job: you could imagine looking at Japanese patent documents and seeing if they are relevant, for example,” he said. Google chairman Eric Schmidt also sees broad political consequences of a world with easy translations. “What happens when we have 100 languages in simultaneous translation? Google and other companies are working on statistical machine translation so that we can on demand translate everything all the time,” he told a conference earlier this year. “Many, many societies have operated in language-defined communities where they really don’t understand and are not particularly sympathetic to other peoples’ views because of the barrier of language. We’re about to have that breakthrough and it is a huge thing.”
Google’s approach, called statistical machine translation, differs from past efforts in that it forgoes language experts who program grammatical rules and dictionaries into computers.
“Some people that are in machine translations for a long time and then see our Arabic-English output, then they say, that’s amazing, that’s a breakthrough,” said Och.