Help! My computer is all grown up. It can translate!

Within just a few years, we have had to grow accustomed to the fact that computers can do things using artificial intelligence (AI) that we once only trusted humans to do. Today’s machines can not only beat us in chess, but also at Jeopardy, and, most recently, at Go, a Chinese game once considered to be an impossible feat for machines, who prevail at Go by using neuronal networks, a technology used in Neural Machine Translation (NMT).

How can machines do this? The answer is: data! A computer that has “seen” and analyzed thousands of the precursors for cancer is more “knowledgeable” than a dermatologist, whose knowledge is limited to merely a few hundred different observations. However, doctors possess resources to which machines do not have access, such as human intuition and knowledge of the life circumstances of their patients. “Added value” is gained by skillfully connecting man and machine.

The use of translation memories (TMs) to perform routine translation has been common practice in the translation industry for quite a while. The TMs themselves are not intelligent, but much like parrots, only reproduce previously learned translations of phrases or sentences. Humans are needed to polish the translation and ensure consistency, as well as to apply their knowledge of style, target audience, context, etc.

Statistical machine translation: from phrase-based to neuronal systems

Data-driven machine translation (MT) methods go a step further by attempting to generate brand new output sentences from unknown input that is based on large amounts of previously “seen” translations.

In the past, statistical MT systems (known as SMT – Statistical Machine Translation) were phrase-based. The most popular implementation is probably the Moses system. The central module, which is built like a TM, assigned probabilities to all possible translations of partial phrases and performed various pre- and post-translation operations such as rearranging sentence fragments and creating complete sentences from partial translations. To do this, various modules had to be trained using data in one or two languages, which led to considerable losses, since the left hand was often ignorant of the right, so to speak.

The newest class of SMT systems, known as neural MT systems (NMT), is based on a recently rediscovered type of machine learning in which artificial neurons organize themselves into layers. The systems imitate brain functionality, which relies on neural networking to enable thinking and remembering. The beauty of this approach is that the systems function end-to-end and thus do not require any intermediary modules. The systems are given source texts and translations, and then take care of the rest themselves. Translation is embedded in the total context of the sentence.

So how good are the NMT systems?

The switch to NMT systems has brought about a major advance in quality. Below is a simple example of a sentence translated by Google Translate before and directly after the switch to neuronal technology.

Source (German):	Warum macht der Tourist drei Fotos?
Reference:	Why does the tourist take three photos?
Google Translate (phrase-based):	Why does the tourist three photos?
Google Translate (neuronal)	Why does the tourist make three photos?

(https://www.blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/)

The old phrase-based systems had difficulty with the verb in German and English, which is located in a different position in each language. As in the above example, the verb was often simply not translated. Since neuronal systems have a view of the entire sentence, this is no longer a problem. Even if the selected verb is suboptimal, the sentence is at least complete and understandable. A test performed a half year later using the above example shows that Google now delivers perfect translations.

Nowadays, neuronal systems use letter groupings, instead of words or phrases, as the basis for statistical models. In rare cases, the system will insert words at first glance for no apparent reason. The example below is taken from a neuronal machine translation system used by a project partner:

Source (German):	Die Arbeiter müssten in den sauren Apfel beißen.
Reference:	The workers would have to bite the bullet.
Google Translate (phrase-based):	The workers would have to bite the bullet.
NMT:	The workers would have to bite into the clean apple.

The reason for the mistranslation is most likely due to the similarity between the German words “sauer” and “sauber,” which are processed in a similar way. We assume that these are just growing pains of the system that will later disappear.

Until now, it was common in the sciences to compare translation quality to a reference translation from an automatic comparison of the system output. The so-called BLEU (“bilingual evaluation understudy”) score is primarily used to measure quality. Put briefly, this method only counts words and word sequences that are identical in the machine translation and reference translation. This method yields numerical values that are not only the subject of evermore criticism in the sciences, but that also provide absolutely no information to users about how good the quality of the system is or what mistakes it is prone to make.

In order to obtain a clear picture of errors made by the systems, DFKI has worked in the past with forward-thinking language service providers, such as text&form and industry associations, to develop various methods that allow language experts to provide detailed feedback. These methods were used in several studies that compared different approaches to machine translation.

Are NMT systems suitable for use in the industry?

Our research has led us to draw several conclusions. First, there has been a tremendous improvement in quality between phrase-based and neural MT, which is now on par with the best rule-based machine translation (RBMT) systems. However, we also find cases where highly fluent NMT translations “mask” errors that are more difficult to spot for proofreaders.

A separate comparison we performed illuminates the differences between a domain-specific Moses system used by a language service provider (LSP) and a non-specific NMT system. Strikingly, the NMT system outperformed the Moses system in all criteria except for terminology (the NMT was not domain-trained) and tag handling (the NMT did not perform any explicit tag handling). Based on our findings from this study, we see no reason to stick to the “traditional” phrase-based technology. Even if NMT cannot be used in every case, a level of quality has been attained that increases productivity in many translation scenarios. Post-editing, i.e., editing by a professional linguist, is still definitely needed, as is data collection to help improve the engines.

Five myths about neural machine translation to demystify at the next cocktail party

NMT works like the human brain.
Neural networks are designed to resemble by neurons in human brains. Science is far from knowing how the human brain works, let alone being able to reconstruct it in a machine.

NMT systems prove that computers now emulate defining elements of humankind: creativity and intelligence.
Systems are creative in the same sense that parrots are creative. They might be able to come up with a nice rhyme while imitating us, but they are not able to crack a joke, use a new metaphor, or create some pun for a marketing campaign. That is still the province of humans.

We have reached a glass ceiling for MT quality improvement.
It takes two to tango. Technology can get better if it is used and experts provide feedback. We have sketched several methods to help establish productive communication.

Machine translation will cost jobs.
Although we can only speculate, this forecast was also made and proved wrong when personal computers were introduced in the 1990s. Many jobs have changed and will continue to change. Audiovisual translators no longer receive VHS tapes (notoriously late) or type subtitles using typewriters. More efficient translation will help meet the increasing demand created by globalization and migration.

RBMT is outdated.
The advantages of RBMT systems include good control of stylistic aspects, terminology, etc. We are convinced that smart hybrid systems are better than individual MT technologies. The approach using RBMT systems to generate training corpora for NMT systems should definitely continue to be pursued. There is still a lot of research needed in this area.

Conclusions

It is an exciting time for both R&D in the area of artificial intelligence and natural language processing, and for business development in the digitized and globalized markets.

DFKI is a non-profit, public-private partnership. Our mission is human-centric AI, meaning that our research aims to improve our lives, workplaces, medical treatment, etc., and to address societal challenges. We hope that our contributions in the area of quality translation help people communicate better.

ABOUT THE AUTHOR

Aljoscha Burchardt is Lab Manager at the Language Technology Lab of the German Research Center for Artificial Intelligence (DFKI GmbH). He is an expert in Artificial Intelligence and Language Technology. His interests include the evaluation of (machine) translation quality and the inclusion of language professionals in the MT R&D workflow. Burchardt is co-developer of the MQM framework for measuring translation quality and has a background in semantic language technology. After finishing his PhD in Computational Linguistics at Saarland University, he coordinated the Center of Research Excellence “E-Learning 2.0” at Technische Universität Darmstadt.