Machine translation (MT), although having existed for over 50 years, has encountered many setbacks in fulfilling its original promise. These setbacks contrast sharply with the success of MT’s younger sibling, translation memory (TM) technology, which has enjoyed mainstream adoption since its inception some 25 years ago. But change is afoot!
Despite their success, translation memory systems don’t have linguistic intelligence; they don’t understand the language. They simply store and reproduce what’s already been written by humans, which may not be right or true. But their practical functionality is well developed, and the technology is a good match for iterative product generations that frequently contain redundant formulations. TM systems can also be effectively integrated into overall workflows.
Where TM systems cannot be effectively employed, however, is to quickly and affordably translate large amounts of new content, especially when a rough, informal idea of it will suffice for the purpose at hand. There is a mountain of material that needs to be translated for corporate communications, knowledge bases, or general information in multinational companies, or even smaller companies that do business abroad. Research has shown that customers will accept gisted translations of certain content types when the alternative is no translation at all. No one knows quite how high this mountain is, but consensus exists that traditional translation processes are too expensive and too slow to be useful in tackling it.
What can be done? The need for a solution is very real.
In the 1950s, the key players in the field of machine translation (often abbreviated MT, not to be confused with TM!) motivated themselves by saying: In 5 years, we’ll have done it! Unfortunately, they said that every five years henceforth. Then, during the 60s, support for the technology was gradually withdrawn because consensus prevailed that MT was unfeasible. Nevertheless, it did continue to exist in the niches of universities and research organizations and, since the 80s, it has experienced a renaissance that could be called a machine translation revival.
This has to do with the fact that almost all content is now available electronically. This has coincided with the development of statistical machine translation systems that require large amounts of electronically formatted text. What’s more, there is now a growing range of potential applications for this technology.
The expectations of MT are high and, although many of the actual translation results are astonishingly good, some remain humorously inaccurate. In many cases, the results just can’t keep up with the wishful thinking.
Transparent, manageable processes are an essential aspect of today’s MT technology. What makes translations good or bad – how can one get the results one wants?
Statistical (SMT) systems such as Google, Bing, or Moses, are hard-pressed to answer this kind of question. They work based on large amounts of text, probabilities, and – of course – statistics, not linguistic methods.
Rule-based (RBMT) systems, on the other hand, such as Lucy Software, Systran, or Open Source components, analyze the text from linguistic perspectives. They know both the source language and the target language, and understand logical transfer relationships between the two. They can evaluate themselves and even guess when something will go wrong. But these programs naturally have a hard time with source texts that are full of grammatical errors.
A particularly interesting aspect of rules-based systems is that they learn from specific terminology rather than large amounts of text. This means that integrating such systems intelligently into corporate terminology processes can generate important synergies. Businesses that already have required terminology can teach it to the systems, thus improving the accuracy of the output.
Furthermore, the rule-based MT system can report missing terminology to simplify the terminology management workflows.
Once you have a system trained to meet your specific requirements, you can use the same basic system in a wide variety of contexts without much additional effort, whether in an information portal or directly within the translation process. In fact, machine translation can hardly be seen as a direct competitor of traditional, established translation processes. However, it can be a major asset when it is intelligently integrated into established workflows.
This redefines the nature of certain tasks – the translator becomes an expert post-editor, and the terminologist needs a few more linguistic skills than previously. But it lays the foundation for a greater data throughput and faster implementation which, in turn, can also substantially reduce costs.
The topic is now so complex that companies can hardly manage it on their own: they’re dependent on the quality of source texts, they need to connect various systems logically, and they need technological expertise that wasn’t really necessary until now. That’s why cooperation with a competent partner is essential to the success of our efforts in the field of machine translation.
Please contact us if you would like more information.
Dipl. Ing. Horst Liebscher is Director of Technology & Innovation at text&form in Berlin, Germany. For many years now, he has specialized on the incorporation of innovative language technologies in translation workflows. Since the mid-nineties, Liebscher has dedicated himself and his expertise to the the language and localization industry. His insight into every aspect of the translation business supports productive application of machine translation by pragmatically closing the gap between humans and computers.