
{"id":16002,"date":"2020-02-11T14:24:39","date_gmt":"2020-02-11T13:24:39","guid":{"rendered":"https:\/\/www.textform.com\/?p=16002"},"modified":"2020-02-11T16:28:01","modified_gmt":"2020-02-11T15:28:01","slug":"bridging-nmt-part-1","status":"publish","type":"post","link":"https:\/\/www.textform.com\/en\/mt-blog\/bridging-nmt-part-1\/","title":{"rendered":"Bridging &#038; NMT, Part 1"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16003 size-full\" src=\"https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/02\/flawed_perfection.png\" alt=\"\" width=\"2000\" height=\"1021\" srcset=\"https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/02\/flawed_perfection.png 2000w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/02\/flawed_perfection-300x153.png 300w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/02\/flawed_perfection-1024x523.png 1024w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/02\/flawed_perfection-768x392.png 768w\" sizes=\"(max-width: 2000px) 100vw, 2000px\" \/><\/p>\n<p>Highly praised and frowned upon, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine_translation\">machine translation<\/a> is a hot topic for the translation industry. It causes fear of drastic changes while promising increased productivity and quality. Without predicting either scenario, it will certainly change the industry in the near future. As in so many other industries, the potential conflict resulting from such scenarios is based on the supposed superiority of machines over humans. That\u2019s why it\u2019s worth taking a look at the details to understand why machines have not yet prevailed.<\/p>\n<p>&nbsp;<\/p>\n<h2>A Closer Look<\/h2>\n<p>Neural Machine Translation (NMT) still has some obstacles to overcome, despite amazingly good results. An example that is commonplace for <a href=\"https:\/\/www.textform.com\/en\/\">text&amp;form<\/a> will be used here to shed light on two of these obstacles. They may seem marginal at first, but the difference between a human-like and a common machine translation is governed by details like these. First, read the following (admittedly very technical) sentence:<\/p>\n<p><em>&#8220;For instance, while a healthcare monitoring service might suffice with single cellular network coverage, a trucking fleet might require more than one mobile network footprint.&#8221;<\/em><\/p>\n<p>First of all, a little bit of context: The text is about so-called machine-to-machine communication (such as the healthcare monitoring service or the trucking fleet mentioned above) and the SIM cards required for this (paraphrased with \u201c<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cellular_network\">cellular network<\/a>\u201d and \u201cmobile network footprint\u201d), which enable wireless information exchange. More precisely, the sentence draws attention to the requirements for the mobile network depending on the scope of the machine-to-machine application. At least that\u2019s how I interpret this sentence, as does our auditor. However, that is not the case when translating the sentence into German with several major NMT providers. They usually interpret the sentence as follows:<\/p>\n<p><em>&#8220;W\u00e4hrend z.B. ein Gesundheits\u00fcberwachungssystem mit einem einzigen Mobilfunknetz ausreichen k\u00f6nnte, k\u00f6nnte eine LKW-Flotte mehr als ein Mobilfunknetz ben\u00f6tigen.&#8221;<\/em><\/p>\n<p>Don\u2019t worry if you don\u2019t speak German. Translating this sentence back into English illustrates the problem perfectly fine:<\/p>\n<p><em>&#8220;For example, while a healthcare monitoring system might be enough with a single cellular network, a trucking fleet might require more than one mobile network footprint.&#8221;<\/em><\/p>\n<p>By the way, the translation is deliberately modified so that the results cannot be traced back to a specific NMT provider. The key point, however, is always the same: in the first independent clause, it is the healthcare monitoring system that <em>is enough<\/em>, although the sufficiency in question refers to the cellular network. The sentence doesn\u2019t become completely incomprehensible, but I doubt that any auditor would be turning a blind eye on a case like that.<\/p>\n<p>&nbsp;<\/p>\n<h2>Flawed Perfection<\/h2>\n<p>The reason for that is embedded in the design of these <a href=\"https:\/\/en.wikipedia.org\/wiki\/Neural_machine_translation\">NMT<\/a> engines. Our &#8220;test subjects&#8221; source their ability to translate from text corpora. These bilingual texts are tremendous in size and linguistically flawless. Feeding such large quantities of data to an engine enables it to understand what a specific word means \u2013 at least in another language. Thus all of our test subjects have learned that in German, the verb <em>to suffice<\/em> always translates to <em>ausreichen<\/em> or one of its close synonyms. We cannot blame our test subjects nor their developers: they only learned or taught to the best of their knowledge. What our digital competitors are not aware of is the use of non-standardized language. In this case the use of <em>to suffice with<\/em> in the sense of <em>to make do with<\/em>. The German verb <em>ausreichen<\/em> doesn\u2019t reflect this meaning; neither do its synonyms. It\u2019s not surprising that our test subjects don\u2019t know that usage of <em>to suffice with<\/em>, since it\u2019s not standard and generally a rare sight. And yet the author of the text has chosen this construction. And although it can be found in a freely accessible translation memory associated to one of our test subjects, it is simply not common enough for the engine to consider that translation. In other words, the design of these NMT engines is too perfect to translate such informal expressions correctly.<\/p>\n<p>At this point, you might have questioned the usage or even the existence of the aforementioned construction. And you are not to be blamed for that. But ultimately, there are two rather disenchanting facts that you should consider: First, language works in mysterious ways, and second, both biological and digital translators eventually have to work with what\u2019s on the (metaphorical) table.<\/p>\n<p>To sum up: Informal language usage will remain the NMT\u2019s Achilles&#8217; heel, unless we find the right balance between correct and informal training data. Despite all that, it\u2019s worth mentioning that this problem is not limited to machines, as humans are far from perfect themselves. In the next installment of this article, we will therefore look at a very human workaround for this problem: bridging.<\/p>\n<p><em>to be continued<\/em><\/p>\n<hr \/>\n<p><em>About the Author<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-15706\" src=\"https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/01\/daniel_nad_project_manager_textform-300x300.png\" alt=\"Daniel Nad, Project Manager at text&amp;form\" width=\"250\" height=\"250\" srcset=\"https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/01\/daniel_nad_project_manager_textform-300x300.png 300w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/01\/daniel_nad_project_manager_textform-150x150.png 150w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/01\/daniel_nad_project_manager_textform-768x768.png 768w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/01\/daniel_nad_project_manager_textform-266x266.png 266w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/01\/daniel_nad_project_manager_textform-500x500.png 500w, https:\/\/www.textform.com\/tf2019\/wp-content\/uploads\/2020\/01\/daniel_nad_project_manager_textform.png 1000w\" sizes=\"(max-width: 250px) 100vw, 250px\" \/><\/p>\n<div>Organized sounds, manifested in abstract character strings with thousands of mutually unintelligible variants: human languages are fascinating. At least according to our author Daniel Nad. As a passionate linguist and PM at <a href=\"https:\/\/www.textform.com\/en\/about-us\/our-team\/\">text&amp;form<\/a>, he experiences language on a daily basis &#8211;\u00a0and enjoys sharing his passion with others.<\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><!--HubSpot Call-to-Action Code --><span id=\"hs-cta-wrapper-087b4f07-6a28-438f-a182-3ad47941a092\" class=\"hs-cta-wrapper\"><span id=\"hs-cta-087b4f07-6a28-438f-a182-3ad47941a092\" class=\"hs-cta-node hs-cta-087b4f07-6a28-438f-a182-3ad47941a092\"><!-- [if lte IE 8]>\n\n\n<div id=\"hs-cta-ie-element\"><\/div>\n\n\n<![endif]--><a href=\"https:\/\/cta-redirect.hubspot.com\/cta\/redirect\/5306090\/087b4f07-6a28-438f-a182-3ad47941a092\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" id=\"hs-cta-img-087b4f07-6a28-438f-a182-3ad47941a092\" class=\"hs-cta-img alignnone\" style=\"border-width: 0px;\" src=\"https:\/\/no-cache.hubspot.com\/cta\/default\/5306090\/087b4f07-6a28-438f-a182-3ad47941a092.png\" alt=\"Join the discussion\" \/><\/a><\/span><script charset=\"utf-8\" src=\"https:\/\/js.hscta.net\/cta\/current.js\"><\/script><script type=\"text\/javascript\"> hbspt.cta.load(5306090, '087b4f07-6a28-438f-a182-3ad47941a092', {}); <\/script><\/span><!-- end HubSpot Call-to-Action Code --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Highly praised and frowned upon, machine translation is a hot topic for the translation industry. It causes fear of drastic changes while promising increased productivity and quality. Without predicting either scenario, it will certainly change the industry in the near future. As in so many other industries, the potential conflict resulting from such scenarios is [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":13349,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[797,737],"tags":[799,798,382,388,486,354,800,589],"_links":{"self":[{"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/posts\/16002"}],"collection":[{"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/comments?post=16002"}],"version-history":[{"count":0,"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/posts\/16002\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/media\/13349"}],"wp:attachment":[{"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/media?parent=16002"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/categories?post=16002"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.textform.com\/en\/wp-json\/wp\/v2\/tags?post=16002"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}