A Reality-Hype Distinction

Honest people are working hard to catch fake news with hands in the cookie jar. Fake news tells us that neural machine translation (NMT) is Nearly Indistinguishable From Human Translation,” but honest people are working hard to distinguish reality from the hype.

Terence Lewis is a veteran translator, lexicographer, author, and creator of the Trasy Dutch-English machine translation program and Dutch-to-English NMT service, MyDutchPal. He openly shares NMT’s pros and cons in this article written by Gábor Ugray, We wanted a Frankenstein translator and ended up with a bilingual chatbot. Gábor is Head of Innovation at Kilgray Translation Technologies and also works to spread the truth about NMT. Also, Kirti Vashee, a long-time machine translation pundit, does his best to share a balanced perspective on his blog eMpTy Pages.

Arle Lommel, a senior analyst for Common Sense Advisory (CSA Research), wrote a great article about NMT published on pages 52 to 54 of Multilingual Magazine. His article, Why zero-shot translation may be the most important MT development in localization, reviews an unexpected anomaly in neural network technology that Google calls zero-shot translation. It promises to extend language technology to under-served minority language pairings.

Zero-shot Translation Anomaly

Digging deeper into Arle’s article, I agree that zero-shot translation is a really promising capability. It is also one of several neural network anomalies that researchers have discovered by surprise. Here’s a consolidated list the anomalies I’ve found in the media:

Maybe these anomalies all trace to the same thing in neural networks’ design; maybe not. Each of them holds a promise of something really cool to come. For today, NMT’s results are great for tourists visiting Cambodia when they order a coffee, and NMT’s anomalies pose serious problems for professionals who try to use NMT for their work. Researchers don’t yet understand these anomalies and it will take some time for them to control this technology for predictable professional results.

Digging Deeper

So, I decided to rewind the clock to academic research published before the hype started. That took me to this Workshop on Learning Technologies (European Committee for Standardization) evaluation paper published in October 2016: Neural versus Phrase-Based Machine Translation Quality: a Case Study. Note that their tests’ phrase-based machine translation (PBMT) includes several varieties of statistical machine translation (SMT) methods for the English-German language pair. I’ll quote article’s summary of findings:

  1. NMT generates outputs that considerably lower the overall post-edit effort with respect to the best PBMT system (-26%);
  2. NMT outperforms PBMT systems on all sentence lengths, although its performance degrades faster with the input length than its competitors;
  3. NMT seems to have an edge especially on lexically rich texts;
  4. NMT output contains less morphology errors (-19%), less lexical errors (-17%), and substantially less word order errors (-50%) than its closest competitor for each error type;
  5. concerning word order, NMT shows an impressive improvement in the placement of verbs (-70% errors)
NMT Alternative

The workshop’s NMT started with a 26% improvement in “overall post-edit effort.” There’s no doubt in my mind that it will continue to improve. The question is, why does SMT as a personalized translation engine outperform NMT and NMT outperforms big-data SMT?

A football player learns new playing rules and achieves different results as he transitions from little league to professional, but every league has football games. Likewise, SMT’s playing rules and the results it achieves change as we apply the technology in different use cases. Here’s an eye-opening demonstration.

From the Workshop on Learning Technologies:

systemBLEU Evaluation Set (en-de)
Standard PBMT25.8 collection of TED Talks short speeches
NMT31.1 collection of TED Talks short speeches

The table above shows the 17% improvement of the NMT system’s BLEU score over the big-data PBMT baseline with workshop’s evaluation set. Compare that to the 109% improvement from Google’s NMT to a Slate™ Desktop user’s SMT engine in the table below.

systemBLEU Evaluation Set (en-it)
Google NMT33.1 Representative sample of Isabella’s work
Isabella’s TMs69.3 Representative sample of Isabella’s work

These are Isabella Massardo’s Slate™ Desktop results she reported in her blog article Who Is A Translator’s New Best Friend?. For her review, she converted her translation memories to create a personalized translation engine and used it with the same client’s work.

In January 2017, Google CEO Sundar Pichai was referring to NMT when he told Google’s investors, “We have improved our translation ability more in one single year than all our improvements over the last 10 years combined.” Clearly, big-data NMT is a significant improvement over online big-data SMT services but it is not a revolutionary improvement for our language industry.

When a translator properly uses a personalized translation engine as intended, the engine will out-perform big-data. We need to continue to explore these new approaches to augment and enhance human translation.