Categories
Uncategorized

Incremental Training

The Slate Desktop strategy for incremental training.

Some readers might ask, What is incremental training. This primer explains incremental training and why Slate Desktop™ does not use it.

When starting with a new corpus to build an engine, the machine learning tools study the corpus and learn the vocabularies and patterns between the 2 languages. This machine learning session stops after it studies the whole corpus.

MT researchers learnt that training with more in-domain segment pairs improves an engine’s quality. By post-editing an engine’s output and adding those post-edits to the training corpus, the training corpus grew and engines improved after re-training with the larger corpus. Post-editing seemed the perfect solution to improve an engine’s output. The feedback loop was born, but post-editing was slow.

It takes time to post-edit enough segments to enlarge the corpus enough for measurable improvements. As a quick fix, researchers added corpus from almost any source. Training corpora grew to multi-millions of segment pairs. Those huge corpora created another problem. Training an engine with a corpus of millions of segment pairs requires training sessions that run weeks or sometimes months.

Researchers developed a technique known as Incremental Training to shorten that time. Incremental training reuses the results of a completed machine learning session, picks up machine learning with only the new segments and finally adds the new results to the older completed session. Incremental training shortens sessions from months to weeks, or week to many days.

Incremental training only reduces training times to update the knowledge base called the translation model (TM), but it does not update the style guide called the language model (LM). That means improvement from word orders in the updated corpus never make it to the engine. So, although shorter times are good, time is part of the story.

The Pitfalls of Incremental Training

Philipp Koehn, PhD created the Moses SMT Toolkit. In 2017 he cautioned Moses users that incremental training causes lower-quality translations.

There are various versions of incremental training, but full re-training from scratch will give better results, since incremental word alignment will not be as good as full word alignment.

Philipp Koehn, Moses Support list – March 31, 2017

The End-to-End Training Process

Before we can accurately assess the value of incremental training, we need to account for its impact on entire end-to-end process, including Philipp Koehn’s advice regarding its impact on quality.

Here are some hypothetical times (hh:mm) for three high-level training steps with a 70,000 segment training corpus (3 years of a translator’s work) plus 10,000 new segments (a few months of work). Here’s a legend:

  • 70,000 – times to train an engine with original corpus.
  • +10,000 – times to incrementally train an engine by adding new segments
  • 80,000 – times to train an engine with new corpus of original plus new segments.
Train Translation Model ( “TM”)70,000+10,00080,000
Convert translation memories to a “Parallel Corpus”0:150:15
Extract Tuning Set0:050:05
Train TM with the “Parallel Corpus”2:452:50
Prepare incremental segments0:05
Extract Tuning Set0:05
Train TM starting at new increment0:20
TRAIN TM TIMES (hh:mm)3:050:303:10
Train Language Model (“LM”) 70,000+10,00080,000
Convert target segments into “Monolingual Corpus”0:050:05
Merge other sources into the “Monolingual Corpus”0:050:05
Prepare incremental “Monolingual Corpus”0:10
Train LM with the “Monolingual Corpus”0:250:250:25
TRAIN LM TIMES (hh:mm)0:350:350:35
Tune SMT Model (TM & LM work together)70,000+10,00080,000
Binarize TM and LM for virtual memory operation0:150:150:15
Merge initial and incremental tuning sets.0:05
Tune SMT Model where TM and LM work together2:002:002:00
TUNING TIMES (hh:mm)2:152:202:15

Putting Them Together

GRAND TOTAL70,000+10,00080,000
TRAIN TM TIMES3:050:303:10
TRAIN LM TIMES0:350:350:35
TUNING TIMES2:152:202:15
TIMES (hh:mm)5:553:206:00

Conclusion

In this hypothetical example, incremental training reduced the engine build time 45%. It’s easy to see why researchers valued incremental training when working with a mega-corpus. Weeks of training time reduced to days.

But what about customers who use translation memories with 70,000 to 150,000 segments? Their build process runs 6 hours. Reducing that to 3 hours has little impact on their workload when they build engines over night while they’re sleeping .

Then, we consider Philipp Koehn’s advise that incremental training degrades translation quality. We must ask. Is the 45% time savings worthwhile at the expense of quality?

That’s why we chose not to implement incremental training in Slate Desktop™. Slate Desktop™ follows Philipp Koehn’s guidance to maintain the best possible translation quality.

Instead, Slate Desktop™ has a Base on engine… button. It is a reliable and convenient feature with a point-and-click user experience. After you add new TMs to your inventory, the button uses an existing engine as a template to build a new engine. As a result, you new training corpus has the TMs and results in better quality engines.

Interim Engine Updates

Slate Desktop™ supports enforcing terminology translations with terminology.tab files (see Translating Terminology). These terminology.tab files are the best way to improve an engines’ quality between update builds every 3 to 4 months.