Translators Experiences

29.8% Better Than Google NMT For English-to-Russian Patents

This English-to-Russian patent translator uses a Slate Desktop personalized SMT that’s 29.8% more productive than Google NMT.

29.8% Better Than Google NMT For French-to-English Medical

This French-to-English medical goods translator uses a Slate Desktop personalized SMT that’s 29.8% more productive than Google NMT.

26.8% Better Than Google NMT For English-to-Gaelic EU Legislation

This English-to-Gaelic EU Legislation translator uses a Slate Desktop personalized SMT that’s 26.8% more productive than Google NMT.

18.7% Better Than Google NMT For German-to-English Patents

This German-to-English patent translator uses a Slate Desktop personalized SMT that’s 18.7% more productive than Google NMT.

26.9% Better Than Google NMT For English-to-French EU Legislation

This English-to-French EU Legislation translator uses a Slate Desktop personalized SMT that’s 26.9% more productive than Google NMT.

28.8% Better Than Google NMT For English-to-Croatian EU Legislation

This English-to-Croatian EU Legislation translator uses a Slate Desktop personalized SMT that’s 28.8% more productive than Google NMT.

49.7% Better Than Google NMT For English-to-Italian IT Software

This English-to-Italian IT Software translator uses a Slate Desktop personalized SMT that’s 49.7% more productive than Google NMT.

47.4% Better Than Google NMT For Spanish-to-English IT Security

This Spanish-to-English IT Security translator uses a Slate Desktop personalized SMT that’s 47.4% more productive than Google NMT.

30.5% Better Than Google NMT For English-to-Croatian EU Legislation

This English-to-Croatian EU Legislation translator uses a Slate Desktop personalized SMT that’s 30.5% more productive than Google NMT.

28.7% Better Than Google NMT For English-to-Spanish Medical

This English-to-Spanish medical translator uses a Slate Desktop personalized SMT that’s 28.7% more productive than Google NMT.

26.6% Better Than Google NMT For Swedish-to-English EU Legislation

This Swedish-to-English EU Legislation translator uses a Slate Desktop personalized SMT that’s 26.6% more productive than Google NMT.

37.8% Better Than Google NMT For Spanish-to-English Medical

This Spanish-to-English medical translator uses a Slate Desktop personalized SMT that’s 37.8% more productive than Google NMT.

28.8% Better Than Google NMT For English-to-Spanish Insurance

This English-to-Spanish insurance translator uses a Slate Desktop personalized SMT that’s 28.8% more productive than Google NMT.

35.6% Better Than Google NMT For English-to-Italian Securities

This English-to-Italian securities translator uses a Slate Desktop personalized SMT that’s 35.6% more productive than Google NMT.

29.1% Better Than Google NMT For English-to-Czech EU Legislation

This English-to-Czech EU Legislation translator uses a Slate Desktop personalized SMT that’s 29.1% more productive than Google NMT.

30.8% Better Than Google NMT For Swedish-to-English Contracts

This Swedish-to-English Contracts translator uses a Slate Desktop personalized SMT that’s 30.8% more productive than Google NMT.

30.8% Better Than Google NMT For English-to-Italian Regulatory

This English-to-Italian regulatory translator uses a Slate Desktop personalized SMT that’s 30.8% more productive than Google NMT.

37.2% Better Than Google NMT For English-to-Italian Patents

This English-to-Italian patents translator uses a Slate Desktop personalized SMT that’s 37.2% more productive than Google NMT.

21.7% Better Than Google NMT For English-to-Italian Finance

This English-to-Italian finance translator uses a Slate Desktop personalized SMT that’s 21.7% more productive than Google NMT.

26.6% Better Than Google NMT For English-to-Italian Consumer Goods

This English-to-Italian consumer goods translator uses a Slate Desktop personalized SMT that’s 26.6% more productive than Google NMT.

40.4% Better Than Google NMT For English-to-Dutch IT Software

This Dutch-to-English IT software translator uses a Slate Desktop personalized SMT that’s 40.4% more productive than Google NMT.

21.0% Better Than Google NMT For German-to-English Contracts

This German-to-English contracts translator uses a Slate Desktop personalized SMT that’s 21.0% more productive than Google NMT.

33.6% Better Than Google NMT For German-to-Bulgarian IT Software

This German-to-Bulgarian IT Software translator uses a Slate Desktop personalized SMT that’s 33.6% more productive than Google NMT.

20.3% Better Than Google NMT For English-to-Polish Medical

This English-to-Polish medical translator uses a Slate Desktop personalized SMT that’s 20.3% more productive than Google NMT.

18.8% Better Than Google NMT For French-to-English Business

This French-to-English business translator uses a Slate Desktop personalized SMT that’s 18.8% more productive than Google NMT.

17.9% Better Than Google NMT For English-to-Italian Contracts

This English-to-Italian contracts translator uses a Slate Desktop personalized SMT that’s 17.9% more productive than Google NMT.

19.6% Better Than Google NMT For English-to-Arabic Medical

This English-to-Arabic medical translator uses a Slate Desktop personalized SMT that’s 19.6% more productive than Google NMT.

21.2% Better Than Google NMT For English-to-Italian IT Security

This English-to-Italian IT security translator uses a Slate Desktop personalized SMT that’s 21.2% more productive than Google NMT.

18.3% Better Than Google NMT For English-to-French Finance

This English-to-French finance translator uses a Slate Desktop personalized SMT that’s 18.3% more productive than Google NMT.

49.1% Better Than Google NMT For English-to-German Automotive

This English-to-German automotive translator uses a Slate Desktop personalized SMT that’s 49.1% more productive than Google NMT.

52.0% Better Than Google NMT For English-to-German Finance

This English-to-German finance translator uses a Slate Desktop personalized SMT that’s 52.0% more productive than Google NMT.

Glossary of Benchmark Score Terms

Glossary of Benchmark Score Terms

Our customer experience studies use these terms.

representative set

A set of segment pairs that are representative of the translator’s daily work. Segment pairs are randomly selected and removed from a translator’s translation memories. The set is not edited or manipulated to prioritize any particular kind of segment. The set does not show evidence of any attempts to prioritize or manipulate the kinds of segments.

source language

The source (authored) language 2-letter code of the evaluation set (e.g. de, en, es, fr, sv).

target language

The target (translated) language 2-letter code of the evaluation set (e.g. ar, bg, cs, de, en, es, fr, ga, hr, it, nl, pl, ru).

subject domain

The subject or industry covered in the evaluation set.

estimated corpus size

The estimated number of segment pairs in the training corpus (all TMs) used to create the Slate Desktop engine.

segments per representative set

The number of segments in the representative set. This number represents a 95% confidence level relative to the total corpus size.

words per source segment

The average number of words per source segment in the representative set. A low number indicates the representative set likely has a disproportionately high number of short segments (terminology or glossary entries).

words per target segment

The average number of words per target segment in the representative set. A low number indicate the representative set likely has a disproportionately high number of short segments (terminology or glossary entries).

words per MT segment

The average number of words per segment generated by the respective MT system. Indicates how closely the MT system matches the number of words per human translated segment.

BLEU score

The BLEU score is a “likeness” match (similar to TM fuzzy match) between all of the MT segments and human reference translations in a set. Higher scores are better and 100 is an exact MT match.

To create the cumulative score, the BLEU algorithm first scores “likeness” between an MT segment and its reference segment. “Likeness” is based on preponderance of the same words in the same order. A score of zero “0” means no likeness. A score of “100” means the MT segment exactly matches its reference segment (below).

The algorithm then consolidates all segment scores into a cumulative BLEU score representing the “likeness” of the entire set. This cumulative BLEU score is conceptually similar to an average of all segment BLEU scores in the set, but computationally it is different.

exact MT match (count)

An exact MT match segment exactly matches the reference human translated segment, AKA BLEU score 100 and 0 edit-distance (Levenstein)

The number of MT segments that exactly match their respective reference segment in the representative set, i.e. segments with BLEU score 100 and 0 edit-distance (Levenstein) score. Segments in this category represent pure cognitive effort for the translator to identify them as correct, without need for mechanical work such as typing or dictation to edit them.

exact MT match (percent)

The percentage of exact MT match segments in a set or the average percent across all representative sets in the summary. A high percentage score represent less work for a translator. Based on all customer experience studies, Google NMT scores range from 1.8% to 11.2%. Slate Desktop scores range from 20.0% to 53.7%.

words per exact MT match (count)

The average number of words per segment generated by the respective MT system for only the exact MT match segments. MT technologies have the reputation of performing poorly for long sentences. The difference (delta) between the words per MT segment and words per exact MT match scores shows the amount of desegregation the MT system suffers with long segments. A smaller delta is good and indicates the MT system performs better with longer segments.

filtered BLEU score (no exact MT matches)

A BLEU score recalculated after removing the exact MT match segments from a set.

Imagine a set of 10 sentence with BLEU scores (100, 100, 100, 100, 90, 65, 70, 80, 75, 40). The cumulative BLEU score (like average) is 73. A high cumulative BLEU score is considered good but it poorly represents the amount of editing work for a translator. Therefore, we divide scoring into two systems.

First, we report the percentage of segments that require zero editing, i.e. the exact MT match (percent) value above. In this case, 4 of the 10 sentences (40%) require no editing. Clearly, higher percents are better.

Then, we remove these segments and recalculate the filtered BLEU score scores using only the 6 segments that require editing. In this case, the BLEU drops from 73 (for 10 sentences) to 70 (for 6 sentences).

The filtered BLEU score is always equal to or lower than the cumulative BLEU score. This score represents the necessary editing work. Although higher BLEU scores are good, you also need to consider the delta between the cumulative BLEU score and the filtered BLEU score.

A small delta with a low percentage of exact MT match segments signals virtually every segment represents editing work for the translator.

A small delta with a high percentage of exact MT match segments signals the engine will likely serve the translator well.

A large delta with many exact MT match segments will result in a lower filtered BLEU score. This signals more editing work for a smaller number of segments. Therefore, it is not as serious as a low cumulative BLEU score.

segments requiring edit (count)

The difference between the segments per representative sets and the exact MT match (count), i.e. the inverse of the exact MT match (count). A higher number indicates more work.

character edits per segment (Levenstein)

The average edit-distance (Levenstein) score per segment requiring editing. The edit-distance (Levenstein) score represents the number of character edits that are needed to transform an MT segment into the reference segment. Therefore, this number represents the average number of character edits per segment to “fix” the MT segment. Higher scores indicates more edit work is required. A score of zero (0) means the segment is an exact MT match and no edit work is required.

total character edits

The edit-distance (Levenstein) score for a set. A higher number indicates more edit work is required.