A Slate Rocks customer (a translator) created a translation engine with his or her translation memories (TMs) using a personal computer. This page describes the engine and compares the translator’s Slate Desktop experience to an experience using Google’s new-and-improved neural machine translation (NMT) technology. You can read the entire report with thirth (30) more customer experiences by downloading the full report Study of Machine Translated Segment Pairs.
Slate Desktop Engine Details
The customer started with translation memories in the language pair and industry of his or her work, totaling the estimated corpus size number of segments. Slate Desktop cleaned the TMs, prepared a training corpus and built the engine. Note that these processes typically runs overnight. During that processing, Slate Desktop also extracted a representative set consisting of randomly selected segments from the training corpus.
|estimated corpus size||140,000|
|segments per representative set||2,361|
|words per source segment||26|
|words per target segment||24|
Benchmark Score Comparison
The segment pairs in the representative set are representative of the translator’s daily work. By focusing on one translator’s experience, these scores indicate a level of work reduction this customer will likely experience in his or her daily work using the respective MT system (Google or Slate Desktop) with 95% confidence.
|words per MT segment||23||24|
|exact MT match (count)||118||763|
|exact MT match (percent)||5.0%||32.3%|
|words per exact MT match (count)||5||15|
|filtered BLEU score (no exact MT matches)||38.64||71.83|
|segments requiring edit (count)||2,243||1,598|
|character edits per segment||49||32|
|total character edits||109,907||51,136|
These scores indicate this customer using Slate Desktop will likely spend significantly less time editing MT suggestions than if he or she were using Google for this work. This is because Slate Desktop creates engines with the customers translation memories and optimizes them to predict how the customer translates. While on the other hand, Google optimizes its NMT service for millions of customers with countless demands.
Google’s three (3) longest exact MT matches
The exact MT match (count) in the Benchmark Scores table (above) is the number of segments that Google NMT successfully matched to the translator’s actual work, i.e. Google successfully predicted the translator’s actions. The three segments in this table are the exact MT match segments with the longest length. This translator can expect to experience these kinds Google NMT results while translating these kinds of project.
|en||hr (Google and translator)|
|In the event of mobility between Member States, Regulation (EU) No 1231/2010 of the European Parliament and of the Council  applies.||U slučaju mobilnosti između država članica primjenjuje se Uredba (EU) br. 1231/2010 Europskog parlamenta i Vijeća .|
|If either Party considers that the other Party has failed to fulfil an obligation under this Agreement, it may take appropriate measures.||Ako jedna od stranaka smatra da druga stranka nije ispunila obvezu prema ovom Sporazumu, može poduzeti odgovarajuće mjere.|
|Furthermore, diseases and species that are important today may be marginalised in the future.||Nadalje, bolesti i vrste koje su danas važne mogu biti marginalizirane u budućnosti.|
Slate’s three (3) longest exact MT matches
The exact MT match (count) in the Benchmark Scores table (above) is the number of segments that this translator’s Slate Desktop engine successfully matched to the translator’s actual work, i.e. Slate Desktop successfully predicted the translator’s actions. The three segments in this table are the exact MT match segments with the longest length. This translator can expect to experience these kinds Slate Desktop results while translating these kinds of project.
|en||hr (Slate and translator)|
|Where, on completion of the procedure set out in Article 40(3) and (4), objections are raised against a measure taken by a Member State, or where the Commission considers a national measure to be contrary to Union legislation, the Commission shall without delay enter into consultation with the Member States and the relevant economic operator or operators and shall evaluate the national measure.||Ako se nakon završetka postupka iz članka 40. stavaka 3. i 4. podnesu prigovori na mjeru koju je poduzela država članica ili ako Komisija smatra da je nacionalna mjera u suprotnosti sa zakonodavstvom Unije, Komisija bez odgode započinje savjetovanje s državama članicama i relevantnim gospodarskim subjektom ili subjektima te ocjenjuje nacionalnu mjeru.|
|Value according to Article 418 of the CRR — transferable assets representing claims on or guaranteed by: the central government of a Member State, a region with fiscal autonomy to raise and collect taxes, or of a third country in the domestic currency of the central or regional government, if the institution incurs a liquidity risk in that Member State or third country that it covers by holding those liquid assets||Vrijednost u skladu s člankom 418. CRR-a – prenosiva imovina koja predstavlja potraživanja od ili za koju jamče: središnja država države članice, regije s fiskalnom autonomijom koje uvode i ubiru poreze ili treće države u domaćoj valuti središnje države ili jedinice područne (regionalne) samouprave, ako je institucija izložena likvidnosnom riziku u toj državi članici ili trećoj zemlji koji pokriva držanjem navedene likvidne imovine|
|The regulatory authority may decide that transmission-connected demand facilities, transmission-connected distribution facilities, distribution systems and demand units for which a request for a derogation has been filed pursuant to Articles 52 or 53 do not need to comply with the requirements of this Regulation from which a derogation has been sought from the day of filing the request until the regulatory authority’s decision is issued.||Regulatorno tijelo može odlučiti da postrojenja kupca priključena na prijenosni sustav, distribucijska postrojenja priključena na prijenosni sustav, distribucijski sustavi i elementi postrojenja kupca za koje je podnesen zahtjev za odstupanje u skladu s člankom 52. ili 53. ne trebaju biti u skladu sa zahtjevima iz ove Uredbe od kojih se traži odstupanje od dana podnošenja zahtjeva do izdavanja odluke regulatornog tijela.|
Glossary of Benchmark Score Terms
Our customer experience studies use these terms.
A set of segment pairs that are representative of the translator’s daily work. Segment pairs are randomly selected and removed from a translator’s translation memories. The set is not edited or manipulated to prioritize any particular kind of segment. The set does not show evidence of any attempts to prioritize or manipulate the kinds of segments.
The source (authored) language 2-letter code of the evaluation set (e.g. de, en, es, fr, sv).
The target (translated) language 2-letter code of the evaluation set (e.g. ar, bg, cs, de, en, es, fr, ga, hr, it, nl, pl, ru).
The subject or industry covered in the evaluation set.
estimated corpus size
The estimated number of segment pairs in the training corpus (all TMs) used to create the Slate Desktop engine.
segments per representative set
The number of segments in the representative set. This number represents a 95% confidence level relative to the total corpus size.
words per source segment
The average number of words per source segment in the representative set. A low number indicates the representative set likely has a disproportionately high number of short segments (terminology or glossary entries).
words per target segment
The average number of words per target segment in the representative set. A low number indicate the representative set likely has a disproportionately high number of short segments (terminology or glossary entries).
words per MT segment
The average number of words per segment generated by the respective MT system. Indicates how closely the MT system matches the number of words per human translated segment.
The BLEU score is a “likeness” match (similar to TM fuzzy match) between all of the MT segments and human reference translations in a set. Higher scores are better and 100 is an exact MT match.
To create the cumulative score, the BLEU algorithm first scores “likeness” between an MT segment and its reference segment. “Likeness” is based on preponderance of the same words in the same order. A score of zero “0” means no likeness. A score of “100” means the MT segment exactly matches its reference segment (below).
The algorithm then consolidates all segment scores into a cumulative BLEU score representing the “likeness” of the entire set. This cumulative BLEU score is conceptually similar to an average of all segment BLEU scores in the set, but computationally it is different.
exact MT match (count)
An exact MT match segment exactly matches the reference human translated segment, AKA BLEU score 100 and 0 edit-distance (Levenstein)
The number of MT segments that exactly match their respective reference segment in the representative set, i.e. segments with BLEU score 100 and 0 edit-distance (Levenstein) score. Segments in this category represent pure cognitive effort for the translator to identify them as correct, without need for mechanical work such as typing or dictation to edit them.
exact MT match (percent)
The percentage of exact MT match segments in a set or the average percent across all representative sets in the summary. A high percentage score represent less work for a translator. Based on all customer experience studies, Google NMT scores range from 1.8% to 11.2%. Slate Desktop scores range from 20.0% to 53.7%.
words per exact MT match (count)
The average number of words per segment generated by the respective MT system for only the exact MT match segments. MT technologies have the reputation of performing poorly for long sentences. The difference (delta) between the words per MT segment and words per exact MT match scores shows the amount of desegregation the MT system suffers with long segments. A smaller delta is good and indicates the MT system performs better with longer segments.
filtered BLEU score (no exact MT matches)
Imagine a set of 10 sentence with BLEU scores (100, 100, 100, 100, 90, 65, 70, 80, 75, 40). The cumulative BLEU score (like average) is 73. A high cumulative BLEU score is considered good but it poorly represents the amount of editing work for a translator. Therefore, we divide scoring into two systems.
First, we report the percentage of segments that require zero editing, i.e. the exact MT match (percent) value above. In this case, 4 of the 10 sentences (40%) require no editing. Clearly, higher percents are better.
Then, we remove these segments and recalculate the filtered BLEU score scores using only the 6 segments that require editing. In this case, the BLEU drops from 73 (for 10 sentences) to 70 (for 6 sentences).
The filtered BLEU score is always equal to or lower than the cumulative BLEU score. This score represents the necessary editing work. Although higher BLEU scores are good, you also need to consider the delta between the cumulative BLEU score and the filtered BLEU score.
A small delta with a low percentage of exact MT match segments signals virtually every segment represents editing work for the translator.
A small delta with a high percentage of exact MT match segments signals the engine will likely serve the translator well.
A large delta with many exact MT match segments will result in a lower filtered BLEU score. This signals more editing work for a smaller number of segments. Therefore, it is not as serious as a low cumulative BLEU score.
segments requiring edit (count)
The difference between the segments per representative sets and the exact MT match (count), i.e. the inverse of the exact MT match (count). A higher number indicates more work.
character edits per segment (Levenstein)
The average edit-distance (Levenstein) score per segment requiring editing. The edit-distance (Levenstein) score represents the number of character edits that are needed to transform an MT segment into the reference segment. Therefore, this number represents the average number of character edits per segment to “fix” the MT segment. Higher scores indicates more edit work is required. A score of zero (0) means the segment is an exact MT match and no edit work is required.
total character edits
The edit-distance (Levenstein) score for a set. A higher number indicates more edit work is required.