1. The Training Corpus

What is a training corpus and how does it differ from translation memories? Learn to select the best resources and use best practices to convert them to a training corpus.

1.1 Build Corpus

Instructions to build a training corpus inventory from translation memories (and other sources). Customers: login for illustrated instructions. Log into Your Account Username or Email Password Remember Me Lost Password?

1.2 Remove Corpus

Instructions to remove training corpus files from your inventory. Customers: login for illustrated instructions. Log into Your Account Username or Email Password Remember Me Lost Password?

1.3 Manage Corpus

Instructions to manage, update and remove training corpus files in your inventory. Customers: login for illustrated instructions. Log into Your Account Username or Email Password Remember Me Lost Password?

1.4 Language Codes with TMX and XLIFF

Instructions to use language codes and country attributes to identify segments to extract from TMX and XLIFF for training corpus. Customers: login for illustrated instructions. Log into Your Account Username or Email Password Remember Me…

Continue reading →

1.5 Curating TMX Translation Units

Instructions to extract and label translation units from TMX files to your inventory according to <prop> tags. Customers: login for illustrated instructions. Log into Your Account Username or Email Password Remember Me Lost Password?

Strategies

Data Science Behind the Scenes

What Slate Desktop™ does to translation memories to convert them to a training corpus. Customers: login for illustrated instructions. Log into Your Account Username or Email Password Remember Me Lost Password?

When Can We Blame The Data?

This article describes a data cleaning challenge with a TMX file that the European Union published as a “clean” for machine translation purposes. Customers: login for illustrated instructions. Log into Your Account Username or Email…

Continue reading →

Working With Huge Corpora

The first Slate Desktop™ support ticket included this comment.  build a test engine based on one large TM (as an easy start)… I remembered a brilliant computational linguist’s comment. Kenneth Heafield created a critically important…

Continue reading →

The Ideal Corpus

An “ideal” corpus is a proportionate, balanced and representative subset of the real-world. Customers: login for illustrated instructions. Log into Your Account Username or Email Password Remember Me Lost Password?