Slate™ Corpus


Slate™ Corpus is a specialized product for those who need to clean translation memories and convert them into a training corpus without the need to create the custom machine translation models. If you don’t know what any of this means, then you can skip this product.

Slate™ Corpus is one of the applications in the Slate™ Desktop suite. It’s the application that organizes translation memories by client, subject matter and project type in an inventory. It also cleans, prepares and converts the translation memories into the training corpora that build translation models.

The Slate™ Desktop and Slate™ Desktop Pro suites include this application. This stand-alone application is for customers who want to organize or clean translation memory on another computer in addition to your Slate™ Desktop computer.


These features and functions come with Slate™ Corpus.

All Languages

Support for all languages and all new languages as they are released in maintenance updates. With 34 languages, it supports a total of 1,122 language pairs.

Privacy & Confidentiality

Slate™ Corpus runs on your PC. There’s no the Internet connection. They don’t log your activities like online subscription services. You’re fully in control of confidential work.

Build Engines (Models)

Organize Translation Memories

Tools that organize an inventory of translation memories by client, subject matter and project type.

Create Training Corpora

Tools that clean, prepare and convert translation memories into training corpora to build translation engines.

Support Utilities

Sample Translation Memories

Sample translation memories and other files to help you practice and learn.

Scripting Automation

Tools that automate repetitive and complicated Slate™ Desktop tasks to efficiently process large projects support a command line terminal or integration into your third-party applications.


You need to provide the following before working with Slate™ Corpus.

Hardware System Requirements
  • Intel Core i3 (i7 recommended) or AMD Athlon 64 CPU (4-core x86-64, 2.4 GHz or faster)
  • 4 GB of RAM (8 GB recommended)
  • 2 GB of free hard-disk space for installation
  • 250 GB (or more) of additional free space on a high-performance drive is required after installation
Windows Operating System (Option)
Windows 10 Logo
  • Windows 7 64-bit Edition with Service Pack 1
  • Windows 8 or 8.1 64-bit Edition
  • Windows 10 64-bit Edition
  • 32-bit Editions not supported
Linux Operating System (Option)
Linux Logo
  • Ubuntu 16.04 or newer (other Debian-based on request)
  • CentOS/RHEL-based (other RPM-based on request)
MacOS Operating System (Option)
MacOS Logo
  • To be determined, currently unsupported
Translation Memories

Personalized engine:

  • 70,000 to 150,000 sentence segments
  • One full-time translator’s work for 3 to 4 years

Customized engine:

  • 200,000 to 500,000 sentence segments
  • Support a team of translators
  • There’s no upper limit to the number of segments
  • Too many segments is an opportunity for variety that degrades performance

File Types

Slate™ Desktop reads and writes these standards-based localization file types:

  • Text files with UTF-8 character encoding, Linux or Windows new line separators
  • Tab-delimited files are specialized Text file (as above) with one tab per line. Text left of the tab is the source language. Text right of the tab is target language.
  • TMX – translation memory exchange up to version 1.4b
  • XLIFF – XML Localization Interchange File Format version 1.2 (.xlf, .xliff, .sdlxliff, .mxliff, .mqxliff)
  • Gettext .po and .mo files

You can also work with file types supported by your computer-assisted translation (CAT), such as .docx, .xlsx, etc.

The installer installs and manages the following required dependencies.

Perl Scripting Runtime

Perl 64-bit version 5.28 or newer is a free open source scripting runtime environment.

Python Scripting Runtime

Python 64-bit version 3.72 or newer is a free open source scripting runtime environment. Required dependency libraries include: pip, pywin32, six, numpy, nltk, lxml, regex, polib, jieba, PyArabic, tinysegmenter3, hazm, wxPython


License and support included with your purchase of Slate™ Desktop Crpus.

End User License Agreement

A perpetual, royalty-free end-user-license agreement (EULA) to use on your machines. No subscriptions or hidden fees.

Multiple Platforms
MacOS Logo
Linux Logo
Windows 10 Logo

Install and activate on any supported operating systems. Today’s support includes MS Windows and Linux. MacOS is planned.

Single Activation

Install, activate and work on one machine. Build engines and work on the same computer.

Maintenance Updates

Maintenance updates are published occasionally with new languages, enhanced features and bug fixes.

Technical Support

Access to priority technical support during the period between major version updates via our online support portal,

Open Source

Slate™ Corpus distributes these components under their respective open source licenses.

Slate Toolkit Language Tokenizers

The language tokenizer scripts from the Slate™ Toolkit support data cleaning, conversion and processing.

Social Sharing Discounts

A win! win! win! You get 10% off at checkout. Your colleagues learn about Slate. We have lower advertising expenses. Thank you!