Slate Corpus™

Slate Corpus™ is the corpus preparation tool in the Slate Desktop™ suite, packaged as a stand-alone application. Use it to convert translation memories to training corpora for any machine translation system.

Slate Corpus™ extracts segments from localization files like TMX and XLIFF, organizes and by client, subject matter and project type in an inventory. It also cleans, prepares and converts the translation memory segments into the training corpora that build translation engines.

Slate Corpus™ is built-in to Slate Desktop™ and Slate Desktop Pro™ suites. This stand-alone application is for customers who want to organize or clean translation memory on a PC, separately from the Slate Desktop™ PC.

These features and functions come with Slate Corpus™.

All Languages

Support for all languages and all new languages as they are released in maintenance updates. Your translation memories and Slate Desktop™ can create engines that translate between any combination of the 34 language pairs. That’s 1,122 language pair combinations.

Privacy & Confidentiality

Slate Corpus™ runs on your PC. There’s no the Internet connection. They don’t log your activities like online subscription services. You’re fully in control of confidential work.

Build Engines (Models)

Organize Translation Memories

Tools that organize an inventory of translation memories by client, subject matter and project type. More tools prepare translation memories as training corpus to build translation engines.

Support Utilities

Sample Translation Memories

Sample translation memories and other files to help you practice and learn.

Scripting Automation

Tools that automate repetitive and complicated Slate Desktop™ tasks to efficiently process large projects support a command line terminal or integration into your third-party applications.

You need to provide the following before working with Slate Corpus™.

Hardware System Requirements
  • Intel Core i3 (i7 recommended) or AMD Athlon 64 CPU (4-core x86-64, 2.4 GHz or faster)
  • 4 GB of RAM (8 GB recommended)
  • 2 GB of free hard drive space for installation
  • 250 GB (or more) free space on a high-performance drive is ideal after installation
Windows Operating System (Option)
  • Windows 7 64-bit Edition with Service Pack 1
  • Windows 8 or 8.1 64-bit Edition
  • Windows 10 64-bit Edition
  • 32-bit Editions not supported
Linux Operating System (Option)
  • Linux, x86_64 kernel version 3.2+
  • Ubuntu 16.04 or newer (other Debian-based on request)
  • CentOS/RHEL-based (other RPM-based on request)
MacOS Operating System (Option)
  • To be determined, currently unsupported
Translation Memories (corpus)

Personalized engines

  • 70,000 to 150,000 sentence segments
  • One full-time translator’s work for 3 to 4 years

Customized engines

  • 200,000 to 500,000 sentence segments
  • Support a team of translators
  • No upper limit number of segments
  • Too many segments risks degrading the engine
File Types

Slate Desktop™ reads and writes these standards-based localization file types:

  • Text files with UTF-8 character encoding, Linux or Windows new line separators
  • Tab-delimited files are specialized Text file (as above) with one tab per line. Text left of the tab is the source language. Text right of the tab is target language.
  • TMX – translation memory exchange up to version 1.4b
  • XLIFF – XML Localization Interchange File Format version 1.2 (.xlf, .xliff, .sdlxliff, .mxliff, .mqxliff)
  • Gettext .po and .mo files

You can work with file types through your computer-assisted translation (CAT), such as .docx, .xlsx, etc.


The installer installs and manages the following required dependencies.

Perl Scripting Runtime

Perl 64-bit version 5.28 or newer is a free open source scripting runtime environment.

Python Scripting Runtime

Python 64-bit version 3.72 or newer is a free open source scripting runtime environment. Required dependency libraries include: pip, pywin32, six, numpy, nltk, lxml, regex, polib, jieba, PyArabic, tinysegmenter3, hazm, wxPython

License and support included with your purchase of Slate Corpus™.

End User License Agreement

A one-time payment, royalty-free end-user license agreement (EULA) to use the software on your machine in perpetuity without subscriptions or usage fees.

Multiple Platforms

Install and activate on any supported operating systems. Today’s support includes MS Windows and Linux. MacOS is planned.

Single Activation

Install, activate and work on one machine. Build engines and work on the same computer.

Maintenance Updates

Maintenance updates are published occasionally with new languages, enhanced features and bug fixes.

Technical Support

Access to priority technical support during the period between major version updates via our online support portal, https://www.slate.rocks/support/.

Slate Corpus™ distributes these components under their respective open source licenses.

Slate Toolkit™ Language Tokenizers

Language tokenizers from Slate Toolkit™ and other open source licensed utilities to tokenize text. That is, utilities that insert spaces between words, punctuation and symbols.


PayPal Acceptance Mark

Secured by PayPal