Slate™ Toolkit

Details

If you’re looking for a ready-made open source package to build machine translation solutions, you found it. Slate™ Toolkit adds phrase-based SMT model support to our commercial Windows and Linux products. To our knowledge, this package is the world’s only complete collection of 64-bit Windows-compatible phrase-based SMT utilities.

Getting Started

To work with this package, you must be familiar with Moses and MGIZA++ or be willing to invest significant time to learn the tools. You’ll find the best learning resources on the Moses website at http://statmt.org/moses/

Lengthy command lines from the various open source utilities can be error-prone. Therefore, this toolkit includes Windows .cmd and Bash .sh shell scripts, plus a very small sample training corpus. These do not constitute a production-ready environment. Rather, they demonstrate the open source command lines for essential steps that prepare corpora, train & tune models and translate text.

The demo-all script is the best place to start. Just run it in-place.

Outputs from upstream scripts become the inputs to downstream scripts. Therefore, the script names are numbered in the order to follow when you run them individually. The order is also referenced in the demo-all script.

Supported SMT Utilities

Support for open source utilities maintains cross-platform functionality of phrase-based “mode” of statistical machine translation. Factored phrase-based, hierarchical and other SMT modes might work, but are not tested or supported.

Supported SMT Features

Support for phrase dictionary, lexical reordering, language modeling and other advanced features is maintained through the supported utilities.

Unsupported Utilities & Features

Open source utilities and features that are not used in our commercial products are either missing or untested, and therefore unsupported. This collection distributes all utilities under their respective open source licenses without warranty. The README file has more details.

Features

These features and functions come with Slate™ Toolkit.

Languages

Support for 29 languages in any combination. With your TMs, you can create engines that translate between any of the 812 possible language pairs.

Sample Translation Memories

Sample translation memories and other files to help you practice and learn.

Software Development Tools

Slate™ Toolkit is the kernel that provides phrase-based statistical machine translation functionality in our proprietary applications. A skilled software engineer can use it to create their own programs with features like these.

  • Organize Translation Memories
  • Build Customized Engines
  • Evaluate Engines
  • Deploy Engines
  • Pre-Translate Files
  • Plugins Connect to CAT Tools
  • Forced Terminology
  • Terminology On-The-Fly
  • Weighted Updates
  • Backup & Restore

Requirements

You need to provide the following before working with Slate™ Toolkit.

Hardware System Requirements
  • Intel Core i3 (i7 recommended) or AMD Athlon 64 CPU (4-core x86-64, 2.4 GHz or faster)
  • 4 GB of RAM (8 GB recommended)
  • 2 GB of free hard-disk space for installation
  • 250 GB (or more) of additional free space on a high-performance drive is required after installation
Windows Operating System (Option)
Windows 10 Logo
  • Windows 7 64-bit Edition with Service Pack 1
  • Windows 8 or 8.1 64-bit Edition
  • Windows 10 64-bit Edition
  • 32-bit Editions not supported
Linux Operating System (Option)
Linux Logo
  • Ubuntu 16.04 or newer (other Debian-based on request)
  • CentOS/RHEL-based (other RPM-based on request)
MacOS Operating System (Option)
MacOS Logo
  • To be determined, currently unsupported
Translation Memories

Personalized engine:

  • 70,000 to 150,000 sentence segments
  • One full-time translator’s work for 3 to 4 years

Customized engine:

  • 200,000 to 500,000 sentence segments
  • Support a team of translators
  • There’s no upper limit to the number of segments
  • Too many segments is an opportunity for variety that degrades performance

File Types

SlateToolkit can use these file types:

  • Text files with UTF-8 character encoding, Linux or Windows new line separators
  • Tab-delimited files are specialized Text file (as above) with one tab per line. Text left of the tab is the source language. Text right of the tab is target language.

The installer installs and manages the following required dependencies.

GNU Utilities

The GNU Utilities are essential open source utilities that the Slate™ Toolkit needs to create models. The MS Windows package installs and updates them but we do not maintain them.

Perl Scripting Runtime

Perl 64-bit version 5.28 or newer is a free open source scripting runtime environment.

Python Scripting Runtime

Python 64-bit version 3.72 or newer is a free open source scripting runtime environment. Required dependency libraries include: pip, pywin32, six, numpy, nltk, lxml, regex, polib, jieba, PyArabic, tinysegmenter3, hazm, wxPython

License

License and support included with your purchase of Slate™ Toolkit.

Open Source Licenses

Open Source software components from many different projects written by many authors and contributors are distributed under their respective open source licenses. In turn, they can have sub-components from different sources that could be licensed differently.

Multiple Platforms
MacOS Logo
Linux Logo
Windows 10 Logo

Install and activate on any supported operating systems. Today’s support includes MS Windows and Linux. MacOS is planned.

Free Open Source

You may redistribute this package and its utilities freely under their respective open source licenses. The price is a packaging and distribution fee.

Open Source

Slate™ Toolkit distributes these components under their respective open source licenses.

Slate Demo

Shell scripts (Windows .cmd files and Bash .sh files) and a sample training corpus to demonstrate native Moses and MGIZA++ utilities.

MGIZA++

MGIZA++ is an essential open source utility that is the Slate™ Toolkit uses to create statistical machine translation models.

Moses Decoder

The decoder is the utility that converts source to target text using statistical machine translation models created by the toolkit.

Moses Toolkit

The toolkit is a collection of open source utilities, derived from the original Moses open source project, that create SMT models.

GNU Utilities

The GNU Utilities are essential open source utilities that the Slate™ Toolkit needs to create models. The MS Windows package installs and updates them but we do not maintain them.

Social Sharing Discounts

A win! win! win! You get 10% off at checkout. Your colleagues learn about Slate. We have lower advertising expenses. Thank you!