Slate™ Toolkit Edition Statistical Machine Translation Utilities for Windows and Linux

Slate™ Toolkit Edition is a collection of free open source software for phrase-based statistical machine translation (SMT) on 64-bit Microsoft Windows and Ubuntu Linux operating systems. It has all the open source components needed to generate and use SMT models.

Specifications

Slate™ Toolkit Edition a consists of command-line utilities for 64-bit Windows and Linux operating systems. It helps if you are familiar with the Moses open source project. If you don’t know what that means, check out Slate™ Starter Edition.

Features Table

MS Windows
Linux
Perl Runtime (engine management)
Python Runtime (everything)
GNU Utilities (engine management)
MGIZA++ (engine management)
Moses Decoder (connectors)
Moses Toolkit (engine management)
Moses Demo (education)
MS Windows Operating Systems
Ubuntu Linux Operating Systems

README

We created Slate™ Toolkit Edition to manage phrase-based SMT models in our other Slate™ products on various operating systems. Utilities that do not support this goal are either missing or untested and not supported.

We distribute these utilities without warranty or support. You may redistribute each under its respective open source license.

Hardware System Requirements

  • Intel Core i3 (i7 recommended) or AMD Athlon 64 CPU (4-core x86-64, 2.4 GHz or faster)
  • 4 GB of RAM (8 GB recommended)
  • 2 GB of available hard-disk space for installation
  • 250 GB (or more) of additional free space on a high-performance drive is required after installation

Windows Operating System Requirements

  • Microsoft Windows XP Professional x64 Edition with Service Pack 3
  • Microsoft Windows 7 64-bit Edition with Service Pack 1
  • Microsoft Windows 8 or 8.1 64-bit Edition
  • Microsoft Windows 10 64-bit Edition
  • Microsoft Windows Server 2003 R2 x64 Edition
  • Microsoft Windows Server 2008 x64 Edition
  • Microsoft Windows Server 2012 x64 Edition

Scripting Runtimes

Slate™ Toolkit Edition for Windows installs scripting runtimes and updates the system %PATH%.

  • Perl 64-bit version 5.26 or newer. We install Strawberry Perl
  • Python 64-bit version 2.7. We install a version from Python.org

Additional Utilities

Slate™ Toolkit Edition for Windows include these utilities.

  • sort.exe (GNU coreutils version 7.6)
  • split.exe (GNU coreutils version 5.3.0)
  • libiconv2.dll and libintl3.dll (GNU coreutils version 5.3.0)
  • gzip.exe (version 1.3.12, also copied as gunzip.exe and bzcat.exe)

These are neither maintained nor supported by us, but we will update them when needed.

Linux Operating System Requirements

  • Ubuntu Linux x86_64 kernel version 3.2+
  • GNU standard library and command-line utilities.

Scripting Runtimes

The Linux package works “out of the box” on a standard Ubuntu 12.04 or newer systems.

On Red Hat-based systems you may need to install Perl and/or Perl’s Date::Format package:

yum install perl perl-TimeDate

Other Linux systems may work and we welcome your feedback about your experiences.

Features

Supported Utilities

We maintain the following utilities, make the installer and support their cross-platform functionality for phrase-based SMT. They may work with other SMT modes, such as factored phrase-based or hierarchical but we do not test these modes. We would like to hear about your experiences.

From MGIZA++:

  • mgiza(.exe)
  • mkcls(.exe)
  • snt2cooc(.exe)
  • merge_alignment.py

From Moses:

  • build_binary(.exe)
  • consolidate(.exe)
  • evaluator(.exe)
  • extract(.exe)
  • extractor(.exe)
  • lexical-reordering-score(.exe)
  • lmplz(.exe)
  • mert(.exe)
  • moses(.exe)
  • processLexicalTable(.exe)
  • processPhraseTable(.exe)
  • query(.exe)
  • score(.exe)
  • symal(.exe)
  • extract-parallel.perl
  • filter-model-given-input.pl
  • filter-rule-table.py
  • flexibility_score.py
  • giza2bal.pl
  • LexicalTranslationModel.pm
  • mert-moses.pl
  • moses_sim_pe.py
  • reduce_combine.pl
  • score-parallel.perl
  • train-model.perl

If you need a particular utility that’s not listed here to run cross-platform (e.g. on Windows), please let us know. We may be able to add them to Slate™ Toolkit Edition.

Supported Features

We support the following features with the utilities listed above. Feature not listed here are not supported. See the “Unsupported Features” section below.

  • PhraseDictionaryMemory
  • PhraseDictionaryBinary
  • LexicalReordering (memory)
  • LexicalReordering (binary)
  • KenLM (all modes)
  • max-kenlm-order=12
  • with-xmlrpc-c (support for -xml-input)
  • cmph support

Unsupported Features

Utilities included in Slate™ Toolkit Edition but not on the list above may work, but we do not support them. This includes, for example:

  • clean-corpus-n.perl
  • detokenizer.perl
  • lowercase.perl
  • snt2cooc.pl
  • tokenizer.perl

Slate™ Toolkit Edition does not include all of the utilities found in the Moses and MGIZA++. We do not support utilities that are not listed above. For example, these are not include or support:

  • BerkeleyAligner
  • PhraseDictionaryOnDisk
  • PhraseDictionaryCompact
  • LexicalReordering (compact)
  • IRSTLM
  • RandLM
  • SRILM
  • hierarchical models
  • suffix arrays
  • bilingual language models

This list may change as we update Slate™ Toolkit Edition. Please contact us if a feature you need is missing.

Getting started

To work with this package, you should be familiar with Moses and MGIZA++.

The command lines for the various utilities can be long and error-prone. So, we have included shell scripts (Windows .cmd files and Bash .sh files) to demonstrate the command lines that take you through the paces on a sample corpus, from training to translation. These are meant for you to read and use as examples.

You can run the whole thing by executing the demo-all script. Just run the script in-place.

The outputs to some scripts become the inputs to others. Therefore, when you run the scripts individually, please follow the order as referenced in the demo-all script.

Caveats

This software is originally written and maintained for academic use on Unix-like systems. You will notice this in many places, and there are a few things you can do to protect yourself from problems.

Naming of files and folders

Unix and Windows deal with locations of files and folders in different ways:

  • Windows paths use drive letters and backslashes; Unix paths use slashes to indicate where a file is. Slate™ Toolkit Edition generally supports each system’s native style, but if you run into glitches, please let us know.
  • In Windows, “a.txt” and “A.txt” are the same file; in Unix they are different. Avoid names that can be confused in this way, and make sure you capitalize all names consistently. The software may not always realize that the two are the same name on your system.
  • Unix software often uses whitespace to separate one filename from another. Files or folders are allowed to have whitespace in them, but it often brings out bugs in software. Avoid whitespace in file and folder path names.
  • Handling of non-ASCII characters can differ between individual computers depending on configuration, and likewise often triggers software bugs.
  • Many punctuation marks can have special meanings on different systems, such as colons, quotes and apostrophes, equals signs, dollar signs, percentage signs, asterisks, tildes, and so on. Avoid these marks. Keep it simple! When in doubt, use dashes and/or underscores.

For trouble-free use, we recommend that you use only files and folders with names consisting exclusively of ASCII letters (a-z), digits (0-9), dots, and dashes or underscores (-, _). With your help and patience we hope to improve the user experience over time.

Line endings

On Windows systems, a line of text ends in a fixed sequence of two characters: carriage return and line feed, also written as “”. Unix systems use just the line feed, or “”.

This may confuse some tools when dealing with files that were not written with your system’s native line endings. Windows Notepad may show all contents in a Unix text file as a single, long line; or instead of returning to the starting column for every new line, some software may just start the next line right below where the last one ended. Rare Unix tools may interpret the carriage returns as “jump back to the beginning of the line and erase all text that was previously displayed.”

Slate™ Toolkit Edition accepts input files with either style of line endings. It generally creates output files with platform-specific line endings, but at times it creates Unix-style files on Windows systems. This may not be perfect and with your feedback we hope to improve it over time.

Contact Information

Slate Rocks
Web: https://slate.rocks/
Email: [email protected]

New “slate”

Translators and Teams

How It Works

How Slate™ Works

How To Use It

How To Use Slate™

Benefits

Slate™ Benefits

  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •