Training Data – Conversion and Corpus Preparation

This presentation and screencast describes the required training data format for the Moses SMT system and shows how to convert data into this format. It also shows how to align text from translated documents and how to convert TMX files to source more data for SMT training.

This video shows the required steps to use Slate Toolkit™. It was recorded on a Linux machine. The same steps work in a Command Prompt or PowerShell terminal on MS Windows. Slate Desktop™‘s user-friendly graphical user interface replaces terminal commands.

The presenter talks about “best practices” of 2014 when the video was recorded. Slate Desktop™ uses newer best practices based on lessons learned since this video was created.

Published on Jul 7, 2014