The Moses Decoder is one of many open source components that help to make Slate™ possible. These instructional videos cover basic concepts. Learn the academic fundamentals of statistical machine translation (SMT). Then watch the Slate videos to compare how Slate automates meticulous Moses tasks.
This presentation provides a brief overview of the history of machine translation and the approaches that were developed during that history. It then focuses on statistical machine translation including its different flavors, the process of training an SMT system with training data and the decoding process to perform translations.
Training data is the essential ingredient for statistical MT systems. This presentation describes parallel and monolingual data, where to obtain it, and how to combine and select data to achieve the highest quality MT output.
This presentation and screencast describes the required training data format for the Moses SMT system and shows how to convert data into this format. It also shows how to align text from translated documents and how to convert TMX files to source more data for SMT training.
Once data is converted into the right format, it needs to be tokenized and cleaned before it can be used to train a SMT system. This presentation explains tokenization and word segmentation for East Asian languages and outlines cleaning options for SMT training data, used by many MT vendors. The presentation provides guidance on which data cleaning to apply and how to apply it to obtain the best quality MT system. For some languages it is beneficial to add linguistic information to the SMT system. This is also described.
This presentation contains an overview of the Moses machine translation system, of associated components and the requirements on how to obtain and run the system. It also describes the history of Moses and the larger open-source Moses ecosystem including the development process, support and opportunities to contribute.
This screencast shows how to train a small Moses SMT system with the training data prepared in earlier screencasts, how to tune the trained system using a tuning set and finally how to perform translations with the trained system.