Guidelines to prepare your translation memories. You can start even before installing Slate Desktop™.
Slate Desktop™ imports TMX (translation memory exchange) files, not the CAT tool’s translation memory files. Therefore, you need to convert your CAT translation memories to TMX files. Use your CAT or other TM tool to export native translation memories to TMX files. Here are some Dos and Don’ts to get the best results when creating the TMX files.
- Do: Normalize inconsistent metadata before converting the segments to TMX.
For example, some segments may have subject “IT” and others “Information Technology” or you may have spelled client names’ or translators’ names differently over time. Use the CAT to convert the metadata to consistent values. Slate Desktop™‘s Curating TMX translation units process can do this, but your CAT tool might be easier to work with.
- Do: Configure the TM tool to preserve metadata in the TMX file, especially the values you normalized.
- Do: Replace substitution variables/placeholders with meaningful content.
- Do: Be patient. Large translation memories can take a long time.
- Do: Export your translation units by metadata to create more but smaller TMX files.
- Do: Despite the don’ts below, if you insist on your own preparation processing, make sure your intention is to to remove or correct formatting errors or reduce the sentences to their core semantic components.
- Do: Take the time to curate/organize the translation memories you import using meaningful labels. This is a better investment of your time than manual cleaning. It opens the opportunity to re-mix and prioritize some translation memories influence an engine’s performance.
- Don’t: Don’t try to guess which translation units you should include. Include everything that you have done. The ideal export would include everything you’ve ever translated.
- Don’t: Don’t spend time with XBench or other tools to clean or improve your segments. Slate Desktop™‘s processing is optimized to convert your TMX files to machine translation training corpus. These read these articles describe the processes:
- Don’t: Don’t over-process. It is possible to over-process your TMs.
For example, it might seem like a good thing to normalize all source language flat double quotes from
”but there’s a down-side. The engine won’t learn how to translate flat double quotes
"and creates randomized results. Instead, it’s best to normalize the target language half of your TM and with the natural variations remaining in the source language.
- Don’t: Don’t use big-mama translation memories as a monolithic unit. Curate them. Your engine will almost always give you better linguistic performance. It’s not guaranteed but you can only test and experiment with various mixes after you have a fully curated inventory.
This is not an exhaustive list of dos and don’ts. More information about working with translation memories is in chapter The Training Corpus.