search

SELECT LANGUAGE:

Two separate sets of couples collaborate in a work setting

Machine Translation Customization vs. Machine Translation Training

When to use each approach to enhance Machine Translation output

Companies are now turning to Machine Translation (MT) more than ever, and we expect that number to grow. You can attribute this trend to the technology’s increasingly predictable results and intense market pressure to produce more content quickly in many languages — within the same or even smaller budget. MT technology delivers translations with speed and cost efficiency in a way that human translators cannot, but companies must also address quality issues. To succeed in increasingly digital markets, they must provide personalized multilingual content that is domain-specific, hits a specific tone, and maintains a consistent brand voice across all channels.

How can you get the most out of your MT initiatives to better achieve those goals? There are two methods to bolster Machine Translation’s effectiveness: Machine Translation customization and Machine Translation training. While each approach can improve MT output quality and reduce the need for post-editing, companies cannot use MT customization and MT training interchangeably.

Read on to learn how the methods work, their differences, and how to select the right approach as dictated by your use case.

Why Can’t a Company Solely Rely on Generic MT?

Companies typically get desired results when using MT for general, straightforward content from generic, untrained Machine Translation engines, such as Google NMT, Bing NMT, Amazon, DeepL, or Yandex. But output also has the potential to fall short.

Why? A generic engine is frequently unable to translate highly specialized content, let’s say in the life sciences or legal industries, and words associated with these specific domains. It cannot figure out when to appropriately apply the correct definition of a word with two meanings. And it cannot preserve your unique brand voice and determine when formal vs. informal language levels are needed to best connect with your audience.

MT customization and MT training address these deficits to achieve better translation output when you have specific requirements the generic engines won’t meet.

What Is MT Customization?

MT customization is an adaptation of a pre-existing Machine Translation engine with a translation glossary and Do Not Translate (DNT) list to improve the accuracy of machine-generated translations. (A translation glossary is a collection of important terms to a company and their translations. A DNT list is a collection of terms a company does not want to translate.)

MT customization works by uploading a list of these source terms with their translations before the engine executes its work. The list instructs the MT engine how to translate the terms — or intentionally prevents their translations. This intervention improves the engine’s suggestions and enables the company to maintain its brand name, adhere to terminology, and achieve regional variations. Superior translations reduce the need for post-editing.

MT customization is generally easier to execute than MT training, though there are some caveats when implementing this approach. While uploading terms into a Machine Translation system is a straightforward process, it can be challenging to select the correct terms. The success of MT customization is highly dependent on the MT expert’s skill level and ability to manage input and output normalization rules, DNT lists, and glossaries, which all work to improve output. Inexperienced authors can inadvertently cause the MT to make poor suggestions and negatively impact overall quality.

What Is MT Training?

MT training is a process that involves building and training an MT engine by using extensive bilingual data from corpora and Translation Memories (previously translated content) to improve the accuracy of machine-generated translations.

It works by feeding the generic MT engine training with company-specific bilingual corpora. It accepts inputs via various exports, frequently in a Translation Memory (TM) format. In addition to providing the previously approved translation, the Translation Memory delivers valuable meta-like data, such as when the sentence was translated, by whom, and whether it is an exact match or a less-than-precise, fuzzy match. This data enables the engine to learn what a company expects in the translation. Instead of making a generic translation suggestion based on what it believes the source should be translated to, it generates a customized output based on the corpora.

MT training enables a company to fine-tune output to achieve a specific brand voice or style due to the engine’s ability to produce more consistent translations. You can bypass the default setting of generic MT engines that produce a formal tone to achieve an informal tone instead. Like MT customization, a company will achieve desired results with less post-editing since the engine is more apt to generate accurate translations with fewer errors.

During MT training, a company provides the engine with as much knowledge as possible; high-quality segments will yield better-quality output. Successful MT training requires a company to provide the training with a minimum of 15K unique bilingual segments that are of high quality and free from inconsistencies and source translation duplications. If a company does not meet these minimum requirements, the training will likely fail to impact the output in a major — or any — way.

What Is the Difference Between MT Customization and MT Training?

While the two approaches work to enhance MT output and reduce post-editing, the similarities stop there. They are not interchangeable.

The methods differ in the following way: MT customization tailors a pre-existing MT engine with glossaries and Do Not Translate (DNT) lists, while MT training builds and trains the engine from scratch through lots of bilingual data from corpora and Translation Memories.

Customization is more versatile than MT training and will generate suggestions that meet most companies’ requirements. However, a one-off cost is associated with customization, which involves updating the profile that goes into the MT engine. There are some additional costs to maintain a glossary over time.

MT training is most appropriate for sophisticated companies with highly specialized content and complex use cases. When implementing MT training, there are costs associated with the first training and potential costs for additional training, which may be considered over time if the MT performance monitoring indicates room for improvement.

A geographic pattern overlays a cityscape at night

When Should My Company Consider Using MT Training vs. MT Customization?

Does your company need to translate scientific material or highly technical manuals? Do you need to preserve your unique brand voice? The answers to these questions can dictate whether it is best to use MT customization or MT training.

When to use MT customization

There are two important use cases for MT customization. Use it when you need to achieve the following:

  • Accurate translations of terminology
  • Regional variations, such as English (United States) vs. English (United Kingdom), but you have insufficient data for training

MT customization is a good choice for technological and detail-oriented content since it is critical to translate terminology correctly for this type of content. MT customization is the preferred approach when you lack enough data for MT training to be effective.

When to use MT training

There are two important use cases for MT training. Use it when you need to achieve the following:

  • A specific brand voice, tone, or style while ensuring the need for less post-editing
  • Regional variations of a target language, such as French (Switzerland) vs. French (France), and you have sufficient data for trainings

MT training is a good choice when translating marketing and creative content since specific brand voice, tone, and style are essential elements of this type of content. However, be sure you possess enough data to train the engines successfully.

A hybrid approach

At times, a hybrid approach produces the best results. For instance, MT may generate better suggestions when companies augment MT training with some customization.

Lionbridge enables its customers to implement a hybrid approach with ease. Customers can customize their MT via Lionbridge’s enterprise MT solution, Smart MT™ Portal while, at the same time, opting to buy professional training services from Lionbridge’s skilled teams. When working with these teams, companies typically approach MT more holistically and often use a combination of MT training and MT customization for the best output. Various tests will enable them to understand better what is yielding the best results and drive a tailored MT approach.

MT Customization vs. MT Training: Which Strategy Is Better?

Selecting the best approach to enhance MT output depends on your situation. As you explore options, it may be tempting to consider MT training as the first and only method to get the most out of your MT. Or you may be intrigued by the hype around continuous training. Here are some things to keep in mind as you investigate your options.

Avoid pitfall #1: The offering of MT training as the sole solution

MT training can be a highly effective tool to achieve improved MT output, but only when it addresses identified and targeted concerns.

With the increased use of MT, many providers make MT training their go-to solution to try to provide value to their customers. However, this approach can backfire in some instances. Some companies that have solely used training with hopes of better MT output have subsequently sought Lionbridge’s services, expressing disappointment with the training after conducting a cost-benefit analysis. They were not impressed with the engine’s generated suggestions and sought a more cost-effective solution. Why were they dissatisfied? Simply put, there were better approaches based on their specific circumstances.

Innovative MT providers, like Lionbridge, use MT training when appropriate but heavily rely on customization to achieve desired MT results at a lower cost than MT training.

Avoid pitfall #2: The hype around continuous training during MT training

As you investigate MT solutions, you may find providers promoting the concept of continuously trained engines after individual projects are completed. Be wary of such claims. Continuous training is only possible if you deal with bespoke engines that require constant updating.

We want to underscore that MT training will only be successful if an individual project has at least 15K unique segments to train the engine. When companies do not have enough data, they may use project content to update customization features, referred to as “training” in many cases.

The bottom line

Customization is a more versatile tool than MT training. It will generate MT suggestions that will meet most companies’ requirements. With customization, you can sufficiently improve MT suggestions to maintain your brand name and adhere to terminology, thus reducing a post editor’s work to verify these items. A one-time cost to update the profile that goes to the MT engine and some ongoing costs to maintain a glossary over time are typically less expensive than the costs associated with MT training.

A geographic pattern overlays a cityscape at night

What Are the Best Practices of MT Customization?

When implementing MT customization, be sure to follow best practices.

Input and output normalization rules

Put a library of input and output normalization rules in place for the most-used languages to control the input to MT and enhance its output. These rules will enable you to meet your specific requirements.

For instance, an input normalization rule may instruct the MT engine to use les guillemets [« … »] instead of double quotes [“...”] for its output of French translations. This rule enhances the output of French translations as French-speaking readers expect to see les guillemets instead of double quotes. Companies may apply input and output normalization rules to enable similar modifications that address regional language variations for parent languages, such as French (Belgium), French (Canada), French (Africa), and so forth.

Do Not Translate lists and rules

Create a list of terms you do not want to be translated and a rule that replaces any identified Do Not Translate (DNT) term with a token before it goes to the engine. This action will make the term invisible to the engine and prevent it from being translated. After the translation has been processed and the MT suggestion is returned, set the output normalization rule to replace the token with the DNT term.

Glossary preparation

Prepare your glossary carefully to promote accurate, consistent translations. Consider the key factors shown in Table 1 when deciding whether to include a term in your glossary.

General Guidelines When Compiling a Glossary

Consideration What to ask Should the term be included in the glossary?*
Frequency How often does the term appear in the source text? If the term occurs infrequently, don’t include it.
Ambiguity Does the term have multiple meanings, or can it easily be confused with other words? If the term is ambiguous, include it. (Note: Be sure alternate meanings of the term rarely appear in the source text.)
Specialized terminology Is the term specific to a particular domain or subject area? If yes, include it.
Consistency Has the term been translated consistently in the past? If yes, don’t include it.
Importance How important is the term to the text’s overall meaning? If it is central to the meaning of the text, include it.
Complexity Is the term complex, and will it be difficult for the Machine Translation system to translate it accurately? If yes, include it.

Table 1. Factors to consider when creating a glossary.

*There may be exceptions to these general guidelines.

Do’s and don’ts

We also recommend the following do’s and don’ts during glossary creation:

  • Don’t include generic terms — such as single words, verbs, and adjectives — that don’t work well with MT and may negatively impact general quality, sentence construction, agreement, and word order
  • Don’t split long terms
  • Don’t include conflicting terms
  • Don’t include duplicate entries
  • Do use only one term entry per source language
  • Do use multiword expressions
  • Do use specific product names
  • Do use DNT terms

How Does Lionbridge Approach MT Customization and MT Training?

Lionbridge’s Smart MT Portal makes it easy for our customers to implement MT customization, and our technology allows customization to work across multiple MT engines simultaneously. You compile your MT glossaries and DNT lists and upload these terms; they are then used for every MT engine. The technology enables you to avoid engine lock-in and change engines anytime for the best results.

Additionally, it’s easy to supplement our MT technology with relevant services by our MT experts. When engaged, we help companies identify the most effective MT strategy and how to execute that strategy best.

Whether you are just beginning to explore MT use, you want to improve existing MT efforts through customization, or MT training becomes a viable approach due to a growth in your content creation — we have a solution to meet your needs.

How Do Machine Translation Training and Machine Translation Customization Compare With One Another?

Compare MT training and MT customization at-a-glance in Table 2 to see which method is appropriate for your content.

Machine Translation Customization vs. Machine Translation Training

  MT Customization MT Training
What it is and how it works An adaptation of a pre-existing Machine Translation engine with a glossary and Do Not Translate (DNT) list to improve the accuracy of machine-generated translations The building and training of an MT engine by using extensive bilingual data from corpora and Translation Memories (TMs) to improve the accuracy of machine-generated translations
What it does Improves MT’s suggestions for more accurate output and reduces the need for post-editing Improves MT’s suggestions for more accurate output and reduces the need for post-editing
Specific benefits Enables companies to adhere to their brand name and terminology and achieve regional variations Enables companies to attain a specific brand voice, tone, and style and achieve regional variations
The risks of using it The MT could make poor suggestions and negatively impact overall quality when executed improperly MT training may fail to impact output if there is not enough quality data to train the engine; the MT could generate poor suggestions and negatively impact overall quality if inexperienced authors overuse terminology
When to use it Ideal for technological and detail-oriented content and any content that requires:
*Accurate translations of terminology
*Regional variation, but you lack sufficient data for MT training
Ideal for highly specialized content, marketing and creative content, and any content that requires:
*A specific brand voice, tone, or style
*Regional variation, and you have enough data for MT training
Success factors An experienced MT expert who can successfully manage input and output normalization rules, glossaries, and DNT A minimum of 15K unique segments to adequately train the engine
Cost considerations There is a one-time cost to update the profile that goes into the MT engine and some ongoing costs to maintain a glossary over time; costs are relatively inexpensive when factoring in the potential benefits and are typically lower than MT training costs There are costs associated with the first training and potential costs for additional training, which may be considered over time if the MT performance monitoring indicates room for improvement; MT training can be worth the investment in certain cases when factoring in the potential benefits

Table 2. A comparison between MT customization and MT training

Get in touch

If you’d like to further explore how we can help you fully leverage Machine Translation, contact us today.

linkedin sharing button

Thomas McCarthy with Janette Mandell
AUTHOR
Thomas McCarthy with Janette Mandell
  • #ai
  • #blog_posts
  • #translation_localization