Our people are our pride, helping companies resonate with their customers for 20+ years. 

About Us
Key Facts
Trust Center


Harness the Lionbridge Lainguage Cloud to support your end-to-end localization and content lifecycle

Lainguage Cloud™ Platform
Translation Community
Workflow Technology
Smairt Content™
Smairt MT™
Smairt Data™
Language Quality


MT Engine Analysis

Lionbridge Machine Translation experts examine top MT engine performance and share insights into the latest industry trends. 

How seriously are big tech companies taking Machine Translation (MT)? What are they doing to try to break away from the pack? Which engines perform best in any given month or any given language? These are some of the questions Lionbridge MT specialists set out to answer each month. Arm yourself with knowledge to make wise MT investments.

Executive summary for each month:

January 2023 — Translation quality comparison between ChatGPT and the major MT engines

November 2022 — Microsoft MT improvement

October 2022 — MT and language formality

September 2022 — Using terminology for enhanced MT quality

August 2022 — Overcoming catastrophic errors during MT

July 2022 — Language ranking for MT

June 2022 — Accurately analyzing MT quality

May 2022 — Amazon and Yandex performance in May

April 2022 — Yandex performance in April

March 2022 — Custom MT comparative evaluations

February 2022 — The future of Neural Machine Translation (NMT) 

January 2022 — MT engine performance in January

December 2021 — Lionbridge adds Yandex MT to the MT Quality Tracker competitive check

November 2021 — Bing Translator makes improvements

October 2021 — How Amazon’s MT engine is progressing

September 2021 — Amazon makes improvements to MT quality

August 2021 — Top tech companies and their MT engine development

January 2023

Would Large Language Models (LLMs) be a good alternative to a Neural Machine Translation (NMT) paradigm for Machine Translation (MT)? To find out, we compared the translation performance of ChatGPT, OpenAI’s latest version of its GPT-3 family of LLMs, to the five major MT engines we use in our MT Quality Tracking.

As expected, specialized NMT engines translate better than ChatGPT. But surprisingly, ChatGPT does a respectable job. As shown in Figure 1, ChatGPT performed almost as well as the specialized engines.

We calculated the quality level based on the inverse edit distance using multiple references for the English-to-Spanish language pair. The edit distance measures the number of edits a human must make to the MT output for the resulting translation to be as good as a human translation. For our calculation, we compared the raw MT output against 10 different human translations — multiple references — instead of just one human translation. The inverse edit distance means the higher the resulting number, the better the quality.

Figure 1. Comparison of automated translation quality between ChatGPT and the major Machine Translation engines based on the inverse edit distance using multiple references for the English-to-Spanish language pair.

These results are remarkable because the generic model has been trained to do Natural Language Processing (NLP) tasks and has not been specifically trained to execute translations. ChatGPT’s performance is similar to the quality level MT engines produced two or three years ago.

Given the evolution of LLMs — based on the public’s attention and the significant investments tech companies are making in this technology — we may soon see whether ChatGPT overtakes MT engines or whether MT will start adopting a new LLM paradigm. MT may use LLMs as a base but then fine-tune the technology specifically for Machine Translation. It would be like what OpenAI and other LLM companies are doing to improve their generic models for specific use cases, such as making it possible for the machines to communicate with humans in a conversational manner. Specialization adds accuracy to the performed tasks.

One great thing about these Large Language "Generic” Models is that they can do many different things and offer outstanding quality in most of their tasks. For example, DeepMind’s GATO, another general intelligence model, has been tested in more than 600 tasks, with State-of-the-Art (SOTA) results in 400 of them.

Two development lines will continue to exist — generic models, such as GPT, Megatron, and GATO, and specialized models for specific purposes based on those generic models. The generic models are important for advancing Artificial Generic Intelligence (AGI) and possibly advancing even more impressive developments in the longer term. Specialized models will have practical uses in the short run for specific areas. One of the remarkable things about LLMs is that both lines can progress and work in parallel.

We are intrigued by what the future holds. We will continue to evaluate LLMs and publish the results so you can stay up to date on this exciting evolution. Read our blogs to delve deeper into ChatGPT’s translation performance and to learn more about ChatGTP and localization and why it’s probably a game-changer.


    —Rafa Moral, Lionbridge Vice President, Innovation


November 2022

We’ve seen a nice overall improvement in Microsoft’s Machine Translation (MT) results during October 11-November 1. With this recent quality increase by Bing Translator, the main MT engines are producing very similar results. As such, they face a tight battle for the top leadership position.

The major MT engines have not shown interesting improvements for months. Let’s hope this development from Microsoft breaks that trend and is the start of forthcoming progress by these engines. 

We went beyond our usual measure of single-reference translations and confirmed the Microsoft improvement results with a second tracking that encompassed multiple references. In this MT evaluation, we used 10 reference translations completed by humans — the gold standard — instead of just one translation to get a more precise Edit Distance metric that considers multiple possible correct translations in the final results.

As we reach the end of the year, we note that 2022 has had very flat MT results. We observed little change; this Microsoft Bing MT development may be the most notable advancement of the whole year. As commented on earlier in the year, the current MT paradigm may be reaching a plateau. We look forward to seeing what 2023 holds for Machine Translation.


    —Rafa Moral, Lionbridge Vice President, Innovation


October 2022

This month, we want to bring your attention to language formality and how difficult — but not impossible — it is to get it right when using Machine Translation (MT).

Machine Translation (MT) engines can produce incorrect and inconsistent formality. Why? MT models typically return a single translation for each input segment. When the input segment is ambiguous, the model must choose a translation among several valid options, regardless of the target audience. Letting the model choose between different valid options may result in inconsistent translations or translations that have an incorrect level of formality.

It is especially challenging to get the correct output when the source language has fewer formality levels than the target language. For instance, languages like French have well-defined formal modes — tu vs. vous — while English does not.

While most MT systems do not support language formality or gender parameters, we are seeing progress. At present, DeepL (API) and Amazon (console and SDK) offer features that control formality. Lionbridge’s Smairt MT™, an enterprise-grade Machine Translation solution, allows linguistic rules to be applied to the target text to produce Machine Translations with the desired style or formality.

It’s critical to effectively translate your source to meet the needs of your target audiences, which includes addressing formal and informal language in your MT output. Translations that come across as “off” or — even worse — as rude can put you at risk of alienating your audiences.

Read our blog to learn more about Machine Translation and formal vs. informal language.


    —Yolanda Martin, Lionbridge MT Specialist 


September 2022

It can be advantageous to use Machine Translation (MT), but you must proceed with caution. Generic MT engines can put out erroneous translations and can especially cause undesired results for specific domains from a terminological point of view. The impact can be particularly harmful to the medical and legal fields. But there are things you can do to enhance MT output.

Using terminology can enable you to improve the quality of MT and achieve accurate, consistent translations.

It’s imperative to train customized MT systems with domain-specific bilingual texts that include specialized terminology. Still, accurate translations cannot be guaranteed when engines are trained with specialized texts if the terminology is not used consistently. Research in this area proposes to inject linguistic information into Neural Machine Translation (NMT) systems. The implementation of manual or semi-automatic annotation depends on available resources, such as glossaries, and constraints, such as time, cost, and availability of human annotators.

Lionbridge’s Smairt MT™ allows the application of linguistic rules to the source and target text, as well as the enforcement of terminology based on Do Not Translate (DNT) and glossary lists added to a specific profile. We help our customers create and maintain glossaries, which are regularly refined to include new, relevant terms and retire obsolete terminology. When glossaries are created once in Smairt MT, they can then be used for all the MT engines, saving time and money.

Using glossaries for MT projects is not as simple as it may seem. Glossaries, if used inappropriately, can negatively affect the overall quality of Machine Translation. The best way to follow terminology in MT is through MT training. The combination of trained MT engines, glossary customization, and the identification of preprocessing and post-processing rules ensure MT output contains proper terminology and is similar in style to the customer's documentation.

Read our blog for more insight into using terminology to enhance MT output.


    —Yolanda Martin, Lionbridge MT Specialist 


August 2022

As companies increasingly rely on Machine Translation (MT) as a standard practice, employees will need to prevent the dissemination of catastrophic errors.

Catastrophic errors are more problematic than standard MT errors, which pertain to error typology related to linguistic features, such as spelling, grammar, or punctuation. Catastrophic errors transcend linguistics and occur when engine output dangerously deviates from the intended message. Resulting misinformation or misunderstandings have the potential to cause companies reputational, financial, or legal problems and may lead to adverse public safety or health consequences. It is essential to find ways to identify these errors and stop them from compromising your communications.

Lionbridge administers specific automated quality checks in translated texts to detect critical errors while preserving MT speed and reducing the need for human intervention.

These automated methods detect:

  • Opposite meanings between original and translated texts
  • Offensive, profane, or highly sensitive words
  • Incorrect translations of proper names of individuals and organizations that are also common words

Companies will be better protected from catastrophic errors when computer scientists improve existing MT technology to prevent these translation errors. Until such time, we can use automated technology to identify potential issues, revise problematic sentences, and promote accuracy during the translation process.

Read our blog for a more in-depth examination of catastrophic errors during Machine Translation.


    —Luis Javier Santiago, MT Group Leader,


    and Rafa Moral, Lionbridge Vice President, Innovation 


July 2022

Google NMT, Bing NMT, Amazon, DeepL, Yandex — which engine is best? Last month’s data — and the current general trend — show major engines perform similarly. That’s why it’s worthwhile to consider additional factors when developing your MT strategy, such as the ease with which MT engines translate specific language pairs.

Identifying how challenging it is for engines to handle specific language pairs will help you allocate your budget when planning translation costs across languages. For instance, you’ll need to allocate more effort to achieve high-quality translations when dealing with complex language pairs. Having insight into language complexity can help support your business decisions.

Ranking languages by translatability is not a straightforward process; however, we can use different metrics for evaluation. Edit Distance, which is the number of changes a post-editor makes to ensure the final text has a human quality, can provide a sense of MT complexity and translatability (machine-translatability or m-translatability) for each language pair.

Most Romance languages, such as Portuguese, Spanish, French and Italian, require fewer changes to reach high-quality levels when translated from English. We identified these target languages as the easiest for machines to handle, and they took the first four spots in our m-translatability ranking. Hungarian and Finnish — two Uralic languages — are more complex languages; they placed last in our ranking, taking the 27th and 28th spots. Estonian, another language in the same family, is also among the more complex languages. These results — based on millions of sentences processed by Lionbridge — underscore the importance of language families in MT results.

While intra language comparison has limitations, the ranking can provide some interesting insights to better manage multilingual projects. Read our blog to see the Lionbridge language ranking table in its entirety.


    —Rafa Moral, Lionbridge Vice President, Innovation 


June 2022

In June, we observed a tiny improvement in Russian translations by Yandex’s MT engine and a tiny dip in translation results by Microsoft Bing’s MT engine. Are these noteworthy changes or insignificant, spurious outcomes? To find out, we analyzed the results differently.

Instead of using a single gold standard that measures the distance from the MT translation to one “perfect” human translation, we used multiple reference translations. We compared each translation made by machines to 10 translations by professional translators. When we took this approach, the small fluctuations in translation quality by Yandex and Microsoft Bing in June disappeared. As such, we can conclude that there were no changes to the MT translation quality. June results are flat.

Sometimes data and its graphical representations may be misleading. This often happens when there are small deltas among different measurements. It’s good practice to use more than one approach to evaluate data to interpret results accurately.

We project little movement in MT engine quality in the coming months. We will use this section to provide analysis and general MT observations. Next month, look for comparisons among MT language pairs. We’ll explore whether it is possible to use data to classify languages and language families by MT complexity and determine whether machines can translate some language pairs easier than others.


    —Rafa Moral, Lionbridge Vice President, Innovation 


May 2022

It has primarily been another static month for the MT engines.

We’ve noticed Amazon has made an incremental improvement in the way its engine handles the English-Spanish pair. It is now the leading engine in this language pair. Amazon also made minor strides in the other languages, but smaller than its improvements to the English-Spanish pair. We speculate these advancements are due to some generic setting changes and as a result of work on the English-Spanish pair. Enhancements appear to affect the treatment of some special characters and strings with measurement expressions.

For the second month in a row, Yandex made minor improvements. Interestingly, these improvements also affect Spanish.

As we’ve previously noted, there have been no significant changes. All the engines perform similarly. In the coming months, we will analyze some specific MT areas and provide general observations. Of course, we will also track major developments.

—Rafa Moral, Lionbridge Vice President, Innovation


    —Rafa Moral, Lionbridge Vice President, Innovation 


April 2022

After several months of flat MT engine performance, Yandex has made some progress, particularly with its German engine.

In one detailed analysis, we saw advancements in Yandex engines' handling of sentences with punctuation characters — such as question marks, exclamation points, parentheses, and slashes — and units of measurement. These developments may result from some fine-tuning to the MT settings rather than improvements in the models. However, we also saw improvement in our tracking of rare terms, so Yandex’s progress may also be due to some refinements of the models or more data training.

Around this time last year, several MT engines showed some improvements that we found interesting. Is there a time pattern involved with these advancements? Will we see something like what we observed in 2021 this year? We’re tracking the MT performance of these engines, and we'll report our findings in the next month or so.

Generally, there is an increased interest in MT engine evaluation. Today, most everyone will agree that MT is a mature technology. People recognize the technology’s usefulness for almost any translation case — with or without human intervention and hybrid approaches. But MT users are still struggling to find the right way to evaluate, measure, and improve MT results.


    —Rafa Moral, Lionbridge Vice President, Innovation 


March 2022

If you’ve been following these pages, you’re familiar with our generic MT comparative evaluations. Each month, we identify which MT engines are performing best for given language pairs and track engine improvements. In March, the performance of the different MT engines was flat. It’s a trend we’ve been noticing for some time already. As we commented last month, it may indicate that a new MT paradigm is needed.

While we share generic results, companies are increasingly pursuing custom MT comparative evaluations. Unlike the generic version, these evaluations take a company’s specific needs into consideration when determining the most advantageous MT engines.

When a company wants to start using MT or improve the way it currently uses MT, it is critical to identify which MT engines will work best. When we execute custom evaluations, we take a similar approach to the one demonstrated on this page, but we make recommendations based on a company’s content type and language pair requirements.

While custom MT comparative evaluations have been available for years, there’s greater demand for them. We attribute this trend to the important role MT plays in helping companies succeed in a digital marketplace.


    —Rafa Moral, Lionbridge Vice President, Innovation 


February 2022

Google’s MT engine showed a tiny improvement during January and February of 2022, while the other engines we track remained stagnant. These observations may lead us to start asking some pointed questions. Is the Neural Machine Translation (NMT) paradigm reaching a plateau? Is a new paradigm shift needed given the engines’ inability to make significant strides? We observed similar trends when NMT replaced Statistical MT.

At the end of the Statistical MT era, there was virtually no change to MT quality output. In addition, the quality output of different MT engines converged. We see similar trends. While NMT may not be imminently replaced, if we believe in exponential growth and accelerating returns theories — and consider Rule-based MT’s 30-year run and Statistical MT’s decade-long prominence and note NMT is now in its sixth year — a new paradigm shift may not be so far away.


    —Rafa Moral, Lionbridge Vice President, Innovation 


January 2022

During January, the main Machine Translation (MT) engines did not show significant changes in their performance. 

Google demonstrated small, incremental improvements across some languages and domains. The performance of most of the other engines has been flat. Microsoft had improvements over the last few months, but performance plateaued in January. Overall, the quality of Google Translate continues to lead in general-purpose MT technology.  

In December, we added a fifth MT engine to our tracker. By monitoring Yandex, we can better analyze the MT quality of the Russian language.


    —Rafa Moral, Lionbridge Vice President, Innovation 


December 2021

In December, we added Yandex MT to our MT Quality Tracker comparative check.

According to our test sets, so far, Yandex:

  • Performs better than MS Bing, similarly to Google, and not as well as Amazon and DeepL for Russian.
  • Performs similarly to Amazon and MS Bing for German.
  • Does not perform as well as the main MT engines for the other language pairs we track.
  • Works well when addressing sentences that are longer than 50 words.

In other observations, MS Bing has improved its output in a nice way during the last months of 2021. In particular, translations into Chinese improved. Amazon has also made some strides. As we start the new year, Google is taking the baton and improving its output. Specifically, translations into Spanish, Russian, and German have improved. Yandex’s line has been flat during the five weeks we have been tracking it.


    —Rafa Moral, Lionbridge Vice President, Innovation 


November 2021

After a few weeks of experimentation and fluctuation in overall performance, it’s clear that Microsoft NLP Engineers are on to something. Bing Translator has shown overall improvements during the past few weeks and improvements for Chinese in particular, making this MT engine last month’s big winner. Bing Translator has closed some gaps in most areas, even surpassing the performance of some of its competitors. Bing Translator remains one of the most trainable engines, and its enhancements position it to be a good choice when building customized models that are specific to your content.


    —Jordi Macias, Lionbridge Vice President, Language Excellence


October 2021

Amazon’s Machine Translation (MT) engines continued to evolve positively during the month of October, building upon what they started doing about a month ago. These continued enhancements are the second set of incremental improvements we’ve seen in the last few months.  

As a reminder, here are some of the areas where Amazon’s MT engines have continue to evolve over the past couple of months:

  • They are putting out a more informal style than before
  • They treat units of measurement differently
  • Both imperial and metric measurements are now consistently put out
  • Imperial measurements now appear before metric measurements
  • Numbers that correspond to measurements are now translated and correct
  • "Euro" is now spelled out and replaces the currency symbol €


    —Jordi Macias, Lionbridge Vice President, Language Excellence


September 2021

September has proven to be a good month for Amazon’s Machine Translation (MT) engines. First, the company improved its MT quality output for the German and Russian languages. Then, we saw spikes for the Spanish and Chinese language pairs. These enhancements are the second set of incremental improvements we’ve seen in the last few months.

Here are some more changes to the Amazon MT engines:

  • They are putting out a more informal style than before 
  • They treat units of measurement differently 
  • Both imperial and metric measurements are now consistently put out 
  • Imperial measurements now appear before metric measurements 
  • Numbers that correspond to measurements are now translated and correct 
  • "Euro" is now spelled out and replaces the currency symbol € 


    —Yolanda Martin, Lionbridge MT Specialist 


August 2021

All the big technology companies have developed their own MT engines, including Microsoft, Google, Amazon, Facebook and now Apple. Many other big players in markets outside of the U.S. are also competing in the space. Clearly, big tech companies believe that MT and Natural Language Processing (NLP) are must-have tools for today’s interconnected, global world.

Watch this space as Lionbridge follows the competition. We’ll identify the best MT engine options based on a company’s specific needs, taking its desired language pair and content type into account.

We expect the MT/NLP race to accelerate with so many top tech companies investing in this space. There’s no doubt that Apple—with its attention to detail and quality—will drive other companies to step up their game.


    —Rafa Moral, Lionbridge Vice President, Innovation