Modern technology with bright lights

Language Ranking Based on Machine Translatability for More Effective MT

Benefit by knowing which languages are easier for machines to translate before deploying Machine Translation

As companies face increasing pressure to translate more content faster, Machine Translation (MT) proves to be an essential part of the solution to this challenge. It is worthwhile to compare the performance of major MT engines — Google NMT, Bing NMT, Amazon, DeepL, and Yandex — to determine which engine will best meet your needs. In fact, we analyze MT engine performance monthly via our MT Tracker, which is the longest-standing measurement of major MT engines. But it’s important to evaluate further, especially since our analysis indicates the engines are currently performing similarly. 

To get the most out of MT, also consider examining the ease with which MT engines translate specific language pairs, otherwise known as the machine translatability or m-translatability of languages. To help you compare languages, we’ve ranked the m-translatability of the top 28 target languages from English in Table 1.

Why Examine the M-translatability of Language Pairs?

Identifying the m-translatability of language pairs will help you allocate your budget when planning translation costs across languages, as you will have a better idea of which language pairs will require more effort to translate. 

Having insight into language complexity can help support your business decisions and help you answer the following questions:

  • Should a larger share of the budget be used to post-edit language pairs that are more complex?
  • Will light post-editing or focused post-editing, which targets only critical areas of the content for post-editing, be sufficient for some languages when dealing with a tight budget? For which languages should I use these post-editing methods?
  • Should my company add language ranking to business and cultural factors when considering how to best allocate its budget, particularly for low-budget projects? If a culture is accepting of a lower quality level, should my company translate into a language that has a low m-translatability ranking?
Digital stream of information

How Is M-translatability Calculated?

Figuring out the m-translatability of languages is not a straightforward process. There are a variety of challenges that differ among languages. And, what may be considered good performance for one language is considered inadequate for another. Yet, we can use some metrics for evaluation. 

For example, edit distance, which is the number of changes a post-editor makes to ensure the final text has a human quality, can provide a sense of language complexity to help us determine the m-translatability for each language pair, even though these metrics are not typically used for cross-language comparisons. 

Lionbridge’s M-translatability Findings: How Languages Ranked and Why?

Our m-translatability ranking of 28 target languages is based on millions of sentences Lionbridge has processed. 

The findings suggest there is a correlation between complexity and language families.  

The Romance Languages

Most Romance languages, such as Portuguese, Spanish, French and Italian, require fewer changes to reach high-quality levels when translated from English. We identified these target languages as the easiest for machines to handle, and they took the first four spots in our m-translatability ranking. 

Notably, Romanian, the other language in the list belonging to the Romance family, placed further down in the ranking at the tenth spot. This finding — for the less-translated Romance language — is likely due to a smaller bilingual training corpus used to train MT engines and Romanian’s grammatical complexity, which has some similarities to Latin.   

Simplified Chinese

Simplified Chinese — a very different language than English — placed fifth on our list, immediately following the top four Romance Languages. We attribute this high placement to frequent updates and improvements to the MT for this language pair during the past five years, as we have seen in our continuous MT tracking for this period. MT companies are investing more heavily in this language pair to generate better performance due to its high business interest. 

Complex Languages

Hungarian and Finnish — two Uralic languages — are more complex languages; they placed last in our ranking, taking the 27th and 28th spots. Estonian, a complex language within the same family, placed 24th on the list. 

Korean scored near the bottom of the list; it placed 25th in our ranking.

Person working on computer with analytical data

What Can We Take Away From M-translatability?

While language comparison has limitations, our ranking and the correlation between complexity and language families provide interesting insights that can help you better manage your multilingual projects.

Table 1

Language M-translatability Ranking

Rank Language (from English) Rank Language (from English) Rank Language (from English)
1 Portuguese 11 Thai 20 Chinese (Traditional)
2 Spanish 12 Norwegian 21 Lithuanian
3 French 13 German 22 Czech
4 Italian 14 Swedish 23 Arabic
5 Chinese (Simplified) 15 Turkish 24 Estonian
6 Dutch 16 Slovak 25 Korean
7 Danish 17 Hebrew 26 Russian
8 Japanese 18 Latvian 27 Hungarian
9 Greek 19 Polish 28 Finnish
10 Romanian        

Table 1

Language M-translatability Ranking

Rank Language (from English)
1 Portuguese
2 Spanish
3 French
4 Italian
5 Chinese (Simplified)
6 Dutch
7 Danish
8 Japanese
9 Greek
10 Romanian
11 Thai
12 Norwegian
13 German
14 Swedish
15 Turkish
16 Slovak
17 Hebrew
18 Latvian
19 Polish
20 Chinese (Traditional)
21 Lithuanian
22 Czech
23 Arabic
24 Estonian
25 Korean
26 Russian
27 Hungarian
28 Finnish

Get in touch

If you’d like to learn more about how Lionbridge can help you develop an effective MT strategy to meet your translation needs, contact us today.

linkedin sharing button
  • #technology
  • #blog_posts
  • #translation_localization

Rafa Moral with Janette Mandell
AUTHOR
Rafa Moral with Janette Mandell