Using Terminology To Improve Machine Translation

Executing top-notch terminology management can help you enhance the quality of your Machine Translation output

Last updated: October 28, 2022 11:41PM

As professionals and the general public increasingly use generic, freely available Machine Translation (MT) systems, it’s essential to be aware that these engines can produce flawed translations. Poor quality or catastrophic errors may have a detrimental impact on businesses. But there are ways to bolster MT quality, including executing MT terminology management effectively.

What Challenge Does Terminology Present for Machine Translation?

Due to the complex relationships between concepts and their representations through terminology units, terminology remains one of the biggest challenges for Machine Translation.

Using free MT systems for specific domains can especially cause undesired results from a terminological point of view. The impact can be particularly harmful to the medical and legal fields.

While you can train MT systems with a specific domain corpus to prevent poor results to some degree, the use of generic MT systems may not guarantee consistent or accurate terminology.

The quality of the translation by an MT engine depends on the quality of the bilingual training corpus, among other things. Therefore, an accurate translation of terminology can only be guaranteed if the corpus includes the relevant terms in both the source and the target part.

Although Neural Machine Translation (NMT) systems are based on probabilistic distributions of terms, their presence is necessary but insufficient to ensure high-quality translations. The frequency of terms in the corpus must be satisfactory for the decoder to obtain the exact equivalents. If the frequency of a particular term is inadequate, it will not get the sufficient weight to be considered a candidate for equivalents and will not be translated accurately.

Hand reaches towards sphere that contains pictures of people

How Does the Generic MT Training Lead To Faulty Translations?

Generic MT systems are often trained with large corpora of varied content. As a result, the most frequent candidate term and its possible equivalent may belong to a different domain than the term being translated. This situation can cause terms to be translated inaccurately into the target language.

For example, the Spanish term fósforo can be translated as a match (the object to light fire) or as phosphorus (a chemical element). A generic MT engine will be unable to differentiate the intent easily, and the translation may result in an error.

The solution to this problem is to train customized MT systems with domain-specific bilingual texts that include specialized terminology.

Still, accurate translations cannot be guaranteed when engines are trained with specialized texts if the terminology is not used consistently.

What Market Solutions Exist?

Research in this area proposes to inject linguistic information into NMT systems by means of annotation methods.

The implementation of manual or semi-automatic annotation depends on available resources, such as glossaries, and constraints, such as time, cost, and availability of human annotators.

What Solution Does Lionbridge Offer?

Lionbridge’s Smart MT™ allows the application of linguistic rules to the source and target text, as well as the enforcement of terminology based on Do Not Translate (DNT) and glossary lists added to a specific profile.

We help our customers create and maintain glossaries, which are regularly refined to include new, relevant terms and retire obsolete terminology. When glossaries are created once in Smart MT, they can then be used for all the MT engines, saving time and money.

How Can You Best Use Glossaries for MT Projects?

Using glossaries for MT projects is not as simple as it may seem. Glossaries, if used inappropriately, can negatively affect the overall quality of Machine Translation. The best way to follow terminology in MT is through MT training.

The combination of trained MT engines, glossary customization, and the identification of preprocessing and post-processing rules ensure MT output contains proper terminology and is similar in style to the customer's documentation.

Dots and three-dimensional cubes overlay a dark background

What Terminology Management Features Should You Look For in a Machine Translation Solution?

When you assess terminology management features in a Machine Translation solution, look for the offering’s ability to:

Manage glossaries
Manage Do Not Translate (DNT) lists
Manage suggested and approved translations
Bulk upload terms and sentences through glossary and Translation Memory (TM) import
Create domain or product-specific MT engine profiles and automate content routing between those engines

Together, these capabilities will promote a higher-quality translation output.

How Does Lionbridge’s Smart MT Solution Work?

Smart MT works with various third-party MT systems via a connector. Think of it as an “MT harness” that can:

Connect to external leading MT providers, such as Microsoft, Google, Amazon, DeepL, and Yandex.
Manage terminology — glossary or Do Not Translate (DNT) terms can be added and updated on the fly to preserve terminology and make sure it appears exactly the way it should in the output.
Apply linguistic rules, allowing users to modify source text or the resulting MT output to address known issues and improve MT quality.

Lionbridge also has other automations able to identify inconsistencies between the terminology in customer glossaries and how that terminology is used in the training corpus and in the output of MT. These automations help identify the cases in which the training corpus or, later, the MT outputs don’t follow the approved terminology and will correct them.

If we learn that MT does not include the required terminology, we suggest using a glossary with DNTs, product names, and specific key domain or brand terminology.

What Are Some Additional Tips for Glossary Creation and Usage?

To ensure the desired Machine Translation outcome, we recommend you consider the following guidelines as you create your glossaries:

Only include terms in your glossaries that can be used systematically in every instance that the source term appears, which is often applicable to specialized terminology, client-approved vocabulary, and technical terms.
Use only one translation line in your glossary when multiple translations exist for a single source term.
Use mostly noun phrases. This works best with multi-word terms, industry-specific terms, or customer-specific product names.
Avoid the use of general or common terms since the presence of many terminological entries in a single sentence may impact translation quality.