Webinar
Enhancing Inclusivity in Clinical Trial Research: COA Development and Translations
Case Study: Multilingual Retail Marketing
New AI Content Creation Solutions for a Sports and Apparel Giant
Lionbridge Knowledge Hubs
Lionbridge’s TRUST Framework
Build Confidence in AI Use
Meet the Pride: Lex Parisi
Lionbridge Games’ Director of Gaming Marketing Solutions
What We Do Home
Generative AI
- AI Translation Services
- Content Remix
AI Training
- Aurora AI Studio™
Machine Translation
- MT Tracker
Instant Interpreter
Smart Onboarding
Translation Service Models
Content Services
- Technical Writing
- Training & eLearning
- Financial Reports
- Digital Marketing
- SEO & Content Optimization
Translation Services
- Video Localization
- Software Localization
- Website Localization
- Translation for Regulated Companies
- Interpretation
- Instant Interpreter
- Live Events
- Language Quality Services
Testing Services
- Functional QA & Testing
- Compatibility Testing
- Interoperability Testing
- Performance Testing
- Accessibility Testing
- UX/CX Testing
Industries Home
Life Sciences
- Pharmaceutical
- Clinical
- Regulatory
- Post-Approval
- Corporate
- Medical Devices
- Validation and Clinical
- Regulatory
- Post-Authorization
- Corporate
Banking & Finance
Retail
Luxury
E-Commerce
Games
Automotive
Consumer Packaged Goods
Technology
Industrial Manufacturing
Legal Services
Travel & Hospitality
Insights
- Blog Posts
- Case Studies
- Whitepapers
- Solution Briefs
- Infographics
- eBooks
- Videos
Webinars
Lionbridge Knowledge Hubs
- Positive Patient Outcomes
- Modern Clinical Trial Solutions
- Patient Engagement
- AI Thought Leadership
SELECT LANGUAGE:
Companies are now turning to Machine Translation (MT) more than ever, and we expect that number to grow. You can attribute this trend to the technology’s increasingly predictable results and intense market pressure to produce more content quickly in many languages — within the same or even smaller budget. MT technology delivers translations with speed and cost efficiency in a way that human translators cannot, but companies must also address quality issues. To succeed in increasingly digital markets, they must provide personalized multilingual content that is domain-specific, hits a specific tone, and maintains a consistent brand voice across all channels.
How can you get the most out of your MT initiatives to better achieve those goals? There are two methods to bolster Machine Translation’s effectiveness: Machine Translation customization and Machine Translation training. While each approach can improve MT output quality and reduce the need for post-editing, companies cannot use MT customization and MT training interchangeably.
Read on to learn how the methods work, their differences, and how to select the right approach as dictated by your use case.
Companies typically get desired results when using MT for general, straightforward content from generic, untrained Machine Translation engines, such as Google NMT, Bing NMT, Amazon, DeepL, or Yandex. But output also has the potential to fall short.
Why? A generic engine is frequently unable to translate highly specialized content, let’s say in the life sciences or legal industries, and words associated with these specific domains. It cannot figure out when to appropriately apply the correct definition of a word with two meanings. And it cannot preserve your unique brand voice and determine when formal vs. informal language levels are needed to best connect with your audience.
MT customization and MT training address these deficits to achieve better translation output when you have specific requirements the generic engines won’t meet.
MT customization is an adaptation of a pre-existing Machine Translation engine with a translation glossary and Do Not Translate (DNT) list to improve the accuracy of machine-generated translations. (A translation glossary is a collection of important terms to a company and their translations. A DNT list is a collection of terms a company does not want to translate.)
MT customization works by uploading a list of these source terms with their translations before the engine executes its work. The list instructs the MT engine how to translate the terms — or intentionally prevents their translations. This intervention improves the engine’s suggestions and enables the company to maintain its brand name, adhere to terminology, and achieve regional variations. Superior translations reduce the need for post-editing.
MT customization is generally easier to execute than MT training, though there are some caveats when implementing this approach. While uploading terms into a Machine Translation system is a straightforward process, it can be challenging to select the correct terms. The success of MT customization is highly dependent on the MT expert’s skill level and ability to manage input and output normalization rules, DNT lists, and glossaries, which all work to improve output. Inexperienced authors can inadvertently cause the MT to make poor suggestions and negatively impact overall quality.
MT training is a process that involves building and training an MT engine by using extensive bilingual data from corpora and Translation Memories (previously translated content) to improve the accuracy of machine-generated translations.
It works by feeding the generic MT engine training with company-specific bilingual corpora. It accepts inputs via various exports, frequently in a Translation Memory (TM) format. In addition to providing the previously approved translation, the Translation Memory delivers valuable meta-like data, such as when the sentence was translated, by whom, and whether it is an exact match or a less-than-precise, fuzzy match. This data enables the engine to learn what a company expects in the translation. Instead of making a generic translation suggestion based on what it believes the source should be translated to, it generates a customized output based on the corpora.
MT training enables a company to fine-tune output to achieve a specific brand voice or style due to the engine’s ability to produce more consistent translations. You can bypass the default setting of generic MT engines that produce a formal tone to achieve an informal tone instead. Like MT customization, a company will achieve desired results with less post-editing since the engine is more apt to generate accurate translations with fewer errors.
During MT training, a company provides the engine with as much knowledge as possible; high-quality segments will yield better-quality output. Successful MT training requires a company to provide the training with a minimum of 15K unique bilingual segments that are of high quality and free from inconsistencies and source translation duplications. If a company does not meet these minimum requirements, the training will likely fail to impact the output in a major — or any — way.
While the two approaches work to enhance MT output and reduce post-editing, the similarities stop there. They are not interchangeable.
The methods differ in the following way: MT customization tailors a pre-existing MT engine with glossaries and Do Not Translate (DNT) lists, while MT training builds and trains the engine from scratch through lots of bilingual data from corpora and Translation Memories.
Customization is more versatile than MT training and will generate suggestions that meet most companies’ requirements. However, a one-off cost is associated with customization, which involves updating the profile that goes into the MT engine. There are some additional costs to maintain a glossary over time.
MT training is most appropriate for sophisticated companies with highly specialized content and complex use cases. When implementing MT training, there are costs associated with the first training and potential costs for additional training, which may be considered over time if the MT performance monitoring indicates room for improvement.
Does your company need to translate scientific material or highly technical manuals? Do you need to preserve your unique brand voice? The answers to these questions can dictate whether it is best to use MT customization or MT training.
There are two important use cases for MT customization. Use it when you need to achieve the following:
MT customization is a good choice for technological and detail-oriented content since it is critical to translate terminology correctly for this type of content. MT customization is the preferred approach when you lack enough data for MT training to be effective.
There are two important use cases for MT training. Use it when you need to achieve the following:
MT training is a good choice when translating marketing and creative content since specific brand voice, tone, and style are essential elements of this type of content. However, be sure you possess enough data to train the engines successfully.
At times, a hybrid approach produces the best results. For instance, MT may generate better suggestions when companies augment MT training with some customization.
Lionbridge enables its customers to implement a hybrid approach with ease. Customers can customize their MT via Lionbridge’s enterprise MT solution, Smart MT™ Portal while, at the same time, opting to buy professional training services from Lionbridge’s skilled teams. When working with these teams, companies typically approach MT more holistically and often use a combination of MT training and MT customization for the best output. Various tests will enable them to understand better what is yielding the best results and drive a tailored MT approach.
Selecting the best approach to enhance MT output depends on your situation. As you explore options, it may be tempting to consider MT training as the first and only method to get the most out of your MT. Or you may be intrigued by the hype around continuous training. Here are some things to keep in mind as you investigate your options.
MT training can be a highly effective tool to achieve improved MT output, but only when it addresses identified and targeted concerns.
With the increased use of MT, many providers make MT training their go-to solution to try to provide value to their customers. However, this approach can backfire in some instances. Some companies that have solely used training with hopes of better MT output have subsequently sought Lionbridge’s services, expressing disappointment with the training after conducting a cost-benefit analysis. They were not impressed with the engine’s generated suggestions and sought a more cost-effective solution. Why were they dissatisfied? Simply put, there were better approaches based on their specific circumstances.
Innovative MT providers, like Lionbridge, use MT training when appropriate but heavily rely on customization to achieve desired MT results at a lower cost than MT training.
As you investigate MT solutions, you may find providers promoting the concept of continuously trained engines after individual projects are completed. Be wary of such claims. Continuous training is only possible if you deal with bespoke engines that require constant updating.
We want to underscore that MT training will only be successful if an individual project has at least 15K unique segments to train the engine. When companies do not have enough data, they may use project content to update customization features, referred to as “training” in many cases.
Customization is a more versatile tool than MT training. It will generate MT suggestions that will meet most companies’ requirements. With customization, you can sufficiently improve MT suggestions to maintain your brand name and adhere to terminology, thus reducing a post editor’s work to verify these items. A one-time cost to update the profile that goes to the MT engine and some ongoing costs to maintain a glossary over time are typically less expensive than the costs associated with MT training.
When implementing MT customization, be sure to follow best practices.
Put a library of input and output normalization rules in place for the most-used languages to control the input to MT and enhance its output. These rules will enable you to meet your specific requirements.
For instance, an input normalization rule may instruct the MT engine to use les guillemets [« … »] instead of double quotes [“...”] for its output of French translations. This rule enhances the output of French translations as French-speaking readers expect to see les guillemets instead of double quotes. Companies may apply input and output normalization rules to enable similar modifications that address regional language variations for parent languages, such as French (Belgium), French (Canada), French (Africa), and so forth.
Create a list of terms you do not want to be translated and a rule that replaces any identified Do Not Translate (DNT) term with a token before it goes to the engine. This action will make the term invisible to the engine and prevent it from being translated. After the translation has been processed and the MT suggestion is returned, set the output normalization rule to replace the token with the DNT term.
Prepare your glossary carefully to promote accurate, consistent translations. Consider the key factors shown in Table 1 when deciding whether to include a term in your glossary.
Consideration | What to ask | Should the term be included in the glossary?* |
---|---|---|
Frequency | How often does the term appear in the source text? | If the term occurs infrequently, don’t include it. |
Ambiguity | Does the term have multiple meanings, or can it easily be confused with other words? | If the term is ambiguous, include it. (Note: Be sure alternate meanings of the term rarely appear in the source text.) |
Specialized terminology | Is the term specific to a particular domain or subject area? | If yes, include it. |
Consistency | Has the term been translated consistently in the past? | If yes, don’t include it. |
Importance | How important is the term to the text’s overall meaning? | If it is central to the meaning of the text, include it. |
Complexity | Is the term complex, and will it be difficult for the Machine Translation system to translate it accurately? | If yes, include it. |
Table 1. Factors to consider when creating a glossary.
*There may be exceptions to these general guidelines.
We also recommend the following do’s and don’ts during glossary creation:
Lionbridge’s Smart MT Portal makes it easy for our customers to implement MT customization, and our technology allows customization to work across multiple MT engines simultaneously. You compile your MT glossaries and DNT lists and upload these terms; they are then used for every MT engine. The technology enables you to avoid engine lock-in and change engines anytime for the best results.
Additionally, it’s easy to supplement our MT technology with relevant services by our MT experts. When engaged, we help companies identify the most effective MT strategy and how to execute that strategy best.
Whether you are just beginning to explore MT use, you want to improve existing MT efforts through customization, or MT training becomes a viable approach due to a growth in your content creation — we have a solution to meet your needs.
Compare MT training and MT customization at-a-glance in Table 2 to see which method is appropriate for your content.
MT Customization | MT Training | |
---|---|---|
What it is and how it works | An adaptation of a pre-existing Machine Translation engine with a glossary and Do Not Translate (DNT) list to improve the accuracy of machine-generated translations | The building and training of an MT engine by using extensive bilingual data from corpora and Translation Memories (TMs) to improve the accuracy of machine-generated translations |
What it does | Improves MT’s suggestions for more accurate output and reduces the need for post-editing | Improves MT’s suggestions for more accurate output and reduces the need for post-editing |
Specific benefits | Enables companies to adhere to their brand name and terminology and achieve regional variations | Enables companies to attain a specific brand voice, tone, and style and achieve regional variations |
The risks of using it | The MT could make poor suggestions and negatively impact overall quality when executed improperly | MT training may fail to impact output if there is not enough quality data to train the engine; the MT could generate poor suggestions and negatively impact overall quality if inexperienced authors overuse terminology |
When to use it | Ideal for technological and detail-oriented content and any content that requires: *Accurate translations of terminology *Regional variation, but you lack sufficient data for MT training |
Ideal for highly specialized content, marketing and creative content, and any content that requires: *A specific brand voice, tone, or style *Regional variation, and you have enough data for MT training |
Success factors | An experienced MT expert who can successfully manage input and output normalization rules, glossaries, and DNT | A minimum of 15K unique segments to adequately train the engine |
Cost considerations | There is a one-time cost to update the profile that goes into the MT engine and some ongoing costs to maintain a glossary over time; costs are relatively inexpensive when factoring in the potential benefits and are typically lower than MT training costs | There are costs associated with the first training and potential costs for additional training, which may be considered over time if the MT performance monitoring indicates room for improvement; MT training can be worth the investment in certain cases when factoring in the potential benefits |
Table 2. A comparison between MT customization and MT training
If you’d like to further explore how we can help you fully leverage Machine Translation, contact us today.