Content Services
- Technical Writing
- Training & eLearning
- Financial Reports
- Digital Marketing
- SEO & Content Optimization
Translation Services
- Video Localization
- Software Localization
- Website Localization
- Translation for Regulated Companies
- Interpretation
- Instant Interpreter
- Live Events
- Language Quality Services
Testing Services
- Functional QA & Testing
- Compatibility Testing
- Interoperability Testing
- Performance Testing
- Accessibility Testing
- UX/CX Testing
Solutions
- Translation Service Models
- Machine Translation
- Smart Onboarding™
- Aurora AI Studio™
Our Knowledge Hubs
- Positive Patient Outcomes
- Modern Clinical Trial Solutions
- Future of Localization
- Innovation to Immunity
- COVID-19 Resource Center
- Disruption Series
- Patient Engagement
- Lionbridge Insights
Life Sciences
- Pharmaceutical
- Clinical
- Regulatory
- Post-Approval
- Corporate
- Medical Devices
- Validation and Clinical
- Regulatory
- Post-Authorization
- Corporate
Banking & Finance
Retail
Luxury
E-Commerce
Games
Automotive
Consumer Packaged Goods
Technology
Industrial Manufacturing
Legal Services
Travel & Hospitality
SELECT LANGUAGE:
Embracing Generative AI is crucial to succeeding, especially when your competitors are doing the same for their workflows, translation, or content creation and optimization. One critical element of using Generative AI is self-correction. Unfortunately, Large Language Models (LLMs) can deliver output with inaccuracies due to a few factors. This is because the data an LLM is trained on might include problematic or wrong information. AI tools will also sometimes “hallucinate,” or make up information. To address issues in AI output, it’s possible to take “self-correction” measures into the initial set of prompts. (Some experts have also called it “self-critique” or “self-refine.) Multiple studies have tested methods that require LLMs to review their output and refine responses before delivering them. Read our blog post to learn some of the techniques people are using to implement self-correction in their AI solutions (or having their AI solutions provider do it). This blog will also cover the limitations of AI self-correction.
These are four ways people are currently implementing AI self-correction:
1. An accuracy-focused prompt: Sometimes, adding a prompt emphasizing accuracy into the series of prompts is effective. Here is a popular one posted on X:
“You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so.”
2. Turning AI tools into an expert: One way to preempt inaccuracies is to turn your AI tool into an expert less likely to make mistakes. Many users and AI service providers, including a group of GitHub developers, are creating prompts that command AI tools to act like experts. Notably, the best expert personas are the ones with the most detail about following best practices — provided they’re widely accepted. With commands that are too general, the AI tool may begin to hallucinate its content. For example, saying, “You’re an excellent career counselor,” isn’t enough. The prompts should include guidance for best practices that career counselors generally follow. Another best practice is to test the series of prompts with a task you know the answer to. This will help determine where to optimize the expert persona prompts. Sometimes, it may even make sense to develop multiple iterations of an expert persona prompt depending on the type of task. The GitHub developers made a list of 15 series of prompts they used to turn AI into an expert assistant. Though they aren’t the only ones to do this, their list is notably comprehensive.
1. Career Counselor
2. Interviewer for a specific position
3. English Pronunciation Helper
4. Advertiser
5. Social Media Manager
6. AI Writing Tutor for Students
7. Accountant
8. Web Design Consultant
9. Act as a UX/UI Developer
10. IT Architect
11. Cyber Security Specialist
12. Machine Learning Engineer
13. IT Expert
14. Generator of Excel formulas
15. Personal Chef
3. Adding “pre-hoc” or “post-hoc” prompting: It’s possible to add prompts that modify the style of AI output. Perhaps the style needs to be more formal or informal, or targeted towards highly-educated audiences or audiences with a high school-level education. If the prompts are added after the output is generated, this is called a “post-hoc prompt.” Per a recent research project from Google’s DeepMind, the best results occur with equally strong pre-hoc and post-hoc prompting.
4. Using prompts to address biases: If LLMs aren’t trained on the right data, their output may reflect the biases of the millions of people who spew hateful content on the Internet. Per a recent study by the Anthropic AI lab, it’s possible to use Reinforcement Learning from Human Feedback (RLHF) to train an LLM to produce output without (or with less) racism, ageism, misogyny, etc. Include instructions in the AI’s constitution to consider general ethical principles your team decides upon. Part of this process is adding a line into prompts that preempts the LLM from relying on harmful stereotypes or philosophies. In some cases, AI tools have been shown to begin “positively discriminating” in their output, which may even exceed expectations.
While AI self-correction measures may be powerful, studies also show that it still has limitations. The same Google DeepMind study found that LLMs sometimes actually perform worse with self-correction measures. In cases where it doesn’t impair performance, self-correction isn’t consistently effective for every series of AI prompts, especially where external sources (a calculator, code executor, knowledge base, etc.) aren’t used. For best results, self-correction measures need access to a benchmarked data set with basic truths built in. With these references, the AI tool will know when to stop its reasoning process, thus avoiding overcorrecting its output. Of course, the researchers noted that some tasks are too complex to enable providing an AI tool with these kinds of references.
The same study also found that another limitation of AI self-correction occurs when multi-agent LLM applications are used. The LLM is asked to perform multiple tasks as different “agents,” or actors.
The LLM generates code as one agent. Then, it also checks the code as another agent.
The LLM performs a debate, with one agent taking each side.
The problem occurs because the multiple agents use a form of majority voting to decide which answer is correct, thus creating a kind of echo chamber or “self-consistency,” rather than true accuracy.
The limitations of AI self-correction underscore how essential a human in the loop is. AI tools can enhance translation efficiency, but they often need human intervention at some point. Perhaps a human must develop the best series of prompts, check an initial sample, or review output at the end. Self-correction measures may assist with the entire process, but can’t replace a human in the loop.
To that end, it’s vital to work with AI consulting experts, like the ones at Lionbridge, who can help address the AI trust gap. They should:
Minimize the risk of untrustworthy or low-quality content/output
Ensure security of data from cyber-attacks or any kind of compromise
Be creative and help develop new, engaging, and original content or output
Check and correct for accuracy, especially for complex material that requires intensive education or expertise
Never try to sell you unnecessary technology, solutions, or subscriptions
Share the entire process and invite input, feedback, and customization throughout
Interested in learning how to utilize AI to automate content creation, website content optimization, or other language services? Lionbridge’s dedicated team of AI experts is ready to help. Let’s get in touch.