Combining Human and AI Translation | A Practical Hybrid Approach

One of the questions I am asked most often is whether AI or machine translation can replace the need for human translators. The answer is yes and no, and the extent to which you can rely less on people depends on the type of content involved.

The translation industry has been using machine translation for a long time, and indeed the technology is as old as the computing industry (see Systran). Today’s large language models are direct descendants of neural machine translation platforms that were introduced in the past decade. So people working in the translation industry are well acquainted with MT and its uses.

LLMs are only as good as the material used to train them, and that is the crux of the issue with machine translation as well. In order to train a translation engine, you need to have aligned texts with direct translations from one language to the other. This type of material is hard to come by, varies wildly in quality, and for some language pairs it isn’t available at all or is polluted by AI generated texts. Because of this, translation quality is variable depending on the type of content being translated and the languages involved. That said, machine translation is a lot better today than it was a few years ago.

Translation for Publication Vs Translation For Comprehension

Machine translation is super useful when you need to obtain information in your language. For example, a Japanese user might use MT to translate a Help Center article. They won’t expect the translation to be perfect. As long as it is reasonably accurate, their questions will be answered even if the translation is awkward. The translation widget embedded in Google Chrome often does a good job with this.

However, if you are translating content to be published on your website, the user’s quality expectations will be higher. The problem isn’t that machine translation will make terrible errors, it’s that it can produce text that is awkward to read as well as get factual information wrong (which is important for things like instructions and help center material). This reflects poorly on your brand and in general is not a great user experience. One thing you can do is add a disclaimer like “Translation by Google Translate”. That will set the user’s expectations, but you should still have human translators review your high visibility, high impact content.

Human In The Loop Machine Translation

A good compromise is to leverage people to correct and fine tune machine translations. We used this type of workflow at Lyft and Notion, and were able to reduce unit costs while also delivering translations with speed. The real benefit of machine translation is speed, as it returns results in near real-time. What we did at both companies was implement a fix-forward workflow where machine translations were used as placeholders until people had time to review and edit them. These post-edits would then over-write the machine translations.

For low visibility content, you can use a workflow called AIPE (AI + post-edit), where human reviewers approve or post-edit machine translations. This typically costs 10 cents per word, sometimes significantly less, depending on the languages involved. This is a good approach for things like your help center back catalog, transcripts, etc. Or you can just machine translate without review and do post-edits if the asset becomes more visible, which is an increasingly common practice and costs even less at a fraction of a cent per word.

See: Choosing Language Service Providers (LSPs)

For high visibility content, you’ll probably also want language leads to take a second pass through the output of the AIPE process. They know your product and brand and are better able to capture your brand voice, jargon, etc.

For high impact content like marketing copy, signup flows, etc, you may find that you are better off not using MT at all. High end translators and copywriters often find that machine translations slow them down, and that it is quicker to write from scratch than to try to overhaul a machine translation.

A word about AI hallucinations. One of the things that happens with LLM translations is that these systems can hallucinate. What happens is they will generate what sounds like a fluent translation. This is sort of like understanding what language someone is speaking even if you don’t understand exactly what they are saying. Essentially what it is doing is saying what you want to hear, not necessarily what is correct. A native speaker can spot and fix stuff like this, whereas this might otherwise go unnoticed. This is also why AI translation doesn’t reduce unit costs as much as you might think, because someone has to find the defects that need to be fixed which takes time.

The adage that you get what you pay for definitely applies here.

Leveraging Your Translation Management System To Get The Most Out Of AI

The leading TMS platforms make it easy to include any number of automated translation services in your translation workflows. More importantly they continuously measure the quality of these services and route translation requests to the best performing services.

This is important because it means you don’t have to be an expert in AI translation, and can rely on the platform to sense which services are doing well. Since this is a moving target, requests that might go to one provider could go to another next month.

These systems use a measure called edit distance as a proxy for quality. If human reviewers need to post edit the translations, that will ding that providers quality score. By doing this, the TMS has a view of how providers compare across languages and can automatically adjust the routing rules to favor the current winners. This becomes even more important if you need to operate in many languages because different service providers will typically do best in certain languages.

Leveraging Translation Memory

All TMS platforms provide TM or translation memory. This doesn’t get as much attention as it once did. A TM is basically a database of previous translations that have been approved by reviewers.

The main situation where this is used is when a phrase or sentence that has been previously translated is encountered. Since this translation has already been approved, it can be automatically applied when there is an exact match.

TM was primarily used as a cost saving measure before AI translation was widely available. LSPs would typically pass cost savings to customers for TM matches. This is less important now that AI is routinely used for first pass translation.

Using Visibility Metrics To Decide On How To Use Automation

Another thing you can do is use instrumentation to measure the relative visibility of prompts. You can do the same thing with page level analytics for content that lives in your CMS. Most companies have a long tail of content that is infrequently accessed. So you can route that content to an automated translation work flow, while routing high visibility content to a machine-human or human-only workflow. Here it is helpful if your CMS supports adding tags to content modules. You write a script that pulls analytics, and then tags the top 10% articles as P1, the top 10-25% as P2, and the rest as P3 (the thresholds here are arbitrary, but you get the idea). Then if an asset jumps up in visibility, you can request retranslation of that module with a higher cost workflow. This is a good way of reducing unit costs while keeping quality high by focusing human effort on high visibility content.

For more about in-app instrumentation, be sure to check out:

Analytics : Ranking Strings By Visibility

How MT / AI Translations Affect SEO

You should also take into account the effect of MT and AI translation on ranking. Google and other search engines can detect this and use it as a quality signal.

Because of this, high visibility content and important assets like signup funnels should be reviewed and edited by humans. Lower quality automatic translations will undermine your search authority. That said, it is OK to use MT / AI translation for low visibility assets such as your long tail of help center articles.

AI Based Linguistic QA

One area where we are seeing a lot of experimentation is AI based QA. Here the AI is asked to evaluate a translation and even to identify issues. The output from this step is then used to score the translation, and to flag translations that need human review. If nothing else, this type of process can be used to prioritize which translations human reviewers look at first (as well as to skip over obviously good translations). AI post-editing is also pretty good for cleaning up translations to fix grammar problems, mismatched gender, etc.

A good heuristic for prioritizing human review is something like the following:

High visibility, high risk content goes to the front of the line
High visibility content that has been flagged for review goes next
Lower visibility content that has been flagged for review after that
Everything else as time permits

Custom AI | Machine Translation Engines

if you have a sufficiently large amount of material, at least several hundred thousand words of source material, you can improve the quality of machine translation by training an MT engine using your human translations and post-edits. A custom MT engine will be aware of your terminology, jargon and writing patterns where a generalist engine like Google Translate will not. Caveat: if your translations are mostly machine generated or loosely reviewed, the custom trained engine will not be improved (garbage in → garbage out).

Using AI / MT To Translate Dynamic and User Generated Content

Another situation where AI / MT translation is useful, and may be the only economically viable option, is for translating dynamic content or user generated content. A travel booking site, for example, may have thousands of permutations on room listings. Even if this material is templated, there is still a huge volume of material to deal with. Machine translation can handle this workflow where a human translation process, even with a huge staff, will be bogged down. This is a case where you might be willing to accept a decrease in translation accuracy in favor of rapid delivery and low unit costs.

NOTE: this is a topic in itself, and we will be publishing an article about this specific use case. For a discussion of the architectural fundamentals, see Building For Runtime Localization

The Big Picture

Machine translations major benefit is speed, and when used in a fix-forward process it works nicely. It also enables you to reduce translation unit costs, but not as dramatically as you might expect.

A common trap I see companies fall into is to treat localization as a cost center when it is one of the best growth levers available to them. Several of the companies I worked with generated well over 50% of their revenue outside of the United States. This would not have been possible without localization, especially in markets where the rate of English proficiency is low, like Japan and Latin America. By all means, reduce unit costs where you can, but localization is just one part of expanding internationally.

Combining Human And Machine/AI Translation