Translating At Scale

User Generated Content, E-Commerce & Other Scenarios

Many of the articles in this knowledge base deal with scenarios around software localization and document translation where there is a finite amount of material that can be handled via manual or semi-automated translation processes. How do you deal with applications where the volume of material to be translated is orders of magnitude larger?

To give a specific example, let’s look at hotel booking service. The inventory of bookable rooms is massive, easily in the millions, spanning properties across the globe, and because it is international in scale, all of these assets need to be visible in dozens of languages.

Where do you even start with this, and how do you build a scalable platform that is tied directly to measurable ROI?

Why Scale Changes Everything

While there are a few parts of your service that can be localized conventionally (sign up page, checkout, etc), the vast majority of what you serve to users is dynamically generated and requires a runtime translation infrastructure.

The problem isn’t just volume, it’s the combination of volume, velocity, and variability. Room inventory changes daily. User reviews arrive in dozens of languages. Property descriptions are updated by owners who may write in any language. Each of these requires a different architectural response.

It is also important to consider ROI as a basic parameter in deciding whether and how to localize assets based on their revenue potential, something which few if any localization platforms account for apart from hand wavy assumptions.

If you haven’t already, be sure to read Building For Runtime Localization. I also recommend reading the blogs from localization teams at companies like AirBNB and Booking.Com as they often write about what they have built.

You may have already built part of this, in which case pay attention to the ROI based routing and prioritization sections of this article.

The bad news here is that nothing like this exists in an off-the-shelf form. At this scale, you’ll need to build this around your specific needs and infrastructure. I discuss the org design considerations later on in this piece.

Static vs Templated vs Freeform Content

Building on the hotel booking site example, you will have a combination of static, templated and free form content.

Static content in a service like this is a tiny percentage of overall content, but it tends to be high visibility and high impact content that appears everywhere. Examples include brand elements, page headers, sign up path, checkout path, etc.

Room descriptions are an example of templated content. They contain a lot of structured data such as number of beds, room size and other characteristics and amenities. They contain a lot of predictable patterns and are automatically generated with placeholders for interpolated values. These can be translated once and rendered at runtime with interpolated values.

Property descriptions and user feedback are examples of free form content that is authored by a person like the property owner or by an AI agent, which is increasingly common. This type of content, by definition, has to be translated at runtime and with the exception of high visibility content, will probably be handled entirely via automation.

The key insight here is that you have different types of content, each of which is best served by different workflows.

Static content  → conventional TMS centric localization, message catalogs, etc, and with direct human oversight.

Templated content → database driven translation via runtime pipeline, backended by a TMS with translation memory and other features. This will also involve a high degree of automation, with human oversight for high visibility content.

Freeform content → runtime translation with direct calls to MT / AI services, bypass the TMS entirely although translations for previously encountered content will live in a persistent data store with a frontend cache or CDN. Example: the first time a block of content is encountered, it is served from an MT engine or reverts to English, while subsequent renders are served from a data store of translations.

Tying this back to a previous article, Building For Runtime Localization, the use of a displayMessage() function that wraps lower level localization libraries allows you to decide on the fly which translation pipeline to use depending on the content being served.

AI generated content poses a challenge for translation because it is synthetic and there is no human author who can verify the accuracy of the content.

Another thing to consider which I will discuss later is language detection. User reviews may be written in many languages, so you will need to detect which language is used prior to routing it for translation.

Database Driven Translation Model

When translating at scale, the static message catalog model breaks entirely. The right architecture is a database-driven translation store where:

  • Each translatable entity has a unique identifier (typically a hash)
  • Translations are stored in a translation table keyed by entity_id and locale
  • At render time, the application queries a cache or CDN and falls back to the data store (and refreshes the cache) and then falls back to the source language if no translation exists.
  • A background process continuously identifies untranslated or stale content and submits it to the translation pipeline (e.g. the hash changes with updates or the record is flagged as stale, something I will discuss more later on).

There is a product decision hiding in here, and that is whether it is acceptable to fall back to the source language if no translation exists. This will happen when a user renders a view containing content that was just added or updated.

The choice here is whether or not to request a translation from an MT engine or fallback to English. It is generally best to accept reversion to source because even well optimized MT engines like Google Translate have a latency of about 100ms, which can really slow down renders especially if multiple calls are made (e.g. 20 room descriptions x 100ms = 2 seconds).

The right way to think about this is that you are not translating strings, but at this stage are translating a mix of structured and freeform data that lives in a persistent but constantly updated data store.

Another decision you will need to make is whether any of this data lives within a TMS at all. It probably doesn’t for several reasons.

  • TMS platforms are usually priced based on the amount of content hosted there. This gets expensive quickly because of the amount of content involved.
  • TMS platforms are not designed for runtime localization, so you will need to build the cache + datastore layer described above.
  • TMS platforms are designed around message catalog and document translation and don’t deal with structured data very well. It usually makes more sense to develop tools that are optimized for tasks like translating room description templates.
  • The datastore also serves as a form of translation memory for frequently re-used content, even if it wasn’t explicitly designed for that purpose.

I usually advise clients not to build their own TMS, but at this scale you will run into the limitations of these platforms, even if cost isn’t a factor. This is what you will build instead.

Prioritization, Optimization & ROI

Prioritization

Prioritization is essential in a runtime system like this because human oversight is such a scarce resource that needs to be allocated wisely. Fortunately systems like this tend to be very well instrumented, which gives you signals that inform automated prioritization systems. These include:

  • Asset visibility : how visible an asset is relative to other assets enables you to route high visibility assets to workflows with more human oversight.
  • Asset revenue : a booking service will track how much revenue an asset generates, which is an even better signal than visibility (e.g. a boutique hotel that gets tons of reservations because of its location). This should be broken out by users preferred language.
  • Asset revenue potential : you can probably forecast future revenue based on the number of people looking at a property, so if you start to see a spike in views, you can probably estimate revenue, also broken out by language from that.

The next thing to do is to forecast the cost of localizing assets into target languages.

A note on detecting the user’s preferred language(s). There are two signals to watch here. One is from account settings, where it makes sense to have a preferred language picker. The other is from their browser’s Accept-Language header, for non logged in users. The stated preference is the better signal of the two.

Cost Estimation

The next step is to estimate how much it will cost to localize an asset into the supported languages. Note that in an automatically optimized system, this means you may treat target languages differently depending on revenue and ROI.

For fully automated translation workflows, what you want to do is to measure cost in dollars per megabyte. For example, Google Translate costs $20/million characters so it maps pretty directly into this metric. LLMs are more expensive, typically about an order of magnitude more expensive than Google Translate and DeepL. Multiply the asset size by the unit cost metric and you have the cost to translate it to any given language.

For human assisted translation workflows, you’ll need to work out the per megabyte added cost to review automated translations. The best way to do this to divide the overall spend on a language by the amount of content processed. You should also be able to automate this or at least make it relatively easy to do. Note that if you are building a new system you won’t have this historical data but can use contract rates from your LSP as a baseline and then update as you collect data.

You now have the cost and revenue information needed to make an automated, ROI based decision on which translation workflow to use for each target language (and whether to translate at all).

ROI and Optimization

Return on investment is baked into this system at a granular level. Most localization platforms don’t account for this at all. ROI based decisions are made outside of the platform and are usually high level and fixed rules like “In Spain, support English, Spanish and Catalan” that do not change in response to actual usage.

Since we are talking about a hotel booking service as an example, let’s think a bit about the level at which it makes sense to make an ROI based decision. Each hotel has many assets grouped under it including property descriptions, room descriptions and so on. It probably doesn’t make sense to make ROI decisions on sub-elements like descriptions because the decision about whether and how to localize a property is an all or nothing decision (for example it doesn’t make sense to localize some room descriptions but not others as the same property).

The heuristic you’ll use goes like this:

  1. Calculate the cost of localizing all translatable property assets into each target language and workflow.
  2. Then calculate the forecast or trailing revenue (captured by the service not retail price paid) for each target language for the estimated lifetime of the translatable assets. One year is a reasonable baseline because room inventory turns over, properties go off-market, and descriptions get updated so the translation value depreciates.
  3. Then for each language and workflow compare forecast or trailing revenue to the cost of each language / workflow combo. If the revenue is less than the cost, don’t localize property assets. If the revenue is greater than the cost, use the translation workflow that revenue supports.
  4. Repeat periodically to detect properties that have risen above these thresholds so they can be translated via an upgraded path.

You’ve done something really important here, and that is to tie localization spend to measurable ROI. This does a couple of things for you. First, you’ll automatically send high revenue assets to premium localization pathways knowing that you’ll get a positive return by spending more. Second, you’ll protect yourself from spending money on localizing assets that don’t benefit from it.

Let’s imagine two properties. One is a boutique hotel located a few blocks from the Eiffel Tower, which draws visitors from all over the world. The other is a cute inn that is in a remote village that draws visitors mostly from cities nearby.

The first property will benefit from extensive localization into most languages, while the second might only need to be localized in a few. The ROI based routing will handle this automatically. This also enables you to show ROI based reports and dashboards that demonstrate the value of continuous localization without getting into operational details executives don’t care about.

User Generated Content

User reviews are the hardest problem in this space. They are high-volume, unpredictable in quality, and important to conversion. A practical approach is to do the following:

  • Use template translation for structured data like ratings grouped by experience or amenity (cleanliness, location, friendliness, etc)
  • Before translating a review, you’ll need to detect the source language. Most MT platforms provide this as part of the translation call, so it’s typically handled automatically.
  • Always translate structured ratings data into all supported languages. Treat freeform review text as optional or on demand.
  • MT / AI translate reviews to a property’s supported languages, or optionally give the user an option to display a machine translation. The latter reduces costs because most users will not request translations (this is what Meta / Facebook does in feed translation).
  • For high-revenue properties, the ROI-based routing system from the previous section will automatically elevate user reviews to premium translation workflows. You don’t need to make this decision manually.
  • It’s worth filtering reviews for minimum length and quality signals before submitting them to translation workflows, since translating spam or one-word reviews wastes budget

Here the concern is to avoid unnecessary spending on localizing content that may or may not be driving conversion. The easy win here is to display structured data ahead of freeform content. Most users will want a quick way to predict quality, and are not going to want to read a long essay about one customer’s experience.

Inbound Language Detection

It is impossible to predict what language a user is going to write in based on their browser or app settings. All those tell you is which language the user prefers, but if they are bilingual or multilingual they might write in any of the languages they are proficient in.

To deal with this, you need to use language detection libraries or APIs whenever a user creates content. These services return the detected languages along with confidence scores. They are very reliable with long-form content, but can be uncertain when given short strings and can guess wrong in the case of closely related languages, such as Catalan/Spanish, Danish/Norwegian or Indonesian/Malay

Meta’s fastText open source library is widely used for language detection. langdetect, a Python library based on Google’s implementation is another example. Machine translation services also provide language detection endpoints. You probably want to use libraries for production scale use because the cost of API calls adds up quickly.

A good heuristic to follow is to label texts that contain multiple languages or have low confidence scores and exclude them from translation. For excluded content, display the source text with a note that translation is unavailable, rather than hiding the content entirely.

Stale Translation Detection

At scale, content changes faster than translation pipelines can keep up. A hotel’s description might be updated weekly. The architectural requirement is a mechanism to detect when source content has changed enough to warrant re-translation. Methods you can use include:

  • Timestamp comparison → if source content updated_at > translation_created_at mark as stale
  • Hash comparison → generate a hash based on the source content; if the hash changes mark as stale
  • Change event triggers → if the content system emits change events, subscribe to those and invalidate translations in response to events. Event based systems can fail at scale, so this should be backed up by periodic timestamp or hash change sweeps.

There is a question hiding in here, which is “When are changes significant enough to warrant re-translations?”

A lot of the edits made to source content are going to be minor edits that don’t affect meaning. A good rule of thumb is to use an edit distance as a proxy for this. For example, ignore changes whose edit distance is less than 4 characters. This will prevent the addition of an apostrophe from triggering retranslation. For short strings, an absolute character threshold works well. For long-form content like property descriptions, consider a percentage-based threshold, for example, changes affecting less than 1% of the total content length.

When small changes are detected, don’t discard translations because you can use them as a seed for retranslation, similar to translation memory. This can reduce the time and cost of AI translation.

Another option would be to run changes through an AI that is asked if the edit has a significantly different meaning. That seems like overkill, at least for now.

Clarification : We are not talking about document level retranslation here, but rather about strings, sentences and paragraphs that are assembled into page level views. So changing one character in a single component of a page will not trigger retranslation of every object in the page.

Quality Signals

When you can’t review everything, you need automated quality metrics and proxies. These include:

  • MT confidence scores : most MT services provide these metrics, so if a translation comes back with a low confidence score it can be routed for review.
  • Edit distance : the more heavily edited a translation is, the more assistance the MT or AI needs for that content type. Note that this is a trailing indicator, but it can be used to drive high level routing decisions.
  • User engagement signals : an asset that is getting lots of views but has low conversion in a language may have a translation issue. This can be a weak signal due to confounding factors that have nothing to do with translation.
  • User feedback : incorporating language accessibility questions into your feedback communication is a good way to get the user’s overall impression of this. Asking for micro-feedback about individual assets is distracting, but a short question like “How was your experience with our service in Japanese?” is a good way to do this.

With all that said, translation quality is highly subjective. What is good enough for a casual buyer might appear to be riddled with errors to a professional translator.

Translations are kind of like wine. A three dollar bottle of wine is probably going to taste awful. A thirty dollar bottle is going to be pretty good. Unless you are a serious wine connoisseur you probably won’t be able to tell the difference between a thirty dollar bottle and a three hundred dollar bottle.

Mapping this back to translation:

$3 wine → unsupervised MT / AI translation

$30 wine → human reviewed MT / AI translation

$300 wine → premium human translation or transcreation

Org Design Considerations

As I mentioned earlier, there are no off the shelf systems that do this, so you will need to build out a localization team, including a localization engineering team to build and support this. As I discussed in Building Out A Localization Team, localization at most companies involves a lean team that manages vendors and outsourced resources. That is not the case here. While you need localization leaders who are good at program management, managing vendors, and related tasks, this is an engineering heavy operation that requires principal / staff level personnel who have experience building and maintaining systems like these.

The good news is that AI assisted coding tools will reduce the needed headcount since a lot of the grunt work in building this infrastructure can be delegated to agents that are overseen by mid-level and senior engineers. Because of that this is within reach of smaller companies whereas before only companies like AirBNB and Expedia could do this, but the system architecture and buildout requires people whose experience and domain knowledge is quite rare (translation: not cheap).

A detailed staffing plan is beyond the scope of this article because every company’s situation is different, but I wanted to emphasize that this requires serious infrastructure with a dedicated team supporting it. That’s not something you can outsource. One decision that will determine the outcome of others is where the team is based.

RTO (Return To Office) & Localization Considerations

A few thoughts on return to office mandates and localization. RTO makes sense for some teams, but localization is not and never was one of them. Localization has been remote first for over twenty years. The people who work in the field are bilingual or multilingual and are often digital nomads. They do not want to be dragged into an office in a different city or country, probably in a drab office park somewhere. If you are looking for an experienced Castilian Spanish language lead, you are better off hiring them where they are than trying to find them in a specific location.

The one thing that will wreck a localization team or prevent you from forming one in the first place is forced RTO.

What I recommend instead is to host team gatherings several times a year, some onsite at the company headquarters, and some offsite. When I was at Notion, we would usually have one of our meetups at LocWorld, the leading conference and expo for localization professionals, since many of us attended anyway. It was easy to tack on a few days for the team meetup.

“The temperature of your chair is not a proxy for productivity or creativity.”

Conclusions & Takeaways

This is not a process or workflow that you manage by hand. It is infrastructure that runs continuously and adapts to real world conditions.

Not all content is equal. This makes intuitive sense, but at scale you need objective metrics to assess content based on its visibility, revenue potential and other criteria,

ROI based routing is critical to sustainability. Without it, you are at risk of spending too much on localization or not enough. Implemented well, the system will automatically steer itself by optimizing for ROI.

Quality is a spectrum, not binary. It is possible to use direct and proxy metrics to auto-steer toward “good enough” translations that are optimized for quality and cost.

Building the team is the hardest part. This pattern is achievable and can be applied to many business cases, but you will need people with rare skills to build and maintain this.

You may have already built part of this. If you have already built a dynamic translation pipeline, the ROI based prioritization and routing can be added onto that.

Related Reading

Building AirBNB’s Internationalization Platform (2020) – authored by one of the architects of their system, this article details their specific requirements and implementation. A great read if you’re looking for a detailed example of this.

Building For Runtime Localization – this article discusses how to build a platform for runtime localization, where dynamic content lives outside of static message catalogs, and the focus is on runtime peformance considerations.

Analytics : Prioritizing Content Based On Visibility – this article describes how to use instrumentation to drive automated decisions to route to different translation workflows to optimize for speed, cost and quality.

Combining AI & Human Translation – most translation workflows involve some degree of automation and the extent of automation depends on what needs to be prioritized (speed, cost or quality). This article discusses how to implement this and the tradeoffs involved.

Selecting A TMS – while companies operating at this scale will have outgrown a conventional TMS, it is important to understand what translation management systems do and don’t do.

Localizing AI Based Services – if you are serving content at this scale, you are almost certainly using AI to manage how users interact with it. This article discusses some of the issues to be aware of when the boundary between your user interface and AI is blurred.