Dynamic Translation Pipeline: Runtime Localization for Modern Applications

Traditional localization systems assume content is known ahead of time. That model breaks down in modern applications where content is generated dynamically at runtime. In these systems, localization must operate as infrastructure: resolving, generating, and improving translations in real time.

This approach reduces upfront localization cost, improves coverage of long-tail content, and allows translation quality to scale with actual user demand.

Message catalogs made sense in when software was shipping infrequently or on physical media. The problem with this approach is that message catalogs are out of date in an era where most software is being built on top of AI first platforms. Important caveat: message catalogs still make sense for static content such as UI strings and mobile clients where connectivity and latency are concerns.

How do you deal with content that lives outside of your code base (for example in a database or external API)?

In order to localize this content, you will need to build a dynamic translation pipeline that operates at runtime.

A dynamic translation pipeline is a localization architecture that retrieves and generates translations at runtime, using APIs, caching layers, and translation services instead of precompiled language files. This model embraces eventual consistency, translations improve over time as the system observes real usage instead of treating all content equally.

The Basic Design Pattern

First, you’ll start off with a function, let’s call it t(text, context)

This function will contain the logic to fetch a translation from a web service or cache, or initiate a request for translation if this text has never been encountered before.

A typical implementation will work something like the following.

Generate a hash using the input text and context. This is used as an identifier or key value for the text and its translations. Alternatively you can provide a manually generate keyname if you don’t want to rely on auto-generated values.
Check the local or in memory cache, return a translation if one exists. You can also look first in a localization message catalog (think of this as a form of local cache), then proceed to the dynamic pipeline. This is a way of using static or dynamic translation tooling via the same function call.
If no translation is found in cache, query an API endpoint to see if the translation service has a translation. If yes, return the translation and update the cache.
If no translation is available, the web service will kick off a request for a translation and return the source text since no translation is available yet.

Server Side Implementation

You’ll need to build a simple REST API endpoint that answers translation requests from clients. This service will check a database to see if it has a translation for the requested string. This is a simple database lookup, though you’ll probably want to use in memory caching or memcached for performance.

If there is not yet a translation for the requested string, the request handler will do the following:

Create a new database record with the source text, context message and hashcode or keyname
Optionally fetch a machine translation and save that to the database as a placeholder translation. This can be replaced with a human edited translation when someone has time to review and edit the machine translation.

It is a good idea to have a counter that tracks the number of requests for a string. The reason for this is to avoid translating strings that are only requested once. A common error developers make is to insert interpolated values in a message. This can flood your translation pipeline. If the request count is under a certain threshold, the string won’t be queued for human translation.

The database schema for the message catalog will look something like this:

Keyname
Service Or Module Name
Locale
Source Text
Context Message
Number Of Recent Requests

TMS Integration

The bad news is that this does not exist off the shelf, so you will need to build this service. Most TMS providers have web APIs for managing files and projects, but do not provide a highly scalable web service for translation request and delivery. There are some Javascript/CDN proxy services like LocalizeJS and Transifex, but they are better suited for “skinning” web assets and are not used to localize backend systems. One of the things I have been exploring is an open source project to extend existing localization libraries to support this.

Most TMS providers do not provide support for dynamic translation and require you to build this service layer which looks to the TMS like it is using resource files. For a comprehensive overview of popular TMS platforms, be sure to read Translation Management Systems (Which One Is Best For You)

The next thing you will need to do is to build a cron job that uploads and downloads message catalogs to your translation management system.

Uploading Source Language Content

This cron job will run a db query and write out the results to a text file or JSON catalog. The query will be something like this.

			
SELECT * FROM Messages 
  WHERE Locale="en-US" AND RequestCount > 5
  ORDER BY Keyname

Write this out to the intermediate file, and then upload that file to the TMS’s Files API. The TMS will take care of detecting new or updated strings, so this bot can be a dumb fire and forget process.

Downloading Translations

This cron job will fetch the translated message catalogs from the TMS and write the translations to the database (and refresh the cache if it is front ended by memcached or similar).

Example Workflow

A user loads a page in Spanish
The system calls t(“Checkout”,”This is a cart checkout button”)
Cache miss
DB miss
Machine translation returned immediately
Request count increments
After threshold reached, string is queued for human translation
Translation is delivered asynchronously and cached
Future requests return high-quality translation with near-zero latency

Key Benefits

Less dependence on message catalogs : source texts can be defined or referenced inline. Developers no longer need to context switch between code and a JSON file that contains strings. Adding or updating in app context is as simple as typing t(“Hello World”,”Hi there”)
Static and dynamic messages : in app messages can be statically or dynamically typed, with statically typed messages defined in quotes and dynamic messages passed in via variables. This is not possible with message catalogs.
Backward compatibility with message catalogs : the t() function can be designed to read first from a message catalog proceeding to call the dynamic pipeline. This can be automatic or driven by a dynamic = true|false parameter.
Continuous translation delivery : translations no longer need to be merged into your code base and are delivered “out of band”. This is important because translations, especially human authored or edited translations, flow into the system unpredictably. These edits will become visible sooner than if they need to be merged into your repo.
Reliability : translations never touch your code base, so the risk of a translation update causing a software failure are greatly reduced. The worst case failure is a message may regress to English if no translation is available.
Instrumentation : this is powerful because you can measure the relative visibility of messages based on how often they are requested. This enables you to make automated, data driven decisions about which translation workflows to use on a message by message basis. For example, you can route high visibility messages to more intensive review, while sending low visibility messages to more automated AI heavy workflows.
QA coverage : QA logic can be incorporated into the translation function, which enables you to catch problems with source content like missing context, upstream of translation.

The only real downside to this approach is that no vendor I know of provides an out of the box solution, so you’ll need to spend some time building a web service to front end whichever TMS you decide on. Hopefully this will be addressed via an open source project.

Common Failure Modes

Cache failure

The main issue to plan for is what happens if the caching layer fails or is reset and needs to be repopulated. There are several ways to deal this. A simple and reliable solution is to cache translations locally so that if the network cache fails the t() function can load from that. Translation updates might not appear until the network cache is restored, but that’s okay since most translation updates are low frequency events.

On other thing you should do is to implement a cron job that periodically updates the cache. This way you don’t need to wait for clients to make requests to trigger cache updates. This can operate asynchronously in the background.

Translation pipeline failure

This could be a technical failure or a human failure. Whatever the cause this means that translation updates are not being delivered. This is an insidious failure because it does not cause an overt malfunction. Translations just stop being delivered, and over time, get out of sync with the source language.

A good solution for this is to use a pseudo-locale for automated QA testing. In this approach, translations to the test locale are prepended with metadata such as the last cache update, last DB update, etc. Doing so will enable you to detect if something is blocking translation delivery.

Duplicate key names

This is also an issue with message catalogs and mostly occurs when using short keynames like “home” that could have multiple uses and contexts. The best way to deal with this is to auto-generate keynames so they include the source string and service area or module, and are suffixed with a hash that is generated from the source string, context message and service or module name. That will basically eliminate duplicates, while still enabling reviewers to search for messages via keyname lookups (which is common in QA workflows).

Runtime Performance Considerations

If you are rendering static content such as a UI menu and value low latency, it’s best to use message catalogs. As noted earlier the t() function can be designed to use static and dynamic translation methods.

The big issue to be aware of is latency. A dynamic translation service will generally have higher latency, which can be mitigated as follows.

Cache locally in memory : to the extent possible, cache translations locally with a high time to live (TTL). The only time the client should need to call the web service is when it encounters a new source text or when the cache has expired. Translations returned from local cache have sub-millisecond latency, while translations returned from the web service will be considerably longer (10-100ms).

Accept a low volume of regressions to English : this is a product decision as well as an engineering decision. Companies that run dynamic pipelines like this generally embrace a “fix forward” approach where it is better to accept a small number of regressions than to allow localization completeness to be a blocker (localization is never complete).

Use machine translation instead of LLMs for real-time requests : machine translation services like Google Translate are optimized for speed and have ~100ms latency whereas LLM based services are considerably slower. If possible, fetch machine translations asynchronously.

Render machine translations within the client : this is particularly relevant for user generated content. Facebook and X both do this to translate user messages, either automatically or on request. This way the MT request happens within the client and doesn’t affect the time needed to serve the page.

Done right, this enables the user to have the best of both worlds by using message catalogs for static content or content that requires minimum latency and by using the dynamic pipeline for everything else.

Enhancements And Issues To Be Aware Of

The design pattern described in this article is a pretty basic implementation and while it will get the job done, there are a few things you can add to this (and also a few issues to be aware of).

Analytics : Tracking Visibility

One of the cool things you can do is track how many times a particular string is requested, and therefore score strings in terms of relative visibility. This allows you to automatically treat high visibility strings differently than low visibility strings. For example, you might decide that until a translation for a string is requested N times, machine translation is good enough.

Another thing you can do is tag high visibility strings, so that human translators and reviewers can be directed to work on those before working on lower visibility content. Higher visibility generally equates to higher value and lower risk tolerance, so you can route this content for more intensive review and oversight.

In practice, thresholds are typically set based on request volume over a rolling time window (e.g. 10 requests per hour triggers machine translation, 100+ per hour triggers human review).

See Analytics : Ranking Texts By Visibility for more detail.

Bundles : Grouping Translations Into Packages

One thing you can do to increase performance is to deliver translations in bundles. If you are tracking requests by service area or module, the web service and caching layer can deliver all translations for a given product or feature. If your product has a large surface area and many modules, this will reduce repetitive queries and reduce payload sizes.

CDN versus caching

One thing to think about is whether to use a CDN or a caching service like memcached. Either can deliver translated assets to clients, mostly likely as JSON packages. Good arguments can be made for either approach. Which you go with is a matter of preference and engineering guidelines.

Feature Releases : Tracking Service Areas

You can add a parameter to the requests and db schema to track what part of your application a request is coming from. This is helpful if you service contains many sub-modules.

Efficiency : Purging Unused Strings

Static message catalogs are like “roach motels”. Strings check in, but they never check out. If you have a complex application with a high rate of change, these can become quite bloated which impairs performance and also results in unnecessary translations and expense.

Here you just implement a rule to purge strings from the database that have not been requested in N number of days. The TMS typically maintains a translation memory so if a purged string is later added back, the previous translations will be recalled.

QA : AI Analysis Of Source Content

Another thing you can do in the background is to run a cron job that examines source messages and context messages for potential issues. The cron job queries an AI agent to ask it if it sees syntax or structural issues with the source content. This is often a source of poor translation by AI and humans if the source content is unclear or poorly worded. This analysis is done whenever new source content is introduced.

Thresholding : Interpolated Values

One of the main issues to be aware of is that sooner or later, a developer is going to insert interpolated values into requests.

			
# What the developer did
print(render_translation("This request was made at " + str(now()) + "!","TS"))
# What the developer should have done
print(render_translation("This request was made at {ts}!").replace("{ts}",str(now()))

The problem is that every time this code runs it will insert a new string into your message catalog. This will flood your translation queue. This is why you should threshold requests so that if this does happen, this does not result in excessive numbers of translation requests. The basic idea is that a message needs more than 1 request to trigger a translation request.

In practice, thresholds are tuned based on request volume over a rolling time window, for example, low-volume strings may only receive machine translation, while high-frequency strings are escalated for human review.

One other thing you can do is to examine the strings being passed into the function to see if they contain:

dates, times or datetimes
non-zero numeric values
common names (via dictionary lookup)
URLs
Evidence of string concatenation (e.g. “You have ” + str(value) + ” widgets in your account”)

This is something you can do at run or build time and flag suspicious messages which can then be compiled into a report.

You’ll also want to monitor the translation server logs to alert you when it starts seeing a large number of new translation requests.

Security Threats

One issue to be aware of is that the web service needs to be secured to prevent bad actors from abusing it. The biggest risk is that it will be flooded with bogus translation requests. In most cases, there is no reason it needs to be accessible outside of your network since the majority of use cases involve server side implementations.

Operational Metrics

Key metrics to monitor include cache hit rate, translation latency, translation queue backlog, and cost per translated string.

Latency : Machine Translation Performance

If you decide to use machine translation to obtain placeholder translations, be aware that there can be significant latency especially with LLM based services. Neural machine translation engines, such as Google Translate, tend to be faster but are still slow in terms of runtime performance.

If you anticipate that your service will get a lot of requests for new content at a time, it is probably a good idea to handle this asynchronously and let the first requests fall back to English. One way to do this is to have a bot that runs a frequent query like:

			
SELECT * FROM Messages 
  WHERE Locale = "es-ES" AND Text = "" 
  AND RequestCount > 5
  ORDER BY Keyname

This will pick up recently requested and untranslated strings. It can then iterate through the query results and call the machine translation API for each one, update the db record, and move on. In most cases, this will be fast enough that users will rarely see regressions to English.

Optimizing For Performance Versus Translation Delivery

One of the tradeoffs you’ll need to make is whether to optimize for performance (long cache TTL) or rapid translation delivery. If the content you are sending through this pipeline does not change very often, you can use a long TTL for client and server cache. On the other hand, if the content is changing a lot, do the opposite. In most cases a fairly long TTL like several hours is a good tradeoff in runtime performance versus translation delivery. Where you decide to set this will depend on which matters more for your use case.

UI strings and static content especially for mobile apps should be kept in embedded message catalogs that are defined in code. There isn’t really a good reason to send these “over the air” as it can lead to long application load times and other performance issues when connectivity is limited.

When NOT to use a dynamic translation pipeline

While there are many use cases where dynamic localization is a superior approach, there are a handful where it is not recommended.

Static assets

If you have a static website or asset with a very low rate of change, it is probably overkill to use this approach. You should be fine with using your TMS connector to whichever container platform that lives in.

Mobile apps

Static content and UI elements in mobile apps do not lend themselves to this approach due to connectivity and latency concerns. Dynamic content that is served to these apps can and should flow through a dynamic translation pipeline. At Lyft, for example, the static UI for mobile apps was pretty small, but we served all sorts of dynamic content to the Lyft app such as coupons, driver promotions, pickup instructions, etc. All of this flowed through a dynamic pipeline.

Conclusions

Run time localization replaces batch-oriented localization with a real-time, demand-driven system.

Teams that adopt runtime localization shift localization from a blocking dependency to a self-improving system, one that moves localization from a release step to part of the runtime environment.

Localization is no longer a release step. It is part of the runtime.

Building For Runtime Localization