Virtually every company I worked for made the mistake of not following internationalization best practices early on. Even if you don’t anticipate making your app or service available in other regions today, it is important to follow a few simple practices to avoid incurring tech debt that is time consuming and expensive to retire later on. Fundamentally this about separation of concerns, a staple of modern software development. Just as you don’t want to hard wire business logic in your presentation layer, the presentation layer itself should be designed to support more than one language.
Build Wrapper Functions To Render User Facing Prompts
Almost every company I worked for hard-coded user facing strings in English, instead of passing them through a function. The problem with this by the time you get around to launching additional languages, you may have have thousands of strings scattered throughout your code base. Cleaning this up later is expensive, tedious and time consuming (and nobody wants to do this work because it is so boring).
Create a dummy function like the one below that then passes user facing prompts through. All it does is receive the text to be displayed along with context (a message describing how it is used), and then returns that text. Notice that I included a simple test to verify the context message is present (AI and human translators greatly benefit from information about the context strings are used in, and developers habitually forget to provide this). I also included an optional url parameter, so translators can click through to in context view of the prompt, which is hugely helpful for QA. This function is also a good place to insert QA logic so you can set up better automated testing.
def t(text, context, keyname="", url = "", tag="", debug=False): if type(text) is not str: raise Exception("text must be a string value") if type(keyname) is not str or len(keyname) < 1: raise Exception("A keyname is required, and should be in the following format platform.module|feature.camelcasedabbreviation") if type(context) is not str: raise Exception("context must be a string value") if len(context) < 1: raise Exception("You must provide context for how this string is used") if type(tag) is not str: raise Exception("tag must be a string value") # # future logic to hook into the translation pipeline goes here # # # logic to log every 1 in 1000 views (for visibility ranking) # num = random.randomint(0,999) if num < 1: log_string_event(text, keyname, session_locale, tag) if debug: return keyname + ':' + text else: return textprint(t("Hello World!","Display a Hello World greeting to the user.", keyname="web.home.HelloWorld"))
A quick comment on keynames. Keynames make it easy for translators and reviewers to find the specific instance of a prompt that needs to be re-translated or edited. For example, you may have many variants of the prompt “Home”. If QA testers can identify a prompt by keyname, this will make it easier for reviewers to find and fix things in the translation system.
I generally recommend that keynames use a dotted format like {platform}.{featurearea}.{stringname} where the string name is a camelcased version of a string or abbreviation that uniquely identifies the string (be sure not to use duplicate keynames and run a test at build time to detect them.
One way to do this is to design the function so it collects this information automatically, and auto generates the key name in a way that prevents accidental duplication.
Just doing this eliminates a huge source of tech debt. You don’t need to worry about the specifics of what your translation pipeline will look like now. You can decide on those details and update this wrapper function to implement that later.
🚖 At Lyft, by the time the company decided to add Spanish and several other languages, it had accrued years of tech debt that took over a year and well over a million dollars of engineering time to retire. Meanwhile, Uber was already operating in over a dozen languages and was killing us in the Spanish speaking market in the US. This all could have been avoided had the company followed global ready coding practices early on.
What About AI Based Services?
If your service is built around AI, you’ll have an additional set of considerations to think about because the product itself is linguistic. I talk about this in more depth at
Build Wrapper Functions To Render Dates, Numbers and Currency Amounts
In a similar vein, you should build dummy functions to render dates, times, numbers and currency amounts. The formatting for these each varies by locale, so you don’t want to bake US English assumptions for these into your code. Take dates, for example. July 1st, 2023 will typically be formatted as 1 July 2023 in Great Britain. The good news is there are libraries like Intl that handle all of this for every locale imaginable.
def displayDate(d, date_format): # check the input is a valid date object if type(d) is not Date: raise TypeError("d must be a date value") # if the session_locale is US English use a custom formatter, otherwise use # a generic formatter if session_locale == "en_US": output = prettyDate(d, date_format, session_locale) else: output = intl_date_format(d, date_format, session_locale) return str(output)
If the programming language you are using has a mature internationalization library, such as ***Intl*** for Javascript, you can just use that. This problem has been solved many times over, so you don’t need to re-invent the wheel. That said, it is a good idea to wrap that with your own function, so that you can override the default formats with your own rules (for example to prettify dates and times per your design guidelines in specific cases, then fallback to whatever the i18n library generates).
Don’t Concatenate Strings, Use Message Templates Instead
Generating sentences on the fly is a big no-no because not all languages have the same subject-verb-object word order that English does. A dynamically generated sentence that makes sense in English will look completely jumbled in a language like German.
# DO NOT DO THISmsg = "You have " + str(count) + " widgets in your account."# DO THISmsg = "Your account balance: {count}").replace("{count}",str(count))
The best practice is to use message templates with interpolated values. This way translators can reorder the sentence to conform with the rules for the target language. If you are merging numeric values into a message, you’ll probably want to use the ICU message format for this because each language handles pluralization differently.
Beware Of Pluralization and Gender
As noted above, languages handle plural values differently, something else you don’t want to deal with in code. The ICU message format deals with this well. So use that for messages with numeric interpolated values.
You also need to be aware of gender. For example, in Spanish nouns and adjectives must share the same gender (masculine or feminine). The word for red is rojo (masc) or roja (fem), and which form you use depends on the gender of the noun it is modifying. While it is perfectly understandable if the gender is mismatched, it looks bad and lazy to native speakers. The same goes for numeric values.
For this reason, you should take care when designing message templates that use interpolated values. I am not saying you shouldn’t use interpolated values, just be mindful of how rules vary by language and try to avoid generating sentences on the fly.
Currency Support
Just as you don’t want to hard wire English into your interface, you should prepare to support multiple currencies and payment methods. This is pretty easy to do with modern payment platforms like Stripe, as they support major currencies and popular payment methods (alternate payment methods are important when you offer service in developing countries).
There is no need to overthink this, but you should keep pricing in a database or config file that allows different pricing by currency. This is important because in some regions you may want to adjust prices to reflect differences in purchasing power. This is especially relevant for consumer and SaaS services.
Use Linters To Enforce Best Practices
Since you are probably setting up a suite of QA tests to run as new code is checked in, it is a good idea to set up tests to enforce localization/internationalization best practices. These same practices are also good for English content (for example, providing context on how strings are used will help UI/UX staff to copyedit and fine tune them). This will prevent tech debt from sneaking into your code base.
Things to test when developers check in code include:
Missing or suspiciously short context and translator instructions
Empty strings
Strings which do not conform to ICU formatting (for strings that contain ICU directives)
Duplicate keynames (although the displayPrompt() function can be written to generate keynames automatically to avoid accidental dupes.
Choose A Content Management System That Is Global Ready
Most startups will settle on a CMS for their web and help center properties long before they consider expanding into new regions and languages. This early decision is often driven by short term needs, and so considerations around things like multi-lingual support are often overlooked. As a result, they later find themselves boxed in by a CMS that can’t grow with them.
There are two important features to consider when choosing a CMS.
One, does it support multiple locales? A surprising number of CMS platforms do not support more than one language per instance. If it is a monolingual CMS, you will be screwed when you need to add languages. None of the workarounds for dealing with this are very attractive. It is also important to ask what TMS (translation management systems) it integrates with.
Two, does it have a page based architecture, or is it based on reusable content modules than can be combined (like LEGO blocks)? The latter gives you a lot of flexibility, not just for localization (for example, building country specific websites), but makes managing content easier. For example, if you have a callout that is reused throughout your help center, you can edit that in one place, and the changes are reflected throughout the site. With a page based site, you have to make those edits in every page that uses that callout (tedious and error prone).
One of my favorite CMS platforms that meets both criteria is Contentful. It is not the cheapest platform, and it does require more technical work to support, but it will grow with you, and it is well integrated into other systems including translation management systems.
Hire Bilingual EPD Staff
While you generally don’t want bilingual EPD staff to spend time on translation, they can help to assess how well your product is working in their language. This is especially important now that most new companies are building around AI. AI services are highly sensitive to training data, which varies widely in terms of quantity and quality by language. As a result a service that works well in English, may struggle in other languages. Bilingual staff are often passionate about accessibility and can make the difference between a lazily translated app and something that feels like it was natively built.
Accessibility
👉 In the course of doing this, you will also be laying much of the foundation for accessibility for disabled users.
Wrap Up
Just following these simple guidelines will save you months of time and associated expense when you are ready to start adding languages. For complex applications, this can easily add up to over a million dollars in engineering time, and greatly delay time to market for new languages.
This also involves little or no extra effort for your developers, and it will help UI and UX staff polish your app or service in your home language. There is really no reason not to do this, even if you never plan to launch outside of English.