Back to articles list
- 9 minutes read

How to Design a Localization-Ready System

In this era of globalization, companies – including software developers – are always interested in expanding to new markets. This often means localizing their products for different areas. In this article, we’ll explain a few approaches to designing your data model for localization – specifically, for managing content in multiple languages.

What Is Localization?

Localization is the process of adapting a product to various markets. It is a prominent factor in achieving maximum market share in terms of product sales. When localization is done correctly, users will feel that the product was produced for their language, culture, and needs.

In places where English is not a common first language, surveys have proved that local languages are much preferred for a software product. This MediaPost article contains some interesting figures on the need for languages other than English when it comes to international sales.

What Can You Lose When You Don’t Localize?

Let’s consider an unfortunate example of non-localization. For customers’ convenience, an e-commerce portal allowed customers to bundle individual purchases into a group of four. Unfortunately, the pronunciation of the word “four” in Japanese sounds like their word for “death.” Japanese people typically avoid all things associated with this number because it is considered unlucky. Needless to say, sales of those products didn’t go well in the Japanese market.

So, in answer to the above question, you could lose a whole lot if you don’t carefully plan for your target market.

Content Translation as Localization

When we think of localization, content translation is often the very first thought that comes to our minds. The second thought is “We need a robust and efficient database model to store translated content in multiple languages”.

Suppose we are asked to design a data model for a multilingual e-commerce application. We’d need to store text fields like product_name and the description of products in various languages. We’d also need to store text fields in other tables, such as the customer table, in all these languages.

To understand how to design our data model with content translation in mind, we’ll examine different approaches by using these two tables. Each of these approaches has pros and cons. The approaches demonstrated below can be extended to other tables in your data model. Of course, the exact approach you choose will be based on your own requirements.

Approach 1 – Adding Separate Language Columns for Each Intended Field

This is the simplest approach in terms of development. It can be implemented by adding one language column for each field.




Pros

  • It is very easy to implement.
  • There is no complexity in writing SQL to fetch the underlying data in any language. Suppose I want to write a query to retrieve product and customer details for a particular order in the French language. The code would look like this:

    Select p.product_name_FR, p.description_FR, p.price, 
           c.name_FR, c.address_FR, c.contact_name 
    from order_line o, product p, customer c
    	Where o.product_id = p.id and o.customer_id = c.id
    	And id = ;
    

Cons

  • There is no scalability: every time a new language is added, dozens of additional columns need to be added across tables.
  • It can get time-consuming, especially if many fields need to have multiple languages.
  • Developers end up writing the query shown below to manage all existing languages. So, all SQLs in your application must change when a new language is introduced. (Note: @in_language signifies the current language of the application; this parameter comes from the app’s front end, while the back end is retrieving records.)

    SELECT CASE @in_language 
                  WHEN ‘FR’ THEN p.product_name_FR
                  WHEN ‘DE’ THEN p.product_name_DE
                  DEFAULT THEN p.product_name_EN,
               p.price,
              CASE @in_language 
                  WHEN ‘FR’ THEN c.name_FR
                  WHEN ‘DE’ THEN c.name_DE
                  DEFAULT THEN c.name_EN,
              c.contact_name
    FROM order_line o, product p, customer c
    	WHERE o.product_id = p.id AND o.customer_id = c.id
    	AND id = ;
    

Approach 2 – Creating a Separate Table for Translated Text

In this approach, a separate table is used to store translated text; in the example below, we’ve called this table translation. It contains one column for each language. Values that have been translated from field values into all applicable languages are stored as records. Instead of storing actual translated text in underlying tables, we store its reference to the translation table. This implementation allows us to make a common repository of translated text without making too many changes to the existing data model.




Pros

  • It’s a good approach if localization is to be implemented on an existing data model.
  • A single additional column in the translation table is the only change needed when a new language is introduced.
  • When the original text is the same across tables, there is no redundant translated text. For example: suppose a customer name and product name are identical. In this case, only one record will be inserted into translation table, and the same record is referred in both the customer and product tables.

Cons

  • It still requires a change in the data model.
  • There can be unnecessary NULLs in the table. If 1,000 fields are needed for only one supported language, you end up creating 1,000 records – and a lot of NULLs.
  • The complexity of writing SQL increases as number of joins increases. And when there are a lot of records in the translation table, retrieval times are slower.

    Let’s write a SQL for the language-management problem statement again:

    SELECT CASE @in_language 
                  WHEN ‘FR’ THEN tp.text_FR
                  WHEN ‘DE’ THEN tp.text_DE
                  DEFAULT THEN p.product_name_EN,
               p.price,
              CASE @in_language 
                  WHEN ‘FR’ THEN tc.text_FR
                  WHEN ‘DE’ THEN tc.text_DE
                  DEFAULT THEN c.name_EN,
              c.contact_name
    FROM order_line o, product p, customer c, translation tp, translation tc
    	WHERE o.product_id = p.id AND o.customer_id = c.id
    	AND p.name_translation_id = tp.id
    	AND c.name_translation_id = tc.id
    	AND id = ;
    
    

A Variant on the Translation Table Approach

To get better performance when translated text is being retrieved, we can split the translation table into multiple tables. We can group the records based on domain, i.e. one table for customer, another for product, etc.




Approach 3 – A Translation Table with Rows for Each Language

This implementation is quite similar to the second approach, but it stores the values for translated text in rows rather than columns. Here, the translation table id remains the same for a field value in any translated language. A composite primary key {id, language_id} is stored in the translation table, and it stores the same translation_id for each field against multiple language_id.




Pros

  • This implementation allows developers to write data-retrieval SQLs quite easily.
  • It’s also easy to write OEM for this model.
  • No data model changes are needed when you add a new language; just insert the records for the new language.
  • There is no unnecessary memory consumption when a set of fields are not applicable to a language.
  • The complexity of data-retrieval SQLs is reduced. Look at the example below:

    SELECT tp.text,
            p.price,
           tc.text,
            c.contact_name
    FROM order_line o, product p, customer c, translation tp, 
         translation tc, language l
    	WHERE o.product_id = p.id AND o.customer_id = c.id
    	AND p.name_translation_id = tp.id
    	AND c.name_translation_id = tc.id
                 AND tp.language_id = l.id
                 AND tc.language_id = l.id
                 AND l.name = @in_language
    	AND id = ;
    
    

Cons

  • A relatively high number of joins is required to retrieve translated data.
  • In time, a huge number of records may potentially be stored in the translation table. Since there is only one translation table, all translated text will be dumped into it. When you have millions of fields to be translated and your application supports a large number of languages, then querying one table for a translation would be a time-consuming activity. However, you can split the translation table either based on languages or subject areas, as we pointed out at the conclusion of the second approach.

What If We Combine Approaches 2 and 3?

Specifically, we’ll combine the variant of Approach 2 – splitting the translation table – with the idea of using rows in a table. We can easily overcome the downside of having huge records in the translation table by creating multiple tables based on a domain, like we did in the Approach 2 variant. If there are a considerable number of records in the domain-based translation table, we could further split the tables based on individual fields.




Approach #4 – Creating Entity Layers for Translated Fields and Non-Translated Fields

In this solution, entity tables which contain one or more translated fields are split into two layers: one for translated fields, and another for non-translated fields. This way, we create separate layers for each.




Pros

  • There is no need to join translation tables if only non-translated fields are concerned. Therefore, non-translated fields get better performance.
  • A relatively fewer number of joins are required to retrieve translated texts.
  • It’s easy to write OEM.
  • SQLs for retrieving translated text are simple.
  • This is a proven approach for incorporating multiple languages across entities.

To demonstrate the point, here is a sample query that will retrieve translated text:

SELECT pt.product_name, pt.description, p.price
FROM order_line o, product p, product_translation pt, language l
	WHERE o.product_id = p.id AND 
	AND p.id = pt.product_non_trans_id
	AND pt.language_id = l.id
               AND l.name = @in_language
	AND id = ;

Localization – Going Beyond Content Translation

Localization is not just translating your application content to another language. There are cultural and functional parameters that need attention as well. For example, a date value is formatted as ‘MM/DD/YYYY’ in North America, but in most of Asia it is written as ‘DD/MM/YYYY’.

Another example has to do with displaying names on the application header. In the US, calling someone by their first name is acceptable or even preferred; in this culture, the customer’s first name is displayed on the header as soon as the customer logs in. In Japan, though, things are reversed: calling someone by their first name is impolite or even rather insulting. Localization would take this into account and avoid the use of first names for the application’s Japanese audience.

Localization parameters may need to be stored at the back end of the application. This is very much the case when you have to design some program logic around localization. We can probably categorize all such parameters into cultural and functional categories, and build the data model around them.

One More Thought About Localization

Localization is necessary when a global brand penetrates into local markets. In order to localize an application, look at the big picture. Localization is more than just technically perfect performance. It also means having mastery of local cultures, behaviors, and even ways of thinking and living.

What else can be done to make an application locally adaptable? Let us know your thoughts in the comments.

go to top