While some software systems are used by a limited number of users speaking the same language, most organizations need to unify and centralize their applications to be used by people speaking different languages all around the world. Multilanguage databases present an additional level of difficulty in designing and implementing data models. In this article, we suggest some approaches for dealing with this challenge.
What Information Do We Need to Store in Multiple Languages?
On the surface, all string information may seem plausible for translation into multiple languages. However, this is usually not the case. Customer-related information like
Address may be translated, but that may not be a good idea.
Take a business customer in the United Kingdom named “Riverside Trucks” with an office at “123 Upper Castle Road.” You do not want a Spanish-speaking user to print and send a letter to “Camiones Orilla” located at “123 Calle Castillo Superior.” Royal Mail (the UK’s postal service) will not find it! You probably want to translate just the columns containing descriptive information, not proper names.
When designing a system to handle translations, it is not always known in advance exactly which columns are translatable and which ones do not require translations. Choosing a flexible approach saves a lot of time in design and development. Take a look at the article “How to Design a Localization-Ready System” to see some examples.
What Approaches Do We Consider?
In this article, we describe three approaches to multilanguage database design. We start with the simplest one that is not as flexible then move on to consider other options, explaining the pros and cons for each.
Both the syntax and the database models (available on Vertabelo web-based data modeler) used in this article are for SQL Server. However, they are easily adapted to any database engine.
Approach 1: Creating Additional Columns to Hold Translated Contents
This is the simplest approach to implement, albeit not a very flexible one. It consists of adding a column for each column and language we need to use in our system, as shown in the following Vertabelo diagram:
While this may seem like a very simple solution, it has some drawbacks. We explain below.
Con: Code Complexity
This approach makes the code more complex. It requires us to either write a different query for each language or use a
CASE construct to retrieve the translation for the appropriate language based on the user configuration. See the following code for instance:
SELECT ProductID, CASE @Language WHEN ‘ES’ THEN ProductName_ES WHEN ‘DE’ THEN ProductName_DE WHEN ‘FR’ THEN ProductName_FR ELSE ProductName END AS ProductName, CASE @Language WHEN ‘ES’ THEN ProductDescription_ES WHEN ‘DE’ THEN ProductDescription_DE WHEN ‘FR’ THEN ProductDescription_FR ELSE ProductDescription END AS ProductDescription, Price, Weight, ProductCategoryID FROM Product WHERE …
Note: In the example, we use the variable @Language to keep the language we want to use. You may consider using SESSION_CONTEXT() (or Application Context in Oracle) to set and read the language for each user.
Con: Lack of Flexibility
This approach lacks flexibility. If we need to implement a new language, we need to modify our data model by adding a column for the new language for each translatable column in our system. We also need to create a new language query for each table (or edit the existing one by adding a new
CASE WHEN clause that uses the new language for each translatable column).
Con: Challenges in Handling Unknown Information
Imagine this: a user adds a product but does not know how to translate it and leaves the translated columns empty. Users speaking those languages see
NULL or blank information in the columns that may be required.
Approach 2: Isolating Translatable Columns in a Separate Table
This approach groups the columns in a table into translatable and non-translatable columns. Non-translatable columns stay in the original table. In contrast, the translatable ones are in a separate table, with a foreign key to the original table and a language indicator. See below:
The diagram shows the original tables (with no translatable data) in white. The tables holding the translations are in light blue, and the master table holding the language information is in yellow.
This has huge flexibility advantages compared to using multiple columns as discussed earlier. This method does not require changing the data model when a new language is needed. Also, the syntax to query the information is simpler:
SELECT p.ProductID, pt.ProductName, pt.ProductDescription, p.Price, p.Weight, p.ProductCategoryID FROM Product p LEFT JOIN ProductTranslation pt ON pt.ProductID = p.ProductID AND pt.LanguageID = @Language WHERE …
However, there are some cons still, as we discuss below.
Con: Challenges When Additional Columns Need to Be Translated
If we need to convert a non-translatable column into a translatable one (or vice versa), we need to modify our data model, moving the column from one table to the other. This usually implies greater costs once the system is implemented and in use.
Con: Challenges in Handling Unknown Information
Like the first approach, this approach has challenges when dealing with unknown information. Again, if a user adds a product but does not know how to translate it and leaves the translated columns empty, users speaking those languages see
NULL or blank information in columns that may be required. Also, the query requires a
LEFT JOIN in case the translation for the language of the current user has not yet been created so that the non-translatable data is still shown.
Approach 3: Adding a Translation Subsystem
Translation can be considered as a feature that is completely independent of the data model requiring translation. In an ideal system, we may enable or disable translation for any column without requiring modification to the data model. Unfortunately, this is easier said than done.
We present a method that has no impact on the existing data model and is completely flexible. Although it adds complexity at the time of querying the data, it does not require any additional change to the data model except for a few tables. This may be a great choice if you need to add the capability to hold translations to an existing data model.
Let’s take a look at the model and see how it works:
The first thing to notice is that the original data model has no changes at all. Also, there is no direct relationship between that model and the translation subsystem.
The translation subsystem includes a small data dictionary with the tables and columns requiring translation. This data dictionary can be modified by just adding/removing rows without altering the data model. Translations are stored in a separate table, with each value identified by the following 3 columns:
ColumnID: Uniquely identifies the column (and the table) we are translating.
KeyID: Stores the ID (primary key) of the specific row we are translating.
LanguageID: Identifies the language of the translation.
This design allows data to be entered and stored in the original tables, adding translations only if and when required. The translated information is used when retrieving data, keeping the original data (in the original language) untouched.
Data can be queried using a more complex syntax than the examples above. It requires an additional
JOIN for each translatable column as shown below:
SELECT p.ProductID, ISNULL(t1.TranslationValue, p.ProductName) AS ProductName, ISNULL(t2.TranslationValue, p.ProductDescription) AS ProductDescription, p.Price, p.Weight, p.ProductCategoryID FROM Product p LEFT JOIN Translation t1 ON t1.ColumnID = <
> AND t1.Key = p.ProductID AND t1.LanguageID = @Language LEFT JOIN Translation t2 ON t2.ColumnID = < > AND t2.Key = p.ProductID AND t2.LanguageID = @Language WHERE …;
<<ProductName_ColumnID>>” and “
<<ProductDescription_ColumnID>>” must be replaced with the IDs of the columns to be translated as stored in the
ColumnInformation table. Consider generating translation views for each table requiring translation to hide the complexity of the JOINs for the end users. You can even automate this step with a script that generates each view. This script can query the data dictionary of the database to pick the tables and columns and add the translation logic for the columns that exist in the
Extra Tip #1
You can also simplify the syntax. Replace each JOIN with a call to a function that handles (and hides) the translation aspect, as shown below:
SELECT p.ProductID, ISNULL(fn_translate(‘Product’,‘ProductName’,ProductID), p.ProductName) AS ProductName, ISNULL(fn_translate(‘Product’,‘ProductDescription’,ProductID), p.ProductDescription) AS ProductName, p.Price, p.Weight, p.ProductCategoryID FROM Product p WHERE …;
The function may read the desired language from the context, or you may add it as an additional parameter. In this example, the function uses the table and column names provided as parameters plus the row key (also provided as a parameter) to search for and return the desired translation.
Calling a function does imply an additional impact on performance due to context switching between SQL and procedural language. However, it may be a simpler solution for databases or tables where the amount of data being translated allows it.
Both examples – the one with a JOIN and the one with a function – use the ISNULL() SQL Server function. So, when the translation to the desired language does not exist, it still displays the original value stored in the ProductName and ProductDescription columns instead of blanks or NULL.
The third approach is usually the best for implementing bigger designs. It allows for flexibility both at design and once the system is in use. However, there are specific considerations that may make the other approaches useful. Regardless of your choice, consider the following to save time both at design and at development/implementation.
Add a Layer of Abstraction
As mentioned earlier, consider creating views that take care of the translation logic, e.g., selecting one column among multiple translation columns or joining to specific rows. This keeps specific implementation details hidden from programmers. They simply use these views rather than having to construct complex SQL sentences each time they need to access a table with translatable information.
Use Context to Filter Data
As mentioned in the first example, consider using context functions available in most database engines. Use them to store user language information once logged into the system then filter the results automatically in the views that take care of the translation.
Modern systems may have hundreds and even thousands of tables. Take time to automate generating the translation views rather than writing the queries one by one. It may take you some time to arrive at a script that works, but then you can always create new views or recreate existing ones in less than a second!