My Voice

Few considerations before Internationalizing an Application in Rails

In the today’s era of Globalization, any application developed should be such that it can target user groups across the world. Dealing with the different user groups in different geographical regions means dealing with many different languages to be supported in the same application i.e. support for Internationalization(I18n) or Multilingualization(M17n).

This has been complicated and difficult for a long time to overcome the limitations of existing character encodings. In the late 1980s, several organizations began working on the creation of a global character set.

A global character set should also have the following capabilities:

●      Support multilingual users and organizations

●      Conform to international standards

●      Enable worldwide interchange of data

This global character set is developed, is in wide use and is called Unicode.

Unicode is a universal encoded character set that enables information from any language to be stored using a single character set. Unicode provides a unique hexadecimal code value for every character, regardless of the platform, program, or language.

There can be full or partial support for I18n in the application. We can internationalize a Ruby on Rails application partially in such a way that the application/interfaces could be translated in the preferred language of the user is very easy by using http://guides.rubyonrails.org/i18n.html

All the static content can be localized very easily to give the user an experience of an application in his convenient language.

However, when it comes to supporting storage of data from multiple languages to database, there are several encoding factors to be considered.

First and critical factor to be considered is the language to be supported. Depending on the languages to be supported we can decide the character set support and database. There are several character sets supporting Unicode. There is scope of a number of characters varies. For example. UTF8 has less character support than that of AL32UTF8 or AL16UTF16. UTF8 does not support supplementary characters.

Out of all AL32UTF8 is considered to be more efficient and covers all the languages including Asian languages like Chinese, Japanese and Indian languages like Hindi, Marathi, Kannada, etc.

Now we need to decide about the database. Almost all the latest databases support Unicode with difference of the character sets. Moreover, as per the database support we can decide about the Unicode supported character set.

There can be two cases for databases. If we are building new application then, we will need to create a database with the supporting character set.

If we have an existing running application with data then, we need to migrate the database character set.

There are two alternatives of migrating database character set. Either we can directly alter the character set with the query, or we can create another database with Unicode supporting character set and then copy data from an old database to it.

Altering character set directly involves risk of losing some incompatible data.

Copying database is quite a safer way as there is a backup of data.

We always need to run some routines to check for data replacements or loss involved in the migration. If we are migrating to a superset character set then, there will be almost no loss of data. This way you can plan and complete internationalization successfully.

Leave a Comment

Your email address will not be published. Required fields are marked *

*