Supporting multiple languages in your application – a simple gettext step-by-step example

First of all let’s create a small exemplary code, which should output some meaningless messages and save it in the file named “test_i18n.c”:

#include <libintl.h>
#include <locale.h>
#include <stdio.h>

#define PACKAGE "test_i18n"
#define LOCALEDIR "./"
#define _( str ) gettext( str )


void main( void ) 
{
  setlocale( LC_ALL, "" );
  bindtextdomain( PACKAGE, LOCALEDIR );
  textdomain( PACKAGE );

  printf( "msg: \"%s\".\n", _( "starting a new test" ) );
  printf( "msg: \"%s\".\n", gettext( "testing i18n" ) );
}

In the case you decide to translate your application with GNU gettext the code you need to include in each of your project is almost the same: a) The two header files from lines 1-2. The latter is needed to reset the locale category for all of the locale (LC_ALL). b) The three function calls from lines 12-14. The good thing is that you do not need to link against the implementation of libintl.h, when compiling with gcc, because it is already included in glibc. So, you can easily compile this example with:

$ gcc -o test_i18n test_i18n.c

Once compiled, the result can be tested even if no translation is done yet. The output of the just written example should be:

$ ./test_i18n 
msg: "starting a new test".
msg: "testing i18n".

At this state we can begin with the translation of our example in other desired languages. For this purpose we use the tool called “xgettext”, which will scan the source code and extract all the strings from there needed to be translated. All the string are the arguments of the function “gettext”. Additionally, one can pass an option like “–keyword=_” to xtettext denoting the abbreviation used instead of gettext function, introduced in line 7 of test_i18n.c file. Here is an example how to use it:

$ xgettext --package-name test_i18n --package-version 0.1 --default-domain test_i18n --keyword=_ --output=test_i18n.pot test_i18n.c

All the extracted strings will be saved in the file with the ending .pot – a so-called portable object template. Based on this template we can create translation files by coping it and translating the strings. Before doing so, it’s important to switch to the UTF-8 encoding. This is easily done by replacing CHARSET with UTF-8 in the “test_i18n.pot”.

$ sed --in-place test_i18n.pot --expression='s/CHARSET/UTF-8/'

Afterwards the pot file should have a similar look:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: test_i18n 0.1\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2012-03-01 21:24+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: test_i18n.c:16
msgid "starting a new test"
msgstr ""

#: test_i18n.c:17
msgid "testing i18n"
msgstr ""

Now we are ready to create a translation (test_german.po file) for a german locale (de_DE) by invoking the “msginit” tool and pointing it to the above mentioned template file as input.

$ msginit --no-translator --locale de_DE --output-file test_german.po --input test_i18n.pot

After providing the appropriate translation for each empty message string (msgstr “”) the test_german.po file should look similar to the following one:

# German translations for test_i18n package
# German messages for test_i18n.
# Copyright (C) 2012 THE test_i18n'S COPYRIGHT HOLDER
# This file is distributed under the same license as the test_i18n package.
# Automatically generated, 2012.
#
msgid ""
msgstr ""
"Project-Id-Version: test_i18n 0.1\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2012-03-01 21:24+0100\n"
"PO-Revision-Date: 2012-03-01 21:24+0100\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

#: test_i18n.c:16
msgid "starting a new test"
msgstr "Man startet einen neuen Test"

#: test_i18n.c:17
msgid "testing i18n"
msgstr "Teste I16g"

In contrast to a real application where the translation data is stored in the proper place, we will save the compiled binary data of translation locally. To do so requires a construction of the following directory structure:

$ mkdir --parents ./de_DE.utf8/LC_MESSAGES

We have already have chosen to do so by defining the locale directory to be in our working directory “./” in line 6 in the C source code and calling the function “bindtextdomain” in line 13 afterwards.
It’s time to create a MO file (machine object), which is a binary message catalog of a textual translation description, better suitable for an application execution.

$ msgfmt --check --verbose --output-file ./de_DE.utf8/LC_MESSAGES/test_i18n.mo test_german.po

Note here, that the name of the binary output file was given as was defined in line 5, respectively.
Congratulations, that’s it! Before testing it, let’s add support for one more language, for example russian. Doing so, just requires to repeat the steps were described above. Starting with:

$ msginit --no-translator --locale ru_RU --output-file test_russian.po --input test_i18n.pot

And obtainging the resulted translation file:

# Russian translations for test_i18n package.
# Copyright (C) 2012 THE test_i18n'S COPYRIGHT HOLDER
# This file is distributed under the same license as the test_i18n package.
# Automatically generated, 2012.
#
msgid ""
msgstr ""
"Project-Id-Version: test_i18n 0.1\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2012-03-01 21:24+0100\n"
"PO-Revision-Date: 2012-03-01 21:24+0100\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: ru\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n"
"%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

#: test_i18n.c:16
msgid "starting a new test"
msgstr "начинается новый тест"

#: test_i18n.c:17
msgid "testing i18n"
msgstr "тестируется и12я"

Now, finally, it’s time to test everything. To locally simulate the change of locale language one could overwrite the variable “LANG”, as shown below:

$ LANG="de_DE.utf8" ./test-i18n
msg: "Man startet einen neuen Test".
msg: "Teste I16g".

$ LANG="ru_RU.utf8" ./test-i18n
msg: "начинается новый тест".
msg: "тестируется и12я".

What is if something went wrong?! And the displayed strings are not translated. The first thing need to be checked is the list of available locales on our system:

 $ locale -a
C
POSIX
de_DE
de_DE.iso88591
de_DE.iso885915@euro
de_DE.utf8
de_DE@euro
deutsch
...

If you wish to extent the locale list, one possible way would be to modify the “/etc/locale.gen” file and executing “locale-gen” to generate them. More information can be found, here and here. If all was fine, then you could try to figure out if your launched application was not able to locate the translation files, using the “strace” tool:

$ LANG="de_DE.utf8" strace -e trace=open ./test_i18n
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/lib64/libc.so.6", O_RDONLY)      = 3
open("/usr/lib64/locale/locale-archive", O_RDONLY) = 3
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
open("/home/dev/test-i18n/.//de_DE.utf8/LC_MESSAGES/test_i18n.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/dev/test-i18n/.//de_DE/LC_MESSAGES/test_i18n.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/dev/test-i18n/.//de.utf8/LC_MESSAGES/test_i18n.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/dev/test-i18n/.//de/LC_MESSAGES/test_i18n.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
msg: "starting a new test".
msg: "testing i18n".

As was shown above, something was mixed up and the file “test_i18n.mo” could not be located in none of the listed directories. Fixing it will result a fully working example!

$  LANG="de_DE.utf8" strace -e trace=open ./test-i18n
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/lib64/libc.so.6", O_RDONLY)      = 3
open("/usr/lib64/locale/locale-archive", O_RDONLY) = 3
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
open("/home/dev/test-i18n/.//de_DE.utf8/LC_MESSAGES/test_i18n.mo", O_RDONLY) = 3
open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
msg: "Man startet einen neuen Test".
msg: "Teste I16g".

One thought on “Supporting multiple languages in your application – a simple gettext step-by-step example

  1. Finally a complete working gettext exemple. For some reason my version does not look into ./fr_FR.utf8 but simply in ./fr.
    Thanks a lot!

Leave a comment