Bests practices for localized texts in C++ cross-platform applications?

https://stackoverflow.com/questions/403747

03-07-2019
|

Question

In the current C++ standard (C++03), there are too few specifications about text localization and that makes the C++ developer's life harder than usual when working with localized texts (certainly the C++0x standard will help here later).

Assuming the following scenario (which is from real PC-Mac game development cases):

responsive (real time) application: the application has to minimize non-responsive times to "not noticeable", so speed of execution is important.
localized texts: displayed texts are localized in more than two languages, potentially more - don't expect a fixed number of languages, should be easily extensible.
language defined at runtime: the texts should not be compiled in the application (nor having one application per language), you get the chosen language information at application launch - which implies some kind of text loading.
cross-platform: the application is be coded with cross-platform in mind (Windows - Linux/Ubuntu - Mac/OSX) so the localized text system have to be cross platform too.
stand-alone application: the application provides all that is necessary to run it; it won't use any environment library or require the user to install anything other than the OS (like most games for example).

What are the best practices to manage localized texts in C++ in this kind of application?

I looked into this last year that and the only things I'm sure of are that you should use std::wstring or std::basic_string<ABigEnoughType> to manipulate the texts in the application. I stopped my research because I was working more on the "text display" problem (in the case of real-time 3D), but I guess there are some best practices to manage localized texts in raw C++ beyond just that and "use Unicode".

So, all best-practices, suggestions and information (cross-platform makes it hard I think) are welcome!

Solution

At a small Video Game Company, Black Lantern Studios, I was the Lead developer for a game called Lionel Trains DS. We localized into English, Spanish, French, and German. We knew all the languages up front, so including them at compile time was the only option. (They are burned to a ROM, you see)

I can give you information on some of the things we did. Our strings were loaded into an array at startup based on the language selection of the player. Each individual language went into a separate file with all the strings in the same order. String 1 was always the title of the game, string 2 always the first menu option, and so on. We keyed the arrays off of an enum, as integer indexing is very fast, and in games, speed is everything. ( The solution linked in one of the other answers uses string lookups, which I would tend to avoid.) When displaying the strings, we used a printf() type function to replace markers with values. "Train 3 is departing city 1."

Now for some of the pitfalls.

1) Between languages, phrase order is completely different. "Train 3 is departing city 1." translated to German and back ends up being "From City 1, Train 3 is departing". If you are using something like printf() and your string is "Train %d is departing city %d." the German will end up saying "From City 3, Train 1 is departing." which is completely wrong. We solved this by forcing the translation to retain the same word order, but we ended up with some pretty broken German. Were I to do it again, I would write a function that takes the string and a zero-based array of the values to put in it. Then I would use markers like %0 and %1, basically embedding the array index into the string. Update: @Jonathan Leffler pointed out that a POSIX-compliant printf() supports using %2$s type markers where the 2$ portion instructs the printf() to fill that marker with the second additional parameter. That would be quite handy, so long as it is fast enough. A custom solution may still be faster, so you'll want to make sure and test both.

2) Languages vary greatly in length. What was 30 characters in English came out sometimes to as much as 110 characters in German. This meant it often would not fit the screens we were putting it on. This is probably less of a concern for PC/Mac games, but if you are doing any work where the text must fit in a defined box, you will want to consider this. To solve this issue, we stripped as many adjectives from our text as possible for other languages. This shortened the sentence, but preserved the meaning, if loosing a bit of the flavor. I later designed an application that we could use which would contain the font and the box size and allow the translators to make their own modifications to get the text fit into the box. Not sure if they ever implemented it. You might also consider having scrolling areas of text, if you have this problem.

3) As far as cross platform goes, we wrote pretty much pure C++ for our Localization system. We wrote custom encoded binary files to load, and a custom program to convert from a CSV of language text into a .h with the enum and file to language map, and a .lang for each language. The most platform specific thing we used was the fonts and the printf() function, but you will have something suitable for wherever you are developing, or could write your own if needed.

OTHER TIPS

GNU Gettext does it all.

I strongly disagree with the accepted answer. First, the part about using static array lookups to speed up the text lookups merely shows that the person doing the optimization is very inexperienced - Calculating the layout for said text and rendering said text uses 2-4 orders of magnitude more time than a hash lookup. If anyone wanted to implement their own language library it should never be based on static arrays.

But that isn't really relevant, because writing your own language library to use in your own game is even worse than pointless premature optimization. There are some extremely good reasons to never write your own localization library:

Planning the time to use a localization library is much easier then planning the time to write a localization library. Localization libraries exist, they work, and many people have used them.
Localization is tricky, so you will get things wrong. Every language adds a new quirk, which means whenever you add a new language to your own homegrown localization library you will need to change code again to account for the quirks. Did you know that some languages have more than 2 plural forms, depending on the number of items in question? More than 2 genders (more than 10, even)? Also, the number and date formats vary a lot between different in many languages.
When your application becomes successful you will want add support for more languages. Languages nobody on your team speaks fluently. Hiring someone to write a translation will be considerably cheaper if they already know the tools they are working with.

A very well known and complete localization library is GNU Gettext, which uses the GPL, and should therefore be avoided for commercial work. You can instead use the boost library boost.locale which works with Gettext files, and is free to use and modify for commercial and non-commercial projects of any kind.

There won't be any additional features in the C++0x standard, as far as I can tell. I suspect the Committee considers this a matter for third-party libraries.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow