Question

I have just started developing a small web app for Arabic speakers, and I will need to ask a co-worker (not a programmer) to help me translate all of the labels and documentation into Arabic at some point. Ideally, labels would allow arbitrary punctuation and line breaks, but we can set some rules to keep things practical.

My first thought is to have a translations directory, where each label is a set of text files that are named after the label string's parameter name, with a suffix to represent the language. Is there a standard way of doing this that is roughly as flexible and user-friendly as what I have in mind?

Example:

translations/
    submit.en-ca
    submit.ar
    cancel.en-ca
    cancel.ar
    instructions.en-ca
    instructions.ar
Was it helpful?

Solution 3

It turns out that translation is a fair bit more complex than I realized. Plurals don't work quite the same way in Arabic as they do in English, every noun has a gender, and some translations are supposed to be different depending on whether the speaker is male or female. What a mess.

After a bit more research, I have discovered Gnu gettext and what appears to be a de-facto standard file type: .po files. The cross platform poeditor looks like a tool that I can expect professional translators to be familiar with, and my hopefully helpful co-worker should be able to run it on her Mac.

There are Javascript parsers, jed for example, and a gulp module (gulp-po-json) to convert .po files to json at build time.

OTHER TIPS

I have used CSV. It works well because:

  • Translators can conveniently edit short phrases using a spreadsheet app such as Excel or Google Drive

  • CSV is easy to translate back and forth into whatever format I must use or decide to use internally in the application

I have no idea if CSV is the simplest format to use. Also, I suspect that the simplest format to use for editing and the simplest format to use in the application might not be the same.

You have some flexibility in how you structure your CSV file, but a basic approach is to use three columns with

  • Column 1 == the label name / key

  • Column 2 == the English phrase

  • Column 3 == an empty cell where the translator will put their translated phrase

You should probably learn from a successful example, like Microsoft ResX, the .NET way. It is XML, so its easy to manage with code, or by hand. The XML schema is very simple, not crazy at all.

http://msdn.microsoft.com/en-us/library/xbx3z216.aspx

I'd also avoid editing them by hand (especially if they require encoding) and provide your co-worker a friendly editor for whatever format you decide on, so I'd stick with a standard format like this and reuse one of the editors out there.

For me, .resx is workable, but you can also use the more traditional format, .resource.

A quick Google shows some free resx editors. Lutz (Roeder) has something I would expect to be of decent quality (given he wrote Reflector)

http://www.lutzroeder.com/dotnet/

Or this one, looks oriented towards translation end-user

http://sourceforge.net/projects/resx/

Or use Visual Studio Express (free for your co-worker), or hack one up yourself using the schema (Microsoft ResX Schema v2.0)

The main thing you'll be manipulating will be data elements. You can store text or images or other objects in a resx.

<data name="navbarImageList.ImageStream" 
             mimetype="application/x-microsoft.net.object.binary.base64">
  <value>
    AAEAAAD/////AQAAAAAAAAAMAgAAAFdTeXN0ZW0uV2luZG93cy5Gb3JtcywgVmVyc2lvbj00LjAuMC4w
    ...
  </value>
</data>

Secondly, I recommend you create less granular resource files, like one per form or screen or page, or even one per app if not that large. It'll be friendlier to manage.

Licensed under: CC-BY-SA with attribution
scroll top