Question

I have been developing a .NET string formatting library to assist with localization of an application. It's called SmartFormat and is open-source on GitHub.

One of the issues it tries to address is Grammatical Numbers. This is also known as "singular and plural forms" or "conditional formatting", and here's a snippet of what it looks like in English:

var message = "There {0:is|are} {0} {0:item|items} remaining";

// You can use the Smart.Format method just like using String.Format:
var output = Smart.Format(CultureInfo.CurrentUICulture, message, items.Count);

The English rule, as I'm sure you know, is that there are 2 forms (singular and plural) that can apply to nouns, verbs, and adjectives. If the quantity is 1 then singular is used, otherwise the plural is used.

I am now trying to "broaden my horizons" by implementing the correct rules for other languages! I have come to understand that some languages can have up to 4 plural forms, and it takes some logic to determine the correct form. I would like to expand my code to accomodate multiple languages. For example, I've heard that Russian, Polish, and Turkish, have pretty different rules than English, so that might be a great starting point.

However, I only speak English and Spanish, so how can I determine the correct grammatical rules for many common languages?

Edit: I also would like to know some good non-English "test phrases" for my unit tests here: What are some good non-English phrases with singular and plural forms that can be used to test an internationalization and localization library?

Was it helpful?

Solution

Definitely, different languages have different pluralization rules. Especially interesting could be Arabic and Polish both of which contain quite a few plural forms.

If you want to learn more about these rules, please visit Unicode Common Locale Data Repository, namely Language Plural Rules.

There are quite a few interesting information there, unfortunately some of them are unfortunately wrong. I hope plural forms are correct (at least for Polish they are, as far as I could tell :) ).

OTHER TIPS

It would be nice if you provided in the question body a sample of the rules that you're using, what format do they take?

Anyway, in your example:

var message = "There {0:is:are} {0} {0:item:items} remaining";

you seem to be basing on the assumption that the selection in both choice segments is based on the same single rule, and that there is direct correspondence between the two choices - that is the same single rule would choose (is,item) or (are,items).

This assumption is not necessarily correct for other languages, take for example the fictitious language English-ez (just to make things easier to understand for the reader, I find examples in foreign languages irritating - I'm borrowing from Arabic but simplifying a lot). The rules for this language are as follows:

The first selection segment is the same as normal English:

is: count=1
are: count=0, count=2..infinity

The second selection segment has a different rule from normal English, assume the following simple rule:

item: count=1
item-da: count=2 # this language has a special dual form.
items: count=0, count=3..infinity 

Now the single rule solution would not be adequate - we can suggest a different form:

var message = "There {0:is:are@rule1} {0} {0:item:items@rule2} remaining";

This solution might have problems in other situations, but we are discussing the example you provided.

Check gettext (allows selection of full message to a single level) and ICU (allows selection of full message to multiple levels ie on multiple variables).

The approach you have taken might work on most cases in English and Spanish but most likely fails on many other languages. The problem is that you only have one pattern that tries to solve all grammatical numbers.

var message = "There {0:is|are} {0} {0:item|items} remaining";

You need one pattern for each grammatical gender. Here I have combined two patterns together into a single multi pattern string.

var message = PluralFormat("one;There is {0} item remaining;other;There are {0} items remaining", count);

English uses two grammatical number: singular and plural. one starts singular pattern and other starts plural pattern.

When translated for example to Finnish that uses the same amount of grammatical numbers you would use

"one;{0} kappale jäljellä;other;{0} kappaletta jäljellä"

However Japanese use only one grammatical number so Japanese would only use other. Polish uses three grammatical numbers so it would contains one, few and many.

Secondly you would need the proper rules to choose the right pattern amount multiple patterns. Unicode consortium's CLDR contains the rules in XML file.

I have implemented an open source library that uses CLDR rules (converted from XML into C# code and included into the library) and multi patterns strings to support both grammatical numbers and grammatical genders.

https://github.com/jaska45/I18N

Using this library your samples turns into

var message = MultiPattern.Format("one;There is {0} item remaining;other;There are {0} items remaining", count);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top