Question

NOTE: I accepted an answer because it is helpful and I don't expect to get another one any time soon, but the question is still not completely answered so I may award a bounty to anyone who does. I guess what I'm looking for is a kind of flowchart which decides whether a given font supports a given language.

I am trying to put together a set of fonts. I need to know which fonts can be used for which languages.

I have a rough knowledge of character sets (Latin, Cyrillic, Arabic) but not enough to classify, for example, Polish diacritics into the scheme of things.

I guess there are two ways to approach my problem:

  1. Have a test character set for each language.
  2. Have a cheat-sheet which says "such-and-such language requires Latin Extended B" and a tech trick to check whether a given font contains those glyphs.

I don't have good resources for option 2. I'm looking for a labour-saving solution. The eventual number of fonts and languages is unknown at this point and I don't want an O(M*N) task. I will probably have to perform option 1 as a verification step, but I want to reduce the search space first.

Can anybody show me how to group languages by character set?

Are there any gotchas I should know about?

Was it helpful?

Solution

Nice question, though it's a bit generic.

Requirement number 0 is, as @Paweł Dylda says, to use UTF-8 for everything. If you already understand that it's a must these days, then it's fine; for some reason a lot of people still don't.

Another meta-tip is to have your application know very clearly which language is it displaying. For example, in HTML, use lang and dir attributes everywhere. If it's not HTML, make some kind of a global variable that tells the application that it's displaying language X, the preferred font for which is Y. You may also need to have clear separation between the language of the user interface and the language of the content. (To see an example of that, go to English Wikipedia, open an account, then go to the preferences and choose French as your language - and you'll see articles in English and menus in French. Quite a lot of people find this comfortable, and it's not hard at all to implement.)

Then you need to understand which languages are you targeting. If you want to target all of them, then it's really great, though it may be challenging.

For Latin you most likely don't have to work too hard to break up languages into groups like "Western", "Eastern European", "Southern European", "Turkish", "Vietnamese" etc. This is done in common web browsers and word processors, but actually this approach is very outdated. You can find a good font that covers German, French, Polish, Turkish, and even Vietnamese and African languages, which use a lot of diacritics. Try SIL's Gentium and Doulos and also GNU Free Fonts. They are all free.

Theoretically, the same could be said for the Arabic script, but apparently, Arabic, Persian and Urdu have somewhat different requirements, even though they all use the same writing system. In general, you may have to use a generic Arabic font for the Arabic language, and provide a different font for Urdu (for example, Nafees). Here, too, test with native speakers.

For Cyrillic, do your best to use a font that includes not just Russian letters, but also Ukrainian, Serbian and Kazakh, because countries where these languages are spoken require good support for them. That's the bare minimum for Cyrillic, but you would do yourself and a lot of other people a favor by finding a font that also supports other languages of Russia such as Sakha and Abkhazian. GNU Free Font may serve you here, too, but yet again - test it with people.

Languages of India are a huge trouble - there are a lot of writing systems and fonts there. The good news are that Linux distributions such as Fedora and Ubuntu include fonts for most of them, and they are free for reuse in other applications. The Lohit family covers most languages of India; take a look also at Meera and Rachana for Malayalam.

I don't know much about South East Asian languages like Thai, Burmese, Khmer and Lao, but do try to support these, as well. Most operating systems since Windows NT 4 support Thai well, but support for the other languages is very patchy, so assume that the OS doesn't help you here.

My last tip is to take a look at the jquery.webfonts library, and the corresponding MediaWiki extension "Universal Language Selector" (a.k.a ULS): * https://github.com/wikimedia/jquery.webfonts * https://www.mediawiki.org/wiki/Extension:UniversalLanguageSelector

It offers portable technology for easy addition of webfonts to your web application. If you aren't developing a web application, or cannot use the library for other reasons, you can still take the fonts that are found in the repository there - they cover a lot of languages and they are all free.

(Disclaimer: I am involved in developing these libraries.)

OTHER TIPS

I found Cyberbit font that covers a lot of languages.

Bitstream Cyberbit is a professionally-designed large Unicode font which provides coverage of many major scripts, including Latin, extended Latin, Greek, Russian, Hebrew, Arabic, Thai, Japanese (Hiragana, Katakana, and Kanji), Korean, and Chinese Hanzi (ideographs).

Here's the link: cyberbit font

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top