Finding UnicodeBlock set for a given Locale

https://stackoverflow.com/questions/7216942

14-01-2021
|

Question

I'm currently trying to figure out how to get a Character.UnicodeBlock set for a given Locale. Languages need differents characters from one to another.

What I'm exactly trying to achieve is having a String containing every character needed to write in a specific language. I can then use this String to precompute a set of OpenGL textures from a TrueTypeFont file, so I can easily write any text in any language.

Precaching every single character and having around 1000000 textures is of course not an option.

Does anyone have an idea ? Or does anyone see a flaw in this procedure ?

Solution

It's not as simple as that. Text in most European languages can often be written with a simple set of precomposed Unicode characters, but for many more complex scripts you need to handle composing characters. This starts fairly easily with combining accents for Western alphabets, progresses through Arabic letters that are context-sensitive (they have different shapes depending on whether they are first, last, or in the middle of a word), and ends with the utter madness that is found in many Indic scripts.

The Unicode Standard has chapters about the intricacies involved in rendering the various scripts it can encode. Just sample, for example, the description of Tibetan early in chapter 10, and if that doesn't scare you away, flip back to Devanagari in chapter 9. You will quickly drop your ambition of being able to "write text in any language". Doing so correctly requires specialized rendering software, written by experts deeply familiar with the scripts in question.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow