How can I determine font and layout information for a unicode character?

https://stackoverflow.com/questions/22994848

01-07-2023
|

Question

I want to render Unicode characters in an application and I have a rough idea of how I can do that for standard latin characters with freetype. However for other languages that have different layouts and shaping I'm not sure how to go from a set of characters I get in a UTF-8 encoded string to:

Picking a suitable font to display the characters
Picking the right layout for the characters (LTR, RTL, TTB)

Is this data contained in the unicode characters themselves (I'm not sure how else applications like web browsers would figure out how to render text)?

For a given Unicode character, how can I determine points 1 and 2? Freetype has some great documentation and talks quite a bit about using different layouts, but I didn't see how you would go about extracting said information from the characters themselves.

I also took a quick look at Harfbuzz but couldn't really find any documentation. There's an example floating around that shows how to set up and use Harfbuzz to layout some languages with Freetype rendering the glyphs, but the example explicitly passes layout, font and language information to Harfbuzz.

What do you do when you don't know those things in advance?

This is for a mobile application, and ideally the libs/solutions used would have a permissive license.

Solution

The Unicode character code point only encodes the character itself; it gives no information with regards to the font to use, nor layout, nor in fact anything else. To get information concerning layout, etc., Unicode provides a number of files, such as UnicodeData.txt, which you can download and use. As for the fonts, each font should provide descriptor files of some sort, with things like the width, height and depth of each character; these files can also be used to determine whch characters the font supports.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow