How can one find the Unicode codepoints that a font has glyphs for, on a Debian-based system?

https://stackoverflow.com/questions/15896493

02-04-2022
|

Question

From a scripting language (Python or Ruby, say) on a Debian-based system, I would like to find either one of:

All the Unicode codepoints that a particular font has glyphs for
All the fonts that have glyphs for a particular Unicode codepoint

(Obviously either 1 or 2 can be derived form the other, so whatever is easier would be great.) I have done this in the past by running:

fc-list : file charset

... and parsing the output at the end of each line, based on this code from fontconfig but it seems to me that there ought to be a much simpler way of doing this.

(I'm not completely sure this is the right StackExchange site for this question, but I am looking for an answer that can be used programmatically.)

Solution

I would try any of the FreeType 2 language bindings. Here's a Perl solution to list the Unicode code points of a font using Font::FreeType:

use Font::FreeType;
Font::FreeType->new->face('DejaVuSans.ttf')->foreach_char(sub {
    printf("%04X\n", $_->char_code);
});

OTHER TIPS

I've recently listed the mapping from unicode codepoints to glypths in a TTF using TTX/FontTools. That tool is written in Python, so it matches the Python tag in your post. The command

ttx -t cmap foo.ttf

will generate an XML file foo.ttx which describes that mapping, for various environments and encodings. See e.g. this reference for a description of what the platform and encoding identifiers actually mean. I assume that the package can be used as a library as well as a command line tool, but I have no experience there.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow