Question

Where does glibc get its database of unicode attributes, for such functions as eg, wcwidth()? I'm interested in correcting a few errant entries, but I can't seem to find where this information is in its source distribution.

If it matters, I'm primarily interested in this under debian or ubuntu linux.

Was it helpful?

Solution

Okay, so I'm just poking around myself so I'm not absolutely sure, but it appears that the table you are looking for is found in the following location relative to the glibc root:

localedata/locales/i18n

This appears to be the Unicode (version 5) locale. It contains the following, which is where I believe you need to make your changes:

% ENCLOSED ALPHANUMERICS/
   <U24D0>..<U24E9>;/

In case you're wondering, the function ctype_output (ld-ctype.c) calls allocate_arrays which calls wcwidth_table_init. The function wcwidth_table_init is generated by 3level.h (which also generates other tables that follow the same template). This is the chain that I followed to track down the files in localedate/locales.

Like I said, I'm not 100% sure that this is the right table, but I thought I'd share what I had found.

OTHER TIPS

It looks like the data is generated by the (apparently manually-run) localedata/gen-unicode-ctype.c from the unicode datafiles published at http://unicode.org/Public/UNIDATA/ . Thanks to Naaff for pointing me in the right direction!

I believe that it's defined in the locale definition file. See this page for more information about locales. glibc includes a bunch of locale definitions in localedate/locales, although none of them seem to have any width information.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top