charset conversion with icu or iconv

https://stackoverflow.com/questions/8198860

04-03-2021
|

Question

In my CGI library, I'm using a converter in order to convert from a IANA-registered charset to native wide unicode (UTF-16/32, depending on platform). With ICU, are all the sets and aliases listed in http://www.iana.org/assignments/character-sets allowed for input to ucnv_open or is a manual cross-reference mapping needed, as with iconv, where I basically map each alias to the respective iconv encoding name? As much as iconv is nice and simple, it requires the use of a table to map preferred mime type and alias to iconv's built-in type (including mapping ISO-8859-x[EI]* bidi to their "standard" ISO-8859 form)?

Or is there a way to force all form input to be ISO-8859-x / UTF-8 to simplify the amount of conversion work needed?

see rfc 1556

Solution

RTFMing the HTML specs, accept-charset="..." does the job, hence this is useless overhead.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow