Logic behind converting a character to UTF-8

Question

For starters, the code doesn't work, in general. By coincidence, it works if the encoding in char (or unsigned char) is ISO-8859-1, because ISO-8859-1 has the same code points as the first 256 Unicode code points. But ISO-8859-1 has largely been superceded by ISO-8859-15, so it probably won't work. (Try it for 0xA4, for example. The Euro sign in ISO-8859-15. It will give you a completely different character.)

There are two correct ways to do this conversion, both of which depend on knowing the encoding of the byte being entered (which means that you may need several versions of the code, depending on the encoding). The simplest is simply to have an array with 256 strings, one per character, and index into that. In which case, you don't need the if. The other is to translate the code into a Unicode code point (32 bit UTF-32), and translate that into UTF-8 (which can require more than two bytes for some characters: the Euro character is 0x20AC: 0xE2, 0x82, 0xAC).

EDIT:

For a good introduction to UTF-8: http://www.cl.cam.ac.uk/~mgk25/unicode.html. The title says it is for Unix/Linux, but there is very little, if any, system specific information in it (and such information is clearly marked).