Question

I would like to print a string to the screen regardless of its encoding (UTF-8,16,32). This string is represented in a char array so I need to ignore null bytes and carry on printing to stdout; this puts printf family and friends out of the question.

char text[] = { 0x00, 0x55, 0x00, 0x6E, 0x00, 0x69, 0x00, 0x63, 0x00, 0x6F, 0x00, 0x64, 0x00, 0x65 };

fwrite( text, sizeof(char), sizeof(text), stdout );

To this end I've chosen the above solution to give me the ability to print all UTF encoding formats. I understand that certain terminals will not display the characters correctly but that is not my concern as its a configurable option outside of the application.

My application has settings on which message catalogue to load (en_EN.UTF-8, etc..) however I want to avoid having to do string conversion in the code based on the currently selected locale.

Could I please get a review on this approach before I let it go live?

Was it helpful?

Solution

You can't do that. When you deal with text, encoding matters big time. So you must do conversion.

And it is also bad to keep things in a char array, you should use a byte array. Because:

  • If not already defined in some header, you should define (or typedef) byte as unsigned char. Plain char can be signed or unsigned, and you will have surprises.
  • More readable, as it makes the intent clear. I see byte, it is a bunch of bytes. I see char, it is plain text (and in your case, it is obviously not the case)

OTHER TIPS

What if you defined char array in Big-Endian mode and terminal accepts Little-Endian ? Or vice-versa ? I too think, that you can't live without conversion when dealing with char -> Utf thing (only because of endianness). Also its reasonable to make define some

typedef unsigned char  utf8char;
typedef unsigned short utf16char;
typedef unsigned int   utf32char;

And

typedef enum {
   BIG_ENDIAN,
   LITTLE_ENDIAN
} CHAR_ENDIANNESS

In that way you will make conversion to UTF more transparent , debug will be easier and code maintenance will improve too.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top