Pregunta

Background:

I'm working on a legacy code of a web application and I'm currently converting some of the ASCII parts of the code to UNICODE. I've run in to the following bug in the logger. it seems that string literals are either created or for some reason corrupted along the way.

Example the following string - "%s::%s - Started with success." In the memory it looks like this.

2AF9BFC   25 00 73 00 3A 00 3A 00  %.s.:.:.
02AF9C04  25 00 73 00 20 00 2D 00  %.s. .-.
02AF9C0C  20 00 53 00 74 00 61 00   .S.t.a.
02AF9C14  72 00 74 00 65 00 64 00  r.t.e.d.
02AF9C1C  20 00 77 00 69 00 74 00   .w.i.t.
02AF9C24  68 00 20 00 73 00 75 00  h. .s.u.
02AF9C2C  63 00 63 00 65 00 73 00  c.c.e.s.
02AF9C34  73 00 2E 00 00 00 00 00  s.......
02AF9C3C  00 00 00 00 00 00 00 00  ........

In the log the string will look as following -_S_t_a_r_t_e_d_ _w_i_t_h _s_u_c_c_e_s_s Where space is represented here as usual and the NULL char is represented by _ (The _ is only an example, different txt editors will show it in a different way).

I do use the _T macro which is replaces the string to be Unicode from what I learn here.

Why do I get the byte 0 prefix?

¿Fue útil?

Solución

In Microsoft's terminology, "Unicode" means UTF-16 i.e. each character is represented by either one or two 16-bit code units. When an ASCII character is converted to a UTF-16, it will be represented as a single code unit with the high byte zero and the low byte containing the ASCII character.

If you want your log file to be readable as ASCII you need to convert your text to UTF-8 when writing it out. Otherwise, make sure that all text in the log file is UTF-16 and use a log file reader that understands UTF-16, but note that you'll waste up to 50% space if most of your text is ASCII (since every second byte will be 0).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top