Trying to show typical encoding/decoding error between MacRoman and Latin1

Question

“Latin1” is a vague term and may refer to ISO Latin 1 (ISO 8859-1) or to Windows Latin 1 (windows-1252). The difference is that in ISO Latin 1, bytes 0x80 to 0x9F are designated as control characters (rarely used), whereas in Windows Latin 1, most of them are defined as graphic characters (punctuation and some non-Ascii Latin letters) and a few left undefined.

When you take e.g. the letter “é” and Latin1 encode (in either Latin1 encoding) it, you get the byte 0xE9. If you then interpret this byte as MacRoman encoded, as you seem to be doing, you get the “È” character. That’s why you get “condamnÈ”.

But if you take the letter “é” as MacRoman encoded, it’s 0x8E. When interpreting this byte as Latin1 data, the Latin1 encodings differ. In ISO Latin 1, it is the control character SINGLE SHIFT TWO (U+008E); in Windows Latin 1, it’s “Ž” LATIN CAPITAL LETTER Z WITH CARON (U+017D). Obviously, your code treats Latin1 as ISO Latin 1. Since U+008E has normally no meaning assigned to it in most programs, it is typically ignored in rendering, but in this case appatently displayed as a space.

The other cases are similar: MacRoman “à” is 0x88 and MacRoman “ê” is 0x90, both falling into the control character are in ISO 8859-1.