Question

I'm running doxygen (1.5.8) on a C# project off of Visual Studio 2008 on a Windows machine running XP. While generating the latex code, some illegal sequences are included. It always involves the following sequence: "" (a latin-I with a dieresis, something like the binary shift operator, and a Spanish open-question-mark). I've seen it happen in the context "using {\bf System}", but maybe there are others.

The generated latex file reads

    \begin{CompactItemize}
    \item 
    using {\bf System}
    \end{CompactItemize}

While the source is simply:

using System;
using System.Collections.Generic;
using System.Linq;

Some strange Windows BOF character? It seems it's only before the using System; directive (the first of each file).

Was it helpful?

Solution



That's an ISO-8859-1 representation of the UTF-8 encoded character U+FEFF, the BYTE ORDER MARK. The BOM is intended for use as the first code point in UTF-16 files and should not be used in UTF-8 files, but there are some very stupid tools that produce it by default, unfortunately. And if you are creating files by concatenating bits of text from other files you can even end up with BOMs in the middle of your document.

Find the editor that is saving files as “UTF-8 with BOM” and burn it.

ETA re updated question:

using {\bf System}

While the source is simply:

using System;

Check that source in a hex editor for a hidden faux-BOM before the ‘using’.

OTHER TIPS

It looks like a character encoding problem to me.

That three-character sequence is the Unicode byte-order mark 0xfeff encoded in UTF-8, although I'm not sure why the byte-order mark would be showing up in the middle of your documentation... that could be significant or it could just be a coincidence.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top