Question

My question is in the context of Code::Blocks and its tweaked version of MinGW, and Notepad++ .

I want to be able to include Unicode literals in my source, and I can, so long as I use UTF-8 and not use a BOM.

This works fine, up to a point, but it BOMs out (bad pun) whenever I reopen the file; it (not surprisingly) has this un-nerving side-effect of displaying the Unicode in its ANSI form. :(

Those very useful and yet very annoying three bytes have to be there, and then they have to go! (at compile time).

It sounds easy enough, just preprocess the source file(s), and discard the first three bytes (if they are a UTF-8 BOM)...

I'm certainly not going to be the processor (by manual removal) each time I compile, so I've even resorted to using BOM-less #include files for these literals, but this is problematic from several perspectives, not the least of which is that it is a pain in the proverbial, and I can't "see" them! ..without a lot of juggling.

Is there some way I can tap into the toolchain with a custom preprocessor? ...or if I have missed some obvious solution, I'd very much appreciate hearing about it.

Was it helpful?

Solution 2

I've fossicked around a bit more, and I've worked out a tentative solution. I'm not completely happy with it because it involves modifying the source, whereas I was actually looking for a piped solution, but it seems that g++.exe only accepts command line args (please correct me if I'm wrong).

My "solution" is a bit rough-and-ready, but it works, and is certainly better (for me) than any other viable solution I've come across (which is none!) It requires due attention be paid to your editor's "File has been externally modified" message-box (if the file is being edited), but in fact, the BOM is still in the editor, so it is somewhat of a moot point.

It is a simple command line hack. I'd prefer a more-integrated option, but here is this one (and it works):

In Codeblocks, go to: Settings -> Compiler and Debugger -> Other settings -> [Advanced options] -> Command line macro:

Make these mods to the command line. They should all be on a single line (of course), but for clarity I've seperated them out:

cmd /c DropTheBOM.exe $file
& $compiler $options $includes -c $file -o $object // (use your compiler cmdline)
& MakeTheBOM.exe $file
// Write your own utils, or try here: http://code.google.com/p/utf-bom-utils/

PS: #include files are not stripiped of their BOM (if they have one).. A simple BOM y/n arg switch for the routine which #includes these files would solve this issue quite simply... (but it is only a Windows problem... maybe thats why it hasn't been catered for... or has it? Does anyone know?

OTHER TIPS

You might want to consider externalising all your string literals to a separate file anyway and using a loadLit() function (or similar) to get them at runtime.

This will allow you to have a single file (with a BOM) containing all your string literals and will make your life a lot easier if you ever have to internationalise your application.

We do that with our stuff but keep in mind our class 1 programs have to be i18n'ed for 21 different locales so we save a lot of work by doing it this way :-) Your mileage may vary.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top