Question

I have been developing a parser that takes JavaScript as input and creates a compressed version of that JavaScript as output.

I found initially that the parser failed when attempting to read the input JavaScript. I believe this has something to do with the fact that Visual Studio 2008 saves its files by default as UTF-8. And when doing so, VS includes a couple of hidden characters at the start of the UTF-8 file.

As a workaround, I used Visual Studio to save the file as code page 1252. After doing so, my parser was able to read the input JavaScript.

Note that I need to use special European characters that include accents.

So, here are my questions:

  1. Should I use code page 1252 or UTF-8?
  2. Why does Visual Studio save files as UTF-8 by default?
  3. If I choose to save files as 1252 will that lead to problems?
  4. It appears to me that Eclipse saves files as code page 1252 by default. Does that sound right?
Was it helpful?

Solution

UTF-8 is a better option as it really support all known characters, while with 1252 you might end up with characters that you need missing from it (even in European languages).

Apparently, VS2008 saves UTF-8 with a byte order mark - it should be possible to either switch that off, or have the parser recognize it, or strip the BOM somewhere in between.

OTHER TIPS

utf-8 has byte order mark (BOM) signature at the beginning of a file which some editors, and obviously libraries don't understand... http://en.wikipedia.org/wiki/Byte-order_mark

if you can get around it, UTF-8 is preferred today by all means. try stripping that first bytes of BOM before giving the JS code to that parser, or find an option in your IDE if it can not write that

1252 doesn't cause this issue and you won't have problems with it, but you'll output your web in an outdated format, i wouldn't do it today, there was a lot of encoding mess on the web in the past with iso vs. win codepages for different languages...

Use UTF-8. 1252 does not cover whole Europe, so in some countries (central Europe) you should use 1250, or more correctly - iso 8859-2. So the only real option is UTF-8.

Using 1252 will cause issues?

Depends on the countries you app needs to work in

From the Top of my head, 1252 (or ISO 8859-1) will work in

  • UK
  • Germany
  • Switzerland
  • Austria
  • Italy
  • France
  • Netherlands
  • Iceland
  • Spain

Oh, Wikipedia has a more comprehensive List: http://en.wikipedia.org/wiki/ISO/IEC_8859-1

So you can use CP 1252 if your app is only used in the mentioned countries/languages.

BOM was at the start of the file. IMHO you should use utf8, its very current nowadays.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top