Question

The picture below explains all:

alt text http://img133.imageshack.us/img133/4206/accentar9.png

The variable textInput comes from File.ReadAllText(path); and characters like : ' é è ... do not display. When I run my UnitTest, all is fine! I see them... Why?

Was it helpful?

Solution 3

I do not know why It works with NUnit, but I open the file with NotePad++ and I see ANSI in the format. Now I converted to UTF-8 and it works.

I am still wondering why it was working with NUnit and not in the console? but at least it works now.

Update I do not get why I get down voted on the question and in this answer because the question is still good, why in a Console I can't read an ANSI file but in NUNit I can?

OTHER TIPS

The .NET classes (System.IO.StreamReader and the likes) take UTF-8 as the default encoding. If you want to read a different encoding you have to pass this explicitly to the appropriate constructor overload.

Also note that there's not one single encoding called “ANSI”. You're probably referring to the Windows codepage 1252 aka “Western European”. Notice that this is different from the Windows default encoding in other countries. This is relevant when you try to use System.Text.Encoding.Default because this actually differs from system to system.

/EDIT: It seems you misunderstood both my answer and my comment:

  1. The problem in your code is that you need to tell .NET what encoding you're using.
  2. The other remark, saying that “ANSI” may refer to different encodings, didn't have anything to do with your problem. It was just a “by the way” remark to prevent misunderstandings (well, that one backfired).

So, finally: The solution to your problem should be the following code:

string text = System.IO.File.ReadAllText("path", Encoding.GetEncoding(1252));

The important part here is the usage of an appropriate System.Text.Encoding instance.

However, this assumes that your encoding is indeed Windows-1252 (but I believe that's what Notepad++ means by “ANSI”). I have no idea why your text gets displayed correctly when read by NUnit. I suppose that NUnit either has some kind of autodiscovery for text encodings or that NUnit uses some weird defaults (i.e. not UTF-8).

Oh, and by the way: “ANSI” really refers to the “American National Standards Institute”. There are a lot of completely different standards that have “ANSI” as part of their names. For example, C++ is (among others) also an ANSI standard.

Only in some contexts it's (imprecisely) used to refer to the Windows encodings. But even there, as I've tried to explain, it usually doesn't refer to a specific encoding but rather to a class of encodings that Windows uses as defaults for different countries. One of these is Windows-1252.

Try setting your console sessin's output code page using the chcp command. The code pages supported by windows are here, here, and here. Remember, fundametnaly the console is pretty simple: it displays UNCICODE or DBCS characters by using a code page to dtermine the glyph that will be displayed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top