Why isn't string.Normalize consistent depending on the context?

https://stackoverflow.com/questions/10529636

07-06-2021
|

Pregunta

I have the following code:

string input = "ç";
string normalized = input.Normalize(NormalizationForm.FormD);
char[] chars = normalized.ToCharArray();

I build this code with Visual studio 2010, .net4, on a 64 bits windows 7.

I run it in a unit tests project (platform: Any CPU) in two contexts and check the content of chars:

Visual Studio unit tests : chars contains { 231 }.
ReSharper : chars contains { 231 }.
NCrunch : chars contains { 99, 807 }.

In the msdn documentation, I could not find any information presenting different behaviors.

So, why do I get different behaviors? For me the NCrunch behavior is the expected one, but I would expect the same for others.

Edit: I switched back to .Net 3.5 and still have the same issue.

Solución

In String.Normalize(NormalizationForm) documentation it says that

binary representation is in the normalization form specified by the normalizationForm parameter.

which means you'd be using FormD normalization on both cases, so CurrentCulture and such should not really matter.

The only thing that could change, then, what I can think of is the "ç" character. That character is interpreted as per character encoding that is either assumed or configured for Visual Studio source code files. In short, I think NCrunch is assuming different source file encoding than the others.

Based on quick searching on NCrunch forum, there was a mention of some UTF-8 -> UTF-16 conversion, so I would check that.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow