문제

I have a program that inputs text and sorts through it using a number of functions and the text should be readable regardless of the format, however, when a file saved to the Extended ASCII encoding is imported, any characters over 127 are ignored. Looking around, I can't seem to see how to overcome this. The files are read fine in UTF-8 and Unicode. I've tried converting the strings to UTF-8, but the letters in question still just come up as question-mark like shapes instead. I can see that the values are correct: 0xBF for û, but they aren't being interpreted as value.

Can anyone help me here, I've not done lots of work with this sort of thing before. I'm working in C# if that helps.

My current code for converting looks like this:

System.Text.UTF8Encoding u = new System.Text.UTF8Encoding();
byte[] asciiBytes = Encoding.UTF8.GetBytes(sd);
sd = u.GetString(asciiBytes);

Where sd is the string. When I import this string, I do not specify the text encoding:

string input = File.ReadAllText(fname);
...
parser(input);
도움이 되었습니까?

해결책

I can see that the values are correct: 0xBF for û

That is not the utf-8 encoding for û, that would be a two byte sequence, 0xC3 + 0xBB. Clearly you guessed the file encoding wrong. The encoding for that character in Windows code page 1252, common in Western Europe and the Americas is 0xFB. Common in the UK as well, your country of residence. Did you reverse the digits?

Use Encoding.Default instead.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top