FileStream Seek/ReadByte seems to reverse byte order of file

https://stackoverflow.com/questions/19054453

29-06-2022
|

题

I don't understand the results i'm getting from the hacked about code below, can someone explain. only happens when reading a UNICODE encoded text file.

fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);

// read from start
byte[] lne = new byte[100];
int actual = fs.Read(lne, 0, lne.Length);
string line = Encoding.Unicode.GetString(lne, 0, actual); // ok readable stuff as expected
string line1 = Encoding.BigEndianUnicode.GetString(lne, 0, actual); // fail as expected

// move down into the file
fs.Seek(-150, SeekOrigin.End);
fs.ReadByte(); // take this out, works ok!

lne = new byte[100];
actual = fs.Read(lne, 0, lne.Length);
line = encoding.GetString(lne, 0, actual); // fail non readable stuff - NOT EXPECTED
line1 = Encoding.BigEndianUnicode.GetString(lne, 0, actual); // SUCCESS, readable - huh!

Obviously the code isn't "real world" , its just a breakdown of what my real code is doing.

after the first Encoding.Unicode.GetString I can see good readable data in the variable 'line', and crappy data in 'line1' as expected.

After the second Encoding.Unicode.GetString I see complete crap (japenese/chinese i don't know), but line1 now contains readable data thats come from the file.

If I take out the ReadByte everything works as expected.

Anyone any ideas why this is happening.

TIA.

解决方案 2

Unicode strings are 2 bytes, and for ASCII strings looks like

0x41, 0, 0x42, 0, 0x43, 0 ...  // {ASCII code for A}, 0,...

So if you read bytes in opposite order (BigEndianUnicode) you get nonsense characters. String above read as 0x4100, 0x4200, 0x4300 ... instead of 0x0041,...

Similar happens when you start reading at odd offset (your seeking from end of file code) - bytes with ASCII text look like:

0, 0x41, 0, 0x42, 0, 0x43 ...

which are read as 0x4100, 0x4200, 0x4300...

ReadByte takes out first 0 so you are reading from beginning of character, not the middle of it and sequence become valid ASCII-only Unicode string (with potentially invalid last character:

0x41, 0, 0x42, 0, 0x43,...

其他提示

You are moving to the end of the stream minus 100 bytes. Then you read a byte (which takes you to the end of the stream minus 99 bytes) and afterwards are you trying to read 100 bytes. That takes you one byte outside the stream.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow