Question

I have a large file (around 400GB) which I need to FileStream and skip the first 128 bytes into another file. I have the following code, but it is not working properly, because when I check the file sizes after the stream has finished, File B is missing a lot more than 128 bytes. What am I doing wrong?

private void SplitUnwantedHeader(string file1, string file2)
    {
        FileStream fr = new FileStream(file1, FileMode.Open, FileAccess.Read);
        FileStream fw = new FileStream(file2, FileMode.Create, FileAccess.Write);

        byte[] fByte = new byte[65534];
        long headerToSplit = 128;
        int bytesRead = 0;

        try
        {
            fr.Position = headerToSplit;
            do
            {
                bytesRead = fr.Read(fByte, 0, fByte.Length);
                fw.Write(fByte, 0, fByte.Length - (int)headerToSplit);
            } while (bytesRead != 0);
        }
        catch (Exception ex)
        {
            UpdateStatusBarMessage.ShowStatusMessage(ex.Message);
        }
        finally
        {
            fw.Close();
            fr.Close();
        }
    }

Thanks.

Was it helpful?

Solution

The line

 fw.Write(fByte, 0, fByte.Length - (int)headerToSplit);

is wrong when used in a loop like this. It will write the "buffer size" minus 128 bytes every loop cycle. Instead, the code should write bytesRead count during the copy.

 fw.Write(fByte, 0, bytesRead);

Only perform the offset before entering the copy-everything-else loop. Also, the loop can be replaced with FileStream.CopyTo (since .NET 4) and using can tidy up resource management.

That is, consider:

using (var fr = new FileStream(file1, FileMode.Open, FileAccess.Read))
using (var fw = new FileStream(file2, FileMode.Create, FileAccess.Write)) {
    fr.Position = 128; // or fr.Seek(128, SeekOrigin.Begin);
    fr.CopyTo(fw, 65534);
}

OTHER TIPS

There are two things wrong with the code:

  • Instead of skipping the first 128 bytes of the first block, it is skipping the last 128 bytes of every block.
  • It is ignoring the bytesRead value when writing, so it may be writing data from the buffer that was never read into the buffer. The number of bytes read can be less than the number of bytes requested, even when you are not at the end of the file.

The code is a mix between skipping the header before the loop and skipping the header inside the loop. You should do one, not both.

You can check how much data you have in the buffer compared to how much you should skip, and update the number of bytes to skip so that it's zero once you are beyond the header:

do {
  bytesRead = fr.Read(fByte, 0, fByte.Length);
  if (bytesRead > headerToSplit) {
    fw.Write(fByte, (int)headerToSplit, bytesRead - (int)headerToSplit);
    headerToSplit = 0;
  } else {
    headerToSplit -= bytesRead;
  }
} while (bytesRead != 0);

Or if you are skipping the header before the loop, just write all the data that you have in the buffer:

fr.Position = headerToSplit;
do {
  bytesRead = fr.Read(fByte, 0, fByte.Length);
  fw.Write(fByte, 0, bytesRead);
} while (bytesRead != 0);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top