Question

I'm working with large files , beginning from 10Gb. I'm loading the parts of the file in the memory for processing. Following code works fine for smaller files (700Mb)

 byte[] byteArr = new byte[layerPixelCount];
 using (FileStream fs = File.OpenRead(recFileName))
    {
        using (BinaryReader br = new BinaryReader(fs))
        {
            fs.Seek(offset, SeekOrigin.Begin);

            for (int i = 0; i < byteArr.Length; i++)
            {
                byteArr[i] = (byte)(br.ReadUInt16() / 256);
            }
         }
    }

After opening a 10Gb file, the first run of this function is OK. But the second Seek() throws an IO exception:

An attempt was made to move the file pointer before the beginning of the file.

The numbers are:

fs.Length = 11998628352

offset = 4252580352

byteArr.Length = 7746048

I assumed that GC didn't collect the closed fs reference before the second call and tried

    GC.Collect();
    GC.WaitForPendingFinalizers();

but no luck.

Any help is apreciated

Was it helpful?

Solution

I'm guessing it's because either your signed integer indexer or offset is rolling over to negative values. Try declaring offset and i as long.

//Offest is now long
long offset = 4252580352;

byte[] byteArr = new byte[layerPixelCount];
using (FileStream fs = File.OpenRead(recFileName))
{
   using (BinaryReader br = new BinaryReader(fs))
    {
        fs.Seek(offset, SeekOrigin.Begin);

        for (long i = 0; i < byteArr.Length; i++)
        {
            byteArr[i] = (byte)(br.ReadUInt16() / 256);
        }
    }
}

OTHER TIPS

My following written code logic is appropriate with large files beyond 4GB. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries. In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size. (using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)

private uint Crc32CAlgorithmBigCrc(string fileName)
{
    uint hash = 0;
    byte[] buffer = null;
    FileInfo fileInfo = new FileInfo(fileName);
    long fileLength = fileInfo.Length;
    int blockSize = 1024000000;
    decimal div = fileLength / blockSize;
    int blocks = (int)Math.Floor(div);
    int restBytes = (int)(fileLength - (blocks * blockSize));
    long offsetFile = 0;
    uint interHash = 0;
    Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
    bool firstBlock = true;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[blockSize];
        using (BinaryReader br = new BinaryReader(fs))
        {
            while (blocks > 0)
            {
                blocks -= 1;
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(blockSize);
                if (firstBlock)
                {
                    firstBlock = false;
                    interHash = Crc32CAlgorithm.Compute(buffer);
                    hash = interHash;
                }
                else
                {
                    hash = Crc32CAlgorithm.Append(interHash, buffer);
                }
                offsetFile += blockSize;
            }
            if (restBytes > 0)
            {
                Array.Resize(ref buffer, restBytes);
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(restBytes);
                hash = Crc32CAlgorithm.Append(interHash, buffer);
            }
            buffer = null;
        }
    }
    //MessageBox.Show(hash.ToString());
    //MessageBox.Show(hash.ToString("X"));
    return hash;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top