What does Filestream.Read return value mean? How to read data in chunks and process it?

https://stackoverflow.com/questions/5075924

03-12-2019
|

Question

I'm quite new to C# so please bear with me. I'm reading (using FileStream) data (fixed size) to small array, process the data and then read again and so on to the end of file.

I thought about using something like this:

            byte[] data = new byte[30];
            int numBytesToRead = (int)fStream.Length;
            int offset = 0;

            //reading
            while (numBytesToRead > 0)
            {
                fStream.Read(data, offset, 30);
                offset += 30;
                numBytesToRead -= 30;

                //do something with the data
            }

But I checked documentation and their examples and they stated that return value of the above read method is:

"Type: System.Int32 The total number of bytes read into the buffer. This might be less than the number of bytes requested if that number of bytes are not currently available, or zero if the end of the stream is reached."

What does it mean that they are not currently available, can this really happen when reading small amounts of data or is this just for large amounts? If only for large, how large approximately, because I'll be reading also in bigger chunks in some other places. If this can happen anytime how should I change my code so that the code will still execute efficiently?

Thank you for your time and answers.

Solution

The read method returns the number of bytes returned, which may be less than the number of bytes requested. Normally when you read a file, you will get all the bytes that you ask for (unless you reach the end of the file), however, you can't count on it always being that way.

It's possible that the system will make a difference between data that is immediately available and data that needs time to be retrieved, so that it will return the data currently available right away, start reading more data in the background and expect you to request the rest of the data in another call. AFAIK it doesn't do this currently, but it's a reasonable future scenario.

You should get the result of the Read method and use that to determine how much data you got. You shouldn't read it into the buffer at the location of offset, then you can't read a file that is larger than the buffer. Alternatively, you can declare an array to hold the entire stream, then you would read the data into the location of offset.

You should also handle the situation where the Read method returns zero, which means that there is no more data to read. This normally doesn't happen until you reach the end of the file, but if it would it would throw your code into an eternal loop.

byte[] data = new byte[30];
int numBytesToRead = (int)fStream.Length;
int offset = 0;

//reading
while (numBytesToRead > 0) {
  int len = fStream.Read(data, 0, data.Length);
  offset += len;
  numBytesToRead -= len;
  if (len == 0 && numBytesToRead > 0) {
    // error: unexpected end of file
  }
  //do something with the data (len bytes)
}

OTHER TIPS

Try reading more than is available in the file. You can do this in the following two scenarios:

You try to read more bytes than the total length of the file
You are too close to the end of the file to be able to read the number of bytes you request

Additionally, Stream has descendants for network-bound connections as well, and in those cases it is not always easy to know how many bytes will be available and when.

The way to process a binary file in chunks is like this:

byte[] buffer = new byte[BUFFER_SIZE];
int inBuffer;
while ((inBuffer = stream.Read(buffer, 0, buffer.Length)) > 0)
{
    // here you have "inBytes" number of bytes in the buffer
}

Bytes currently not available only applies to non-FileStream Streams such as the one found in HttpWebRequest.

FileStream.Read could return 1 byte, in theory. You should still be able to process packets this small.

But it will never return 0 unless there is a problem like SMB connection lost, file deleted, anti virus, or it hits the end of the file.

There are better ways to read files. If you're dealing with a text file, consider using System.IO.StreamReader instead, as it handles different text encoding, line breaks, and more.

Also be aware that buffer max size is 2 GB, so don't do new buffer[fileStream.Length]

FileStream derives from Stream, and Stream is a very generic class and the description of Read is from that generic class. A stream can also be a network stream for example, and there, data might not be currently available, because it has not been send. For a FileStream you can assume, that you will get three types of return values:

return value == count of bytes to be read (last parameter of Read): You are in the middle of the file
return value < count && return value > 0: You might be at the end of the file or the rest of the stream is just not currently available.
return value == 0: You already read all content. Nothing more to read.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow