Question

I'm trying to split a byte stream into chunks of increasing size.

The source stream contains an unknown number of bytes and is expensive to read. The output of the enumerator should be byte arrays of increasing size, starting at 8KB up to 1MB.

This is very simple to do by simply reading the whole stream, storing it in an array and taking the relevant pieces out. However, since the stream may be very large, reading it at once is unfeasible. Also, while performance is not the main concern, it is important to keep system load very low.

While implementing this I noticed that it's relatively difficult to keep the code short and maintainable. There are a few stream related issues to keep in mind, too (for instance, Stream.Read might not fill the buffer even though it succeeded).

I did not find any existing classes that help for my case, nor could I find something close on the net. How would you implement such a class?

Was it helpful?

Solution

public IEnumerable<BufferWrapper> getBytes(Stream stream)
{
    List<int> bufferSizes = new List<int>() { 8192, 65536, 220160, 1048576 };
    int count = 0;
    int bufferSizePostion = 0;
    byte[] buffer = new byte[bufferSizes[0]];
    bool done = false;
    while (!done)
    {
        BufferWrapper nextResult = new BufferWrapper();
        nextResult.bytesRead = stream.Read(buffer, 0, buffer.Length);
        nextResult.buffer = buffer;
        done = nextResult.bytesRead == 0;
        if (!done)
        {
            yield return nextResult;
            count++;
            if (count > 10 && bufferSizePostion < bufferSizes.Count)
            {
                count = 0;
                bufferSizePostion++;
                buffer = new byte[bufferSizes[bufferSizePostion]];
            }
        }
    }
}

public class BufferWrapper
{
    public byte[] buffer { get; set; }
    public int bytesRead { get; set; }
}

Obviously the logic for when to move up in buffer size, and how to choose what that size is could be altered.

Someone could also probably find a better way of handling the last buffer to be sent, as this isn't the most efficient way.

OTHER TIPS

For reference, the implementation I currently use, already with improvements as per the answer by @Servy

private const int InitialBlockSize = 8 * 1024;
private const int MaximumBlockSize = 1024 * 1024;

private Stream _Stream;
private int _Size = InitialBlockSize;

public byte[] Current
{
    get;
    private set;
}

public bool MoveNext ()
{
    if (_Size < 0) {
        return false;
    }

    var buf = new byte[_Size];
    int count = 0;

    while (count < _Size) {
        int read = _Stream.Read (buf, count, _Size - count);

        if (read == 0) {
            break;
        }

        count += read;
    }

    if (count == _Size) {
        Current = buf;
        if (_Size <= MaximumBlockSize / 2) {
            _Size *= 2;
        }
    }
    else {
        Current = new byte[count];
        Array.Copy (buf, Current, count);
        _Size = -1;
    }

    return true;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top