How is one supposed to deal with the intermediate buffer of DataReader class?

https://softwareengineering.stackexchange.com/questions/297763

10-10-2020
|

Question

Summary

I am developing a WAV file format reader under WinRT, and for this I need to read random amounts of structs consisting of fundamental types such as int, uint, float and so on.

Back in desktop development one would rely on BinaryReader, now in WinRT it has been replaced by DataReader which works asynchronously.

Problem

I cannot grasp how to use this new class since now, an intermediate buffer must be filled using LoadAsync(), prior calling reading methods such as ReadInt32().

In contrast, with the old BinaryReader there was no notion of having to fill an intermediate buffer prior reading primitives from the source.

Every example I have seen on the web are 'naive' in the sense that they entirely read the source stream in memory, but in my case a WAV file is in the range of hundred megabytes and possibly gigabytes.

I have sketched the following helper methods which pre-fills the intermediate buffer with only what's needed and basically frees me from systematically calling LoadAsync every time before reading something from the stream:

internal static class DataReaderExtensions
{
    public static async Task<string> ReadStringAsync(this DataReader reader, uint length)
    {
        await LoadAsync(reader, length);
        return reader.ReadString(length);
    }

    private static async Task LoadAsync(DataReader reader, uint length)
    {
        var u = await reader.LoadAsync(length);
        if (u < length) throw new InvalidOperationException();
    }
}

But I'm not entirely sure whether it is the way to go when using DataReader.

Question

How is one supposed to pre-fill the intermediate buffer in my case ?

should one load only the needed amount as shown above ?
or should one load a constant size (e.g. 65536 bytes), keep track of reading position then possibly pre-fetch more on larger requests ? (basically wrapping a DataReader in a helper class)

EDIT

By looking at BinaryReader source code there doesn't seem to be any kind of magic behind the scene, i.e. bytes are fetched on demand. So for my case, even if it sounds a bit silly to read primitives asynchronously, I guess it's the simplest and safest way to do it; in contrast to wrapping a DataReader, tracking read position, handling an intermediate buffer and finally, the inability to derive from it as public WinRT types must be sealed ... not sure it is worth it for the outcome.

Unfortunately WINMD assemblies sources are unavailable, it would have been pretty interesting to see how they do it at Microsoft as these newer types can be used as older types, with these extension methods.

Solution

should one load only the needed amount as shown above ?

You should load into the buffer all that you can feasibly expect to process with the code that follows. In the DataReader documentation example, they read the entire stream into the buffer, because they are going to process it all immediately.

The reason for the buffer is that IO is slow (usually). So the amount of data you specify is loaded with asynchronous IO into the memory buffer up front. Then you can subsequently read it without waiting for IO on every read. That's a good thing for performance. IO is batched which will improve perf on many devices (e.g. mechanical hard drives). Your code's execution is suspended (due to async/await) until IO is finished, so it isn't tying up CPU cycles.

or should one load a constant size (e.g. 65536 bytes), keep track of reading position then possibly pre-fetch more on larger requests ? (basically wrapping a DataReader in a helper class)

Sometimes the size of the data will be too large to load in memory all at once. .NET itself sets a memory limit of 2GB per object (well, sortof). So if the data you are reading is close to 2GB, then you will definitely want to keep track of stream's read position, and read only part of the file into the buffer. Then once you get to the end of the buffer, fill it back up again from the next read position and continue processing. Repeating as necessary until you've processed the whole file.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange