Question

I need to read small sequences of data from a 3.7 GB file. The positions I need to read are not adjacent, but I can order the IO so that the file is read from beginning to end.

The file is stored on a iSCSI SAN which should be capable of handling/optimizing queued IO.

The question is, how can I make a one shot request of all the data/positions I need in one go? Is it possible? I don't think async IO is an option because the reads are very small (20-200 bytes)

Currently the code looks like this:

using (var fileStream = new FileStream(dataStorePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    for (int i = 0; i < internalIds.Count();i++ )
    {
        fileStream.Position = seekPositions[i].SeekPosition;
        ... = Serializer.DeserializeWithLengthPrefix<...>(fileStream, PrefixStyle.Base128);

    }
    ...
}

I'm looking for ways to improve this I/O because I'm getting somewhat sub-par read performance. All the seek times from moving the head seem to be adding up.

Was it helpful?

Solution

Have you run Performance Monitor (from Microsoft Sysinternals) on this?

I'm not sure what the problem is, but I'll take a guess. If you're reading from a SAN, I would think disk accesses result in network requests under the hood. The first read sends a request to seek, reads and buffers data, and then the Serializer constructs the objects. By the time your second request gets sent, the SAN disks have continued to spin, so you have to wait for the data to spin into place.

Have you tried multithreading? I'm curious about the performance if you setup a Queue of file sections you need to process in sequential order, spin up a some threads, have them open the file separately (FileSharing.Read so they can all access the file at once) and then let them start grabbing work from the Queue. Output the results into another collection. If the order matters for the output, you sort the output by the original order in which you queued them.

--- EDIT ---

Have you tried the ReadFileScatter API? Here's a P-invoke signature from pinvoke.net.

OTHER TIPS

Make a single background thread as a disk proxy. Send all your read operations to it, and have it sort and merge the reads. If two or more regions are close, then read the full sector containing them and take sub-sections of the data. Return the data asynchronously.

Just for the record:

In POSIX Environments you could request multiple areas of a file with one (sys-)call using the readv function. Another ption in a POSIX Environment would be non-blocking IO.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top