Data align when socket recv() then written to file using overlapped_io with FILE_NO_BUFFERING_FLAG

StackOverflow https://stackoverflow.com/questions/23510289

Question

I'm writing a C++ program that simply receives data from another computer and writes the data into an SSD RAID with high throughput (about 100MB/s since GbEthernet).

I have set up 2 overlapped_io each, which are received from Ethernet and written to SSD.

When the receiving is done done, it'll post a message to the writer.

And I use FILE_NO_BUFFERING_FLAG when creating the file on disk.

On the side of network sender, I am using an overlapped IO to send data.

I got stuck in the problem: when received from the socket, the rv = recv() is not aligned with the disk (maybe 4096 times?).

What should I do?

Was it helpful?

Solution

recv and unbuffered writes are not really very compatible with each other. It is possible to get that working, but it will take a little extra work.

When doing unbuffered writes, both the start address of your buffer and the amount to write must be multiples of the sector size (see MSDN). Aligning the buffer is trivial, but dealing with the fact that recv can return pretty much every amount of data (up to the amount you ask for, but in theory it could be just 1 byte) is a bit of work.

Another problem is that while it is pretty much guaranteed that the sector size is a power of two (though at least there used to exist harddisks with non-power-of-two sectors in the 1990s, this fact was hidden by the controller) you do not know what it is. And even if you did know, it might be different on the next computer. It might be 512 or 1024 or something else.

How to handle this? Most programmers resort to simply using a function that allocates complete memory pages, such as VirtualAlloc, or an anonymous memory mapping. Since these operate on pages, they are necessarily page-size aligned, which (usually) means 4096 bytes1.

Since the amount of data to write must, too, be a multiple of the sector size (but the amount of data received probably isn't), you have round down, do a partial write, and save the rest for the next write.
Again, the problem is that you don't know the sector size, so the best thing you can do is round down to the same granularity that you're using for the buffer start (anything else would be nonsensical). In other words, you conceptually have to do something like this:

while(rv < 0xffff)          // don't have enough yet
    receive_more_and_append();

num_write = rv &  ~0xffff;
rv -= num_write;
memcpy(other_buf, buf+num_write, rv);
WriteFileEx(...);


1That is only half the truth, since Windows has a minimum allocation granularity of 64kB. You can't allocate something smaller than 64k and it can't be aligned less than 64k. So in fact, you are good for sectors up to 64k, which is bigger than anything you are likely to ever encounter, realistically.
Also, as a small nitpick, Itanium has 8k pages, not 4k -- but that is no problem, it's actually better.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top