سؤال

I'm trying to optimizie an I/O bound C++ Win32 application. What it actually does is something very similar to recurse a folder and compute a cryptographic hash of every file it finds. It's a single threading application using memory mapped files, so as it's easy to imagine it doesn't seem to use much CPU since most of the times the main thread is put to sleep waiting for I/O to complete. I'm thinking about a couple of solutions, but I'm not sure about it so I'd like to have your opinions.

  1. I could spawn many threads ( having a fixed size pool of workers to keep memory usage under certain threshold ), but honestly I don't know if this can make the situation better, each thread is gonna be put to sleep just like my main thread in the current implementation, plus the scheduler would "waste" lot of computation power to switch contexts.
  2. I was thinking about I/O completion ports ( single thread? multi? ), but that would mean to abandon the memory mapped files ( am I wrong ? ) and use standard file operations. If this is the case, could you please provide me some example code on how to use IOCP to read and elaborate a given list of files without putting the reading thread to sleep ?

Any other idea/suggestion/etc would be really appreciated :)

Thanks.

هل كانت مفيدة؟

المحلول

Before parallelizing anything, always ask yourself first: Does the added complexity justify the performance gained? In order to answer that question with minimal effort, just test how much % of max read throughput you already have. That is, test your current read throughput and then test max throughput. Don't use the theoretical max for this computation. Then, think about how much complexity and how many possible issues are introduced in even the simplest approach to gain the last few %.

As already mentioned in the comments, the greatest performance gain here is probably achieved by pipelining (i.e. overlapping computation and I/O). And the easiest way to implement that is with asynchronous reads. This thread lists multiple ways to implement asnychronous file I/O in C++.

If you don't need portability, just use the Windows OVERLAPPED API. Boost ASIO does not seem to make File I/O very easy (yet). I could not find any good examples.

Note that, depending on your system configuration, you have to launch multiple threads to fully saturate I/O bandwidth (especially if files of that folder actually reside on multiple disks, which is possible). Even if you only read from one device, you might fair (slightly) better with multiple threads to mitigate OS overhead.

نصائح أخرى

My experience indicates that memory mapping isn't particularly fast, so that would probably be the first thing I'd abandon.

Threading (explicitly or via IOCPs) probably won't do much good either, unless the target system has lots of disk drives, and can split things up so different threads are operating on different physical drives.

Once you've given up on memory mapping and do explicit I/O, you probably want to use FILE_FLAG_NO_BUFFERING and read relatively large blocks (say, a few megabytes at a time). Do check the alignment requirements on your block of memory though--they're a little tricky (or maybe "tedious" would be a better word to describe them). Also note that this only works for reads that are multiples of the disk's sector size, so in a typical case you need to open the file twice, once with FILE_FLAG_NO_BUFFERING to read the bulk of the data, then again without that flag to read the "tail" of the file.

Although it only copies a file (rather than processing the contents), and it's probably pure C, not C++, perhaps some demo code will be of at least a little help:

int do_copy(char const *in, char const *out) {

    HANDLE infile;
    HANDLE outfile;
    char *buffer;
    DWORD read, written;
    DWORD junk=0;
    unsigned long little_tail;
    unsigned long big_tail;
    unsigned __int64 total_copied = 0;
    unsigned __int64 total_size = 0;
    BY_HANDLE_FILE_INFORMATION file_info;

#define size (1024 * 8192)

    buffer = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
    if ( NULL == buffer)
        return 0;

    infile = CreateFile(in, 
        GENERIC_READ, 
        FILE_SHARE_READ,
        NULL,
        OPEN_ALWAYS, 
        FILE_FLAG_NO_BUFFERING, 
        NULL);

    GetFileInformationByHandle(infile, &file_info);
    total_size = (unsigned __int64)file_info.nFileSizeHigh << 32 | (unsigned __int64)file_info.nFileSizeLow / 100;

    outfile = CreateFile(out, 
        GENERIC_WRITE, 
        FILE_SHARE_READ,
        NULL,
        CREATE_ALWAYS, 
        FILE_FLAG_NO_BUFFERING, 
        NULL);

    if ((infile == HNULL) || (outfile == HNULL))
        return 0;

    while (ReadFile(infile, buffer, size, &read, NULL) && read == size) {
        WriteFile(outfile, buffer, read, &written, NULL);
        total_copied += written;
        fprintf(stderr, "\rcopied: %lu %%", (unsigned long)(total_copied / total_size));
    }

    little_tail = read % 4096;
    big_tail = read - little_tail;

    WriteFile(outfile, buffer, big_tail, &written, NULL);

    CloseHandle(infile);
    CloseHandle(outfile);

    outfile = CreateFile(out, 
        GENERIC_WRITE, 
        0,
        NULL,
        OPEN_EXISTING,
        FILE_FLAG_SEQUENTIAL_SCAN, 
        NULL);
    fprintf(stderr, "\rcopied: 100 %%\n");

    SetFilePointer(outfile, 0, &junk, FILE_END);
    WriteFile(outfile, buffer+big_tail, little_tail, &written, NULL);
    CloseHandle(outfile);
    VirtualFree(buffer, size, MEM_RELEASE);
    return 1;
}
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top