Pregunta

I have written a program (using FFTW) to perform Fourier transforms of some data files written in OpenFOAM.

The program first finds the paths to each data file (501 files in my current example), then splits the paths between threads, such that thread0 gets paths 0->61, thread1 gets 62-> 123 or so, etc, and then runs the remaining files in serial at the end.

I have implemented timers throughout the code to try and see where it bottlenecks, since run in serial each file takes around 3.5s and for 8 files in parallel the time taken is around 21s (a reduction from the 28s for 8x3.5 (serial time), but not by so much)

The problematic section of my code is below

if (DIAG_timers) {readTimer = timerNow();}
for (yindex=0; yindex<ycells; yindex++)
{
    for (xindex=0; xindex<xcells; xindex++)
    {
        getline(alphaFile, alphaStringValue);
        convertToNumber(alphaStringValue, alphaValue[xindex][yindex]);
    }
}
if (DIAG_timers) {endTimerP(readTimer, tid, "reading value and converting", false);}

Here, timerNow() returns the clock value, and endTimerP calculates the time that has passed in ms. (The remaining arguments relate to it running in a parallel thread, to avoid outputting 8 lines for each loop etc, and a description of what the timer measures).

convertToNumber takes the value on alphaStringValue, and converts it to a double, which is then stored in the alphaValue array.

alphaFile is a std::ifstream object, and alphaStringValue is a std::string which stores the text on each line.

The files to be read are approximately 40MB each (just a few lines more than 5120000, each containing only one value, between 0 and 1 (in most cases == (0||1) ), and I have 16GB of RAM, so copying all the files to memory would certainly be possible, since only 8 (1 per thread) should be open at once. I am unsure if mmap would do this better? Several threads on stackoverflow argue about the merits of mmap vs more straightforward read operations, in particular for sequential access, so I don't know if that would be beneficial.

I tried surrounding the code block with a mutex so that only one thread could run the block at once, in case reading multiple files was leading to slow IO via vaguely random access, but that just reduced the process to roughly serial-speed times.

Any suggestions allowing me to run this section more quickly, possibly via copying the file, or indeed anything else, would be appreciated.

Edit:

template<class T> inline void convertToNumber(std::string const& s, T &result)
{
    std::istringstream i(s);
    T x;
    if (!(i >> x))
        throw BadConversion("convertToNumber(\"" + s + "\")");
    result = x;
}

turns out to have been the slow section. I assume this is due to the creation of 5 million stringstreams per file, followed by the testing of 5 million if conditions? Replacing it with TonyD's suggestion presumably removes the possibility of catching an error, but saves a vast number of (at least in this controlled case) unnecessary operations.

¿Fue útil?

Solución

The files to be read are approximately 40MB each (just a few lines more than 5120000, each containing only one value, between 0 and 1 (in most cases == (0||1) ), and I have 16GB of RAM, so copying all the files to memory would certainly be possible,

Yes. But loading them there will still count towards your process' wall clock time unless they were already read by another process short before.

since only 8 (1 per thread) should be open at once.

Since any files that were not loaded in memory before the process started will have to be loaded and thus the loading will count towards the process wall clock time, it does not matter how many are open at once. Any that are not cache will slow down the process.

I am unsure if mmap would do this better?

No, it wouldn't. mmap is faster, but because it saves the copy from kernel buffer to application buffer and some system call overhead (with read you do a kernel entry for each page while with mmap pages that are read with read-ahead won't cause further page faults). But it will not save you the time to read the files from disk if they are not already cached.

mmap does not load anything in memory. The kernel loads data from disk to internal buffers, the page cache. read copies the data from there to your application buffer while mmap exposes parts of the page cache directly in your address space. But in either case the data are fetched on first access and remain there until the memory manager drops them to reuse the memory. The page cache is global, so if one process causes some data to be cached, next process will get them faster. But if it's first access after longer time, the data will have to be read and this will affect read and mmap exactly the same way.

Since parallelizing the process didn't improve the time much, it seems majority of the time is the actual I/O. So you can optimize a bit more and mmap can help, but don't expect much. The only way to improve I/O time is to get a faster disk.


You should be able to ask the system to tell you how much time was spent on the CPU and how much was spent waiting for data (I/O) using getrusage(2) (call it at end of each thread to get data for that thread). So you can confirm how much time was spent by I/O.

Otros consejos

mmap is certainly the most efficient way to get large amounts of data into memory. The main benefit here is that there is no extra copying involved.

It does however make the code slightly more complex, since you can't directly use the file I/O functions to use mmap (and the main benefit is sort of lost if you use "m" mode of stdio functions, as you are now getting at least one copy). From past experiments that I've made, mmap beats all other file reading variants by some amount. How much depends on what proportion of the overall time is spent on waiting for the disk, and how much time is spent actually processing the file content.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top