Question

I'm currently performing a large set of numerical simulations (C++/MPI). For every simulation I change a parameter and obtain a final value. Please note that each simulation is a parallel simulation itself, executed separately using MPI. What would be the most efficient way to save these data in a binary file avoiding any kind of simultaneous writing and overlapping?

Was it helpful?

Solution

It depends on the file writing pattern you have in your program:

  • If you write not so often (compared to the amount of computation you do) you could just protect file writing code with a mutex
  • If writing happens more often you could us a separate file for each thread and then merge them in the final file

You could also make a queue of data to be written in which scenario computation threads would be producers and there would be a single thread writing data to disk. You would probably need some kind of a queue size control for this if writing consumer can not keep up with producers.

Scheme with a queue is also nice because it separates computation and IO improving modularity.

OTHER TIPS

Instead of writing the file in each executable, you might want to store the data in the memory and then save the sorted and aggregated results after finishing all tasks.

There are frameworks that help you conduct tasks in parallel and aggregate the results. I would suggest the LeoTask framework: https://github.com/mleoking/LeoTask

It not only does the job but also provide many additional useful features. For example, it can recover and continue running your tasks after a power outrage without losing your calculated results.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top