Question

I'm running my finite-difference program on a scientific cluster at my school. The program uses openmpi to parallelize the code.

When time the program in serial I get:

real    78m40.592s
user    78m34.920s
sys     0m0.999s

When I run it with 8 mpi processors I get:

real    12m45.929s
user    101m9.271s
sys     0m29.735s

When I run it with 16 mpi processors I get:

real    4m46.936s
user    37m30.000s
sys     0m1.150s

So my question is: if the user time is the total CPU time, then why are the user times so different from each other for different numbers of processors?

Thanks,

Anthony G.

Was it helpful?

Solution

In serial, your code runs in 78m40s and real and user are almost identical.

When you run with 8 processes, which I would assumed are all running on the same machine (node), the total cpu time is 101m9. It is much larger, I would guess that you have encountered either overloading of the node or memory overconsumption. But as you are using 8 cores, the total wall clock time is 101m9 / 8 = 12m45. You could try to rerun that test and observe what happens.

When you run with 16 processes, which I would assumed are dispatched on two nodes, the real time is 4m46, which is approximately 78m40 / 16. But the real time is the cumulated cpu time of all processing running on the same node as mpirun ; the time command has no way of knowing about mpi processes running on other nodes; 37m30 is approximately 78m40 / 2.

OTHER TIPS

There are usually two different notions of time on a computer system.

  1. Wall-clock time (let's call it T): This is the time that goes by on your watch, while your program is executing.
  2. CPU time (let's call it C): that's the cumulative time all the CPUs working on your program have spent executing your code.

For an ideal parallel code running on P CPUs, T=C/P. That means, if you run the code on eight CPUs, the code is eight times faster, but the work has been distributed to eight CPUs, which all need to execute for C/P seconds/minutes.

In reality, there's often overhead in the execution. With MPI, you've got communication overhead. This usually cause a situation where T>C/P. The higher T becomes, the less efficient the parallel code is.

An operating system like Linux can tell you more things, than just the wall-clock time. It usually reports user and sys time. User time is the CPU time (not exactly, but reasonable close for now) that the application spends in your code. Sys time is the time in Linux kernel.

Cheers, -michael

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top