In serial, your code runs in 78m40s and real
and user
are almost identical.
When you run with 8 processes, which I would assumed are all running on the same machine (node), the total cpu time is 101m9. It is much larger, I would guess that you have encountered either overloading of the node or memory overconsumption. But as you are using 8 cores, the total wall clock time is 101m9 / 8 = 12m45. You could try to rerun that test and observe what happens.
When you run with 16 processes, which I would assumed are dispatched on two nodes, the real time is 4m46, which is approximately 78m40 / 16. But the real
time is the cumulated cpu time of all processing running on the same node as mpirun
; the time
command has no way of knowing about mpi processes running on other nodes; 37m30 is approximately 78m40 / 2.