$time meaning when using parallel processing on a scientific cluster?

Question 1

In serial, your code runs in 78m40s and real and user are almost identical.

When you run with 8 processes, which I would assumed are all running on the same machine (node), the total cpu time is 101m9. It is much larger, I would guess that you have encountered either overloading of the node or memory overconsumption. But as you are using 8 cores, the total wall clock time is 101m9 / 8 = 12m45. You could try to rerun that test and observe what happens.

When you run with 16 processes, which I would assumed are dispatched on two nodes, the real time is 4m46, which is approximately 78m40 / 16. But the real time is the cumulated cpu time of all processing running on the same node as mpirun ; the time command has no way of knowing about mpi processes running on other nodes; 37m30 is approximately 78m40 / 2.

Question 2

There are usually two different notions of time on a computer system.

Wall-clock time (let's call it T): This is the time that goes by on your watch, while your program is executing.
CPU time (let's call it C): that's the cumulative time all the CPUs working on your program have spent executing your code.

For an ideal parallel code running on P CPUs, T=C/P. That means, if you run the code on eight CPUs, the code is eight times faster, but the work has been distributed to eight CPUs, which all need to execute for C/P seconds/minutes.

In reality, there's often overhead in the execution. With MPI, you've got communication overhead. This usually cause a situation where T>C/P. The higher T becomes, the less efficient the parallel code is.

An operating system like Linux can tell you more things, than just the wall-clock time. It usually reports user and sys time. User time is the CPU time (not exactly, but reasonable close for now) that the application spends in your code. Sys time is the time in Linux kernel.

Cheers, -michael