Is there a point using MPI instead of OpenMP when all processors share the memory?

Question

If you never intend to scale your application beyond a single shared-memory node, then OpenMP parallelisation might be relatively easier to implement in comparison to MPI parallelisation. Relatively, because the apparent simplicity of OpenMP is very misleading. In order to utilise the full ability of modern shared-memory machines, one should maximise data locality and use lots of private data, effectively treating them (the machines) as distributed memory systems. Also, the most prevailing error in shared memory programming are data races and those in times could be very hard to debug, even when armed with special thread-checker tools. Data races are virtually absent in MPI programming since processes do not share data.

That said, even when MPI processes communicate using shared memory, that is still slower than directly accessing the shared memory in a threaded process. Also some algorithms require some global data, which takes more memory with MPI where each process has to hold a copy of that data. This is curable in MPI-3.0 using shared-memory windows with single-sided operations, but that's somehow cumbersome (though portable). Also there are research efforts to reduce the intra-node communication overhead to as little as possible and some are very successful.