How do MPI implementations (OpenMPI, MPICH) handle security/authentication

https://stackoverflow.com/questions/6346873

27-10-2019
|

Question

How do OpenMPI and MPICH handle security when I send MPI messages between the processes over TCP/IP sockets?

In particular, how do they prevent other users of the same network from connecting to a listening socket and sending fake MPI messages?

The specific scenario is the following:

The administrators are trusted. Untrusted users do not have a physical access to any hardware or network. Untrusted users do not have root access.
However, untrusted users can run their own programs in the cluster; the cluster nodes are typical Linux boxes. In particular, untrusted users can open TCP connections from any machine to any other machine in the cluster and send arbitrary messages.

Solution

J Teller's right; MPI doesn't really do this, and it shouldn't. That's a design decision based on the use case of MPI.

MPI users are the sorts of people who pay lots of money for interconnects with sub-microseconds latency. The overhead of some sort of cryptographic signing of messages would be completely unacceptable for this community.

And it wouldn't really help at any rate. The way MPI is used is as a message transport interface within a controlled environment - nodes in a limited-access cluster, or maybe machines in a compute lab. If a malicious user gains enough control of one of these nodes to interfere with MPI communications, there are far easier ways to disrupt the communication than sniffing packets, figuring out what stage of the computation is underway, and doing some kind of man-in-the-middle attack. One could just alter the memory of the running job, or more easily, simply overwrite the results on the shared file system. (notice simply sending forged MPI messages might well be noticed, as the "real" messages would pile up, using resources and possibly crashing the job; similarly, intercepting messages without relaying them would almost certainly result in deadlock).

These arguments don't apply so strongly to distributed computing, of course, say BOINC-style: but MPI isn't well suited for that sort of use anyway.

Nothing of course stops an MPI user who does have this sort of security requirement from simply sending a pgp-style signature along with every message and incorporating that into their code; but a mechanism for doing that is not part of MPI per se, and that's certainly the right decision.

OTHER TIPS

I'm not quite an expert on this, but the basic answer is that MPI generally doesn't handle security. It relies on the underlying OS to provide the security level you're describing.

For my mpi distribution, this is built in is using the mpd daemon (the daemon that launches mpi processes). mpdboot sets up a ring of mpd daemons on the cluster (1 per node), in a sane way. Once that ring is setup, and if you trust the mpd daemons, then you're all set. Mpd will make sure that only processes you own connect to your mpi processes.

However, I don't quite understand the "sane way" the mpd ring is setup. In my distribution however, mpdboot is a python script, so it's possible to take a look at it and see if it's secure enough for you. It is probably safe enough if the cluster you're running on is access controlled.

I'm all with @Jonathan Dursi that securing MPI communication contributes little to the security of a well-configured cluster but (a) management may insist, (b) for some reason you want to run MPI over an untrusted network, and (c) it's fun to solve the problem.

As for the untrusted network situation, I used IPSec to protect the MPI communication network on a cluster before. Rather than using IPSec's automatic configuration scripts that require certificates, I loaded symmetric encryption keys via a shell script that I generated automatically with a python script.

To isolate MPI communication of individual jobs on the cluster, it should be possible to extend this approach to load newly generated IPSec keys in the prolog of each job, rather than only once at system startup. In this setup, jobs must not share nodes. Furthermore, it will be easier to set this up with a dedicated network for MPI as ongoing connections (ssh, the job manager etc.) must be left intact and you probably don't want to include the head node(s) which usually the job management system needs to connect to to the job's IPSec network as the attacker could be sitting there.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow