Corriendo en múltiples núcleos usando MPI

https://stackoverflow.com/questions/6060836

15-11-2019
|

Pregunta

Utilizo el comando actual para enviar trabajos de MPI: MPIRUN -NP No.Of Procesadores FileName

Mi comprensión es que el comando anterior me permite enviar a 4 procesadores independientes que se comunican a través de MPI. Sin embargo, en nuestra configuración, cada procesador tiene 4 núcleos que se utilizan sin utilizar . Las preguntas que tenía son las siguientes:

¿Es posible presentar un trabajo para ejecutarse en múltiples núcleos en el mismo nodo o varios nodos de la línea de comandos de ejecución MPI? Si es así, ¿cómo?
¿El código anterior requiere algún comentario especial dentro del código? Entiendo que la lectura de cierta literatura de que el tiempo de comunicación entre los núcleos podría ser diferente entre los procesadores, por lo que requiere algo de pensar en cómo se distribuye el problema ... ¡pero para ese problema? ¿Qué más necesita para estimar?
Finalmente, ¿hay un límite de cuánta cantidad de datos se transfiere? ¿Hay un límite de cuánto datos pueden enviar / recibir el autobús? ¿Hay una limitación en el caché?
¡Gracias!

Solución

So 1 is a question about launching processes, and 2+3 are questions about, basically, performance tuning. Performance tuning can involve substantial work on the underlying code, but you won't need to modify a line of code to do any of this.

What I understand from your first question is that you want to modify the distribution of the MPI processes launched. Doing this is necessarily outside the standard, because it's OS and platform dependant; so each MPI implementation will have a different way to do this. Recent versions of OpenMPI and MPICH2 allow you to specify where the processors end up, so you can specify two processors per socket, etc.

You do not need to modify the code for this to work, but there are performance issues depending on core distributions. It's hard to say much about this in general, because it depends on your communication patterns, but yes, the "closer" the processors are, the faster the communications will be, by and large.

There's no specified limit to the total volume of data that goes back and forth between MPI tasks, but yes, there are bandwidth limits (and there are limits per message). The cache size is whatever it is.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow