Correndo em vários núcleos usando o MPI

https://stackoverflow.com/questions/6060836

15-11-2019
|

Pergunta

Eu uso o comando presente para enviar trabalhos MPI: MPIRUN -NP No.OF de processadores de nome de arquivo

Meu entendimento é que o comando acima me permite enviar para 4 processadores independentes que se comunicam via MPI. No entanto, na nossa configuração, cada processador tem 4 núcleos que não são utilizados . As perguntas que tive são as seguintes:

É possível enviar um trabalho para executar em vários núcleos no mesmo nó ou vários nós da linha de comando MPI Run? Em caso afirmativo, como?
O acima exige quaisquer comentários especiais / configuração dentro do código? Eu entendo de ler alguma literatura que o tempo de comunicação entre os núcleos pode ser diferente dos processadores, por isso exige algum pensamento sobre como o problema é distribuído ... mas para esse problema? O que mais precisa estimar?
Finalmente, há um limite de quanta quantidade de dados é transferida? Existe um limite de quantos dados o ônibus podem enviar / receber? Existe uma limitação no cache?
Obrigado!

Solução

So 1 is a question about launching processes, and 2+3 are questions about, basically, performance tuning. Performance tuning can involve substantial work on the underlying code, but you won't need to modify a line of code to do any of this.

What I understand from your first question is that you want to modify the distribution of the MPI processes launched. Doing this is necessarily outside the standard, because it's OS and platform dependant; so each MPI implementation will have a different way to do this. Recent versions of OpenMPI and MPICH2 allow you to specify where the processors end up, so you can specify two processors per socket, etc.

You do not need to modify the code for this to work, but there are performance issues depending on core distributions. It's hard to say much about this in general, because it depends on your communication patterns, but yes, the "closer" the processors are, the faster the communications will be, by and large.

There's no specified limit to the total volume of data that goes back and forth between MPI tasks, but yes, there are bandwidth limits (and there are limits per message). The cache size is whatever it is.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow