optimizing nbody on a GPU cluster with openacc

https://stackoverflow.com/questions/21208857

openacc

29-09-2022
|

Frage

We are trying to provide a generic nbody algorithm for multiple Nodes. A node has 2 GPUs and 1 CPU.

We want to calculate the n-body only on GPUs using openacc. After doing some research about openacc i am unsure how to spread the calculation to multiple GPUs.

Is it possible to use 2 GPUs with only one thread and openacc? If not, what would be a suitable approch, using openMP to use both GPUs on one node and communicate with other nodes via MPI?

Lösung

The OpenACC runtime library provides routines (acc_set_device_num(), acc_get_device_num()) to select which accelerator device will be targetted by a particular thread, but it's not convenient to use a single thread to use multiple devices simultaneously. Instead, either OpenMP or MPI can be used.

For example (lifting from here) a basic framework for OpenMP might be:

#include <openacc.h>
#include <omp.h>
#pragma omp parallel num_threads(2)
{
  int i = omp_get_threadnum();
  acc_set_device_num( i, acc_device_nvidia );
  #pragma acc data copy...
  {
  }
}

It can also be done with MPI, and/or you could use MPI to communicate between nodes, as is typical.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow