Question

I'm trying to parallelise Dijkstra's Algorithm using OpenMP but the serial version runs x40 times faster. I might be missing a concept or doing something wrong. I'm new to parallelism and OpenMP. Can you please help out? Thanks.

long* prev; // array of pointers to preceding vertices
long* dist; // array of distances from the source to each vertex
long* visited; // visited vertices, 0 if not visited, 1 otherwise
long** vSetDistance; // distance between i and j

void dijkstraAlgorithm(
      long * prev, 
      long * dist, 
      long * visited, 
      long ** vSetDistance) 
   {
   int i, j, min;

   // Initialization: set every distance to INFINITY until we discover a path
   for (i = 1; i <= numVertices; i++) 
      {
      prev[i] = -1;
      visited[i] = 0;
      dist[i] = INFINITY;
      }

   // The distance from the source to the source is defined to be zero
   dist[sourceVertex] = 0; 

      {
      for (j = 1; j <= numVertices; j++)
         {
         min = -1;

#pragma omp parallel default(none) private(i, j) \
   shared(min, visited, dist, prev, vSetDistance, numVertices)

            {
            /* This loop corresponds to sending out the explorers walking the paths,
             * where the step of picking "the vertex, v, with the shortest path to s"
             * corresponds to an explorer arriving at an unexplored vertex */

#pragma omp for

            for (i = 1; i <= numVertices; i++)
#pragma omp critical
               {
               if (!visited[i] && ((min == -1) || (dist[i] <= dist[min])))
                  min = i;
               }

            visited[min] = 1; // visited = true

            // relaxation
#pragma omp for
            for (i = 1; i <= numVertices; i++) 
               {
               if (vSetDistance[min][i]) 
                  {
                  if ((dist[min] + vSetDistance[min][i]) < dist[i]) 
                     {
                     dist[i] = dist[min] + vSetDistance[min][i];
                     prev[i] = min;
                     }
                  }
               }
            }
         }
      }
   }
Was it helpful?

Solution

Parallelization isn't always a free ticket to higher performance. I see two things that could be causing the slowdown.

  1. The critical section is probably spending a lot of time dealing with synchronization. I'm not entirely familiar with how those sections are implemented in OpenMP but my first guess would be that they use mutexes to lock access to that section. Mutexes aren't super cheap to lock/unlock, and that operation is much more expensive than the operations you want to perform. In addition, since the loop is entirely in a critical section, all but one of the threads will just be waiting around for the thread in the critical section to finish. Essentially, that loop will still be done in a serial fashion with the added overhead of synchronization.

  2. There may not be enough vertices to benefit from parallelization. Again, starting threads isn't free, and the overhead may be significantly larger than the time gained. This becomes more and more pronounced as the number of vertices becomes smaller.

My guess is that the first problem is where most of the slowdown occurs. The easiest way to mitigate this problem is to simply do it in a serial fashion. Second, you could try having each thread find the minimum only in its own section, and them compare those in serial after the parallel portion.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top