Parallelization: pthreads or OpenMP?

https://stackoverflow.com/questions/935467

06-09-2019
|

Question

Most people in scientific computing use OpenMP as a quasi-standard when it comes to shared memory parallelization.

Is there any reason (other than readability) to use OpenMP over pthreads? The latter seems more basic and I suspect it could be faster and easier to optimize.

Solution

It basically boils down to what level of control you want over your parallelization. OpenMP is great if all you want to do is add a few #pragma statements and have a parallel version of your code quite quickly. If you want to do really interesting things with MIMD coding or complex queueing, you can still do all this with OpenMP, but it is probably a lot more straightforward to use threading in that case. OpenMP also has similar advantages in portability in that a lot of compilers for different platforms support it now, as with pthreads.

So you're absolutely correct - if you need fine-tuned control over your parallelization, use pthreads. If you want to parallelize with as little work as possible, use OpenMP.

Whichever way you decide to go, good luck!

OTHER TIPS

One other reason: the OpenMP is task-based, Pthreads is thread based. It means that OpenMP will allocate the same number of threads as number of cores. So you will get scalable solution. It is not so easy task to do it using raw threads.

The second opinion: OpenMP provides reduction features: when you need to compute partial results in threads and combine them. You can implement it just using single line of code. But using raw threads you should do more job.

Just think about your requirements and try to understand: is OpenMP enough for you? You will save lots of time.

OpenMP requires a compiler that supports it, and works with pragmas. The advantage to this is that when compiling without OpenMP-support (e.g. PCC or Clang/LLVM as of now), the code will still compile. Also, have a look at what Charles Leiserson wrote about DIY multithreading.

Pthreads is a POSIX standard (IEEE POSIX 1003.1c) for libraries, while OpenMP specifications are to be implemented on compilers; that being said, there are a variety of pthread implementations (e.g. OpenBSD rthreads, NPTL), and a number of compilers that support OpenMP (e.g. GCC with the -fopenmp flag, MSVC++ 2008).

Pthreads are only effective for parallelization when multiple processors are available, and only when the code is optimized for the number of processors available. Code for OpenMP is more-easily scalable as a result. You can mix code that compiles with OpenMP with code using pthreads, too.

You're question is similar to the question "Should I program C or assembly", C being OpenMP and assembly being pthreads.

With pthreads you can do much better parallelisation, better meaning very tightly adjusted to your algorithm and hardware. This will be a lot of work though.

With pthreads it is also much easier to produce a poorly parallelised code.

Is there any reason (other than readability) to use OpenMP over pthreads?

Mike kind of touched upon this:

OpenMP also has similar advantages in portability in that a lot of compilers for different platforms support it now, as with pthreads

Crypto++ is cross-platform, meaning in runs on Windows, Linux, OS X and the BSDs. It uses OpenMP for threading support in places where the operation can be expensive, like modular exponentiation and modular multiplication (and where concurrent operation can be performed).

Windows does not support pthreads, but modern Windows compilers do support OpenMP. So if you want portability to the non-*nix's, then OpenMP is often a good choice.

And as Mike also pointed out:

OpenMP is great if all you want to do is add a few #pragma statements and have a parallel version of your code quite quickly.

Below is an example of Crypto++ precomputing some values used in Rabin-Williams signatures using Tweaked Roots as described by Bernstein in RSA signatures and Rabin-Williams signatures...:

void InvertibleRWFunction::Precompute(unsigned int /*unused*/)
{
    ModularArithmetic modp(m_p), modq(m_q);

    #pragma omp parallel sections
    {
        #pragma omp section
            m_pre_2_9p = modp.Exponentiate(2, (9 * m_p - 11)/8);
        #pragma omp section
            m_pre_2_3q = modq.Exponentiate(2, (3 * m_q - 5)/8);
        #pragma omp section
            m_pre_q_p = modp.Exponentiate(m_q, m_p - 2);
    }
}

It fits with Mike's observation - fine grain control and synchronization was not really needed. Parallelization was used to speed up execution, and the synchronization came at no cost in the source code.

And if OpenMP is not available, the the code reduces to:

m_pre_2_9p = modp.Exponentiate(2, (9 * m_p - 11)/8);
m_pre_2_3q = modq.Exponentiate(2, (3 * m_q - 5)/8);
m_pre_q_p = modp.Exponentiate(m_q, m_p - 2);

OpenMP is ideal when you need to perform the same task in parallel (that is, on multiple data), a kind of SIMD machine (single-instruction multiple-data).

Pthreads is needed when you want to perform (quite different) tasks in parallel such as, for example, reading data in one thread and interacting with the user in another thread.

See this page:

http://berenger.eu/blog/c-cpp-openmp-vs-pthread-openmp-or-posix-thread/

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow