Question

What is the best way to utilize OpenMP with a matrix-vector product? Would the for directive suffice (if so, where should I place it? I assume outer loop would be more efficient) or would I need schedule, etc..?

Also, how would I take advantage different algorithms to attempt this m-v product most efficiently?

Thanks

Was it helpful?

Solution

The first step you should take is the obvious one, wrap the outermost loop in a parallel for directive. As you assume. It's always worth experimenting a bit to get some evidence to support your (and my) assumptions, but if you were only allowed to make 1 change that would be the one to make.

I don't know much about cache-oblivious algorithms but I understand that they, generally, work by recursive division of a problem into sub-problems. This doesn't seem to fit with the application of parallel for directives. I suspect you could implement such an algorithm with OpenMP's tasks, but I suspect that the overhead of doing this would outweigh any execution improvements on any m-v product of reasonable dimensions.

(If you demonstrate the falsity of this argument on m-v products of size N I will retort 'N's not a reasonable dimension'. As ever with these performance questions, evidence trumps argument every time.)

Finally, depending on your compiler and the availability of libraries, you may not need to use OpenMP for m-v calculations, you might find auto-parallelisation works efficiently, or already have a library implementation which multi-threads this sort of computation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top