converting gsl linear algebra for use in scalapack or other parallel matrix library

https://stackoverflow.com/questions/9386113

28-10-2019
|

문제

I have a code deeply embedded with GNU Scientific Library (GSL) matrix arithmetic, the main computation of this code is solving a large system of linear equations that takes a very long time in serial and with GSL and BLAS functions, is there a way to parallelize this computation or convert it for use in an already parallel library like ScaLAPACK?

해결책

If your matrix is sparse, i.e. it contains a lot of zero entries, then you can easily implement many sparse matrix algebra packages without too much trouble. Unfortunately this will require you to store your matrices in sparse format which, to my knowledge, gsl does not do. Once you have your matrix stored in some sparse format, you should be able to handle large systems without too much trouble, even in serial applications.

I suggest using UMFPACK because it requires the least amount of work to implement as it doesn't require you to put your data into their structures.

A note on paralleism: If your code is currently serial, going to a parallel solver is NOT trivial. It is possible that it may be simple to implement a multi-threaded package, but I don't have much experience with threaded programs. Additionally, truly parallel (distributed memory) direct solvers are not all that efficient, since each processor needs its own copy of the full matrix, and it is better to use iterative methods.

A little more detail would be helpful: How long is a long time? Do you need the inverse for some reason, or are you just solving a system of equations?

다른 팁

Have you tried Intel MKL? It includes its own parallel versions of blas functions. Last time I tried, they're pretty darn fast. But it would also be easier to answer if you'd give info on size of matrix, but as long as you're running x64, many CPUs/cores and with much RAM, well, it doesn't really matter then.

Another option is nVidia CUDA. Their interface is similar to blas, but it's actually slower than MKL, still faster than serial. Might be that I tried it on an old card, but you need at least 200 GPU stream units to call it useful.

EDIT: Matrices of those sizes are beyond my experience.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow