문제

Currently I am doing some image processing algorithms using OpenCL. Basically my algorithm requires to solve a linear system of equations for each pixel. Each system is independent of others, so going for a parallel implementation is natural.

I have looked at several BLAS packages such as ViennaCL and AMD APPML, but it seems all of them have the same use pattern (host calling BLAS subroutines to be executed on CL device).

What I need is a BLAS library that could be called inside an OpenCL kernel so that I can solve many linear systems in parallel.

I found this similar question on the AMD forums.

Thanks

도움이 되었습니까?

해결책

Its not possible. clBLAS routines make a series of kernel launches, some 'solve' routine kernel launches are really complicated. clBLAS routines take cl_mem and commandQueues as args. So if your buffer is already on device, clBLAS will directly act on that. It doesn't accept host buffer or manage host->device transfers

If you want to have a look at what kernel are generated and launched, uncomment this line https://github.com/clMathLibraries/clBLAS/blob/master/src/library/blas/generic/common.c#L461 and build clBLAS. It will dump all kernels being called

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top