문제

This question will use scikits.cuda [1] in the Python command line, but may equivalently be attempted in pure C/CUDA (which I haven't tried).

I'm attempting to create a CUFFT plan for 1D complex-to-complex transforms that'll be applied to many inputs (so lots of batches). With a Tesla C2050, I do the following

import scikits.cuda.fft as cufft
import numpy as np
p = cufft.Plan((64*1024,), np.complex64, np.complex64, batch=100)
p = cufft.Plan((64*1024,), np.complex64, np.complex64, batch=1000)
p = cufft.Plan((64*1024,), np.complex64, np.complex64, batch=10000) # !!!

The last attempted plan raises a cufftAllocFailed exception. If I reduce the size of the transform (from 64K), I can get a batch of 10'000, but currently I need 64K-sized transforms.

My question is: is this a hard limit in CUFFT? And if so, where in the CUDA [2] or CUFFT [3] documentation are limits on transform size versus batch (versus dimension?) specified?

[1] http://scikits.appspot.com/cuda
[2] http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
[3] http://docs.nvidia.com/cuda/pdf/CUDA_CUFFT_Users_Guide.pdf

도움이 되었습니까?

해결책

There's a hard limit of roughly 2^27 elements in a plan.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top