Is it possible to call cufft library calls in device function?

Question 1

Despite the introduction of dynamic parallelism on Kepler (cc 3.5) cards, cuFFT remains a host API and there is currently no way of creating or executing FFT operations in device code using cuFFT.

Question 2

there is NO way to call the APIs from the GPU kernel. You must call them from the host. If you want to run a FFT without passing from DEVICE -> HOST -> DEVICE to continue your elaboration, the only solution is to write a kernel that performs the FFT in a device function. Actually I'm doing this because I need to run more FFTs in parallel without passing again the datas to the HOST. If you find/have another solution let me know. There are a lot of example on the web to how to achieve this: -https://hackage.haskell.org/package/pure-fft-0.2.0/docs/Numeric-FFT.html

Question 3

I already answered this in the duplicate thread: Is there a method of FFT that will run inside CUDA Kernel?. In short, since CUDA 11.0, there is cuFFTDx (Device Extensions), which allows you to do exactly that.

Link to my answer there: https://stackoverflow.com/a/72403181/6924585.