OpenCV gpu::dft distorted image after inverse transform

Question

It took me several more hours but I have eventually solved the problem. There are two options

1) real-to-complex (CV_32FC1 -> CV_32FC2) forward and complex-to-real (CV_32FC2 -> CV_32FC1) inverse
As a result of the forward transform a narrower spectrum matrix is obtained (newWidth = oldWidth/2+1 as explained in documentation). It is not CSS compact matrix as in case of non-gpu dft. It is a complex matrix that uses the fact that frequency spectrum is symmetric. Hence any filter can also be applied here with the speed up from performing nearly half less multiplication than in the second case. In this case the following flags should be set:

forward -> 0
inverse -> DFT_INVERSE | DFT_REAL_OUTPUT | DFT_SCALE

This worked great for me. Remember to declare earlier properly the GpuMat used to their types (CV_32FC1 or CV_32FC2)

2) complex-to-complex (CV_32FC2 -> CV_32FC2) forward and complex-to-complex(CV_32FC2 -> CV_32FC2) inverse Full size spectrum (CV_32FC2) is produced in the forward DFT. In this case the flags are

forward -> 0
inverse -> DFT_INVERSE

The result of inverse transform is a complex matrix (CV_32FC2), hence you need to split it and extract the desired result from the zero channel. Later the data needs to be scaled explicitly:

Mat lenaAfter;
Mat lena = imread("C:/Users/Fundespa/Desktop/lena.jpg", CV_LOAD_IMAGE_GRAYSCALE);

lena.convertTo(lena, CV_32F, 1);

std::vector<Mat> planes;
planes.push_back(lena);
planes.push_back(Mat::zeros(lena.size(), CV_32FC1));
merge(planes, lena);

gpu::GpuMat lenaGPU = gpu::GpuMat(512, 512, CV_32FC2);
gpu::GpuMat lenaSpectrum = gpu::GpuMat(512, 512, CV_32FC2);
gpu::GpuMat lenaOut = gpu::GpuMat(512, 512, CV_32FC2);
lenaGPU.upload(lena);

gpu::dft(lenaGPU, lenaSpectrum, lenaGPU.size(), 0, stream);
int c = lenaSpectrum.channels();
Size s = lenaSpectrum.size();
gpu::dft(lenaSpectrum, lenaOut, lenaGPU.size(), DFT_INVERSE, stream);

gpu::split(lenaOut, splitter, stream);
stream.waitForCompletion();
splitter[0].download(lenaAfter);
//  lenaOut.download(lenaAfter);

c = lenaAfter.channels();

double n,x;
minMaxIdx(lenaAfter, &n, &x);

lenaAfter.convertTo(lenaAfter, CV_8U, 255.0/x);

namedWindow("lena after",1);    imshow("lena after", lenaAfter); waitKey(1000);

As simple as that! I have no idea why I didn't come across this earlier. I decided to post it any way as someone out there might have the same problem or need some guidance.