Can the following loop be vectorized?

https://stackoverflow.com/questions/2311227

22-09-2019
|

Question

I have a for-loop which performs the following function:

Take a M by 8 matrix and:

Split it into blocks of size 512 elements (meaning X by 8 of the matrix == 512, and the number of elements can be 128,256,512,1024,2048)
Reshape the block into 1 by 512 (Number of elements) matrix.
Take the last 1/4 of the matrix and put it in front,
e.g. Data = [Data(1,385:512),Data(1,1:384)];

The following is my code:

for i = 1 : NumOfBlock  
    if i == 1  
        Header = tempHeader(1:RowNeeded,:);  
        Header = reshape(Header,1,BlockSize); %BS  
        Header = [Header(1,385:512),Header(1,1:384)]; %CP  
        Data = tempData(1:RowNeeded,:);  
        Data = reshape(Data,1,BlockSize); %BS  
        Data = [Data(1,385:512),Data(1,1:384)]; %CP  
        start = RowNeeded + 1;  
        end1 = RowNeeded * 2;  
    else  
        temp = tempData(start:end1,:);  
        temp = reshape(temp,1,BlockSize); %BS  
        temp = [temp(1,385:512),temp(1,1:384)]; %CP  
        Data = [Data, temp];  
    end

    if i <= 127 & i > 1
        temp = tempHeader(start:end1,:);
        temp = reshape(temp,1,BlockSize); %BS
        temp = [temp(1,385:512),temp(1,1:384)]; %CP
        Header = [Header, temp];
    end

    start = end1 + 1;
    end1=end1 + RowNeeded;  
end

Running this loop with 5 million element will take more than 1 hour. I need it to be as fast as possible (in sec). Is this loop able to be vectorized?

Solution 4

Once again I would like to thanks Amro for giving me an idea on how to solve my question. Sorry for not making myself clear in the question.

Here is my solution to my problem:

%#BS CDMA, Block size 128,512,1024,2048  
  BlockSize = 512;  
  RowNeeded = BlockSize / 8;  
  TotalRows = size(tempData);  
  TotalRows = TotalRows(1,1);  
  NumOfBlock = TotalRows / RowNeeded;  
  CPSize = BlockSize / 4;  

%#spilt into blocks  
  Header = reshape(tempHeader',[RowNeeded,8, 128]);  
  Data = reshape(tempData',[RowNeeded,8, NumOfBlock]);  
  clear tempData tempHeader;  

%#block spread & cyclic prefix  
    K = zeros([1,BlockSize,128],'single');  
    L = zeros([1,BlockSize,NumOfBlock],'single');  
       for i = 1:NumOfBlock  
           if i <= 128  
              K(:,:,i) = reshape(Header(:,:,i),[1,BlockSize]);  
              K(:,:,i) = [K((CPSize*3)+1:BlockSize),K(1:CPSize*3)];
           end  
           L(:,:,i) = reshape(Data(:,:,i),[1,BlockSize]);  
           L(:,:,i) = [L((CPSize*3)+1:BlockSize),L(1:CPSize*3)];
        end

OTHER TIPS

Based on your function description, here's what I came up with:

M = 320;           %# M must be divisble by (numberOfElements/8)
A = rand(M,8);     %# input matrix

num = 512;         %# numberOfElements
rows = num/8;      %# rows needed

%# equivalent to taking the last 1/4 and putting it in front
A = [A(:,7:8) A(:,1:6)];

%# break the matrix in blocks of size (x-by-8==512) into the third dimension
B = permute(reshape(A',[8 rows M/rows]),[2 1 3]);

%'# linearize everything
B = B(:);

this diagram might help in understanding the above:

Vectorizing may or may not help. What will help is knowing where the bottleneck is. Use the profiler as outlined here:

http://blogs.mathworks.com/videos/2006/10/19/profiler-to-find-code-bottlenecks/

It would be nice if you'd tell what you are trying to do (my guess is some simulation in dynamical systems, but it's hard to tell).

yes, of course it can be vectorized: each of your blocks is actually four sub blocks; using your (extremely non standard) indices:

1...128, 129...256, 257...384, 385...512

Every kernel/thread/what-ever-you-call-it of the vectorization should do the following:

i = threadIdx is between 0 and 127 temp = data[1 + i] data[1 + i] = data[385+i] data[385 + i] = data[257+i] data[257 + i] = data[129+i] data[129 + i] = temp

You should of course also parallelize on blocks, not only vectorize.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow