Can the following loop be vectorized?
-
22-09-2019 - |
Question
I have a for-loop which performs the following function:
Take a M by 8 matrix and:
- Split it into blocks of size 512 elements (meaning X by 8 of the matrix == 512, and the number of elements can be 128,256,512,1024,2048)
- Reshape the block into 1 by 512 (Number of elements) matrix.
- Take the last 1/4 of the matrix and put it in front,
e.g.Data = [Data(1,385:512),Data(1,1:384)];
The following is my code:
for i = 1 : NumOfBlock
if i == 1
Header = tempHeader(1:RowNeeded,:);
Header = reshape(Header,1,BlockSize); %BS
Header = [Header(1,385:512),Header(1,1:384)]; %CP
Data = tempData(1:RowNeeded,:);
Data = reshape(Data,1,BlockSize); %BS
Data = [Data(1,385:512),Data(1,1:384)]; %CP
start = RowNeeded + 1;
end1 = RowNeeded * 2;
else
temp = tempData(start:end1,:);
temp = reshape(temp,1,BlockSize); %BS
temp = [temp(1,385:512),temp(1,1:384)]; %CP
Data = [Data, temp];
end
if i <= 127 & i > 1
temp = tempHeader(start:end1,:);
temp = reshape(temp,1,BlockSize); %BS
temp = [temp(1,385:512),temp(1,1:384)]; %CP
Header = [Header, temp];
end
start = end1 + 1;
end1=end1 + RowNeeded;
end
Running this loop with 5 million element will take more than 1 hour. I need it to be as fast as possible (in sec). Is this loop able to be vectorized?
Solution 4
Once again I would like to thanks Amro for giving me an idea on how to solve my question. Sorry for not making myself clear in the question.
Here is my solution to my problem:
%#BS CDMA, Block size 128,512,1024,2048
BlockSize = 512;
RowNeeded = BlockSize / 8;
TotalRows = size(tempData);
TotalRows = TotalRows(1,1);
NumOfBlock = TotalRows / RowNeeded;
CPSize = BlockSize / 4;
%#spilt into blocks
Header = reshape(tempHeader',[RowNeeded,8, 128]);
Data = reshape(tempData',[RowNeeded,8, NumOfBlock]);
clear tempData tempHeader;
%#block spread & cyclic prefix
K = zeros([1,BlockSize,128],'single');
L = zeros([1,BlockSize,NumOfBlock],'single');
for i = 1:NumOfBlock
if i <= 128
K(:,:,i) = reshape(Header(:,:,i),[1,BlockSize]);
K(:,:,i) = [K((CPSize*3)+1:BlockSize),K(1:CPSize*3)];
end
L(:,:,i) = reshape(Data(:,:,i),[1,BlockSize]);
L(:,:,i) = [L((CPSize*3)+1:BlockSize),L(1:CPSize*3)];
end
OTHER TIPS
Based on your function description, here's what I came up with:
M = 320; %# M must be divisble by (numberOfElements/8)
A = rand(M,8); %# input matrix
num = 512; %# numberOfElements
rows = num/8; %# rows needed
%# equivalent to taking the last 1/4 and putting it in front
A = [A(:,7:8) A(:,1:6)];
%# break the matrix in blocks of size (x-by-8==512) into the third dimension
B = permute(reshape(A',[8 rows M/rows]),[2 1 3]);
%'# linearize everything
B = B(:);
this diagram might help in understanding the above:
Vectorizing may or may not help. What will help is knowing where the bottleneck is. Use the profiler as outlined here:
http://blogs.mathworks.com/videos/2006/10/19/profiler-to-find-code-bottlenecks/
It would be nice if you'd tell what you are trying to do (my guess is some simulation in dynamical systems, but it's hard to tell).
yes, of course it can be vectorized: each of your blocks is actually four sub blocks; using your (extremely non standard) indices:
1...128, 129...256, 257...384, 385...512
Every kernel/thread/what-ever-you-call-it of the vectorization should do the following:
i = threadIdx is between 0 and 127 temp = data[1 + i] data[1 + i] = data[385+i] data[385 + i] = data[257+i] data[257 + i] = data[129+i] data[129 + i] = temp
You should of course also parallelize on blocks, not only vectorize.