题
我有一个用于环它执行的下列功能:
采取M8矩阵,并且:
- 它分割成块的大小512元(含义X8矩阵==512,而元件的数目可128,256,512,1024,2048)
- 改造的框入1 512(数量元素)的矩阵。
- 采取的最后1/4的矩阵,把它放在前面,
例如Data = [Data(1,385:512),Data(1,1:384)];
以下是我的代码:
for i = 1 : NumOfBlock
if i == 1
Header = tempHeader(1:RowNeeded,:);
Header = reshape(Header,1,BlockSize); %BS
Header = [Header(1,385:512),Header(1,1:384)]; %CP
Data = tempData(1:RowNeeded,:);
Data = reshape(Data,1,BlockSize); %BS
Data = [Data(1,385:512),Data(1,1:384)]; %CP
start = RowNeeded + 1;
end1 = RowNeeded * 2;
else
temp = tempData(start:end1,:);
temp = reshape(temp,1,BlockSize); %BS
temp = [temp(1,385:512),temp(1,1:384)]; %CP
Data = [Data, temp];
end
if i <= 127 & i > 1
temp = tempHeader(start:end1,:);
temp = reshape(temp,1,BlockSize); %BS
temp = [temp(1,385:512),temp(1,1:384)]; %CP
Header = [Header, temp];
end
start = end1 + 1;
end1=end1 + RowNeeded;
end
运行这种循环有5万元将超过1小时。我需要尽可能快(在sec)。这是循环能够量化?
解决方案 4
再一次我要谢谢Amro给我一个想法就如何解决我的问题。对不起对不会让自己清楚在这个问题。
这是我解决我的问题:
%#BS CDMA, Block size 128,512,1024,2048
BlockSize = 512;
RowNeeded = BlockSize / 8;
TotalRows = size(tempData);
TotalRows = TotalRows(1,1);
NumOfBlock = TotalRows / RowNeeded;
CPSize = BlockSize / 4;
%#spilt into blocks
Header = reshape(tempHeader',[RowNeeded,8, 128]);
Data = reshape(tempData',[RowNeeded,8, NumOfBlock]);
clear tempData tempHeader;
%#block spread & cyclic prefix
K = zeros([1,BlockSize,128],'single');
L = zeros([1,BlockSize,NumOfBlock],'single');
for i = 1:NumOfBlock
if i <= 128
K(:,:,i) = reshape(Header(:,:,i),[1,BlockSize]);
K(:,:,i) = [K((CPSize*3)+1:BlockSize),K(1:CPSize*3)];
end
L(:,:,i) = reshape(Data(:,:,i),[1,BlockSize]);
L(:,:,i) = [L((CPSize*3)+1:BlockSize),L(1:CPSize*3)];
end
其他提示
根据你的描述,这就是我想出了:
M = 320; %# M must be divisble by (numberOfElements/8)
A = rand(M,8); %# input matrix
num = 512; %# numberOfElements
rows = num/8; %# rows needed
%# equivalent to taking the last 1/4 and putting it in front
A = [A(:,7:8) A(:,1:6)];
%# break the matrix in blocks of size (x-by-8==512) into the third dimension
B = permute(reshape(A',[8 rows M/rows]),[2 1 3]);
%'# linearize everything
B = B(:);
该图可能有助于了解上述:
矢量可能或不可能的帮助。什么会有帮助的是知道 哪里 瓶颈。使用探查所概述的在这里:
http://blogs.mathworks.com/videos/2006/10/19/profiler-to-find-code-bottlenecks/
这将是好的,如果你会告诉你们试图要做到(我的猜测是,一些模拟在动态系统,但是很难告诉).
是的,当然它可以量化:你的每一个街区实际上是四个子块;用你的(非常非标准)指数:
1...128, 129...256, 257...384, 385...512
每一个内线是什么-没有-你叫它矢量化应该做到以下几点:
i=threadIdx之间0和127 temp=data[1+i] 数据[1+i]=data[385+i] 数据[385+i]=data[257+i] 数据[257+i]=data[129+i] 数据[129+i]=温度
你应该当然还并行块,不仅矢量.
不隶属于 StackOverflow