Domanda

By default, all built-in functions for computing correlation or covariance return a matrix. I am trying to write an efficient function that will compute the correlation between a seed region and various other regions, but I do not need the correlations between the other regions. I assume that computing the full correlation matrix would therefore be inefficient.

I could instead compute a the correlation matrix between each region and the seed region, choose one of the off diagonal points and store it, but I feel like looping in this situation is also inefficient.

To be more concrete, each point in my 3-dimensional space has a time dimension. I am attempting to compute the mean correlation between a given point and all points in space within a given radius. I want to repeat this procedure hundreds of thousands of times, for many different radius lengths, and so on, so I would like for this to be as efficient as possible.

So, what is the best way to compute the correlation between a single vector and several others, without computing correlations that I will just ignore?

Thank you, Chris

EDIT: Here is my code now...

function [corrMap] = TIME_meanCorrMap(A,radius)
% Even though the variable is "radius", we work with cubes for simplicity...
% So, the radius is the distance (in voxels) from the center of the cube an edge.
denom = ((radius*2)^3)-1;
dim = size(A);
corrMap = zeros(dim(1:3));
for x = radius+1:dim(1)-radius
    rx = [x-radius : x+radius];
    for y = radius+1:dim(2)-radius
        ry = [y-radius : y+radius];
        for z = radius+1:dim(3)-radius
            rz = [z-radius : z+radius];
            corrCoefs = zeros(1,denom);
            seed = A(x,y,z,:);
            i=0;
            for xx = rx
                for yy = ry
                    for zz = rz
                        if ~all([x y z] == [xx yy zz])
                            i = i + 1;
                            temp = corrcoef(seed,A(xx,yy,zz,:));
                            corrCoeffs(i) = temp(1,2);
                        end
                    end
                end
            end
            corrMap = mean(corrCoeffs);
        end
    end
end

EDIT: Here are some more times to supplement the accepted answer. Using bsxfun() to do normalization, and matrix multiplication to compute correlations:

tic; for i=1:10000                                                                
    x=rand(100);
    xz = bsxfun(@rdivide,bsxfun(@minus,x,mean(x)),std(x));
    cc = xz(:,2:end)' * xz(:,1) ./ 99;
end; toc
Elapsed time is 6.928251 seconds.

Using zscore() to normalize, matrix multiplication to compute correlations:

tic; for i=1:10000                                    
    x=rand(100);                                          
    xz = zscore(x);                                       
    cc = xz(:,2:end)' * xz(:,1) ./ 99;
end; toc
Elapsed time is 7.040677 seconds.

Using bsxfun() to normalize, and corr() to compute correlations.

tic; for i=1:10000                                    
    x=rand(100);
    xz = bsxfun(@rdivide,bsxfun(@minus,x,mean(x)),std(x));
    cc = corr(x(:,1),x(:,2:end));
end; toc
Elapsed time is 11.385707 seconds.
È stato utile?

Soluzione

It is certainly possible to improve upon the for loop that you are currently employing. The correlation compuattions can be parallelized using matrix multiplications if you have sufficient RAM. However, it will require you to unwrap your 4-dimensional data matrix A into a different shape. most likely you are dealing with 3-dimensional voxelwise fMRI data, in which case you'll have to reshape from [x y z time] matrix to an [index time] matrix. I will assume you can deal with that reshaping. Once you have your seed timecourse [Time by 1] and your target timecourses [Time by NumTargets] ready, you can perform some much more efficient computations.

A quick way to efficiently compute the desired correlation is using the corr function in MATLAB. This function will accept 2 matrix arguments and it will quite efficiently compute all pairwise correlations between the columns of argument 1 and the columns of argument 2, e.g.

T = 200; %time samples
N = 20;  %number of other voxels

seed = randn(T,1);     %data from seed voxel
targets = randn(T,N);  %data from target voxels

%here is the for loop method
tic
for n = 1:N
   tmp = corrcoef(seed, targets(:,n));
   tmpcc = tmp(1,2);
end
looptime = toc;

%here is the parallel method
tic
cc = corr(seed, targets);
matrixtime = toc;

On my machine, the parallel operation in corr is faster than the loop method by a factor proportional to T*N.

It is possible to go a little faster than the corr function if you are willing to perofrm the underlying matrix operations yourself, and in any case it is worth knowing what they are. The correlation between two vectors is basically a normalized dot product, so using the conventions above you can compute the correlations in the following way

zseed = zscore(seed);  %normalize the seed timecourse by z-scoring
ztargets= zscore(targets);  %normalize the target timecourses by z-scoring
ztargets = ztargets';      %flip columns and rows for convenience
cc2 = ztargets*zseed./(T-1);    %compute many dot products with one matrix multiplication

The code above is basically what the corr function will do which is why it is much faster than the loop. Note that most of the operation time is in the zscore operations, and you can improve on the performance of the corr function if you efficiently compute the zscore using the bsxfun command. For now, I hope this gives you some direction on how to compute a correlation between a seed timecourse and many target timecourses without having to loop through and compute each one separately.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top