improve growing cell array in matlab

https://stackoverflow.com/questions/23354227

11-07-2023
|

Question

I would like to know how to efficiently run this part of my progam. Basically for a background of what I aim to do, I want to cluster points. There are around 10,000 points. These points have computed "forces" between them that is stored in matrix F. So the force between point 1 and point 2 is F(1,2). I would then like to "cluster points" with sufficient F acting on them (Force setting/threshold), that is 2 points with sufficient F between them belong to the same cluster.

I have a code as seen below. A cell array CLUSTER was made to contain the cluster assignments. So CLUSTER{i} is the cell containing the ith cluster with clustered points in it.

However, for some F settings, the implementation takes forever. I have read about preallocation and parfor (parfor can't be done since there is dependency in iterations). But does preallocation of cell arrays mean that individual cells are not preallocated with memory? Is there any other way around this? Profiling tells me that ismember has the biggest share in computing. I hope to improve the code with your suggestions. Thanks a lot!

CLUSTER = {};

for fi = 1:srow 
    for fj = 1:scol 
        if fj > fi % to eliminate redundancy, diagonal mirror elements of F !check on this 
            if F(fi,fj) >= 2000 % Force setting    
                if( (~ismember(1,cellfun(@(x)ismember(fi,x),CLUSTER))) && (~ismember(1,cellfun(@(x)ismember(fj,x),CLUSTER))) ) % fi & fj are not in CLUSTER
                    CLUSTER{end+1} = [fi fj];
                    end
                %%if( (ismember(1,cellfun(@(x)ismember(fi,x),CLUSTER))) && (ismember(1,cellfun(@(x)ismember(fj,x),CLUSTER))) ) % fi & fj are in CLUSTER   
                    %do nothing since lfi and fj are in CLUSTER    
                %%end    
                if( (ismember(1,cellfun(@(x)ismember(fi,x),CLUSTER))) && (~ismember(1,cellfun(@(x)ismember(fj,x),CLUSTER))) ) % fi in CLUSTER, fj not in CLUSTER
                    c = find(cellfun(@(x)ismember(fi,x),CLUSTER));
                    CLUSTER{c} = [CLUSTER{c} fj];
                end        
                if( (~ismember(1,cellfun(@(x)ismember(fi,x),CLUSTER))) && (ismember(1,cellfun(@(x)ismember(fj,x),CLUSTER))) ) % fi not in CLUSTER, fj in CLUSTER
                    c = find(cellfun(@(x)ismember(fj,x),CLUSTER));
                    CLUSTER{c} = [CLUSTER{c} fi];
                end
            end
        end
    end
end

Solution

I am not sure if this is the best way, but my thought is that this should be possible to do by some kind of update algorithm. Begin with looking at node 1. Find all the nodes clustered with node 1 as,

clusteredNbr = find(F(1,:)>=2000); (a)

This will give you the node number for all nodes that are clustered with node 1. Then find all nodes that is clustered with clusteredNbr, by repeting (a) with for all new nodes. These nodes can then be added to the same vector as the old nodes in clusteredNbr. You may have some duplicates here, but they can be removed later. Then check for new unique nodes in this resulting vector. If there are any, repeat (a) for these. Continue until you find all nodes in the first cluster. Then you know that any of the nodes in the first cluster is not clustered with the others.

Repeat this process for the next cluster and so on. The gain with this is that you use find instead of ismember and that the operation can be vectorized, which allows you to run only one for loop.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow