문제

I have an array in Matlab. I numbered every entry in array with natural number. So I formed equivalence relation in array.

For example,

array   = [1 2 3 5 6 7]
classes = [1 2 1 1 3 3].

I want to get cell array: i-th cell array's position is connected with i-th entry of initial array and shows, which elements are in the one class with this entry. For the example above, I would get:

{[1 3 5], [2], [1 3 5], [1 3 5], [6 7], [6 7]}

It can be done easily with for-loop, but is there any other solution? It will be good if it works faster than O(n^2), where n is the size of initial array.

Edit. Problem will be solved, if I know the approach to split sorted array into cells with indeces of equal elements by O(n).

array  = [1 1 1 2 3 3]
groups = {[1 2 3], [4], [5 6]}
도움이 되었습니까?

해결책 2

I don't know how to do this without for-loops entirely, but you can use a combination of sort, diff, and find to organize and partition the equivalence class identifiers. That'll give you a mostly vectorized solution, where the M-code level for-loop is O(n) where n is the number of classes, not the length of the whole input array. This should be pretty fast in practice.

Here's a rough example using some index munging. Be careful; there's probably an off-by-one edge case bug in there somewhere since I just banged this out.

function [eqVals,eqIx] = equivsets(a,x)
%EQUIVSETS Find indexes of equivalent values

[b,ix] = sort(x);
ixEdges = find(diff(b)); % identifies partitions between equiv classes
ix2 = [0 ixEdges numel(ix)];
eqVals = cell([1 numel(ix2)-1]);
eqIx = cell([1 numel(ix2)-1]);
% Map back to original input indexes and values
for i = 1:numel(ix2)-1
    eqIx{i} = ix((ix2(i)+1):ix2(i+1));
    eqVals{i} = a(eqIx{i}); 
end

I included the indexes in the output because they're often more useful than the values themselves. You'd call it like this.

% Get indexes of occurrences of each class
equivs = equivsets(array, classes)
% You can expand that to get equivalences for each input element
equivsByValue = equivs(classes)

It's a lot more efficient to build the lists for each class first and then expand them out to match the input indexes. Not only do you have to do the work just once, but when you use the b = a(ix) to expand a small cell array to a larger one, Matlab's copy-on-write optimization will end up reusing the memory for the underlying numeric mxArrays so you get a more compact representation in memory.

This transformation pops up a lot when working with unique() or databases. For decision support systems and data warehouse style things I've worked with, it happens all over the place. I wish it were built in to Matlab. (And maybe it's been added to one of the db or timeseries toolboxes in recent years; I'm a few versions behind.)

Realistically, if performance of this is critical for your code, you might also look at dropping down to Java or C MEX functions and implementing it there. But if your data sets are low cardinality - that is, have a small number of classes/distinct values, like numel(unique(classes)) / numel(array) tends to be less than 0.1 or so - the M-code implementation will probably be just fine.

다른 팁

Not sure about complexity, but accumarray with cell output is useful for splitting up the array based on unique values of the classes:

data = sortrows([classes; array].',1) %' stable w.r.t. array
arrayPieces = accumarray(data(:,1),data(:,2)',[],@(x){x.'})
classElements = arrayPieces(classes).'

Regarding sorted array splitting into cells of indeces:

>> array  = [1 1 1 2 3 3]
>> arrayinds = accumarray(array',1:numel(array),[],@(x){x'})' %' transpose for rows
arrayinds = 
    [1x3 double]    [4]    [1x2 double]
>> arrayinds{:}
ans =
     1     2     3
ans =
     4
ans =
     5     6

For the second question:

array  = [1 1 1 2 3 3]; %// example data
  1. Use diff to find the end of each run of equal values, and from that build the groups:

    ind = [0 find(diff([array NaN])~=0)];
    groups = arrayfun(@(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
    
  2. Same approach using unique:

    [~, ind] = unique(array);
    ind = [0 ind];
    groups = arrayfun(@(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
    

I haven't tested if the complexity is O(n), though.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top