How to programmatically generate a dataset object from the Cartesian product (aka "cross-join") of multiple one-dimensional cell arrays?

StackOverflow https://stackoverflow.com/questions/21355790

Domanda

I have n cell arrays c1,c2,…,cn, having dimensions L1 × 1,L2 × 1,…, Ln × 1, respectively. (FWIW, each cell array contains elements of a unique class, but this class may not be the same for all the arrays.)

I want to produce a dataset object representing the Cartesian product (aka "cross-join") of these n cell arrays.

I'm looking for a programmatic way to do this that will work for any n.

To be clear about what I mean by "Cartesian product" (or "cross-join"): I want to produce a dataset object containing n columns and L1 × L2 × … ×Ln rows, one row for each possible combination of an entry from c1, an entry from c2, …, an entry from cn - 1, and an entry from cn. (It's OK to assume that none of c1,c2,…,cn contains duplicate entries. IOW, one may assume that every ci is equal to unique(ci).)


An example where n = 3 is given below; the desired result is the dataset object factors. (Of course, the names of factors's columns represent an additional parameter. Also, in this example, all the cell arrays contain strings, but, as already mentioned, in general, the different arrays will contain entries of different classes.)

>> c1

c1 = 

    'even'
    'odd'

>> c2

c2 = 

    'green'
    'red'
    'yellow'

>> c3

c3 = 

    'clubs'
    'diamonds'
    'hearts'
    'spades'

>> factors

factors = 

    Parity        TrafficLight    Suit          
    'even'        'red'           'spades'      
    'even'        'red'           'hearts'      
    'even'        'red'           'diamonds'    
    'even'        'red'           'clubs'       
    'even'        'yellow'        'spades'      
    'even'        'yellow'        'hearts'      
    'even'        'yellow'        'diamonds'    
    'even'        'yellow'        'clubs'       
    'even'        'green'         'spades'      
    'even'        'green'         'hearts'      
    'even'        'green'         'diamonds'    
    'even'        'green'         'clubs'       
    'odd'         'red'           'spades'      
    'odd'         'red'           'hearts'      
    'odd'         'red'           'diamonds'    
    'odd'         'red'           'clubs'       
    'odd'         'yellow'        'spades'      
    'odd'         'yellow'        'hearts'      
    'odd'         'yellow'        'diamonds'    
    'odd'         'yellow'        'clubs'       
    'odd'         'green'         'spades'      
    'odd'         'green'         'hearts'      
    'odd'         'green'         'diamonds'    
    'odd'         'green'         'clubs'       
È stato utile?

Soluzione

This works for

  • arbitrary number of cell arrays, n;
  • arbitrary size of each cell array;
  • arbitrary type of each cell's contents.

It makes use of cellfun, arrayfun and comma-separated lists. The Cartesian product is computed on indices (not on actual elements) using ndgrid, with fliplr to yield the order you want (first column varies slowest, last column varies fastest).

The result is given as a cell array with n columns. If you need it in the form of a dataset, define appropriate names and use cell2dataset to convert.

c1 = {'even','odd'}; %// example data
c2 = {'green','red','yellow'};
c3 = {'clubs','diamonds','hearts','spades'};
sets = {c1, c2, c3}; %// can have an arbirary number of c's

num = numel(sets);
nums = cellfun(@(c) numel(c), sets);
inds = cell(1,num);
vec = fliplr(arrayfun(@(n) 1:n, nums, 'uni', 0));
[inds{:}] = ndgrid(vec{:});
inds = fliplr(inds);
factors = arrayfun(@(n) {sets{n}{inds{n}}},1:num, 'uni', 0);
factors = cat(1, factors{:}).';

Result:

>> factors
factors = 
    'even'    'green'     'clubs'   
    'even'    'green'     'diamonds'
    'even'    'green'     'hearts'  
    'even'    'green'     'spades'  
    'even'    'red'       'clubs'   
    'even'    'red'       'diamonds'
    'even'    'red'       'hearts'  
    'even'    'red'       'spades'  
    'even'    'yellow'    'clubs'   
    'even'    'yellow'    'diamonds'
    'even'    'yellow'    'hearts'  
    'even'    'yellow'    'spades'  
    'odd'     'green'     'clubs'   
    'odd'     'green'     'diamonds'
    'odd'     'green'     'hearts'  
    'odd'     'green'     'spades'  
    'odd'     'red'       'clubs'   
    'odd'     'red'       'diamonds'
    'odd'     'red'       'hearts'  
    'odd'     'red'       'spades'  
    'odd'     'yellow'    'clubs'   
    'odd'     'yellow'    'diamonds'
    'odd'     'yellow'    'hearts'  
    'odd'     'yellow'    'spades' 

Altri suggerimenti

This was fun to think about - here's what I came up with:

function product = setjoin(sets, names)
product = {};
nrows = 1;
for curset=sets(:)'
    curset = curset{1}(:);
    n = length(curset);
    setidx = repmat(1:n, nrows, 1)(:);
    product = [repmat(product, n, 1) curset(setidx)];
    nrows = nrows * n;
end
product = cell2dataset([names(:)'; product]);
end

where sets is a cell array of cell arrays {c1, c2,..., cn} and names is a cell array of strings. As is it's a bit hacky - this method of coercing things into row/column vectors where required is concise but isn't necessarily obvious, especially in generating setidx - but hopefully it gives you an idea to build upon.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top