How can I create combinations of several lists without hardcoding loops?

https://stackoverflow.com/questions/1442965

22-07-2019
|

Question

I have data that looks like this:

    my @homopol = (
                   ["T","C","CC","G"],  # part1
                   ["T","TT","C","G","A"], #part2
                   ["C","CCC","G"], #part3 ...upto part K=~50
                  );


    my @prob = ([1.00,0.63,0.002,1.00,0.83],
                [0.72,0.03,1.00, 0.85,1.00],
                [1.00,0.97,0.02]);


   # Note also that the dimension of @homopol is always exactly the same with @prob.
   # Although number of elements can differ from 'part' to 'part'.

What I want to do is to

Generate all combinations of elements in part1 through out partK
Find the product of the corresponding elements in @prob.

Hence at the end we hope to get this output:

T-T-C  1 x 0.72 x 1 = 0.720
T-T-CCC     1 x 0.72 x 0.97 = 0.698
T-T-G  1 x 0.72 x 0.02 = 0.014
...
G-G-G  1 x 0.85 x 0.02 = 0.017
G-A-C  1 x 1 x 1 = 1.000
G-A-CCC     1 x 1 x 0.97 = 0.970
G-A-G  1 x 1 x 0.02 = 0.020

The problem is that the following code of mine does that by hardcoding the loops. Since the number of parts of @homopol is can be varied and large (e.g. ~K=50), we need a flexible and compact way to get the same result. Is there any? I was thinking to use Algorithm::Loops, but not sure how to achieve that.

use strict;
use Data::Dumper;
use Carp;


my @homopol = (["T","C","CC","G"],
               ["T","TT","C","G","A"],
               ["C","CCC","G"]);


my @prob = ([1.00,0.63,0.002,1.00,0.83],
            [0.72,0.03,1.00, 0.85,1.00],
            [1.00,0.97,0.02]);



my $i_of_part1 = -1;
foreach my $base_part1 ( @{ $homopol[0] } ) {
    $i_of_part1++;
    my $probpart1 = $prob[0]->[$i_of_part1];

    my $i_of_part2 =-1;
    foreach my $base_part2 ( @{ $homopol[1] } ) {
        $i_of_part2++;
        my $probpart2 = $prob[1]->[$i_of_part2];

        my $i_of_part3 = -1;
        foreach my $base_part3 ( @{ $homopol[2] } ) {
            $i_of_part3++;
            my $probpart3 = $prob[2]->[$i_of_part3];

            my $nstr = $base_part1."".$base_part2."".$base_part3;
            my $prob_prod = sprintf("%.3f",$probpart1 * $probpart2 *$probpart3);

            print "$base_part1-$base_part2-$base_part3 \t";
            print "$probpart1 x $probpart2 x $probpart3 = $prob_prod\n";

        }
    }
}

Solution

I would recommend Set::CrossProduct, which will create an iterator to yield the cross product of all of your sets. Because it uses an iterator, it does not need to generate every combination in advance; rather, it yields each one on demand.

use strict;
use warnings;
use Set::CrossProduct;

my @homopol = (
    [qw(T C CC G)],
    [qw(T TT C G A)],
    [qw(C CCC G)], 
);

my @prob = (
    [1.00,0.63,0.002,1.00],
    [0.72,0.03,1.00, 0.85,1.00],
    [1.00,0.97,0.02],
);

# Prepare by storing the data in a list of lists of pairs.
my @combined;
for my $i (0 .. $#homopol){
    push @combined, [];
    push @{$combined[-1]}, [$homopol[$i][$_], $prob[$i][$_]]
        for 0 .. @{$homopol[$i]} - 1;
};

my $iterator = Set::CrossProduct->new([ @combined ]);
while( my $tuple = $iterator->get ){
    my @h = map { $_->[0] } @$tuple;
    my @p = map { $_->[1] } @$tuple;
    my $product = 1;
    $product *= $_ for @p;
    print join('-', @h), ' ', join(' x ', @p), ' = ', $product, "\n";
}

OTHER TIPS

A solution using Algorithm::Loops without changing the input data would look something like:

use Algorithm::Loops;

# Turns ([a, b, c], [d, e], ...) into ([0, 1, 2], [0, 1], ...)
my @lists_of_indices = map { [ 0 .. @$_ ] } @homopol;

NestedLoops( [ @lists_of_indices ], sub {
  my @indices = @_;
  my $prob_prod = 1; # Multiplicative identity
  my @base_string;
  my @prob_string;
  for my $n (0 .. $#indices) {
    push @base_string, $hompol[$n][ $indices[$n] ];
    push @prob_string, sprintf("%.3f", $prob[$n][ $indices[$n] ]);
    $prob_prod *= $prob[$n][ $indices[$n] ];
  }
  print join "-", @base_string; print "\t";
  print join "x", @prob_string; print " = ";
  printf "%.3f\n", $prob_prod;
});

But I think that you could actually make the code clearer by changing the structure to one more like

[ 
  { T => 1.00, C => 0.63, CC => 0.002, G => 0.83 },
  { T => 0.72, TT => 0.03, ... },
  ...
]

because without the parallel data structures you can simply iterate over the available base sequences, instead of iterating over indices and then looking up those indices in two different places.

Why don't you use recursion? Pass the depth as a parameter and let the function call itself with depth+1 inside the loop.

you could do it by creating an array of indicies the same length as the @homopol array (N say), to keep track of which combination you are looking at. In fact this array is just like a number in base N, with the elements being the digits. Iterate in the same way as you would write down consectutive numbers in base N, e.g (0 0 0 ... 0), (0 0 0 ... 1), ...,(0 0 0 ... N-1), (0 0 0 ... 1 0), ....

Approach 1: Calculation from indices

Compute the product of lengths in homopol (length1 * length2 * ... * lengthN). Then, iterate i from zero to the product. Now, the indices you want are i % length1, (i / length1)%length2, (i / length1 / length2) % length3, ...

Approach 2: Recursion

I got beaten to it, see nikie's answer. :-)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow