Question

I´m looking for an efficient algorithm that will find reverse cartesian products.

Mathematically, given $S \subseteq T^n$, I want to express $S$ as a union of sets $A_{i,1} \times A_{i,2} \times \dots \times A_{i,n}$, for $i=1,2,\dots,k$; and I want to minimize the number $k$. Is there an efficient algorithm to solve this problem?

This may be easier to explain by example. Calculating the cartesian product (cross product) is trivial, e.g. {1,2,3,4,5} x {1,2,3,4,5} x {1,2,3,4,5} results in:

    1 1 1
    1 1 2
    1 1 3
    .
    .
    .
    5 5 3
    5 5 4
    5 5 5

All 125 (5 x 5 x 5) rows are created.

Now let's say you remove some rows from this result, i.e. you are left with these 14 rows as a starting point:

    1 2 3
    1 2 4
    1 2 5
    5 2 3
    5 2 4
    2 2 5
    3 2 5
    1 3 3
    1 3 4
    1 3 5
    5 3 3
    5 3 4
    2 3 5
    3 3 5

Now, how do I find a cartesian product that if calculated can generate only those 14 rows back? (order of rows is irrelevant). It can no longer be represented by only one cartesian product {1,2,3,4,5} x {1,2,3,4,5} x {1,2,3,4,5} since e.g. 5 5 5 is not in the list.

One way to now represent the rows as cartesian products is this answer:

    {1} x {2,3} x {3,4,5}
    union
    {5} x {2,3} x {3,4}
    union
    {2,3} x {2,3} x {5}

But the above merge of rows is not optimal although it is a valid solution, we can do better and write it in only two cartesian products:

    {1,5} x {2,3} x {3,4}
    union
    {1,2,3} x {2,3} x {5}

This result is better since 2 < 3 and I want to find the smallest possible number of cartesian products that still represents all the rows. There is no way to reduce (merge) this any further given the 14 rows so this is the best answer in this case.

Now if the starting point would have been all the 125 rows (as in example above) the optimal answer would have been one: {1,2,3,4,5} x {1,2,3,4,5} x {1,2,3,4,5}.

Some known facts about the input:

  • All rows are unique, no duplicates exists in the list.
  • Row are of same length, fixed with either 3,4,5,6,7,8 in length.
  • Numbers in a row range from 1 to 20
  • There could be millions of rows as a starting point but typically around 100k, hence speed is of importance rather than low memory consumption.

One possible idea is that rows can be merged if n-1 numbers (or group of numbers in the slot) are equal, e.g.,

    1 2 4
    1 2 5
    1 2 6
    1 3 4
    1 3 5
    1 3 6

can merge into

    1 2 456
    and
    1 3 456

which further can be merged to:

    1 23 456

and written as:

    {1} x {2,3} x {4,5,6}

My question is similar to https://stackoverflow.com/q/20890802/781723, but that one is limited to triples ($n=3$); I am interested in generalizing, where rows can be longer than 3. I also found https://stackoverflow.com/q/46084505/781723, but that has no answer.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top