Partitioning a superset and getting the list of original sets for each partition

https://stackoverflow.com/questions/14004945

11-12-2021
|

Question

Introduction

While trying to do some cathegorization on nodes in a graph (which will be rendered differenty), I find myself confronted with the following problem:

The Problem

Given a superset of elements S = {0, 1, ... M} and a number n of non-disjoint subsets T_i thereof, with 0 <= i < n, what is the best algorithm to find out the partition of the set S called P?

P = S is the union of all disjoint partitions P_j of the original superset S, with 0 <= j < M, such that for all elements x in P_j, every x has the same list of "parents" among the "original" sets T_i.

Example

S = [1, 2, 3, 4, 5, 6,   8, 9]

T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]

So all P_js would be:

P_1 = [1, 4] # all elements x have the same list of "parents": T_1, T_3
P_2 = [2] # all elements x have the same list of "parents": T_2
P_3 = [3] # all elements x have the same list of "parents": T_2, T_3
P_4 = [5, 6, 8, 9] #  all elements x have the same list of "parents": S (so they're not in any of the P_j

Questions

What are good functions/classes in the python packages to compute all P_js and the list of their "parents", ideally restricted to numpy and scipy? Perhaps there's already a function which does just that
What is the best algorithm to find those partitions P_js and for each one, the list of "parents"? Let's note T_0 = S

I think the brute force approach would be to generate all 2-combinations of T sets and split them in at most 3 disjoint sets, which would be added back to the pool of T sets and then repeat the process until all resulting Ts are disjoint, and thus we've arrived at our answer - the set of P sets. A little problematic could be caching all the "parents" on the way there.

I suspect a dynamic programming approach could be used to optimize the algorithm.

Note: I would have loved to write the math parts in latex (via MathJax), but unfortunately this is not activated :-(

Solution

The following should be linear time (in the number of the elements in the Ts).

from collections import defaultdict

S = [1, 2, 3, 4, 5, 6,   8, 9]

T_1 = [1, 4]
T_2 = [2, 3]
T_3 = [1, 3, 4]

Ts = [S, T_1, T_2, T_3]

parents = defaultdict(int)
for i, T in enumerate(Ts):
    for elem in T:
        parents[elem] += 2 ** i

children = defaultdict(list)
for elem, p in parents.items():
    children[p].append(elem)

print(list(children.values()))

Result:

[[5, 6, 8, 9], [1, 4], [2], [3]]

OTHER TIPS

The way I'd do this is to construct an M × n boolean array In where In(i, j) = S_i ∈ T_j. You can construct that in O(Σ_j|T_j|), provided you can map an element of S onto its integer index in O(1), by scanning all of the sets T and marking the corresponding bit in In.

You can then read the "signature" of each element i directly from In by concatenating row i into a binary number of n bits. The signature is precisely the equivalence relationship of the partition you are seeking.

By the way, I'm in total agreement with you about Math markup. Perhaps it's time to mount a new campaign.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow