Algorithm/approximation for combined independent set/hamming distance

https://stackoverflow.com/questions/3513723

29-09-2019
|

Question

Input: Graph G Output: several independent sets, so that the membership of a node to all independent sets is unique. A node therefore has no connections to any node in its own set. Here is an example path.

Since clarification was called for here another rephrasal:

Divide a given graph into sets so that

i can tell a node from all others by its membership in sets e.g. if node i is present only in set A no other node should be present in set A only

if node j is present in set A and B then no other node should be present in set A and B only. if the membership of nodes is coded by a bit pattern, then these bit patterns have hamming distance at least one
if two nodes are adjacent in the graph, they should not be present in the same set, hence be an independent set

Example: B has no adjacent nodes D=>A, A=>D

Solution:

A B /
/ B D

A has bit pattern 10 and no adjacent node in its set. B has bit pattern 11 and no adjacent node, D has 01 therefore all nodes have hamming distance at least 1 an no adjacent nodes => correct

Wrong, because D and A are connected:

A D B
/ D B

A has bit pattern 10 and D in its set, they are adjacent. B has bit pattern 11 and no adjacent node, D has 11 as has B, so there are two errors in this solution and therefore it is not accepted.

Of course this should be extended to more Sets as the number of Nodes in the Graph increases, since you need at least log(n) sets.

I already wrote a transformation into MAX-SAT, to use a sat-solver for this. but the number of clauses is just to big. A more direct approach would be nice. So far I have an approximation, but I would like an exact solution or at least a better approximation.

I have tried an approach where I used a particle swarm to optimize from an arbitrary solution towards a better one. However the running time is pretty awful and the results are far from great. I am looking for a dynamic algorithm or something, however i cannot fathom how to divide and conquer this problem.

Solution

Not a complete answer, and I don't know how useful it will be to you. But here goes:

The hamming distance strikes me as a red herring. Your problem statement says it must be at least 1 but it could be 1000. It suffices to say the bit encoding for each node's set memberships is unique.

Your problem statement doesn't spell it out, but your solution above suggests every node must be a member of at least 1 set. ie. a bit encoding of all 0's is not allowed for any node's set memberships.

Ignoring connected nodes for a moment, disjoint nodes are easy: Simply number them sequentially with an unused bit encoding. Save those for last.

Your example above uses directed edges, but again, that strikes me as a red herring. If A cannot be in the same set as D because A=>D, D cannot be in the same set as A regardless whether D=>A.

You mention needing at least log(N) sets. You will also have at most N sets. A fully connected graph (with (N^2-N)/2 undirected edges) will require N sets each containing a single node.

In fact, if your graph contains a fully connected simplex of M dimensions (M in 1..N-1) with M+1 vertices and (M^2+M)/2 undirected edges, you will require at least M+1 sets.

In your example above, you have one such simplex (M=1) with 2 vertices {A,D} and 1 (undirected) edge {(A,D)}.

It would seem that your problem boils down to finding the largest fully connected simplexes in your graph. Stated differently, you have a routing problem: How many dimensions do you need to route your edges so none cross? It doesn't sound like a very scalable problem.

The first large simplex found is easy. Every vertex node gets a new set with its own bit.

The disjoint nodes are easy. Once the connected nodes are dealt with, simply number the disjoint nodes sequentially skipping any previously used bit patterns. From your example above, since A and D take 01 and 10, the next available bit pattern for B is 11.

The tricky part then becomes how to fold any remaining simplexes as much as possible into the existing range before creating any new sets with new bits. When folding, one must use 2 or more bits (sets) for each node, and the bits (sets) must not intersect with the bits (sets) for any adjacent node.

Consider what happens to your example above when one adds another node, C, to the example:

If C connects directly to both A and D, then the initial problem becomes finding the 2-simplex with 3 vertices {A,C,D} and 3 edges {(A,c),(A,D),(C,D)}. Once A, C and D take the bit patterns 001, 010 and 100, the lowest available bit pattern for disjoint B is 011.

If, on the other hand, C connects directly A or D but not both, the graph has two 1-simplexes. Supposing we find the 1-simplex with vertices {A,D} first giving them the bit patterns 01 and 10, the problem then becomes how to fold C into that range. The only bit pattern with at least 2 bits is 11, but that intersects with whichever node C connects to so we have to create a new set and put C in it. At this point, the solution is similar to the one above.

If C is disjoint, either B or C will get the bit pattern 11 and the remaining one will need a new set and get the bit pattern 100.

Suppose C connects to B but not to A or D. Again, the graph has two 1-simplexes but this time disjoint. Suppose {A,D} is found first as above giving A and D the bit patterns 10 and 01. We can fold B or C into the existing range. The only available bit pattern in the range is 11 and either B or C could get that pattern as neither is adjacent to A or D. Once 11 is used, no bit patterns with 2 or more bits set remain, and we will have to create a new set for the remaining node giving it the bit pattern 100.

Suppose C connects to all 3 A, B and D. In this case, the graph has a 2-simplex with 3 vertexes {A,C,D} and a 1-simplex with 2 vertexes {B, C}. Proceeding as above, after processing the largest simplex, A, C and D will have bit patterns 001, 010, 100. For folding B into this range, the available bit patterns with 2 or more bits set are: 011, 101, 110 and 111. All of these except 101 intersect with C so B would get the bit pattern 101.

The question then becomes: How efficiently can you find the largest fully-connected simplexes?

If finding the largest fully connected simplex is too expensive, one could put an approximate upper bound on potential fully connected simplexes by finding maximal minimums in terms of connections:

Sweep through the edges updating the vertices with a count of the connecting edges.
For each connected node, create an array of Cn counts initially zero where Cn is the count of edges connected to the node n.
Sweep through the edges again, for the connected nodes n1 and n2, increment the count in n1 corresponding to Cn2 and vice versa. If Cn2 > Cn1, update the last count in the n1 array and vice versa.
Sweep through the connected nodes again, calculating an upper bound on the largest simplex each node could be a part of. You could build a pigeon-hole array with a list of vertices for each upper bound as you sweep through the nodes.
Work through the pigeon-hole array from largest to smallest extracting and folding nodes into unique sets.

If your nodes are in a set N and your edges in a set E, the complexity will be: O(|N|+|E|+O(Step 5))

If the above approximation suffices, the question becomes: How efficiently can you fold nodes into existing ranges given the requirements?

OTHER TIPS

This maybe not the answer you might expect, but I can't find a place to add a comment. So I type it directly here. I can't fully understand your question. Or does it need specific knowledge to understand? What is this independent set? As I know a node in an independent set from a directed graph have a two way path to any other node in this set. Is your notion the same?

If this problem is like what I assume, independent sets can be found by this algorithm: 1. do depth-first search on the directed graph, records the time of tree rooted by this node is traversed. 2. then reverse all the edges in this graph 3. do depth-frist search again on the modified graph. The algorihtm is precisely explained by book "introduction to alogrithm"

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow