minimum set of binary vectors to get full coverage

https://stackoverflow.com/questions/19961303

30-07-2022
|

質問

I need to find an efficient algorithm that finds an optimal set of binary vectors in such a way that every index has a bit that is set in at least one vector in the set.

Amusing motivation: I want to break into a castle and steal its treasure. In order to do it, i have to unlock 4 doors: the blue door, the red one, the yellow and the green. There are 3 dwarfs and each one of them have a different set of keys. dwarf 1 has: blue key & red key. dwarf 2 has: red key & yellow key. dwarf 3 has: blue key & green key. I want to kill minimum amount of dwarfs in order to get into the castle. so in this case, it's obvious that we should kill only dwarf 2 and dwarf 3 to get all the keys and that there is no need to kill dwarf 1.

In general, we have n binary vectors of size m (n dwarfs and m doors): a0,a1....a(n-1). We need a set of vectors such that we have at least one key for every index in the vector. If a1[5]=1, then we know that the 6th bit in the 2nd vector is set. meaning dwarf #2 has key #6. The example in the begining is actually like that: a0 (1,1,0,0) a1 (0,1,1,0) a2 (1,0,0,1) so we choose vectors: a1,a2 to get full coverage with minimum vectors.

The brute force naive algorithm is to try every option and then return the solution with minimum amount of vectors, but since there are n vectors, there are 2^n options.

Next i thought of a greedy algorithm that takes each time the vector with most keys. But here is a counter example that proves this method wrong:

a0 (1,1,1,0,0,0) --greedy and wrong choise--
a1 (1,0,0,1,0,0)
a2 (0,1,0,0,1,0)
a3 (0,0,1,0,0,1)

with this algorithm, we will choose all 4 vectors, but the optimal solution has only 3 vectors {a1,a2,a3}.

Now, i thought of another greedy algorithm, we choose the vector with the most rare key (and in case of a tie, we search the next rare key and so on), but then again i found a counter example:

a0 (1,1,1,0,0,0) --greedy and wrong choise--
a1 (1,0,0,1,0,0)
a2 (0,1,0,0,1,0)
a3 (0,0,1,0,0,1)
a4 (0,0,0,1,0,0)
a5 (0,0,0,0,1,0)
a6 (0,0,0,0,0,1)

in this example all the indexes have the same "rarity" (2), so we end up choosing a0 although a sulotion with a0 (such as {a0,a1,a2,a3}) has at least 4 vectors and the optimal solution has only 3 {a1,a2,a3}.

I think that maybe if i would eliminate vectors that are subsets of another vector (for instance a6 is a subset of a3) then this algorithm may work, but even so, checking this would cost n!.

I hope you could help me find an efficient algorithm or help me prove that such algorithm is not existed.

解決

This problem is called Set Cover Problem and it is NP-Hard.

You may find this short video helpful: discrete optimization (you have to be logged into your coursera account). It comes from great course on Discrete Optimization -- the archive of the class is still opened.

(Your reasoning about pruning space-search is quite sound: you can just immediately remove all vectors that are dominated by another vector and you have to take a vector if no other vector has the same key.)

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow