質問

I am trying to prove the hardness of the following problem. This problem is from google hashcode, qualification-round, 2020.

Hier is a brief description of the problem. Given a list or libraries and a list of books. Each book $b$ has a number of points $P_b$. We want to scan a subset of the books in a way that maximizes the total points of the books we scan in $n$ days. We have to choose the books from the libraries. Each library $l$ has a list of books $B_l$, the number of books we can scan per day in this library $m_l$ and the number of days needed to register the library (do the paper work) $d_l$. We can scan books in parallel from many libraries at once but the registration should happen sequentially and we cannot register two libraries at the same time. Moreover, the same book can be fond at many libraries and we have to decide what books should be chosen from which libraries and in which order. We can scan as many books per day as we want (respecting the constraints provided by libraries).

Here is an example: We have two libraries and three books. Assume we have 3 days in total. The books have 40, 40, 40 and 30 points respectively. The first library has the first 3 books. The second library has the first and the forth book. Assume each needs one day to register and can provide one book per day. We can choose the first library at first. In the first day we register this library. In the second day we can scan the second book while registering the second library. in the third day we can scan the third book in the first library and the first book in the second library. This totals in the first three books and a total of 120 points. Note that if we scan the first book in the first library or we start by the second library we would have had to scan the 30-point book instead a 40-point one.


Here is what I have tried so far. On one hand, having the correct order of libraries in hand, we can solve the problem in polynomial time, since we can calculate how many books each library can achieve in total (total number of days minus prefix sums of library-registration-time for libraries we have chosen so far all multiplied by the number of books this library can provide per day). Using these values we can turn the problem into a weighted bipartite b-matching problem where each library is connected to the books it has with edge-weights equal to the points of each book. Each library has capacity the total number of books it can provide. This problem can be reduced to maximum-weight bipartite matching and solved in polynomial time. However, finding a correct order of libraries seems to me as a hard scheduling problem.

On the other hand, by assuming $B_l$ are pairwise disjoint over all libraries $l$, which means each library has its "own" set of books, and assuming each library can provide exactly one book per day, and each book has exactly one point, we turn the problem into a scheduling problem of determining an order of the libraries where each library has registration time and number of points such that it can give one point per day after registration and registrations happen sequentially. This problem seems similar to knapsack problem since we can solve it using dynamic programming over the number of the given days where $d_{i, j, k}$ is the most number of points we can achieve from the $i$th to the $j$th library in $k$ days where all libraries in the range are registered on the $k$th days and we are free to register a new library. The problem seems similar to the knap-sack probkem. However, I can't find a proper reduction since the number of points we get depends heavily on the the order and that in this special case the number of days is polynomial in the size of the input and hence, psudo polynomial algorithm is a polynomial time algorithm.


Do you think this problem is NP-hard? I appreciate all hints, thoughts and ideas about hardness and/or polynomial solutions :)

Disclaimer. The question is just out of curiosity. The competition is already over and an $O(\sqrt n m)$ matching algorithm is already too slow for the suggested bounds on a typical computer so I am just asking from a theoretical point of view.

役に立ちましたか?

解決

The problem is NP-hard, at least for a particular simplified configuration.

Assume that each $m_l$ is effectively infinite - we can scan a particular libraries' books all on the day we get access. Let all $d_l = 1$ - each library takes one day to get access to, meaning we get access to exactly $n$ libraries. Now if $P_b = 1$ the thing that maximizes our score is the number of unique books we scan. The order in which we choose to get access to libraries does not matter, but our selection of which does.

Now if there was an efficient algorithm to solve this problem you could solve the maximum coverage problem efficiently as well, meaning your problem is NP-hard.

ライセンス: CC-BY-SA帰属
所属していません cs.stackexchange
scroll top