What algorithm to use for this problem?

https://cs.stackexchange.com/questions/117634

28-09-2020
|

Question

I was recently faced with a hackerrank interview test where I couldn't solve the following problem correctly. I would not want to name the exact problem for privacy purposes, but I can tell that it was nowhere in Google with this name.

Problem

We want to organize an event with as much performers as possible. We are given a list of possible performer arrival times and performance durations. Our function should return the maximum number of performers that can be selected without conflicts.

Example

N = 5
arrivals = [1, 3, 3, 5, 7] (performers arrive in these times)
durations = [2, 2, 1, 2, 1] (the time needed to finish their performances)
solution = 4

So at time 1 we can accept the performer and will finish at time 3. At time 3 we have two conflicting choices, and doesn't matter which we chose. Time 5 and 7 are not conflicted either.

My solution

Made tuples of the arrivals and durations by zipping. Sorted the tuples first on arrival, then on duration.
Outer for cycle for i from start to finish. Initialize new set in each iter.
Inner for cycle for j from i to finish. Check if j can be added to the set without conflict. If set is longer than max I save.

Python source

Questions

Are there better ways, standard algorithms to solve this problem?
I know that there are some edge cases, for which my algorithm doesn't work. Hackerrank evaluated my algo, and it only passed 7/13 test cases. What could be some edge cases I am missing?
Is this a search problem or an optimization problem?

Solution

This is a simple Interval Scheduling Maximization problem, which can be solved in O(n log(n)) time via a simple greedy algorithm. The trick is to sort performances according to end time and not start time.

Here is the description of the solution found on the Wikipedia page I linked:

Several algorithms, that may look promising at first sight, actually do not find the optimal solution:

Selecting the intervals that start earliest is not an optimal solution, because if the earliest interval happens to be very long, accepting it would make us reject many other shorter requests.
Selecting the shortest intervals or selecting intervals with the fewest conflicts is also not optimal.

Greedy polynomial solution

The following greedy algorithm does find the optimal solution:

Select the interval, x, with the earliest finishing time.
Remove x, and all intervals intersecting x, from the set of candidate intervals.
Repeat until the set of candidate intervals is empty.

Whenever we select an interval at step 1, we may have to remove many intervals in step 2. However, all these intervals necessarily cross the finishing time of x, and thus they all cross each other. Hence, at most 1 of these intervals can be in the optimal solution. Hence, for every interval in the optimal solution, there is an interval in the greedy solution. This proves that the greedy algorithm indeed finds an optimal solution.

A more formal explanation is given by a Charging argument.

The greedy algorithm can be executed in time O(n log n), where n is the number of tasks, using a preprocessing step in which the tasks are sorted by their finishing times.

OTHER TIPS

Look at maximum interval scheduling.

Assign an interval $(s_i, t_i)$ to each performer, for $i=1,\dots,N$, meaning that the performance starts at time $s_i$ and has a duration of $t_i - s_i$.

Assume for now that all the intervals are sorted by their end time. Just for simplicity assume also that all times are distinct (this assumption is unnecessary). Your problem can then be solved in $O(N)$ time.

Consider $(s_1, t_1)$ and let $I$ be the set of intervals that intersect it (excluding $(s_1, t_1)$ itself).

Is easy to see that all the intervals in $I$ must start before $t_1$ (as otherwise they wouldn't intersect $(s_1, t_1)$) and end after $t_1$ (otherwise $t_1$ wouldn't be the minimum end time). This means that all intervals in $I \cup \{ (s_1, t_1) \}$ intersect with each other and hence any solution can contain at most one of them.

Let $S$ be a solution (i.e., a subset of pairwise non-intersecting intervals). It is always possible to convert $S$ into a new solution $S'$ that contains $(s_1, t_1)$ and such that $|S'| \ge |S|$.

If $S \cap I = \emptyset$ then we are trivially done since we can pick $S' = S \cup \{ (s_1, t_1) \}$ (notice that this also includes the case in which $(s_1, t_1) \in S$).

If $S \cap I \neq \emptyset$, let $(s_i, t_i)$ be the unique interval in $S \cap I$. Any interval $(s_j, t_j) \in S \setminus \{ (s_i, t_i) \}$ must satisfy either $s_j > t_i > t_1$ or $s_i > t_j > t_1$. The latter case is actually impossible since it would contradict the fact that $(s_i, t_i) \in S$. In other words, if we replace $(s_i, t_i)$ with $(s_1, t_1)$ we still obtain a feasible solution. Therefore we pick $S' = S \cup \{(s_1, t_1)\} \setminus \{ (s_i, t_i) \}$.

The bottom line is that there is always an optimal solution that contains $(s_1, t_1)$. Therefore you can always add $(s_1, t_1)$ to your solution, delete all intervals in $S \cup \{ (s_1, t_1) \}$, and look for the optimal solution among the remaining intervals. This amounts to running the greedy algorithm that considers the intervals in order and adds any interval that fits.

From your instance it seems that the starting times are already sorted. In this case you can do the trick of "reversing the time", i.e., swapping the starting and ending times, for example by changing $(s_i, t_i)$ into $(-t_i, -s_i)$, which allows you to obtain a $O(N)$-time algorithm by skiping the initial sorting step which would already require $\Omega(N \log N)$ time.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange