Find all intervals [s,e] where s,e in N_0 and s < e that overlap with another given interval

https://stackoverflow.com/questions/18804045

28-06-2022
|

Question

I have an algorithmic problem where I would like to see if it can be solved in better than O(n):

I have given a table T of n elements where each element is a tuple (s_i, e_i) with s_i, e_i in N and s_i < e_i, i.e. each tuple is some kind of an interval. I have to find all intervals that overlap with a given interval [t0, t1] with t0, t1 in N and t0 < t1. Further, I have available two sorted lists S and E, containing the s values, or e values respectively, together with the index i pointing to the respective entry in T. The lists are sorted by s values, or e values respectively. (Let's assume both, s and e values, are unique.)

Problem:

We have to find each interval/tuple (s_i, e_i) in T where s_i <= t1 and e_i >= t0.

My thoughts so far:

We can exclude some elements by either applying one of the interval boundaries, i.e. searching t1 in S or t0 in E. This gives us a list L of remaining elements:

L <- {e in E | e >= t0} or L <- {s in S | s <= t1}

However, there is no lower bound on the number of elements in L, no matter which search we perform. Further, we have to check every element in L if s <= t1, or e >= t0 respectively depending on which search we performed before.

The complexity for this solution is O(n).

However, let's say that k is the maximum number of elements overlapping with interval [t0, t1]. If we assume k << n, then the complexity is O(n/2) since we can exclude at least n/2 elements by choosing the appropriate search for L. Still O(n/2) is in O(n).

Can you think of a better approach to solve this problem?

(Please feel free to improve this question. Maybe it's not very clear. Thanks)

Solution 2

The solution can be found at cs.stackexchange.com: The general problem cannot be solved in less than O(n). Interval trees provide O(log n) + O(k) complexity for searching intervals that overlap with a certain other interval, given that k is the number of overlapping intervals.

OTHER TIPS

If you don't know anything about k, the number of intervals in the answer, you can't beat O(N), as the result can have N intervals in it.

If you know k is a lot smaller than N, you may do something better. Using a binary search, you can find the last i0 that s_i<t0 and the first i1 that s_i1>t1.

Then you find the last j0 that e_j0<t0 and the first j1 that e_j1>t1.

The results are between max(i0,j0) and min(i1, j1). So you have O(logN) + O(k).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow