Search for maximal value smaller than V in an array composed of K linear functions

https://cs.stackexchange.com/questions/119151

28-09-2020
|

Question

Sorry for the lack of clarity in the question description.

I have an array composed of length $N$ composed of $K$ linear subarrays. We define a linear subarray as a contiguous subarray of the array $[l,r]$ where $A[i]-A[i-1] = C$, a constant, for all $l<i<=r$. (Note: $C$ may be different for different subarrays; the elements of the array are integers.) Please note that the linear subarrays are not disjoint (there is one element intersection between any pair of adjacent linear subarrays). For example, [1,3,5,4,3,2] has two linear subarrays: [1,3,5] and [5,4,3,2]. [1,3,5,1,2,3] would have three: [1,3,5], [5,1], [1,2,3].

I wish to find multiple queries for the maximal value that is less than a queried value $V$, in $o(K)$ and $o(N)$ time per query, with $o(K^2)$ and $o(N)$ preprocessing. (Assume that the array has already been stored in terms of the K linear subarrays, perhaps in terms of the constant C and length of each subarray, with the subarrays ordered. Therefore, you are given the array with the starting and ending points of all the linear subarrays, as well as the linear constant C, as described before. So, one does not need to derive the linear subarrays in the first place.) Otherwise, a proof (formal or not) that it is not possible to do so would be appreciated.

Of course, a balanced binary search tree (BBST) or simply sorting achieves the purpose, but it requires $O(NlogN)$ preprocessing, which is too much. Checking the largest valid value within each subarray takes $O(K)$ per query, which is again too much. Would it be possible for something to combine both of these, perhaps?

Randomised algorithms are okay as long as they always achieve the correct answer and work fast enough in the average case, though deterministic algorithms are preferred.

Thanks for any and all responses. I was wondering if there was any research in the question, maybe? It does not seem like too obscure a topic, but unfortunately my searching has not been competent enough.

EDIT: A method that seems useful.

Here was my line of thinking after I asked the question; I wonder if this would help somehow. It uses the idea of modulo as well. Initialise V=0, and allow each linear subarray to be stored as L,R; where L is the minimal value of the subarray and R is the maximal value. When we are given a query for V, we somehow disinclude elements where L>V and R<V (perhaps by using multiple dimensions?) A supplementary data structure stores the minimal theoretical difference of the element in the array, which is something like L - V mod c[i]. So essentially, we now need to be able to run a range add on this data structure, but if the value of any element becomes <0 or >=c[i] it needs to be reset (eg. if an element becomes equal to -1 with c[i]=5 it would be reset to 4; if an element becoming equal to 6 with the same c[i] would be reset to 1); and also run range minimum queries.

If such a data structure can be made, the problem is solved. The trouble is the modulo, as range add and range minimum query can be easily done with a segment tree and lazy propagation; as well as the disinclusion of certain elements.

Solution

It is not possible to guarantee to find the maximal value that is less than a queried value $V$ in $o(K)$ time with preprocessing in $o(N)$ time.

This can be seen easily in the following extreme case. Let $A$ be any array of $N$ integers, which is composed of the following $K=N-1$ linear subarrays.

the subarray $A[0], A[1]$ with $C=A[1]-A[0]$.
the subarray $A[1], A[2]$ with $C=A[2]-A[1]$.
$\cdots$
the subarray $A[N-2], A[N-1]$ with $C=A[N-1]-A[N-2]$.

With $o(N)$ time preprocessing and $o(K)=o(N)$ time processing, an algorithm will not be able to even read some number in $A$ when $N$ is large enough. (In fact, most of the numbers in $A$.) So, for some queried value $q$, the algorithm will fail to recognize that $q+1$ appears in $A$.

(The explanation above could be made more rigorous, for example, using the formal method of adversary and a well-defined model of computation.)

It looks like the more interesting question to ask should be whether there is an algorithm with $o(N\log N)$ preprocessing and $o(K)$ per query. Or whether there is algorithm with $o(N)$ preprocessing and $o(K)$ per query given $K=o(N)$. That sounds like another question, anyway.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange