Pergunta

I have following problem: I have a sorted sequence of $N$ integers (assume they are monotonically increasing). I want to check whether there is any subsequence of length $\ge N/4$, such that consecutive elements of the subsequence all differ by the same value.

For example, in the sequence [3,4,5,8,12] there are two such subsequences: [3,4,5] (the difference is 1) and [4,8,12] (the difference is 4). Thus, the length of longest such subsequence is 3 for this example. Since $3 \ge 5/4$, the answer is yes, there is a subsequence of length $\ge N/4$ with the desired property.

In my real-life situation, the sequence is of length $N\approx 10^6$, and the elements are all 9-digit numbers. Is there an efficient algorithm to solve this problem?


My naive approach was to create Cartesian product with absolute differences between numbers:

$$ \left( \begin{array}{ccccc} 0 & 1 & 2 & 5 & 9 \\ 1 & 0 & 1 & 4 & 8 \\ 2 & 1 & 0 & 3 & 7 \\ 5 & 4 & 3 & 0 & 4 \\ 9 & 8 & 7 & 4 & 0 \end{array} \right) $$

And then focus on top-right part and compute number of occurrences of each difference, so:

$$ ||\text{diff-by-1}|| = 2 => \text{3 numbers diff by 1}\\ ||\text{diff-by-4}|| = 2 => \text{3 numbers diff by 4} $$

This is very simple and very ineffective. It requires lot of comparisons and does not scale (at all): its running time is $\Theta(N^2)$. In my real life scenario my sequence is ~10^6 long, so this is too slow.

To give you wider picture as maybe there is much better (probabilistic) approach to this problem: after largest sub-sequence is found I want to compute simple ratio:

$$ r:=\frac{\text{largest sub-sequence length}}{\text{sequence length}} $$

and if $r$ is greater then some fixed value I want to raise alarm (or do whatever I have to do ;-)).

Thanks for any help, references, pointers, etc.

BTW: here are things that I was/am looking at:

Update: was thinking a little bit more about it and started from the end, so instead of computing all differences between numbers (top-right corner of the matrix) I can derive small $k$ value from "fixed value" I mentioned at the end of original question. For instance if I am going to raise the alarm when 25% of all numbers are in some sequence I need to focus on small "triangles" in matrix and number of computations required is smaller (much smaller). When I add some sampling then it should be simple enough to implement at scale.

Update 2 - Implemented @D.W. algorithm, sample run below:

    11:51:06 ~$ time nodejs progression.js 
    L: 694000000,694000002,694000006,694000007,694000009,694000010,
        694000013,694000015,694000018,694000019,694000021,694000022,694000023,
    694000026,694000028,694000030,694000034,694000036,694000038,694000040,
    694000043,694000045,694000046,694000048,694000051,694000053,694000055,
    694000057,694000060,694000061,694000063,694000067,694000069,694000072,
    694000074,694000076,694000077,694000079,694000080,694000082,694000083,
    694000084,694000086,694000090,694000091,694000093,694000095,694000099,
    694000102,694000103,694000105,694000108,694000109,694000113,694000116,
    694000118,694000122,694000125,694000128,694000131,694000134,694000137,
    694000141,694000143,694000145,694000148,694000152,694000153,694000154,
    694000157,694000160,694000162,694000163,694000166,694000170,694000173,
    694000174,694000177,694000179,694000180,694000181,694000184,694000185,
    694000187,694000189,694000193,694000194,694000198,694000200,694000203,
    694000207,694000211,694000215,694000219,694000222,694000226,694000228,
    694000232,694000235,694000236
    N: 100
    P: 0.1
    L: 10 (min)
    D: 26 (max)
    [ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90 ]
    Found progression of 10 elements, difference: 16 starts: 694000045, ends: 694000189.

    real    0m0.065s
    user    0m0.052s
    sys 0m0.004s

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top