Find pairs in an array such that a%b = k , where k is a given integer

https://stackoverflow.com/questions/12732939

05-07-2021
|

Question

Here is an interesting programming puzzle I came across . Given an array of positive integers, and a number K. We need to find pairs(a,b) from the array such that a % b = K.

I have a naive O(n^2) solution to this where we can check for all pairs such that a%b=k. Works but inefficient. We can certainly do better than this can't we ? Any efficient algorithms for the same? Oh and it's NOT homework.

La solution

Sort your array and binary search or keep a hash table with the count of each value in your array.

For a number x, we can find the largest y such that x mod y = K as y = x - K. Binary search for this y or look it up in your hash and increment your count accordingly.

Now, this isn't necessarily the only value that will work. For example, 8 mod 6 = 8 mod 3 = 2. We have:

x mod y = K => x = q*y + K =>
            => x = q(x - K) + K =>
            => x = 1(x - K) + K =>
            => x = 2(x - K)/2 + K =>
            => ...

This means you will have to test all divisors of y as well. You can find the divisors in O(sqrt y), giving you a total complexity of O(n log n sqrt(max_value)) if using binary search and O(n sqrt(max_value)) with a hash table (recommended especially if your numbers aren't very large).

Autres conseils

Treat the problem as having two separate arrays as input: one for the a numbers and a % b = K and one for the b numbers. I am going to assume that everything is >= 0.

First of all, you can discard any b <= K.

Now think of every number in b as generating a sequence K, K + b, K + 2b, K + 3b... You can record this using a pair of numbers (pos, b), where pos is incremented by b at each stage. Start with pos = 0.

Hold these sequences in a priority queue, so you can find the smallest pos value at any given time. Sort the array of a numbers - in fact you could do this ahead of time and discard any duplicates.

For each a number
  While the smallest pos in the priority queue is <= a
    Add the smallest multiple of b to it to make it >= a
    If it is == a, you have a match
    Update the stored value of pos for that sequence, re-ordering the priority queue

At worst, you end up comparing every number with every other number, which is the same as the simple solution, but with priority queue and sorting overhead. However, large values of b may remain unexamined in the priority queue while several a numbers pass through, in which case this does better - and if there are a lot of numbers to process and they are all different, some of them must be large.

This answer mentions the main points of an algorithm (called DL because it uses “divisor lists” ) and gives details via a program, called amodb.py.

Let B be the input array, containing N positive integers. Without much loss of generality, suppose B[i] > K for all i and that B is in ascending order. (Note that x%B[i] < K if B[i] < K; and where B[i] = K, one can report pairs (B[i], B[j]) for all j>i. If B is not sorted initially, charge a cost of O(N log N) to sort it.)

In algorithm DL and program amodb.py, A is an array with K pre-subtracted from the input array elements. Ie, A[i] = B[i] - K. Note that if a%b == K, then for some j we have a = b*j + K or a-K = b*j. That is, a%b == K iff a-K is a multiple of b. Moreover, if a-K = b*j and p is any factor of b, then p is a factor of a-K.

Let the prime numbers from 2 to 97 be called “small factors”. When N numbers are uniformly randomly selected from some interval [X,Y], on the order of N/ln(Y) of the numbers will have no small factors; a similar number will have a greatest small factor of 2; and declining proportions will have successively larger greatest small factors. For example, on the average about N/97 will be divisible by 97, about N/89-N/(89*97) by 89 but not 97, etc. Generally, when members of B are random, lists of members with certain greatest small factors or with no small factors are sub-O(N/ln(Y)) in length.

Given a list Bd containing members of B divisible by largest small factor p, DL tests each element of Bd against elements of list Ad, those elements of A divisible by p. But given a list Bp for elements of B without small factors, DL tests each of Bp's elements against all elements of A. Example: If N=25, p=13, Bd=[18967, 23231], and Ad=[12779, 162383], then DL tests if any of 12779%18967, 162383%18967, 12779%23231, 162383%23231 are zero. Note that it is possible to cut the number of tests in half in this example (and many others) by noticing 12779<18967, but amodb.py does not include that optimization.

DL makes J different lists for J different factors; in one version of amodb.py, J=25 and the factor set is primes less than 100. A larger value of J would increase the O(N*J) time to initialize divisor lists, but would slightly decrease the O(N*len(Bp)) time to process list Bp against elements of A. See results below. Time to process other lists is O((N/logY)*(N/logY)*J), which is in sharp contrast to the O(n*sqrt(Y)) complexity for a previous answer's method.

Shown next is output from two program runs. In each set, the first Found line is from a naïve O(N*N) test, and the second is from DL. (Note, both DL and the naïve method would run faster if too-small A values were progressively removed.) The time ratio in the last line of the first test shows a disappointingly low speedup ratio of 3.9 for DL vs naïve method. For that run, factors included only the 25 primes less than 100. For the second run, with better speedup of ~ 4.4, factors included numbers 2 through 13 and primes up to 100.

  $ python amodb.py 
  N:  10000   K:  59685 X:  100000   Y:  1000000
  Found 208  matches in 21.854 seconds
  Found 208  matches in 5.598 seconds
  21.854 / 5.598 = 3.904

  $ python amodb.py
  N:  10000   K:  97881 X:  100000   Y:  1000000
  Found 207  matches in 21.234 seconds
  Found 207  matches in 4.851 seconds
  21.234 / 4.851 = 4.377

Program amodb.py:

import random, time
factors = [2,3,4,5,6,7,8,9,10,11,12,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
X, N = 100000, 10000
Y, K = 10*X, random.randint(X/2,X)
print "N: ", N, "  K: ", K, "X: ", X, "  Y: ", Y
B = sorted([random.randint(X,Y) for i in range(N)])
NP = len(factors);  NP1 = NP+1
A, Az, Bz = [], [[] for i in range(NP1)], [[] for i in range(NP1)]
t0 = time.time()
for b in B:
    a, aj, bj = b-K, -1, -1
    A.append(a)                 # Add a to A 
    for j,p in enumerate(factors):
        if a % p == 0:
            aj = j
            Az[aj].append(a)
        if b % p == 0:
            bj = j
    Bz[bj].append(b)
Bp = Bz.pop()   # Get not-factored B-values list into Bp
di = time.time() - t0; t0 = time.time()
c = 0
for a in A:
    for b in B:
        if a%b == 0:
            c += 1
dq =  round(time.time() - t0, 3); t0 = time.time()
c=0
for i,Bd in enumerate(Bz):
    Ad = Az[i]
    for b in Bd:
        for ak in Ad:
            if ak % b == 0:
                c += 1
for b in Bp:
    for ak in A:
        if ak % b == 0:
            c += 1
dr = round(di + time.time() - t0, 3)
print "Found", c, " matches in", dq, "seconds"
print "Found", c, " matches in", dr, "seconds"
print dq, "/", dr, "=", round(dq/dr, 3)

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow