How does Pollard's rho algorithm work?

https://cs.stackexchange.com/questions/121205

29-09-2020
|

Question

I am trying to understand how does Pollard's rho algorithm actually work, but I just can not wrap my head around it. I already read its section in the CLRS book and also on internet but still can not understand its structure or analysis. This is a java implementation of the pseudocode from the CLRS book along with euclid gcd algorithm:

public static void pollardRho(int n) {
    Random rand = new Random();
    int i = 1;
    int x0 = rand.nextInt(n);
    int y = x0;
    int k = 2;
    while (true) {
        i++;
        int x = (x0 * x0 - 1) % n;
        int d = gcd(y - x, n);
        if (d != 1 && d != n) {
            System.out.println(d);
        }
        if (i == k) {
            y = x;
            k *= 2;
        }
        x0 = x;
    }
}

public static int gcd(int a, int b) {
    // fixes the issue with java modulo operator % returning negative 
    // results based on the fact that gcd(a, b) = gcd(|a|, |b|)
    if (a < 0) a = -a;
    if (b < 0) b = -b;

    while (b != 0) {
        int tmp = b;
        b = (a % b);
        a = tmp;
    }
    return a;
}

Why does it choose $x = (x_0^2 - 1) \mod n$?
What does $y$ actually represent and why is it chosen to be equal to $\{x_1,x_2,x_4,x_8,x_{16},...\}$?
Why does it compute $\text{GCD}(y-x,n)$ and how does $d$ turns out to be a factor of $n$?
And why is the expected running time is $O(n^{1/4})$ arithmetic operations and $O(2^{\beta/4} \beta^2)$ bit operations assuming that $n$ is $\beta$ bits long?

I understand that if there exists a non-trivial square-root of $x^2 \equiv 1 \pmod{n}$ then $n$ is a composite and $x$ is a factor, but $y - x$ is not a square root of $n$ is it?

Solution

The idea behind Pollard $\rho$ is that if you take any function $f : [0, n - 1] \to [0, n - 1]$, the iteration $x_{k + 1} = f(x_k)$ must fall into a cycle eventually. Take now $f$ as a polynomial, and consider it modulo $n = p_1 p_2 \dotsm p_r$, where the $p_i$ are primes:

$\begin{equation*} x_{k + 1} = f(x_k) \bmod n = f(x_k) \bmod p_1 p_2 \dotsm p_r \end{equation*}$

Thus it repeats the same iteration structure modulo each of the primes into which $n$ factors.

We don't know anything about the cycles, but it is easy to see that if you go with $x_0 = x'_0$ and:

$\begin{align*} x_{k + 1} &= f(x_k) \\ x'_{k + 1} &= f(f(x'_k)) \end{align*}$

(i.e., $x'$ advances twice as fast) eventually $x'_k$ and $x_k$ will span one (or more) cycles (see Floyd's cycle detection algorithm for details), thus in our case, $x'_k \equiv x_k \mod{p_i}$, and $\gcd(x'_k, x_k)$ will be a factor of $n$, hopefully a non-trivial one.

Any polynomial works, but we want irreducible ones (no non-trivial factors, detecting those is not the point of the exercise). Linear polynomials don't give factors, next simplest to compute is a quadratic one, but just $x^2$ doesn't work either (reducible), so take $x^2 + 1$ for simplicity. Remember the idea here is to work with very large numbers, few and simple arithmetic operations are a distinct bonus. The analysis of the algorithm (e.g. in Knuth's "Seminumerical algorithms") models $f(x) \bmod p$ as a random function, which is close enough to explain the overall characteristics of the algorithm.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange