How does Pollard's rho algorithm work?
-
29-09-2020 - |
Question
I am trying to understand how does Pollard's rho
algorithm actually work, but I just can not wrap my head around it. I already read its section in the CLRS book and also on internet but still can not understand its structure or analysis.
This is a java implementation of the pseudocode from the CLRS book along with euclid
gcd
algorithm:
public static void pollardRho(int n) {
Random rand = new Random();
int i = 1;
int x0 = rand.nextInt(n);
int y = x0;
int k = 2;
while (true) {
i++;
int x = (x0 * x0 - 1) % n;
int d = gcd(y - x, n);
if (d != 1 && d != n) {
System.out.println(d);
}
if (i == k) {
y = x;
k *= 2;
}
x0 = x;
}
}
public static int gcd(int a, int b) {
// fixes the issue with java modulo operator % returning negative
// results based on the fact that gcd(a, b) = gcd(|a|, |b|)
if (a < 0) a = -a;
if (b < 0) b = -b;
while (b != 0) {
int tmp = b;
b = (a % b);
a = tmp;
}
return a;
}
- Why does it choose $x = (x_0^2 - 1) \mod n$?
- What does $y$ actually represent and why is it chosen to be equal to $\{x_1,x_2,x_4,x_8,x_{16},...\}$?
- Why does it compute $\text{GCD}(y-x,n)$ and how does $d$ turns out to be a factor of $n$?
- And why is the expected running time is $O(n^{1/4})$ arithmetic operations and $O(2^{\beta/4} \beta^2)$ bit operations assuming that $n$ is $\beta$ bits long?
I understand that if there exists a non-trivial square-root of $x^2 \equiv 1 \pmod{n}$ then $n$ is a composite and $x$ is a factor, but $y - x$ is not a square root of $n$ is it?
Solution
The idea behind Pollard $\rho$ is that if you take any function $f : [0, n - 1] \to [0, n - 1]$, the iteration $x_{k + 1} = f(x_k)$ must fall into a cycle eventually. Take now $f$ as a polynomial, and consider it modulo $n = p_1 p_2 \dotsm p_r$, where the $p_i$ are primes:
$\begin{equation*} x_{k + 1} = f(x_k) \bmod n = f(x_k) \bmod p_1 p_2 \dotsm p_r \end{equation*}$
Thus it repeats the same iteration structure modulo each of the primes into which $n$ factors.
We don't know anything about the cycles, but it is easy to see that if you go with $x_0 = x'_0$ and:
$\begin{align*} x_{k + 1} &= f(x_k) \\ x'_{k + 1} &= f(f(x'_k)) \end{align*}$
(i.e., $x'$ advances twice as fast) eventually $x'_k$ and $x_k$ will span one (or more) cycles (see Floyd's cycle detection algorithm for details), thus in our case, $x'_k \equiv x_k \mod{p_i}$, and $\gcd(x'_k, x_k)$ will be a factor of $n$, hopefully a non-trivial one.
Any polynomial works, but we want irreducible ones (no non-trivial factors, detecting those is not the point of the exercise). Linear polynomials don't give factors, next simplest to compute is a quadratic one, but just $x^2$ doesn't work either (reducible), so take $x^2 + 1$ for simplicity. Remember the idea here is to work with very large numbers, few and simple arithmetic operations are a distinct bonus. The analysis of the algorithm (e.g. in Knuth's "Seminumerical algorithms") models $f(x) \bmod p$ as a random function, which is close enough to explain the overall characteristics of the algorithm.