Why don't genetic algorithms work on problems like factoring RSA?

https://stackoverflow.com/questions/5456483

12-11-2019
|

Question

Some time ago i was pretty interested in GAs and i studied about them quite a bit. I used C++ GAlib to write some programs and i was quite amazed by their ability to solve otherwise difficult to compute problems, in a matter of seconds. They seemed like a great bruteforcing technique that works really really smart and adapts.

I was reading a book by Michalewitz, if i remember the name correctly and it all seemed to be based on the Schema Theorem, proved by MIT.

I've also heard that it cannot really be used to approach problems like factoring RSA private keys.

Could anybody explain why this is the case ?

Solution

Genetic Algorithm are not smart at all, they are very greedy optimizer algorithms. They all work around the same idea. You have a group of points ('a population of individuals'), and you transform that group into another one with stochastic operator, with a bias in the direction of best improvement ('mutation + crossover + selection'). Repeat until it converges or you are tired of it, nothing smart there.

For a Genetic Algorithm to work, a new population of points should perform close to the previous population of points. Little perturbation should creates little change. If, after a small perturbation of a point, you obtain a point that represents a solution with completely different performance, then, the algorithm is nothing better than random search, a usually not good optimization algorithm. In the RSA case, if your points are directly the numbers, it's either YES or NO, just by flipping a bit... Thus using a Genetic Algorithm is no better than random search, if you represents the RSA problem without much thinking "let's code search points as the bits of the numbers"

OTHER TIPS

I would say because factorisation of keys is not an optimisation problem, but an exact problem. This distinction is not very accurate, so here are details. Genetic algorithms are great to solve problems where the are minimums (local/global), but there aren't any in the factorising problem. Genetic algorithm as DCA or Simulated annealing needs a measure of "how close I am to the solution" but you can't say this for our problem.

For an example of problem genetics are good, there is the hill climbing problem.

GAs are based on fitness evaluation of candidate solutions.

You basically have a fitness function that takes in a candidate solution as input and gives you back a scalar telling you how good that candidate is. You then go on and allow the best individuals of a given generation to mate with higher probability than the rest, so that the offspring will be (hopefully) more 'fit' overall, and so on.

There is no way to evaluate fitness (how good is a candidate solution compared to the rest) in the RSA factorization scenario, so that's why you can't use them.

GAs are not brute-forcing, they’re just a search algorithm. Each GA essentially looks like this:

candidates = seed_value;
while (!good_enough(best_of(candidates))) {
    candidates = compute_next_generation(candidates);
}

Where good_enough and best_of are defined in terms of a fitness function. A fitness function says how well a given candidate solves the problem. That seems to be the core issue here: how would you write a fitness function for factorization? For example 20 = 2*10 or 4*5. The tuples (2,10) and (4,5) are clearly winners, but what about the others? How “fit” is (1,9) or (3,4)?

Indirectly, you can use a genetic algorithm to factor an integer N. Dixon's integer factorization method uses equations involving powers of the first k primes, modulo N. These products of powers of small primes are called "smooth". If we are using the first k=4 primes - {2,3,5,7} - 42=2x3x7 is smooth and 11 is not (for lack of a better term, 11 is "rough"). Dixon's method requires an invertible k x k matrix consisting of the exponents that define these smooth numbers. For more on Dixon's method see https://en.wikipedia.org/wiki/Dixon%27s_factorization_method.

Now, back to the original question: There is a genetic algorithm for finding equations for Dixon's method.

Let r be the inverse of a smooth number mod N - so r is a rough number
Let s be smooth
Generate random solutions of rx = sy mod N. These solutions [x,y] are the population for the genetic algorithm. Each x, y has a smooth component and a rough component. For example suppose x = 369 = 9 x 41. Then (assuming 41 is not small enough to count as smooth), the rough part of x is 41 and the smooth part is 9.
Choose pairs of solutions - "parents" - to combine into linear combinations with ever smaller rough parts.
The algorithm terminates when a pair [x,y] is found with rough parts [1,1], [1,-1],[-1,1] or [-1,-1]. This yields an equation for Dixon's method, because rx=sy mod N and r is the only rough number left: x and y are smooth, and s started off smooth. But even 1/r mod N is smooth, so it's all smooth!

Every time you combine two pairs - say [v,w] and [x,y] - the smooth parts of the four numbers are obliterated, except for the factors the smooth parts of v and x share, and the factors the smooth parts of w and y share. So we choose parents that share smooth parts to the greatest possible extent. To make this precise, write

g = gcd(smooth part of v, smooth part of x)

h = gcd(smooth part of w, smooth part of y)

[v,w], [x,y] = [g v/g, h w/h], [g x/g, h y/h].

The hard-won smooth factors g and h will be preserved into the next generation, but the smooth parts of v/g, w/h, x/g and y/h will be sacrificed in order to combine [v,w] and [x,y]. So we choose parents for which v/g, w/h, x/g and y/h have the smallest smooth parts. In this way we really do drive down the rough parts of our solutions to rx = sy mod N from one generation to the next.

On further thought the best way to make your way towards smooth coefficients x, y in the lattice ax = by mod N is with regression, not a genetic algorithm.

Two regressions are performed, one with response vector R0 consisting of x-values from randomly chosen solutions of ax = by mod N; and the other with response vector R1 consisting of y-values from the same solutions. Both regressions use the same explanatory matrix X. In X are columns consisting of the remainders of the x-values modulo smooth divisors, and other columns consisting of the remainders of the y-values modulo other smooth divisors.

The best choice of smooth divisors is the one that minimizes the errors from each regression:

E0 = R0 - X (inverse of (X-transpose)(X)) (X-transpose) (R0)

E1 = R1 - X (inverse of (X-transpose)(X)) (X-transpose) (R1)

What follows is row operations to annihilate X. Then apply a result z of these row operations to the x- and y-values from the original solutions from which X was formed.

z R0 = z R0 - 0
     = z R0 - zX (inverse of (X-transpose)(X)) (X-transpose) (R0)
     = z E0

Similarly, z R1 = z E1

Three properties are now combined in z R0 and z R1:

They are multiples of large smooth numbers, because z annihilates remainders modulo smooth numbers.
They are relatively small, since E0 and E1 are small.
Like any linear combination of solutions to ax = by mod N, z R0 and z R1 are themselves solutions to that equation.

A relatively small multiple of a large smooth number might just be the smooth number itself. Having a smooth solution of ax = by mod N yields an input to Dixon's method.

Two optimizations make this particularly fast:

There is no need to guess all the smooth numbers and columns of X at once. You can run regressions continuosly, adding one column to X at a time, choosing columns that reduce E0 and E1 the most. At no time will any two smooth numbers with a common factor be selected.
You can also start with a lot of random solutions of zx = by mod N, and remove the ones with the largest errors between selections of new columns for X.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow