Question

I'm new to Haskell and tinkering around with the Euler Project problems. My solution for problem #3 is far too slow. At first I tried this:

-- Problem 3
-- The prime factors of 13195 are 5, 7, 13 and 29.
-- What is the largest prime factor of the number 600851475143 ?

problem3 = max [ x | x <- [1..n], (mod n x) == 0, n /= x]
    where n = 600851475143

Then I changed it to return all x and not just the largest one.

problem3 = [ x | x <- [1..n], (mod n x) == 0, n /= x]
        where n = 600851475143

After 30 minutes, the list is still being processed and the output looks like this

[1,71,839,1471,6857,59569,104441,486847,1234169,5753023,10086647,87625999,408464633,716151937

Why is it so slow? Am I doing something terribly wrong or is it normal for this sort of task?

Was it helpful?

Solution

With your solution, there are about 600 billion possible numbers. As noted by delnan, making every check of the number quicker is not going to make much difference, we must limit the number of candidates.

Your solution does not seem to be correct either. 59569 = 71 * 839 isn't it? The question only asks for prime factors. Notice that 71 and 839 is in your list so you are doing something right. In fact, you are trying to find all factors.

I think the most dramatic effect you get simply by dividing away the factor before continuing.

euler3 = go 2 600851475143
  where
    go cand num
      | cand == num           = [num]
      | cand `isFactorOf` num = cand : go cand       (num `div` cand)
      | otherwise             =        go (cand + 1) num

isFactorOf a b = b `mod` a == 0

This may seem like an obvious optimization but it relies on the fact that if both a and b divides c and a is coprime to b then a divides c/b.

If you want to do more, the common "Only check until the square root" trick has been mentioned here. The same trick can be applied to this problem, but the performance gain does not show, unfortunately, on this instance:

euler3 = go 2 600851475143
  where
    go cand num
      | cand*cand > num       = [num]
      | cand `isFactorOf` num = cand : go cand       (num `div` cand)
      | otherwise             =        go (cand + 1) num

isFactorOf a b = b `mod` a == 0

Here, when a candidate is larger than the square root of the remaining number (num), we know that num must be a prime and therefore a prime factor of the original number (600851475143).

It is possible to remove even more candidates by only considering prime numbers, but this is slightly more advanced because you need to make a reasonably performant way of generating primes. See this page for ways of doing that.

OTHER TIPS

It's doing a lot of work! (It's also going to give you the wrong answer, but that's a separate issue!)

There are a few very quick ways you could speed it up by thinking about the problem a little first:

  • You are applying your function over all numbers 1..n, and checking each one of them to ensure it isn't n. Instead, you could just go over all numbers 1..n-1 and skip out n different checks (small though they are).
  • The answer is odd, so you can very quickly filter out any even numbers by going from 1..(n-1)/2 and checking for 2x instead of x.
  • If you think about it, all factors occur in pairs, so you can in fact just search from 1..sqrt(n) (or 1..sqrt(n)/2 if you ignore even numbers) and output pairs of numbers in each step.

Not related to the performance of this function, but it's worth noting that what you've implemented here will find all of the factors of a number, whereas what you want is only the largest prime factor. So either you have to test each of your divisors for primality (which is going to be slow, again) or you can implement the two in one step. You probably want to look at 'sieves', the most simple being the Sieve of Eratosthenes, and how you can implement them.

A complete factorization of a number can take a long time for big numbers. For Project Euler problems, a brute force solution (which this is) is usually not enough to find the answer in your lifetime.

Hint: you do not need to find all prime factors, just the biggest one.

TL;DR: The two things you were doing non-optimally, are: not stopping at the square root, and not dividing out each smallest factor, as they are found.


Here's a little derivation of the (2nd) factorization code shown in the answer by HaskellElephant. We start with your code:

f1 n = [ x | x <- [2..n], rem n x == 0]
n3 = 600851475143

Prelude> f1 n3
[71,839,1471,6857,59569,104441,486847Interrupted.

So it doesn't finish in any reasonable amount of time, and some of the numbers it produces are not prime... But instead of adding primality check to the list comprehension, let's notice that 71 is prime. The first number produced by f1 n is the smallest divisor of n, and thus it is prime. If it weren't, we'd find its smallest divisor first - a contradiction.

So, we can divide it out, and continue searching for the prime factors of newly reduced number:

f2 n = tail $ iterate (\(_,m)-> (\f->(f, quot m f)) . head $ f1 m) (1,n) 

Prelude> f2 n3
[(71,8462696833),(839,10086647),(1471,6857),(6857,1),(*** Exception: Prelude.hea
d: empty list

(the error, because f1 1 == []). We're done! (6857 is the answer, here...). Let's wrap it up:

takeUntil p xs = foldr (\x r -> if p x then [x] else x:r) [] xs
pfactors1 n = map fst . takeUntil ((==1).snd) . f2 $ n   -- prime factors of n

Trying out our newly minted solution,

Prelude> map pfactors1 [n3..]
[[71,839,1471,6857],[2,2,2,3,3,1259Interrupted.

suddenly we hit a new inefficiency wall, on numbers without small divisors. But if n = a*b and 1 < a <= b, then a*a <= a*b == n and so it is enough to test only until the square root of a number, to find its smallest divisor.

f12 n = [ x | x <- takeWhile ((<= n).(^2)) [2..n], rem n x == 0] ++ [n]
f22 n = tail $ iterate (\(_,m)-> (\f->(f, quot m f)) . head $ f12 m) (1,n) 
pfactors2 n = map fst . takeUntil ((==1).snd) . f22 $ n

What couldn't finish in half an hour now finishes in under one second (on a typical performant box):

Prelude> f12 n3
[71,839,1471,6857,59569,104441,486847,600851475143]

All the divisors above sqrt n3 were not needed at all. We unconditionally add n itself as the last divisor in f12 so it is able to handle prime numbers:

Prelude> f12 (n3+6)
[600851475149]

Since n3 / sqrt n3 = sqrt n3 ~= 775146, your original attempt at f1 n3 should have taken about a week to finish. That's how important this optimization is, of stopping at the square root.

Prelude> f22 n3
[(71,8462696833),(839,10086647),(1471,6857),(6857,1),(1,1),(1,1),(1,1),(1,1),(1,
1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1),(1,1)Interrupted

We've apparently traded the "Prelude.head: empty list" error for a non-terminating - but productive - behavior.


Lastly, we break f22 up in two parts and fuse them each into the other functions, for a somewhat simplified code. Also, we won't start over anew, as f12 does, searching for the smallest divisor from 2 all the time, anymore:

-- smallest factor of n, starting from d. directly jump from sqrt n to n.
smf (d,n) = head $ [ (x, quot n x) | x <- takeWhile ((<=n).(^2)) [d..]
                                   , rem n x == 0] ++ [(n,1)]

pfactors n = map fst . takeUntil ((==1).snd) . tail . iterate smf $ (2,n)

This expresses guarded (co)recursion through a higher-order function iterate, and is functionally equivalent to that code mentioned above. The following now runs smoothly, and we're even able to find a pair of twin primes as a bonus there:

Prelude Saga> map pfactors [n3..]
[[71,839,1471,6857],[2,2,2,3,3,1259,6628403],[5,120170295029],[2,13,37,227,27514
79],[3,7,7,11,163,2279657],[2,2,41,3663728507],[600851475149],[2,3,5,5,19,31,680
0809],[600851475151],[2,2,2,2,37553217197],[3,3,3,211,105468049],[2,7,11161,3845
351],[5,67,881,2035853],[2,2,3Interrupted.

Here is my solution for Euler Project #3. It takes only 1.22 sec on my Macbook Air.

First we should find all factors of the given number. But we know, that even numbers can't be prime numbers (except number 2). So, to solve Euler Project #3 we need not all, but only odd factors:

    getOddFactors num = [ x | x <- [3,5..num], num `rem` x == 0 ]

But we can optimize this function. If we plan to find a factor of num greater than sqrt num, we should have another factor which is less than sqrt num - and these possible factors we have found already. Hence, we can limit our list of possible factors by sqrt num:

    getOddFactors num = [ x | x <- [3, 5..(floor.sqrt.fromIntegral) num], 
                          num `rem` x == 0 ]

Next we want to know which of our odd factors of num are prime numbers:

    isPrime number = [ x | x <- [3..(floor.sqrt.fromIntegral) number], 
                       number `rem` x == 0] == []

Next we can filter odd factors of num with the function isPrime to find all prime factors of num. But to use laziness of Haskell to optimize our solution, we apply function filter isPrime to the reversed list of odd factors of the num. As soon as our function finds the first value which is prime number, Haskell stops computations and returns solution:

    largestPrimeFactor = head . filter isPrime . reverse . getOddDivisors

Hence, the solution is:

    ghci> largestPrimeFactor 600851475143
    6857
    (1.22 secs, 110646064 bytes)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top