Question

In the book "The Haskell Road to Logic, Maths and Programming", the authors present two alternative ways of finding the least divisor k of a number n with k > 1, claiming that the second version is much faster than the first one. I have problems understanding why (I am a beginnner).

Here is the first version (page 10):

ld :: Integer -> Integer -- finds the smallest divisor of n which is > 1
ld n = ldf 2 n

ldf :: Integer -> Integer -> Integer
ldf k n | n `rem` k == 0 = k
        | k ^ 2 > n      = n
        | otherwise      = ldf (k + 1) n

If I understand this correctly, the ld function basically ends up iterating through all integers in the interval [2..sqrt(n)] and stops as soon as one of them divides n, returning it as the result.

The second version, which the authors claim to be much faster, goes like this (page 23):

ldp :: Integer -> Integer -- finds the smallest divisor of n which is > 1
ldp n = ldpf allPrimes n

ldpf :: [Integer] -> Integer -> Integer
ldpf (p:ps) n | rem n p == 0 = p
              | p ^ 2 > n    = n
              | otherwise    = ldpf ps n

allPrimes :: [Integer]
allPrimes = 2 : filter isPrime [3..]

isPrime :: Integer -> Bool
isPrime n | n < 1     = error "Not a positive integer"
          | n == 1    = False
          | otherwise = ldp n == n

The authors claim that this version is faster because it iterates only through the list of primes within the interval 2..sqrt(n), instead of iterating through all numbers in that range.

However, this argument doesn't convince me: the recursive function ldpf is eating one by one the numbers from the list of primes allPrimes. This list is generated by doing filter on the list of all integers.

So unless I am missing something, this second version ends up iterating through all numbers within the interval 2..sqrt(n) too, but for each number it first checks whether it is prime (a relatively expensive operation), and if so, it checks whether it divides n (a relatively cheap one).

I would say that just checking whether k divides n for each k should be faster. Where is the flaw in my reasoning?

Was it helpful?

Solution

The main advantage of the second solution is that you compute the list of primes allPrimes only once. Thanks to lazy evaluation, each call computes just the primes it needs, or reuses the ones that have been already computed. So the expensive part is computed only once and then just reused.

For computing the least divisor of just a single number, the first version is indeed more efficient. But try running ldp and ld for all numbers between let's say 1 and 100000 and you'll see the difference.

OTHER TIPS

haskell is unknown for me so without proper measurements of booth version I can only assume that the claims are right. In that case the reasons can be:

1.primes are precomputed in some array

  • then isprime? is not time expensive

2.primes are computed on the run (and memoizing)

  • from start will be the first version faster
  • but continuous use will make the second version faster (especially for bigger n)

3.primes are computed on the run (and not memoizing)

  • as DeadMG mentioned
  • compiler optimizations+CPU caching can sometimes feel like memoizing effect up to a point
  • in that case version 2 will be overall faster
  • but if you continue to use bigger and bigger n and small ones all the time
  • after reaching cache invalidation point it will slow down even more then version 1

4.this is just a guess

  • is it possible for your compiler to convert recursion
  • like single integer iteration through list or range
  • to loop ? (most recursions of this type can be converted to loop anyway)
  • that could explain all ...
  • no recursion calls overhead
  • no heap trashing

[Note]

  • as I wrote above I am not haskell user so treat this answer accordingly

As I understand this, the division operation isn't as expensive as you might think for the divisor of 2, this makes for half of allPrimes filtered out numbers to be checked for "shift right 1 bit" which is as simple as a computing operation can be, while the first algorithm will perform a relatively expensive true division by the whole of the number. Say if the possible divisor is 1956, it will be filtered out of allPrimes by executing the very first test at almost no cost (shift right will return zero - divisible by 2) while dividing a say 2^4253-1 by 1956 is already senseless, as it's not divisible by 2, and in case of really big numbers divisions take a lot of time, and at least half of them (or say 5/6, for divisors 2 and 3) are useless. Also allPrimes is a cached list, thus checking the next integer for a prime to be included in allPrimes uses only verified primes, so the primality test isn't extremely expensive even for the actual prime number. This combined gives the second method advantage.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top