Domanda

I've been doing project Euler problems to learn Haskell.

I've have some bumps on the way but managed to get to problem 14. The question is, which starting number under 1 000 000 produces the longest Collatz chain (numbers are allowed to go above one million after the chain starts).

I've tried a couple of solutions but none of the worked. I wanted to do a reverse. Starting from 1 and terminating when the number gets above one million but that obviously doesn't work since the terms can go higher than one million.

I've tried memoizing the normal algorithm but again, too large numbers, to much memoization.

I've read that the most obvious solution should work for this but for some reason, my solution takes over 10 seconds to get the maximum up to 20 000. Let alone 1 million.

This is the code I'm using at the moment:

reg_collatz 1 = 1
reg_collatz n
        | even n        = 1 + reg_collatz (n `div` 2)
        | otherwise     = 1 + reg_collatz (n * 3 + 1)

solution = foldl1 (\a n -> max a (reg_collatz n)) [1..20000]

Any help is very welcome.

È stato utile?

Soluzione

The answer is simple: don’t memoise numbers above one million, but do that with numbers below.

module Main where

import qualified Data.Map as M
import Data.List
import Data.Ord

main = print $ fst $ maximumBy (comparing snd) $ M.toList ansMap

ansMap :: M.Map Integer Int
ansMap = M.fromAscList [(i, collatz i) | i <- [1..1000000]]
  where collatz 1 = 0
        collatz x = if x' <= 1000000 then 1 + ansMap M.! x'
                                     else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

Altri suggerimenti

This is obv waaay late but I thought I'd post anyways for future readers' benefit (I imagine OP is long done with this problem).

TL;DR: I think we probably want to use the Data.Vector package for this problem (and similar types of problems).

Longer version:

According to the Haskell docs, a Map (from Data.Map) has O(log N) access time whereas a Vector (from Data.Vector) has O(1) access; we can see the difference in the results below: the vector implementation runs ~3x faster. (Both are way better than lists which have O(N) access time.)

A couple of benchmarks are included below. The tests were intentionally not run one after another so as to prevent any cache-based optimization.

A couple of observations:

  • The largest absolute improvement (from the code in the original post) was due to the addition of type signatures; without being explicitly told that the data was of type Int, Haskell's type system was inferring that the data was of type Integer (which is obv bigger and slower)

  • A bit counterintuitive but, results are virtually indistinguishable between foldl1' and foldl1. (I double checked the code and ran these a few times just to make sure.)

  • Vector and Array (and, to a certain extent, Map) allow for decent improvement primarily as a result of memoization. (Note that OP's solution is likely a lot faster than a list-based solution that tried to use memoization given lists' O(N) access time.)

Here are a couple of benchmarks (all compiled using O2):

                                                       Probably want to look
                                                         at these numbers        
                                                                |             
                                                                V         
Data.Vector                     0.35s user 0.10s system 97% cpu 0.468 total
Data.Array (Haskell.org)        0.31s user 0.21s system 98% cpu 0.530 total
Data.Map (above answer)         1.31s user 0.46s system 99% cpu 1.772 total
Control.Parallel (Haskell.org)  1.75s user 0.05s system 99% cpu 1.799 total
OP (`Int` type sigs + foldl')   3.60s user 0.06s system 99% cpu 3.674 total
OP (`Int` type sigs)            3.53s user 0.16s system 99% cpu 3.709 total
OP (verbatim)                   3.36s user 4.77s system 99% cpu 8.146 total

Source of figures from Haskell.org: https://www.haskell.org/haskellwiki/Euler_problems/11_to_20#Problem_14


The Data.Vector implementation used to generate the above results:

import Data.Vector ( Vector, fromList, maxIndex, (!) )

main :: IO ()
main = putStrLn $ show $ largestCollatz 1000000

largestCollatz :: Int -> Int
largestCollatz n = maxIndex vector
  where 
    vector :: Vector Int               
    vector = fromList $ 0 : 1 : [collatz x x 0 | x <- [2..n]]

    collatz m i c =
      case i < m of
        True  -> c + vector ! i
        False -> let j = if even i then i `div` 2 else 3*i + 1
                 in collatz m j (c+1)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top