Project Euler No. 14 Haskell

Question 1

The easiest optimization (as you have identified) is memoization. You have attempted create a memoization system yourself, however have come across issues on how to store the memoized values. There are solutions to doing this in a maintainable way, such as using a State monad or a STArray. However, there is a much simpler solution to your problem - use haskell's existing memoization. Haskell by default remembers constant values, so if you create a value that stores the collatz values, it will be automatically memoized!

A simple example of this is the following fibonacci definition:

fib :: Int -> Integer
fib n = fibValues !! n where
  fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)

The fibValues is a [Integer], and as it is just a constant value, it is memoized. However, that doesn't mean it is all memoized at once, since as it is an infinte list, this would never finish. Instead, the values are only calculated when needed, as haskell is lazy.

So if you do something similar with your problem, you will get memoization without a lot of the work. However, using a list like above won't work well in your solution. This is because the collatz algorithm uses many different values to get the result for a given number, so the container used will require random access to be efficient. The obvious choice is an array.

collatzMemoized :: Array Integer Int

Next, we need to fill up the array with the correct values. I'll write this function pretending a collatz function exists that calculates the collatz value for any n. Also, note that arrays are fixed size, so a value needs to be used to determine the maximum number to memoize. I'll use a million, but any value can be used (it is a memory/speed tradeoff).

collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where
  maxNumberToMemroize = 1000000

That is pretty straightforward, the listArray is given bounds, and the a list of all the collatz values in that range is given to it. Remember that this won't calculate all the collatz values straight away, as the values are lazy.

Now, the collatz function can be written. The most important part is to only check the collatzMemoized array if the number being checked is within its bounds:

collatz :: Integer -> Int
collatz 1 = 1
collatz n
  | inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
  | otherwise = 1 + collatz nextValue
  where
    nextValue = case n of
      1 -> 1
      n | even n -> n `div` 2
        | otherwise -> 3 * n + 1

In ghci, you can now see the effectiveness of the memoization. Try collatz 200000. It will take about 2 seconds to finish. However, if you run it again, it will complete instantly.

Finally, the solution can be found:

maxCollatzUpTo :: Integer -> (Integer, Int)
maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where

and then printed:

main = print $ maxCollatzUpTo 1000000

If you run main, the result will be printed in about 10 seconds.

Now, a small problem with this approach is it uses a lot of stack space. It will work fine in ghci (which seems to use be more flexible with regards to stack space). However, if you compile it and try to run the executable, it will crash (with a stack space overflow). So to run the program, you have to specify more when you compile it. This can be done by adding -with-rtsopts='K64m' to the compile options. This increases the stack to 64mb.

Now the program can be compiled and ran:

> ghc -O3 --make -with-rtsopts='-K6m' problem.hs

Running ./problem will give the result in less than a second.

Question 2

You are going about memoization the hard way, trying to write an imperative program in Haskell. Borrowing from David Eisenstat's solution, we'll solve it as j_random_hacker suggested:

collatzLength :: Integer -> Integer
collatzLength n
    | n == 1 = 1
    | even n = 1 + collatzLength (n `div` 2)
    | otherwise = 1 + collatzLength (3*n + 1)

The dynamic programming solution for this is to replace the recursion with looking things up in a table. Let's make a function where we can replace the recursive call:

collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
    | n == 1 = 1
    | even n = 1 + r (n `div` 2)
    | otherwise = 1 + r (3*n + 1)

Now we could define the recursive algorithm as

collatzLength :: Integer -> Integer
collatzLength = collatzLengthDef collatzLength

Now we could also make a tabled version of this (it takes a number for the table size, and returns a collatzLength function that is calculated using a table of that size):

-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds

collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
    where
        bounds = (1, n)
        table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
        collatzLengthTableLookup =
            \x -> Case inRange bounds x of
                True -> table ! x
                _ -> (collatzLengthDef collatzLengthTableLookup) x

This works by defining the collatzLength to be a table lookup, with the table being the definition of the function, but with recursive calls replaced by table lookup. The table lookup function checks to see if the argument to the function is in the range that is tabled, and falls back on the definition of the function. We can even make this work for tabling any function like this:

tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
    where
        table = buildTable bounds (definition tableLookup)
        tableLookup =
            \x -> Case inRange bounds x of
                True -> table ! x
                _ -> (definition tableLookup) x

collatzLengthTabled n = tableRange (1, n) collatzLengthDef

You just need to make sure that you

let memoized = collatzLengthTabled 10000000
    ... memoized ...

So that only one table is built in memory.

Question 3

I remember finding memoisation of dynamic programming algorithms very counterintuitive in Haskell, and it's been a while since I've done it, but hopefully the following trick works for you.

But first, I don't quite understand your current DP scheme, though I suspect it may be quite inefficient as it seems like it will need to update many entries for each answer. (a) I don't know how to do this in Haskell, and (b) you don't need to do this to solve the problem efficiently ;-)

I suggest the following approach instead: first build an ordinary recursive function that computes the right answer for an input number. (Hint: it will have a signature like collatzLength :: Int -> Int.) When you have this function working, just replace its definition with the definition of an array whose elements are defined lazily with the array function using an association list, and replace all recursive calls to the function to array lookups (e.g. collatzLength 42 would become collatzLength ! 42). This will automagically populate the array in the necessary order! So your "top-level" collatzLength object will now actually be an array, rather than a function.

As I suggested above, I would use an array instead of a map datatype to hold the DP table, since you will need to store values for all integer indices from 1 up to 1,000,000.

Question 4

I don't have a Haskell compiler handy, so I apologize for any broken code.

Without memoization, there's a function

collatzLength :: Integer -> Integer
collatzLength n
    | n == 1 = 1
    | even n = 1 + collatzLength (n `div` 2)
    | otherwise = 1 + collatzLength (3*n + 1)

With memoization, the type signature is

memoCL :: Map Integer Integer -> Integer -> (Map Integer Integer, Integer)

since memoCL receives a table as input and gives the updated table as output. What memoCL needs to do is intercept the return of the recursive call with a let form and insert the new result.

-- table must have an initial entry for 1

memoCL table n = case Map.lookup n table of
    Just m -> (table, m)
    Nothing -> let (table', m) = memoCL table (collatzStep n) in (Map.insert n (1 + m) table', 1 + m)

collatzStep :: Integer -> Integer
collatzStep n = if even n then n `div` 2 else 3*n + 1

At some point you'll get sick of the above idiom. Then it's time for monads.

Question 5

I eventually modify the {-WRONG-} part to do what it should with a call to mark x (b, n) [] xs table where

        mark :: Integer -> (Bool, Integer) -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
        mark crtElem (b, n) list xs table
            | b == False    = mark n (findElem n table) (crtElem:list) xs table
            | otherwise = continueWith n list xs table

        continueWith :: Integer -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
        continueWith _   []     xs table    = solve' xs table
        continueWith cnt (y:ys) xs table    = continueWith (cnt - 1) ys xs (Map.insert y (True, cnt - 1) table)

        findElem :: Integer -> Map Integer (Bool, Integer) -> (Bool, Integer)
        findElem n table = 
            case Map.lookup n table of
                Nothing     -> (False, 0)
                Just (b, nr)    -> (b, nr)

But it seams that there are better (and far less verbose) answers than this 1

Question 6

Maybe you might find interesting how I solved the problem. Its is pretty functional though it might be not the most efficient thing on earth :)

You can find the code here: https://github.com/fmancinelli/project-euler/blob/master/haskell/project-euler/Problem014.hs

P.S.: Disclaimer: I was doing Project Euler exercises in order to learn Haskell, so the quality of the solution could be debatable.

Question 7

Since we are studying recursion schemes, here's one for you.

Let's consider functor N(A,B,X)=A+B*X, which is a stream of Bs with the last element being A.

{-# LANGUAGE DeriveFunctor
           , TypeFamilies
           , TupleSections #-}

import Data.Functor.Foldable
import qualified Data.Map as M
import Data.List
import Data.Function
import Data.Int

data N a b x = Z a | S b x deriving (Functor)

This stream is handy for several kinds of iterations. For one, we can use it to represent a chain of Ints in a Collatz sequence:

type instance Base Int64 = N Int Int64

instance Foldable Int64 where
  project 1 = Z 1
  project x | odd x = S x $ 3*x+1
  project x = S x $ x `div` 2

This is just a algebra, not a initial one, because the transformation is not a isomorphism (same chain of Ints is part of a chain for 2*x and (x-1)/3), but this is sufficient to represent the fixpoint Base Int64 Int64.

With this definition, cata is going to feed the chain to the algebra given to it, and you can use it to construct a memo Map of integers to the chain length. Finally, anamorphism can use it to generate a stream of solutions to the problem of different sizes:

problems = ana (uncurry $ cata . phi) (M.empty, 1) where
    phi :: M.Map Int64 Int -> 
           Base Int64 (Prim [(Int64, Int)] (M.Map Int64 Int, Int64)) ->
           Prim [(Int64, Int)] (M.Map Int64 Int, Int64)
    phi m (Z v) = found m 1 v
    phi m (S x ~(Cons (_, v') (m', _))) = maybe (notFound m' x v') (found m x) $
                                          M.lookup x m

The ~ before (Cons ...) means lazy pattern matching. We don't touch the pattern until the values are needed. If not for lazy pattern matching, it would always construct the whole chain, and using the map would be useless. With lazy pattern matching we only construct the values v' and m' if the chain length for x was not in the map.

Helper functions construct the stream of (Int, chain length) pairs:

    found m x v = Cons (x, v) (m, x+1)
    notFound m x v = Cons (x, 1+v) (M.insert x (1+v) m, x+1)

Now just take the first 999999 problems, and figure out the one that has the longest chain:

main = print $ maximumBy (compare `on` snd) $ take 999999 problems

This works slower than array-based solution, because Map lookup is logarithmic of map size, but this solution is not fixed size. Still, it finishes in about 5 seconds.