Optimizing longest Collatz chain in Haskell

Question 1

The answer is simple: don’t memoise numbers above one million, but do that with numbers below.

module Main where

import qualified Data.Map as M
import Data.List
import Data.Ord

main = print $ fst $ maximumBy (comparing snd) $ M.toList ansMap

ansMap :: M.Map Integer Int
ansMap = M.fromAscList [(i, collatz i) | i <- [1..1000000]]
  where collatz 1 = 0
        collatz x = if x' <= 1000000 then 1 + ansMap M.! x'
                                     else 1 + collatz x'
          where x' = if even x then x `div` 2 else x*3 + 1

Question 2

This is obv waaay late but I thought I'd post anyways for future readers' benefit (I imagine OP is long done with this problem).

TL;DR: I think we probably want to use the Data.Vector package for this problem (and similar types of problems).

Longer version:

According to the Haskell docs, a Map (from Data.Map) has O(log N) access time whereas a Vector (from Data.Vector) has O(1) access; we can see the difference in the results below: the vector implementation runs ~3x faster. (Both are way better than lists which have O(N) access time.)

A couple of benchmarks are included below. The tests were intentionally not run one after another so as to prevent any cache-based optimization.

A couple of observations:

The largest absolute improvement (from the code in the original post) was due to the addition of type signatures; without being explicitly told that the data was of type Int, Haskell's type system was inferring that the data was of type Integer (which is obv bigger and slower)
A bit counterintuitive but, results are virtually indistinguishable between foldl1' and foldl1. (I double checked the code and ran these a few times just to make sure.)
Vector and Array (and, to a certain extent, Map) allow for decent improvement primarily as a result of memoization. (Note that OP's solution is likely a lot faster than a list-based solution that tried to use memoization given lists' O(N) access time.)

Here are a couple of benchmarks (all compiled using O2):

                                                       Probably want to look
                                                         at these numbers        
                                                                |             
                                                                V         
Data.Vector                     0.35s user 0.10s system 97% cpu 0.468 total
Data.Array (Haskell.org)        0.31s user 0.21s system 98% cpu 0.530 total
Data.Map (above answer)         1.31s user 0.46s system 99% cpu 1.772 total
Control.Parallel (Haskell.org)  1.75s user 0.05s system 99% cpu 1.799 total
OP (`Int` type sigs + foldl')   3.60s user 0.06s system 99% cpu 3.674 total
OP (`Int` type sigs)            3.53s user 0.16s system 99% cpu 3.709 total
OP (verbatim)                   3.36s user 4.77s system 99% cpu 8.146 total

Source of figures from Haskell.org: https://www.haskell.org/haskellwiki/Euler_problems/11_to_20#Problem_14

The Data.Vector implementation used to generate the above results:

import Data.Vector ( Vector, fromList, maxIndex, (!) )

main :: IO ()
main = putStrLn $ show $ largestCollatz 1000000

largestCollatz :: Int -> Int
largestCollatz n = maxIndex vector
  where 
    vector :: Vector Int               
    vector = fromList $ 0 : 1 : [collatz x x 0 | x <- [2..n]]

    collatz m i c =
      case i < m of
        True  -> c + vector ! i
        False -> let j = if even i then i `div` 2 else 3*i + 1
                 in collatz m j (c+1)