Pergunta

I want to challenge GHC compiler, so I wrote this code (the detail of the code is actually unimportant, only to show that some hard work has to be done to get each element of this infinite list):

hardwork :: [Int]
hardwork = step1 where
    -- start with value 1
    step1 = step2 1 
    -- force the hard work being done
    step2 i = step3 i $! step4 i
    -- construct the current result, start next 
    step3 i result = result : step2 (i+1)
    -- start the work with n=0 and j=1
    step4 i = step5 i 0 1
    -- if n=1048576, finish work and return j
    step5 _ 1048576 j = j
    -- otherwise increase n, and modify j
    step5 i n j = step5 i (n+1) ((j * i) `mod` 1048575)

Now I use the cleave function described in the Haskellwiki

cleave :: [a] -> ([a],[a])
cleave = step1 where
    step1 xs = (odds xs, evens xs)
    odds [] = []
    odds [x] = [x]
    odds (x:_:xs) = x : odds xs
    evens [] = []
    evens [x] = []
    evens (_:x:xs) = x : evens xs

and the main function

main :: IO ()
main = do
    print (take 5 (fst $ cleave hardwork), take 4 (snd $ cleave hardwork))

Like expected, it prints out values slowly, since it has to do very hard work to get the results. However what surprising me is, once the fist list being printed out, the second list was calculated immediately.

This is a surprise because, since the two occurrence of cleave hardwork seem to be unrelated in the code, and we are accessing the different part of them, it looks like a naive implementation will do the hard work again to get the second list. However GHC seems to be cleverer than I think.

My question is: how did they manage to do so? What is the magic behind this? More precisely, how the runtime figure out some requested value has already been evaluated (even if they were never being accessed)? Are there any cost for this kind of bookkeeping?

BTW, to ensure I am doing the right thing in the right way, I used a un-sugared, step by step style to define hardwork. There are of cause other ways to implement it but if it uses any sugars the behaviour may depend on the detail how the code being de-sugared by the compiler. Also, this step-by-step style makes paper evaluation by manually substitute expressions easier.

EDIT

So according to the answers, I rewrote hardwork to make it not a CAF (this is a more generic way to do so than the answer suggested anyway):

hardwork :: a -> [Int]
hardwork = step1 where
    step1 _ = step2 1
    ...

Now it results in the main running slow on both part of the result. But if I replace main with

print (take 5 $ fst value, take 6 $ snd value) where value = cleave hardwork()

It works the same way the first version do. So it looks like a prove of what the accepted answer said.

Foi útil?

Solução

hardwork is a constant, defined at the top level of your program, so once it is computed once, its results are saved (just as if you had started main with let hardwork = ... in ...). If you wanted to compute it twice, you could define it as a function, and either ignore the first argument or use it as the seed, for example by changing the first few lines of hardwork to

hardwork :: Int -> [Int]
hardwork seed = step1 where
    step1 = step2 seed

Then if you call hardwork 1 twice, the same list will be recomputed each time.

Outras dicas

hardwork is a CAF, not a function, so its computation result doesn't get GC'd between accesses.

How to make a CAF not a CAF in Haskell?

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top