Space leak with redundant use of seq in GHC interpreter
-
11-06-2021 - |
Question
I type this code into the interpreter and memory is rapidly consumed:
last [1..10^7] `seq` ()
I can't see why this needs more than O(1) space. If i do just (which should be the same, because Show forces weak head normal form, so seq is redundant?):
last [1..10^7]
...it works fine.
I'm unable to reproduce this situation outside the interpreter.
What's going on here?
Here are some test cases: http://hpaste.org/69234
Things to note:
- By running in the interpreter, I load wtf.hs without compiling it, and type
wtf<n>
in ghci. - By compiling, I do
ghc --make wtf.hs && ./wtf
. last
can be substituted for asum
with strict accumulator or a function that finds the max element in the list, and the space leak still happens- I haven't seen this behaviour when using
$!
instead ofseq
. - I tried adding a dummy
()
parameter because I thought maybe this is a CAF problem. Changes nothing. - It's probably not a problem with functions on
Enum
, because I can reproduce the behaviour withwtf5
and later, which don't useEnum
at all. - It's probably not a problem with
Num
,Int
, orInteger
, because I can reproduce the behaviour without them inwtf14
andwtf16
.
I tried reproducing the problem with Peano arithmetic to take lists and integers out of the equation (fetching the at the end of of 10^9), but ran into other sharing/space leak problems I don't understand when trying to implement *
.
No correct solution
OTHER TIPS
We need to look at the instance of enumFromTo
for Integer, and last:
last [] = errorEmptyList "last"
last (x:xs) = last' x xs
where last' y [] = y
last' _ (y:ys) = last' y ys
It is defined in GHC.Enum as:
enumFrom x = enumDeltaInteger x 1
enumFromThen x y = enumDeltaInteger x (y-x)
enumFromTo x lim = enumDeltaToInteger x 1 lim
where
enumDeltaInteger :: Integer -> Integer -> [Integer]
enumDeltaInteger x d = x `seq` (x : enumDeltaInteger (x+d) d)
-- strict accumulator, so
-- head (drop 1000000 [1 .. ]
-- works
and
enumDeltaToInteger :: Integer -> Integer -> Integer -> [Integer]
enumDeltaToInteger x delta lim
| delta >= 0 = up_list x delta lim
| otherwise = dn_list x delta lim
up_list :: Integer -> Integer -> Integer -> [Integer]
up_list x0 delta lim = go (x0 :: Integer)
where
go x | x > lim = []
| otherwise = x : go (x+delta)
last
is fully lazy, as expected.
For the Integer Enum class, we have a strict accumulator (explicitly) for enumFrom
. In the bounded case (e.g. [1..n]
), it calls enumDeltaToInteger
and then into up_list
, which uses a worker to unfold a list until its limit is reached.
But up_list
is strict in x
in the go
helper (see the comparison against lim
).
When run in GHCi none of this is optimized, yielding naive calls to enumFromTo, before returning ()
.
let
it_ax6 :: ()
it_ax6 =
case last
@ GHC.Integer.Type.Integer
(GHC.Enum.enumFromTo
@ GHC.Integer.Type.Integer
GHC.Num.$fEnumInteger
(GHC.Integer.smallInteger 1)
(GHC.Real.^
@ GHC.Integer.Type.Integer
@ GHC.Integer.Type.Integer
GHC.Num.$fNumInteger
GHC.Real.$fIntegralInteger
(GHC.Integer.smallInteger 10)
(GHC.Integer.smallInteger 7)))
of _ -> GHC.Unit.()
in
GHC.Base.thenIO
@ ()
@ [()]
(System.IO.print @ () GHC.Show.$fShow() it_ax6)
(GHC.Base.returnIO
@ [()] (GHC.Types.: @ () it_ax6 (GHC.Types.[] @ ())))
So, why are we retaining the list in the seq
case, and not in the regular case? The regular case runs nicely in constrant space, relying on the laziness of enumFromTo
for Integer
and for last
. The GHCi core for that case looks like:
let {
it_aKj :: GHC.Integer.Type.Integer
[LclId,
Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False,
ConLike=False, Cheap=False, Expandable=False,
Guidance=IF_ARGS [] 170 0}]
it_aKj =
GHC.List.last
@ GHC.Integer.Type.Integer
(GHC.Enum.enumFromTo
@ GHC.Integer.Type.Integer
GHC.Num.$fEnumInteger
(GHC.Integer.smallInteger 1)
(GHC.Real.^
@ GHC.Integer.Type.Integer
@ GHC.Integer.Type.Integer
GHC.Num.$fNumInteger
GHC.Real.$fIntegralInteger
(GHC.Integer.smallInteger 10)
(GHC.Integer.smallInteger 7))) } in
GHC.Base.thenIO
@ ()
@ [()]
(System.IO.print
@ GHC.Integer.Type.Integer GHC.Num.$fShowInteger it_aKj)
(GHC.Base.returnIO
@ [()]
(GHC.Types.:
@ ()
(it_aKj
`cast` (UnsafeCo GHC.Integer.Type.Integer ()
:: GHC.Integer.Type.Integer ~ ()))
(GHC.Types.[] @ ())))
So these are almost identical, with the differences being:
- in the
seq
version,last (enumFromTo ..)
is forced inside acase
. - in the regular version, it is a lazy
let
. - in the
seq
version, the value is computed then discarded, yielding a()
-- nothing looks at the result - in the regular case, it is inspected and shown.
What is odd is that there's nothing magic about:
let x = case last (enumFromTo 1 n) of _ -> ()
that makes it retain values.
As we see, the implementation of up_list
is strict in its accumulator (since it compares against lim
, and the list is unfolded lazily -- so last
should be able to consume it in constant space). Writing the expression by hand confirms this.
Doing a heap profile of the ghci execution shows the entire list being retained:
which tells us at least that it isn't a chain of thunks, but rather, the entire list is being built strictly and held on to, until being discarded.
So the mystery is: what is holding onto the list argument to last
in ghci, and not in ghc?
I suspect some internal (or subtle) detail of ghci now -- I think this is worth a ghci ticket.
I think @n.m is right. Nothing is forcing the value in the list, so the 1+1+1+1+... thunk eventually kills space.
I will whip up a quick test.