Looking at the core from 7.6.1, with -O2
and -dsuppress-uniques
, the function that does the work, Main.main_$spoly_$wa
is structurally (almost) identical whether I use int
or Int64
as the index type. Since the core is long and complicated, here's the diff
output:
$ diff Int_core.dump-simpl Int64_core.dump-simpl
11,12c11,12
< (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
< (Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int))
---
> (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)
> (Main.C (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int))
26,27c26,27
< (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
< (Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)))
---
> (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)
> (Main.C (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)))
Different index types, these are of course different.
33,40d32
< l :: GHC.Types.Int
< [LclId]
< l = GHC.Types.I# sc } in
< let {
< u :: GHC.Types.Int
< [LclId]
< u = GHC.Types.I# sc1 } in
< let {
For index type Int
, GHC produces somewhat more informative errors for out-of-bounds indices, for that it needs the lower and upper bound of the valid indices. (The default implementation of index
is not overridden in the instance Ix Int64
.)
45,46c37
< GHC.Types.False ->
< case poly_$w$j5 (GHC.Types.I# a) l u of wild2 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild1 { };
Different error, indexError
vs. hopelessIndexError
. The following differences also only concern index errors.
49,50c40
< GHC.Types.False ->
< case poly_$w$j5 (GHC.Types.I# a) l u of wild2 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild2 { };
58c48
< case poly_$w$j4 y (GHC.Types.I# sc2) of wild3 { };
---
> case poly_$w$j3 y (GHC.Types.I# sc2) of wild4 { };
62c52
< case poly_$w$j4 y (GHC.Types.I# sc2) of wild5 { };
---
> case poly_$w$j3 y (GHC.Types.I# sc2) of wild5 { };
77,78c67
< GHC.Types.False ->
< case poly_$w$j3 (GHC.Types.I# a1) l u of wild6 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild6 { };
81,82c70
< GHC.Types.False ->
< case poly_$w$j3 (GHC.Types.I# a1) l u of wild7 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild7 { };
Now once more the different index type:
110c98
< GHC.Types.Int
---
> GHC.Int.Int64
152c140
< s GHC.Types.Int GHC.Types.Int>)>)
---
> s GHC.Int.Int64 GHC.Types.Int>)>)
And finally, 0
and 1
have gotten different top-level names.
177,178c165,166
< 0 -> (# sc5, lvl5 #);
< 1 -> (# sc5, lvl6 #)
---
> 0 -> (# sc5, lvl #);
> 1 -> (# sc5, lvl1 #)
So the entire code that does the actual work is identical. Yet the one causes a stack overflow (though only just, -K9M
suffices [-K8731K
is enough here, -K8730K
not]) and the other doesn't.
The difference is indeed caused by the index errors. The code with Int
indices allocates two boxed Int
s in every recursive call that the Int64
code doesn't allocate, because
Main.main_$spoly_$wa [Occ=LoopBreaker]
:: forall s.
GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.MutableByteArray# s
-> (GHC.Prim.~#)
*
(Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
(Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int))
-> GHC.Prim.Int#
-> GHC.Prim.State# s
-> (# GHC.Prim.State# s, GHC.Types.Int #)
carries around two references to the array.
That uses more stack, and these boxed Int
s have to be garbage collected, which causes the much larger GC figures. Additionally, the thunk for the index error is a bit larger than the hopelessIndexError
thunk.
Now, if you help the compiler by
- removing the newtype wrapper
- making the function strict in the array (by bang patterns or
data C a = C !a
)
or some other ways, it produces better code that manages without a stack overflow for the given argument, since there is only one reference to the array in the worker, and thus the allocation of the boxed Int
s for the bounds isn't needed.
Note that this algorithm causes a stack overflow for slightly larger arguments even with the help for the compiler.