Looking at the core from 7.6.1, with `-O2`

and `-dsuppress-uniques`

, the function that does the work, `Main.main_$spoly_$wa`

is structurally (almost) identical whether I use `int`

or `Int64`

as the index type. Since the core is long and complicated, here's the `diff`

output:

```
$ diff Int_core.dump-simpl Int64_core.dump-simpl
11,12c11,12
< (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
< (Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int))
---
> (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)
> (Main.C (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int))
26,27c26,27
< (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
< (Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)))
---
> (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)
> (Main.C (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)))
```

Different index types, these are of course different.

```
33,40d32
< l :: GHC.Types.Int
< [LclId]
< l = GHC.Types.I# sc } in
< let {
< u :: GHC.Types.Int
< [LclId]
< u = GHC.Types.I# sc1 } in
< let {
```

For index type `Int`

, GHC produces somewhat more informative errors for out-of-bounds indices, for that it needs the lower and upper bound of the valid indices. (The default implementation of `index`

is not overridden in the `instance Ix Int64`

.)

```
45,46c37
< GHC.Types.False ->
< case poly_$w$j5 (GHC.Types.I# a) l u of wild2 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild1 { };
```

Different error, `indexError`

vs. `hopelessIndexError`

. The following differences also only concern index errors.

```
49,50c40
< GHC.Types.False ->
< case poly_$w$j5 (GHC.Types.I# a) l u of wild2 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild2 { };
58c48
< case poly_$w$j4 y (GHC.Types.I# sc2) of wild3 { };
---
> case poly_$w$j3 y (GHC.Types.I# sc2) of wild4 { };
62c52
< case poly_$w$j4 y (GHC.Types.I# sc2) of wild5 { };
---
> case poly_$w$j3 y (GHC.Types.I# sc2) of wild5 { };
77,78c67
< GHC.Types.False ->
< case poly_$w$j3 (GHC.Types.I# a1) l u of wild6 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild6 { };
81,82c70
< GHC.Types.False ->
< case poly_$w$j3 (GHC.Types.I# a1) l u of wild7 { };
---
> GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild7 { };
```

Now once more the different index type:

```
110c98
< GHC.Types.Int
---
> GHC.Int.Int64
152c140
< s GHC.Types.Int GHC.Types.Int>)>)
---
> s GHC.Int.Int64 GHC.Types.Int>)>)
```

And finally, `0`

and `1`

have gotten different top-level names.

```
177,178c165,166
< 0 -> (# sc5, lvl5 #);
< 1 -> (# sc5, lvl6 #)
---
> 0 -> (# sc5, lvl #);
> 1 -> (# sc5, lvl1 #)
```

So the entire code that does the actual work is identical. Yet the one causes a stack overflow (though only just, `-K9M`

suffices [`-K8731K`

is enough here, `-K8730K`

not]) and the other doesn't.

The difference is indeed caused by the index errors. The code with `Int`

indices allocates two boxed `Int`

s in every recursive call that the `Int64`

code doesn't allocate, because

```
Main.main_$spoly_$wa [Occ=LoopBreaker]
:: forall s.
GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.Int#
-> GHC.Prim.MutableByteArray# s
-> (GHC.Prim.~#)
*
(Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
(Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int))
-> GHC.Prim.Int#
-> GHC.Prim.State# s
-> (# GHC.Prim.State# s, GHC.Types.Int #)
```

carries around *two* references to the array.

That uses more stack, and these boxed `Int`

s have to be garbage collected, which causes the much larger GC figures. Additionally, the thunk for the index error is a bit larger than the `hopelessIndexError`

thunk.

Now, if you help the compiler by

- removing the newtype wrapper
- making the function strict in the array (by bang patterns or
`data C a = C !a`

)

or some other ways, it produces better code that manages without a stack overflow for the given argument, since there is only *one* reference to the array in the worker, and thus the allocation of the boxed `Int`

s for the bounds isn't needed.

Note that this algorithm causes a stack overflow for slightly larger arguments even with the help for the compiler.