For the simple test case, both versions yield the same core when compiled with optimisations, but without profiling.
When compiling with profiling enabled (-prof -fprof-auto
), the pointfull version gets inlined, resulting in the main part being
Rec {
Main.main_go [Occ=LoopBreaker]
:: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer]
[GblId, Arity=1, Str=DmdType S]
Main.main_go =
\ (ds_asR :: [GHC.Integer.Type.Integer]) ->
case ds_asR of _ {
[] -> xs_r1L8;
: y_asW ys_asX ->
let {
r_aeN [Dmd=Just S] :: [GHC.Integer.Type.Integer]
[LclId, Str=DmdType]
r_aeN = Main.main_go ys_asX } in
scctick<opEvery.\>
GHC.Base.map
@ GHC.Integer.Type.Integer
@ GHC.Integer.Type.Integer
(GHC.Integer.Type.plusInteger y_asW)
r_aeN
}
end Rec }
(you get something better without profiling).
When compiling the pointfree version with profiling enabled, opEvery'
is not inlined, and you get
Main.opEvery'
:: forall a_aeW.
(a_aeW -> a_aeW -> a_aeW) -> [a_aeW] -> [a_aeW] -> [a_aeW]
[GblId,
Str=DmdType,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, WorkFree=False, Expandable=False,
Guidance=IF_ARGS [] 80 60}]
Main.opEvery' =
\ (@ a_c) ->
tick<opEvery'>
\ (x_ass :: a_c -> a_c -> a_c) ->
scc<opEvery'>
GHC.Base.foldr
@ a_c
@ [a_c]
(\ (x1_XsN :: a_c) -> GHC.Base.map @ a_c @ a_c (x_ass x1_XsN))
Main.main4 :: [GHC.Integer.Type.Integer]
[GblId,
Str=DmdType,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
ConLike=False, WorkFree=False, Expandable=False,
Guidance=IF_ARGS [] 40 0}]
Main.main4 =
scc<main>
Main.opEvery'
@ GHC.Integer.Type.Integer
GHC.Integer.Type.plusInteger
Main.main7
Main.main5
When you add an {-# INLINABLE opEvery' #-}
pragma, it can be inlined even when compiling for profiling, giving
Rec {
Main.main_go [Occ=LoopBreaker]
:: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer]
[GblId, Arity=1, Str=DmdType S]
Main.main_go =
\ (ds_asz :: [GHC.Integer.Type.Integer]) ->
case ds_asz of _ {
[] -> lvl_r1KU;
: y_asE ys_asF ->
GHC.Base.map
@ GHC.Integer.Type.Integer
@ GHC.Integer.Type.Integer
(GHC.Integer.Type.plusInteger y_asE)
(Main.main_go ys_asF)
}
end Rec }
which is even a bit faster than the pragma-less pointfull version, since it doesn't need to tick the counters.
It is likely that a similar effect occurred for the Stream
case.
The takeaway:
- Profiling inhibits optimisations. Code that is equivalent without profiling may not be with profiling support.
- Never measure performance using code that was compiled for profiling or without optimisations.
- Profiling can help you find out where the time is spent in your code [but, occasionally, enabling profiling can entirely alter the behaviour of the code; anything relying heavily on rewrite-rule optimisations and/or inlining is a candidate for that to happen], but it cannot tell you how fast your code is.