Pointfree version worsens the performance

Question 1

For the simple test case, both versions yield the same core when compiled with optimisations, but without profiling.

When compiling with profiling enabled (-prof -fprof-auto), the pointfull version gets inlined, resulting in the main part being

Rec {
Main.main_go [Occ=LoopBreaker]
  :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer]
[GblId, Arity=1, Str=DmdType S]
Main.main_go =
  \ (ds_asR :: [GHC.Integer.Type.Integer]) ->
    case ds_asR of _ {
      [] -> xs_r1L8;
      : y_asW ys_asX ->
        let {
          r_aeN [Dmd=Just S] :: [GHC.Integer.Type.Integer]
          [LclId, Str=DmdType]
          r_aeN = Main.main_go ys_asX } in
        scctick<opEvery.\>
        GHC.Base.map
          @ GHC.Integer.Type.Integer
          @ GHC.Integer.Type.Integer
          (GHC.Integer.Type.plusInteger y_asW)
          r_aeN
    }
end Rec }

(you get something better without profiling).

When compiling the pointfree version with profiling enabled, opEvery' is not inlined, and you get

Main.opEvery'
  :: forall a_aeW.
     (a_aeW -> a_aeW -> a_aeW) -> [a_aeW] -> [a_aeW] -> [a_aeW]
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 80 60}]
Main.opEvery' =
  \ (@ a_c) ->
    tick<opEvery'>
    \ (x_ass :: a_c -> a_c -> a_c) ->
      scc<opEvery'>
      GHC.Base.foldr
        @ a_c
        @ [a_c]
        (\ (x1_XsN :: a_c) -> GHC.Base.map @ a_c @ a_c (x_ass x1_XsN))

Main.main4 :: [GHC.Integer.Type.Integer]
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
Main.main4 =
  scc<main>
  Main.opEvery'
    @ GHC.Integer.Type.Integer
    GHC.Integer.Type.plusInteger
    Main.main7
    Main.main5

When you add an {-# INLINABLE opEvery' #-} pragma, it can be inlined even when compiling for profiling, giving

Rec {
Main.main_go [Occ=LoopBreaker]
  :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer]
[GblId, Arity=1, Str=DmdType S]
Main.main_go =
  \ (ds_asz :: [GHC.Integer.Type.Integer]) ->
    case ds_asz of _ {
      [] -> lvl_r1KU;
      : y_asE ys_asF ->
        GHC.Base.map
          @ GHC.Integer.Type.Integer
          @ GHC.Integer.Type.Integer
          (GHC.Integer.Type.plusInteger y_asE)
          (Main.main_go ys_asF)
    }
end Rec }

which is even a bit faster than the pragma-less pointfull version, since it doesn't need to tick the counters.

It is likely that a similar effect occurred for the Stream case.

The takeaway:

Profiling inhibits optimisations. Code that is equivalent without profiling may not be with profiling support.
Never measure performance using code that was compiled for profiling or without optimisations.
Profiling can help you find out where the time is spent in your code [but, occasionally, enabling profiling can entirely alter the behaviour of the code; anything relying heavily on rewrite-rule optimisations and/or inlining is a candidate for that to happen], but it cannot tell you how fast your code is.

Question 2

This is a big assumption on what I'm going to say, but I think the compiler has got not enough information to optimize your program. While not answering directly to your question but adding the Eq a constraint to both functions (as a test) I've got an improvement from the pointfree variant. See image attached (splits explanation)

Right -> TOP = everyOp initial, BOTTOM = everyOp' initial
Left  -> TOP = everyOp with Eq a constraint, BOTTOM = everyOp' Eq a constraint

enter image description here

EDIT: I'm using GHC 7.4.2

Pointfree version worsens the performance

Simple test case