Question

There is a performance issue I am trying to debug as part of a more complicated code. It seems that append function that I am using to create a dynamic, growable vector of (Int,Int,Int,Int) is causing one of the Int in the tuple to be boxed and unboxed before being written to the vector. I wrote a simpler code that reproduces the issue - it seems to happen only when I add the vector growth functionality in append function - sample code below (it doesn't do much useful work except reproducing the issue), followed by snippets of core which show the value being boxed and unboxed:

{-# LANGUAGE BangPatterns #-}
module Test
where
import Data.Vector.Unboxed.Mutable as MU
import Data.Vector.Unboxed as U hiding (mapM_)
import Control.Monad.ST as ST
import Control.Monad.Primitive (PrimState)
import Control.Monad (when)
import GHC.Float.RealFracMethods (int2Float)
import Data.STRef (newSTRef, writeSTRef, readSTRef)
import Data.Word

type MVI1 s  = MVector (PrimState (ST s)) Int
type MVI4 s  = MVector (PrimState (ST s)) (Int,Int,Int,Int)
data Snakev s = S {-# UNPACK #-}!Int
                                !(MVI4 s)

newVI1 :: Int -> Int -> ST s (MVI1 s)
newVI1 n x = do
          a <- new n
          mapM_ (\i -> MU.unsafeWrite a i x) [0..n-1]
          return a

-- Growable array - we always append an element. It grows by factor of 1.5 if more capacity is needed
append :: Snakev s -> (Int,Int,Int,Int) -> ST s (Snakev s)
append (S i v) x = do
   if i < MU.length v then MU.unsafeWrite v i x >> return (S (i+1) v)
   else MU.unsafeGrow v (floor $! 1.5 * (int2Float $ MU.length v)) >>= (\y -> MU.unsafeWrite y i x >> return (S (i+1) y))

gridWalk :: Vector Word8 -> Vector Word8 -> MVI1 s -> MVI1 s -> Snakev s -> Int -> (Vector Word8 -> Vector Word8 -> Int -> Int ->      Int) -> ST s (Snakev s)
gridWalk a b fp snodes snakesv !k cmp = do
   let offset = 1+U.length a
       xp = offset-k
   snodep <- MU.unsafeRead snodes xp -- get the index of previous snake node in snakev array
   append snakesv (snodep,xp,xp,xp)
{-#INLINABLE gridWalk #-}

GHC generates a version of append for use in gridWalk. That function is $wa in the core - Please note the boxed Int argument:

$wa
  :: forall s.
     Int#
     -> MVI4 s
     -> Int#
     -> Int#
     -> Int#
     -> Int  ======= Boxed value - one of (Int,Int,Int,Int) is boxed
     -> State# s
     -> (# State# s, Snakev s #)
$wa =
  \ (@ s)
    (ww :: Int#)
    (ww1 :: MVI4 s)
    (ww2 :: Int#)
    (ww3 :: Int#)
    (ww4 :: Int#)
    (ww5 :: Int) === Boxed value
    (w :: State# s) ->

....
....
of ipv12 { __DEFAULT ->
              case (writeIntArray# ipv7 ww ww4 (ipv12 `cast` ...)) `cast` ...
              of ipv13 { __DEFAULT ->
              (# case ww5 of _ { I# x# ->
                 (writeIntArray# ipv10 ww x# (ipv13 `cast` ...)) `cast` ...
                 },
                 S (+# ww 1)
                   ((MV_4
                       (+# y rb)
                       ==== x below unboxed from arg ww5 ======
                       ((MVector 0 x ipv1) `cast` ...) 
                       ((MVector 0 x1 ipv4) `cast` ...)
                       ((MVector 0 x2 ipv7) `cast` ...)
                       ((MVector 0 x3 ipv10) `cast` ...))
                    `cast` ...) #)

gridWalk boxes the value when calling append:

=== function called by gridWalk ======
a :: forall s.
     Vector Word8
     -> Vector Word8
     -> MVI1 s
     -> MVI1 s
     -> Snakev s
     -> Int
     -> (Vector Word8 -> Vector Word8 -> Int -> Int -> Int)
     -> State# s
     -> (# State# s, Snakev s #)
a =
  \ (@ s)
    (a1 :: Vector Word8)
    _
    _
    (snodes :: MVI1 s)
    (snakesv :: Snakev s)
    (k :: Int)
    _
    (eta :: State# s) ->
    case k of _ { I# ipv ->
    case snodes `cast` ... of _ { MVector rb _ rb2 ->
    case a1 `cast` ... of _ { Vector _ rb4 _ ->
    let {
      y :: Int#
      y = -# (+# 1 rb4) ipv } in
    case readIntArray# rb2 (+# rb y) (eta `cast` ...)
    of _ { (# ipv1, ipv2 #) ->
    case snakesv of _ { S ww ww1 ->
    ====== y boxed below before append called ======
    $wa ww ww1 ipv2 y y (I# y) (ipv1 `cast` ...) 
    }
    }
    }
    }
    }

So, the effect seems to be boxing of value in gridWalk and unboxing in append before insertion into the vector of (Int,Int,Int,Int). Marking append INLINE doesn't change the behavior - those boxed values just move in the function body of gridWalk.

I will appreciate pointers on how to make this value unboxed. I will like to keep the functionality of append (i.e., handle vector growth when capacity is exceeded) while refactoring it.

GHC version is 7.6.1. Vector version is 0.10.

Was it helpful?

Solution

This is just a comment. I figured I would get rid of the tuple argument (adjusting the use of append in gridWalk), but the result is that (only) the last Int argument has to be bang'd to get everything unboxed, which does seem strange:

append :: Snakev s -> Int -> Int -> Int -> Int -> ST s (Snakev s)
append (S i v) a b c !d = do
   if i < len then do MU.unsafeWrite v i (a,b,c,d)
                      return $ S (i+1) v
              else do y <- MU.unsafeGrow v additional        
                      MU.unsafeWrite y i (a,b,c,d) 
                      return $ S (i+1) y
  where len = MU.length v           
        additional = floor (1.5 * int2Float len) -- this seems kind of bizarre 
                                        -- by the way; can't you stay inside Int?
                                        --  3 * (len `div` 2) or something

Edit, also, you get everything unboxed if you move the application of S (i+1) outside the do block, but I'm not sure if that gets us closer to the quarry...:

append :: Snakev s -> Int -> Int -> Int -> Int -> ST s (Snakev s)
append (S i v) a b c d = do
       if i < len then liftM (S (i+1)) $ do MU.unsafeWrite v i (a,b,c,d)
                                            return v
                  else liftM ( S (i+1)) $ do y <- MU.unsafeGrow v zzz         
                                             MU.unsafeWrite y i (a,b,c,d) 
                                             return  y
      where len = MU.length v           
            zzz = floor (1.5 * int2Float len)     

But if liftM is replaced by fmap we are back to the lone unboxed in. Things go well if liftM (S (1+i) or fmap (S (i+1) is moved all the way out to the front:

append (S i v) a b c d = S (i+1) <$> do ...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top