Repa nested array definitions resulting in “Performing nested parallel computation sequentially…”

StackOverflow https://stackoverflow.com/questions/8214478

  •  05-03-2021
  •  | 
  •  

Question

As part of a larger problem, I am trying to define an array inside an array like so:

import Data.Array.Repa
type Arr = Array DIM2 Int

arr = force $ fromList (Z :. 5 :. 5) [1..25] :: Arr

combined :: Arr
combined = arr `deepSeqArray` 
    traverse arr (\_ -> Z :. 4 :. 4 :: DIM2) (\f (Z :. x :. y) -> 
        let reg = force $ extract f (x,y) (2,2)
        in  reg `deepSeqArray` sumAll reg)

extract :: (DIM2 -> Int) -> (Int,Int) -> (Int,Int) -> Arr
extract lookup (x0,y0) (width,height) = fromFunction bounds 
  $ \sh -> offset lookup sh
    where 
    bounds = Z :. width :. height
    offset :: (DIM2 -> Int) -> DIM2 -> Int
    offset f (Z :. x :. y) = f (Z :. x + x0 :. y + y0)

main = print combined

The extract function is using fromFunction and the lookup function provided to it, but it could also use traverse and arr ! ... for the same effect. Despite using force and deepSeqArray everywhere as early as possible, the console is filled with the message here, followed by the correct result:

Data.Array.Repa: Performing nested parallel computation sequentially. You've probably called the 'force' function while another instance was already running. This can happen if the second version was suspended due to lazy evaluation. Use 'deepSeqArray' to ensure each array is fully evaluated before you 'force' the next one.

While I haven't constructed a version with lists to compare the speeds, in the larger version performance is suffering.

Is this simply a consequence of nested array definitions and thus I should restructure my program for either the inner or outer definition to be lists? Is my extract function horrible and the cause of the problems?

The tips from this question were useful to get this far but I haven't yet gone crawling through the compiled code.

Was it helpful?

Solution

It's because 'print' implicitly forces the array as well. The inner 'force' and 'sumAll' functions invoke parallel computation, but does 'print', so you have nested parallelism. That fact that this is so non-obvious is a great sadness in the Repa 2 API.

Repa 3 addresses these sort of issues by exporting both sequential and parallel versions of 'force' and 'sumAll' etc. It also adds a tag to the array type to indicate whether the array is delayed or manifest. Repa 3 isn't finished yet, but you could use the head version on http://code.ouroborus.net/repa. It should be out shorty after GHC 7.4 later this year.

Here is a Repa 3 version of your example that runs without giving the warning about nested parallelism. Note that 'force' is now 'compute'.

import Data.Array.Repa

arr :: Array U DIM2 Int
arr = fromListUnboxed (Z :. 5 :. 5) [1..25]

combined :: Array U DIM2 Int
combined 
  = computeP $ traverse arr (\_ -> Z :. 4 :. 4 :: DIM2) 
  $ \f (Z :. x :. y) -> sumAllS $ extract f (x,y) (2,2)

extract :: (DIM2 -> Int) -> (Int,Int) -> (Int,Int) -> Array D DIM2 Int
extract lookup (x0,y0) (width,height) 
  = fromFunction bounds 
  $ \sh -> offset lookup sh
    where 
    bounds = Z :. width :. height
    offset :: (DIM2 -> Int) -> DIM2 -> Int
    offset f (Z :. x :. y) = f (Z :. x + x0 :. y + y0)

main = print combined
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top