The most efficient way is to employ Partitioned
array representation (your 4th option), however, it is inconvenient, because you should work with 5 areas by hand.
In Yarr you could write an utility
dim2WrapAround :: USource r l Dim2 a => UArray r l Dim2 a -> Dim2 -> Dim2 -> IO a
{-# INLINE dim2WrapAround #-}
dim2WrapAround arr (sizeX, sizeY) (posX, posY) =
index arr (wrap sizeX posX, wrap sizeY posY)
where wrap size pos = (pos + size) `mod` size
-- I'm afraid to write the signature...
{-# INLINE convolveOnThorus #-}
convolveOnThorus = convolveLinearDim2WithStaticStencil dim2WrapAround
Usage:
myConvolution :: UArray F L Dim2 Float -> UArray CV CVL Dim2 Float
myConvolution = convolveOnThorus [dim2St| some
coeffs
here |]