Question

This a topic on Haskell discussed a lot (e.g. mutable-array-implementation), but I am still not sure what is the best practice for the case requiring frequent modification and random-access of an array/vector.

Say a vector of length 1,000,000. Operation on it involves accessing a (small, e.g 1000) subset of it based on input, and modifying the values based on the input. Furthermore, such operation are repeated 2,000,000 times. The task itself can be implemented in pure data structure such as list as the following for example, though very inefficient:

type Vect = [Int]

f :: Vect -> [[Int]] -> Vect
f x indsList = foldl g x indsList

-- g is just an example of random-access and modifications on the values.
g :: Vect -> [Int] -> Vect
g x inds = map h $ zip x [0..]
    where h (x, i) = if i `elem` inds then x !! i + 1 else x !! i

Hash/Map data structure (e.g. IntMap) could be used for efficient large amounts of random-accesses, but array/vector should do it too. More importantly, the large amount of modifications is still need to be addressed by a mutable structure to avoid memory replication. Is there a mutable, random-acesss array/vector in Haskell indeed? If ST/IO Monads used, does such controls affect performance in my settings?

Was it helpful?

Solution

Haskell does have efficient mutable arrays.

  • There is STUArray, which has the quite sophisticated but often just unnecessary Ix-indexing methodology with many bounds-checks and little special optimisation, which makes it a bit slower than theoretically possible.

  • All Data.Vector have very little overhead, make heavy use of stream fusion optimisation, prefer a simple, "list-like" interface. That means you can actually transfer your example very easily directly to immutable vectors, and may still get better performance than you might expect:

    import Data.Vector.Unboxed as VU
    
    type Vect = VU.Vector Int
    
    f :: Vect -> [[Int]] -> Vect
    f x indsList = VU.foldl g x indsList
    
    
    g :: Vect -> [Int] -> Vect
    g x inds = VU.zipWith h x [0..]
        -- h is just an example of modifications on the values.
        where h x i
               | i`elem`inds   = x VU.! i + 1
               | otherwise     = x VU.! i
    

Yes, you probably want to work in the ST monad for mutable updates. Not sure what you mean by "does such controls affect performance": there isn't really any "control" that's not also present in imperative languages, once the compiler has optimised away the proven type-safety. Which GHC can do quite well; you can get pretty close to C performance with Data.Vector.Unboxed. There is always some inevitable overhead, but that has mostly to do with garbage-collection etc. issues, which you would also get in Java.

Since ST and IO are monads, the compiler can in fact do some more high-level optimisations that wouldn't be possible in an imperative language, though compiler's aren't really that far yet.

Performance, particularly of array operations, is discussed in many places, for instance in RWH.

OTHER TIPS

Foreign UArrays from Yarr are mutable, random-access and maximum fast. Also they are "quick and dirty", i. e. don't impose freezing/thawing boilerplate for every mutation operation.

Disadvantage: almost all "low-level" operations are under IO.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top