Question

I have a problem with writing a simple function without too much repeating myself, below is a simplified example. The real program I am trying to write is a port of an in-memory database for a BI server from python. In reality there are more different types (around 8) and much more logic, that is mostly expressible as functions operating on polymorphic types, like Vector a, but still some logic must deal with different types of values.

Wrapping each value separatly (using [(Int, WrappedValue)] type) is not an option due to efficiency reasons - in real code I am using unboxed vectors.

type Vector a = [(Int, a)] -- always sorted by fst

data WrappedVector = -- in fact there are 8 of them
      FloatVector (Vector Float)
    | IntVector (Vector Int)
    deriving (Eq, Show)

query :: [WrappedVector] -> [WrappedVector] -- equal length
query vectors = map (filterIndexW commonIndices) vectors
    where
        commonIndices = intersection [mapFstW vector | vector <- vectors]

intersection :: [[Int]] -> [Int]
intersection = head -- dummy impl. (intersection of sorted vectors)

filterIndex :: Eq a => [Int] -> Vector a -> Vector a
filterIndex indices vector = -- sample inefficient implementation
    filter (\(idx, _) -> idx `elem` indices) vector

mapFst :: Vector a -> [Int]
mapFst = map fst

-- idealy I whould stop here, but I must write repeat for all possible types
-- and kinds of wrapped containers and function this:

filterIndexW :: [Int] -> WrappedVector -> WrappedVector
filterIndexW indices vw = case vw of
    FloatVector v -> FloatVector $ filterIndex indices v
    IntVector   v -> IntVector $ filterIndex indices v

mapFstW :: WrappedVector -> [Int]
mapFstW vw = case vw of
    FloatVector v -> map fst v
    IntVector   v -> map fst v

-- sample usage of query
main = putStrLn $ show $ query [FloatVector [(1, 12), (2, -2)],
                                IntVector   [(2, 17), (3, -10)]]

How can I express such code without wrapping and unwrapping like in mapFstW and filterIndexW functions?

Was it helpful?

Solution

If you're willing to work with a few compiler extensions, ExistentialQuantification solves your problem nicely.

{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE StandaloneDeriving #-}
module VectorTest where

type PrimVector a = [(Int, a)]

data Vector = forall a . Show a => Vector (PrimVector a)

deriving instance Show Vector

query :: [Vector] -> [Vector] -- equal length
query vectors = map (filterIndex commonIndices) vectors
    where
        commonIndices = intersection [mapFst vector | vector <- vectors]

intersection :: [[Int]] -> [Int]
intersection = head -- dummy impl. (intersection of sorted vectors)

filterIndex :: [Int] -> Vector -> Vector
filterIndex indices (Vector vector) = -- sample inefficient implementation
    Vector $ filter (\(idx, _) -> idx `elem` indices) vector

mapFst :: Vector -> [Int]
mapFst (Vector l) = map fst l

-- sample usage of query
main = putStrLn $ show $ query [Vector [(1, 12), (2, -2)],
                                Vector [(2, 17), (3, -10)]]

The StandaloneDeriving requirement can be removed if you write a manual Show instance for Vector, e.g.

instance Show Vector where
    show (Vector v) = show v

OTHER TIPS

The standard option for wrapping a single type without a performance hit is to do

{-# LANGUAGE GeneralizedNewtypeDeriving #-} -- so we can derive Num
newtype MyInt = My Int deriving (Eq,Ord,Show,Num)
newtype AType a = An a deriving (Show, Eq)

Because it creates a difference only at the type level - the data representation is identical because it all gets compiled away. You can even specify that values are unboxed, BUT... this doesn't help you here because you're wrapping multiple types.

The real problem is that you're trying to represent a dynamically typed solution in a staticly typed language. There is necessarily a performance hit for dynamic typing which is hidden from you in a dynamic language but made explicit here in tagging.

You have two solutions:

  1. Accept that dynamic typing involves additional runtime checks over static typing, and live with the ugly.
  2. Reject the need for dynamic typing, accepting that polymorphic typing tidies up all the code and moves the type checking to compile time and data aquisition.

I feel that 2 is by far the best solution, and you should give up trying to list an program all the types you want to use, instead programming to use any type. It's neat, clear and efficient. You check validity and handle it once, then stop worrying.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top