Question

My application involves heavy array operations (e.g. log(1) indexing), thus Data.Vector and Data.Vector.Unboxed are preferred to Data.List. It also involves many set operations (e.g. intersectBy), which however, are not provided by the Data.Vector.

Each of these functions can be implemented like in Data.List in 3-4 lines . Is there any reason they all not implemented with Data.Vector? I can only speculate. Maybe set operations in Data.Vector is discouraged for performance reasons, i.e. intersectBy would first produce the intersection through list comprehension and then convert the list into a Data.Vector?

Était-ce utile?

La solution

I assume it's missing because intersection of unsorted, immutable arrays must have a worst-case run time of Ω(n*m) without using additional space and Data.Vector is optimized for performance. If you want, you can write that function yourself, though:

import Data.Vector as V

intersect :: Eq a => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`V.elem` x)

Or by using a temporary set data structure to achieve an expected O(n + m) complexity:

import Data.HashSet as HS

intersect :: (Hashable a, Eq a) => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`HS.member` set)
    where set = HS.fromList $ V.toList x

If you can afford the extra memory usage, maybe you can use some kind of aggregate type for your data, for example an array for fast random access and a hash trie like Data.HashSet for fast membership checks and always keep both containers up to date. That way you can reduce the asymptotic complexity for intersection to something like O(min(n, m))

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top