Set operators not provided by Data.Vector of Haskell, what is the reason

https://stackoverflow.com/questions/15708971

30-03-2022
|

题

My application involves heavy array operations (e.g. log(1) indexing), thus Data.Vector and Data.Vector.Unboxed are preferred to Data.List. It also involves many set operations (e.g. intersectBy), which however, are not provided by the Data.Vector.

Each of these functions can be implemented like in Data.List in 3-4 lines . Is there any reason they all not implemented with Data.Vector? I can only speculate. Maybe set operations in Data.Vector is discouraged for performance reasons, i.e. intersectBy would first produce the intersection through list comprehension and then convert the list into a Data.Vector?

解决方案

I assume it's missing because intersection of unsorted, immutable arrays must have a worst-case run time of Ω(n*m) without using additional space and Data.Vector is optimized for performance. If you want, you can write that function yourself, though:

import Data.Vector as V

intersect :: Eq a => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`V.elem` x)

Or by using a temporary set data structure to achieve an expected O(n + m) complexity:

import Data.HashSet as HS

intersect :: (Hashable a, Eq a) => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`HS.member` set)
    where set = HS.fromList $ V.toList x

If you can afford the extra memory usage, maybe you can use some kind of aggregate type for your data, for example an array for fast random access and a hash trie like Data.HashSet for fast membership checks and always keep both containers up to date. That way you can reduce the asymptotic complexity for intersection to something like O(min(n, m))

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow