My application involves heavy array operations (e.g. log(1) indexing), thus Data.Vector and Data.Vector.Unboxed are preferred to Data.List. It also involves many set operations (e.g. intersectBy), which however, are not provided by the Data.Vector.

Each of these functions can be implemented like in Data.List in 3-4 lines . Is there any reason they all not implemented with Data.Vector? I can only speculate. Maybe set operations in Data.Vector is discouraged for performance reasons, i.e. intersectBy would first produce the intersection through list comprehension and then convert the list into a Data.Vector?

有帮助吗?

解决方案

I assume it's missing because intersection of unsorted, immutable arrays must have a worst-case run time of Ω(n*m) without using additional space and Data.Vector is optimized for performance. If you want, you can write that function yourself, though:

import Data.Vector as V

intersect :: Eq a => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`V.elem` x)

Or by using a temporary set data structure to achieve an expected O(n + m) complexity:

import Data.HashSet as HS

intersect :: (Hashable a, Eq a) => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`HS.member` set)
    where set = HS.fromList $ V.toList x

If you can afford the extra memory usage, maybe you can use some kind of aggregate type for your data, for example an array for fast random access and a hash trie like Data.HashSet for fast membership checks and always keep both containers up to date. That way you can reduce the asymptotic complexity for intersection to something like O(min(n, m))

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top