문제

I often have to map multiple functions to the same data. I have implemented dpMap to do this for me

dpMap fns = (`map` fns) . flip ($)

dpMap is one function, does this mean I read the data dt just once (like a circuit fed with just the same input. A pointless system reminds of a circuit; just the plumbing no data)?

As an example consider calculating the minimum and maximum of a list dt.

minimax dt = (dpMap [minimum, maximum]) dt

(I could get rid of the dt but have to use -XNoMonomorphismRestriction)

Is there a performance advantage over implementing the same function in a point-full form like this?:

minimax2 dt = [minimum dt, maximum dt]

EDIT: Is there a general implementation of dpMap which works with constant memory?

I found another nice blog post: http://www.haskellforall.com/2013/08/composable-streaming-folds.html ;hope this helps.

EDIT2: After some more context, here is a solution, even though I don't have an exact implementation of dpMap, the pattern is simple enough that it doesn't warrant a separate function:

minimax = (,) <$> minimum <*> maximum

Usage:

> minimax [1..100]
(1,100)

If you want to also calculate the sum and the length

func = (,,,) <$> minimum <*> maximum <*> sum <*> length

Usage:

> func [1..100]
(1,100,5050,100)

도움이 되었습니까?

해결책 2

I'm going to take a fairly broad view of the question in this answer, mostly because of the comments under WillNess's answer.

In a blog post, Max Rabkin introduced some work on folding combinators. Conal Elliott picked up this idea and published a few more blog posts as well as the ZipFold package on hackage. I would highly recommend reading this material, it's short and pretty accessible. The ZipFold package is probably very usable, although it hasn't been updated for some time.

Edward Kmett's recent tour-de-force, lens, also includes some folding combinators. I'm not sure I'd want to use it just for that, but if you're using lens anyway it's probably worth looking in to.

An alternative approach is to use parallelism. If you write

import Control.Parallel

minimax2 dt = let a = minimum dt
                  b = maximum dt
              in a `par` b `pseq` [a,b]

and link with -threaded, then it's possible for minimax2 to run in something close to constant space, depending on the vagaries of the scheduler, phases of the moon, etc (mostly the scheduler and allocation patterns of the functions IIRC). Of course this doesn't provide reliable guarantees, but it can work well in practice. Generalizing this approach to dpMap should be straightforward, you'd probably want to use Control.Parallel.Strategies or similar rather than using the lower-level par directly.

Finally, most of the iteratee-derived libraries are quite good at handling this sort of task. In general they provide explicit control over when input streams are produced and consumed. In iteratee I provide sequence_ which does almost the same as dpMap, will be adding sequence which will do exactly the same thing as dpMap, and a number of zips, all of which run in constant space (provided that the consuming functions are themselves constant-space). I wouldn't be surprised if most of the other packages provided similar operations.

다른 팁

TL;DR: There are no guarantees about performance in the language itself. None whatsoever. It is a compiler thing.

As the rule of thumb, a named entity will be memory resident. If it is accessed lazily by only one consumer, it is reasonable to expect it to be optimized such that the compiled program will run in constant memory.

The creation and consumption of memory cells will be interleaved, and each cell will be GC-ed after it was processed.


In minimax2 dt = [minimum dt, maximum dt], the expression [minimum dt, maximum dt] is inside the scope where the named entity dt is defined. Most probably (i.e. almost certain) that GHC will allocate it as a memory entity, i.e. once, and both dt inside the expression will refer to the same entity (point to it, as if pointers).

But as Cat Plus Plus points out in the comments, of course how is entity is accessed is quite another matter. And the two sub-expressions will each access it separately, i.e. it will be retained in memory in full. That's no good.

We can do better, and find our answer by accessing it only once, with a fold, collecting the two pieces of data as we go along. In such situation, it is almost certain that GHC will perform an optimization where this list will not be retained in memory as a whole.

This is what is usually refered to as the list being consumed lazily. When that's the case, its creation will be interleaved with that one access, and each memory cell produced will be immediately consumed and released, by GC (garbage collection), so that constant memory operation will be achieved.

But that's predicated on our ability to scan through the list only once:

{-# OPTIONS_GHC -O2 -XBangPatterns #-}

import Data.List (foldl')

minmax :: (Ord b) => [b] -> (b, b)
minmax (x:xs) = foldl' (\(!a,!b) x -> (min a x,max b x)) (x,x) xs

Bang patterns prevent thunk build-up, making evaluation of arguments more eager. Testing:

Prelude> minmax [1..6]
(1,6)
Prelude> minmax []
*** Exception: <interactive>:1:4-65: Non-exhaustive patterns in function minmax

An empty list of course has no minimum nor maximum defined.

For the optimizations to kick in, -O2 flag must be used when compiling with GHC.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top