Question

I'm trying to understand when clojure's lazy sequences are lazy, and when the work happens, and how I can influence those things.

user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (let [[a b] lz-seq])
fn call!
fn call!
fn call!
fn call!
nil

I was hoping to see only two "fn call!"s here. Is there a way to manage that? Anyway, moving on to something which indisputably only requires one evaluation:

user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (first lz-seq)
fn call!
fn call!
fn call!
fn call!
0

Is first not suitable for lazy sequences?

user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (take 1 lz-seq)
(fn call!
fn call!
fn call!
fn call!
0)

At this point, I'm completely at a loss as to how to access the beginning of my toy lz-seq without having to realize the entire thing. What's going on?

Was it helpful?

Solution

Clojure's sequences are lazy, but for efficiency are also chunked, realizing blocks of 32 results at a time.

=>(def lz-seq (map #(do (println (str "fn call " %)) (identity %)) (range 100)))
=>(first lz-seq)

fn call 0
fn call 1
...
fn call 31
0

The same thing happens once you cross the 32 boundary first

=>(nth lz-seq 33)
fn call 0
fn call 1
...
fn call 63
33

For code where considerable work needs to be done per realisation, Fogus gives a way to work around chunking, and gives a hint an official way to control chunking might be underway.

OTHER TIPS

I believe that the expression produces a chunked sequence. Try replacing 4 with 10000 in the range expression - you'll see something like 32 calls on first eval, which is the size of the chunk.

A lazy sequence is one where we evaluate the sequence as and when needed. (hence lazy). Once a result is evaluated, it is cached so that it can be re-used (and we don't have to do the work again). If you try to realize an item of the sequence that hasn't been evaluated yet, clojure evaluates it and returns the value to you. However, it also does some extra work. It anticipates that you might want to evaluate the next element(s) in the sequence and does that for you too. This is done to avoid some performance overheads, the exact nature of which is beyond my skill-level. Thus, when you say (first lz-seq), it actually calculates the first as well as the next few elements in the seq. Since your println statement is a side effect, you can see the evaluation happening. Now if you were to say (second lz-seq), you will not see the println again since the result has already been evaluated and cached.

A better way to see that your sequence is lazy is :

user=> def lz-seq (map #(do (println "fn call!") (identity %)) (range 400))
#'user/lz-seq
user=> (first lz-seq)

This will print a few "fn call!" statements, but not all 400 of them. That's because the first call will actually end up evaluating more than one element of the sequence.

Hope this explanation is clear enough.

I think its some sort of optimization made by repl. My repl is caching 32 at a time.

user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 100))
#'user/lz-seq
user=> (first lz-seq)
prints 32 times
user=> (take 20 lz-seq)
does not print any "fn call!"
user=> (take 33 lz-seq)
prints 0 to 30, then prints 32 more "fn call!"s followed by 31,32
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top