Question

I'd like to understand the behaviour of a lazy sequence if I iterate over with doseq but hold onto part of the first element.

 (with-open [log-file-reader (clojure.java.io/reader (clojure.java.io/file input-file-path))]

    ; Parse line parse-line returns some kind of representation of the line.
    (let [parsed-lines (map parse-line (line-seq log-file-reader))
          first-item (first parsed-lines)]

          ; Iterate over the parsed lines
          (doseq [line parsed-lines]
            ; Do something with a side-effect  
          )))

I don't want to retain any of the list, I just want to perform a side-effect with each element. I believe that without the first-item there would be no problem.

I'm having memory issues in my program and I think that perhaps retaining a reference to something at the start of the parsed-line sequence means that the whole sequence is stored.

What's the defined behaviour here? If the sequence is being stored, is there a generic way to take a copy of an object and enable the realised portion of the sequence to be garbage collected?

Was it helpful?

Solution

The sequence-holding occurs here

...
(let [parsed-lines (map parse-line (line-seq log-file-reader))
...

The sequence of lines in the file are being lazily produce and parsed, but the entire sequence is held onto, within the scope of let. This sequence is realized in the doseq, but doseq is not the problem, it does not do sequence-holding.

...
(doseq [line parsed-lines]
 ; Do something
...

You wouldn't necessarily care about sequence-holding in a let because the scope of let is limited, but here presumably your file is large and/or you stay within the dynamic scope of let for a while, or perhaps return a closure containing it in the "do something" section.

Note that holding onto any given element of the sequence, including the first, does not hold the sequence. The term head-holding is a bit of a misnomer if you consider head to be the first element as in "head of the list" in Prolog. The problem is holding onto a reference to the sequence.

OTHER TIPS

The JVM will never return memory to the OS once it becomes part of the java heap, and unless you configure it differently the default max heap size is pretty large (1/4 of available RAM, usually). So if you're only experiencing vague issues like "Gosh, this takes up a lot of memory" rather than "Well, the JVM threw an OutOfMemoryError", you probably just haven't tuned the JVM the way you'd like it to act. partition-by is a little eager, in that it holds one or two partitions in memory at once, but unless your partitions are huge, you shouldn't be running out of heap space with this code. Try setting -Xmx100m, or whatever you think is a reasonable heap size for your program, and see if you have problems.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top