Question

These two expressions should mean the same thing:

Stream.from(1).filter(_ < 0).head
Stream.from(1).find(_ < 0)

The should loop around until they return Int.MinValue. And that is exactly what the version with filter does, but with find an OutOfMemoryError is produced. Looking at their implementations though, I can't figure out both versions don't produce an OutOfMemoryError.

Here is the implementation of Stream.filter:

override def filter(p: A => Boolean): Stream[A] = {
  // optimization: drop leading prefix of elems for which f returns false
  // var rest = this dropWhile (!p(_)) - forget DRY principle - GC can't collect otherwise
  var rest = this
  while (!rest.isEmpty && !p(rest.head)) rest = rest.tail
  // private utility func to avoid `this` on stack (would be needed for the lazy arg)
  if (rest.nonEmpty) Stream.filteredTail(rest, p)
  else Stream.Empty
}

find is inherited from LinearSeqOptimized, with this definition:

override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
  var these = this
  while (!these.isEmpty) {
    if (p(these.head)) return Some(these.head)
    these = these.tail
  }
  None
}

They both have a while loop that discards elements of the Stream that don't satisfy the predicate. Because this should maintain a reference to the beginning of the Stream all of these created elements should accumulate in memory until we run out of space. Unless I am really misunderstanding what is happening here, Stream.filter is somehow eliminating this from its stack frame before it enters the while loop. The comment in Stream.filter on why dropWhile isn't used looks like a hint, but I have no idea what it is referring to.

My next step would be learning how to disassemble and read JVM bytecode, but I'm really hoping someone knows what is happening here.

Was it helpful?

Solution

It's a combination of HotSpot and the way Scala's traits are implemented.

If I turn HotSpot off with -Xint, Stream.filter will also die with an OutOfMemoryException. In the generated bytecode itself, this and the variables rest and these are stored in different memory locations, but because this is only used to initialize these variables I believe HotSpot is smart enough to simply reuse the memory location for this. This explains why Stream.filter does not run out of memory.

HotSpot's optimization to Stream.filter should also apply to LinearSeqOptimized.find, however because of the way traits are implemented a reference to this is preserved. When a method is implemented inside of a trait, Scala compiles that method into a static method. When a class inherits from that trait, Scala creates a small stub method that invokes the static method. So even though HotSpot optimizes the static method for LinearSeqOptimized.find the stub method's stack frame still has a reference to this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top