Lazy sequences in R

Question 1

Alternate Implementation

Introduction

I've had occasion to be working in R more frequently since this post, so I offer an alternate base R implementation. Again, I suspect you could get much better performance by dropping down to the C extension level. Original answer follows after the break.

The first challenge in base R is the lack of a true cons (exposed at the R level, that is). R uses c for a hybrid cons/concat operation, but this does not create a linked list, but rather a new vector populated with the elements of both arguments. In particular, the length of both arguments has to be known, which is not the case with lazy sequences. Additionally, successive c operations exhibit quadratic performance rather than constant time. Now, you could use "lists" (which are really vectors, not linked-lists) of length two to emulate cons cells, but...

The second challenge is the forcing of promises in data structures. R has some lazy evaluation semantics using implicit promises, but these are second class citizens. The function for returning an explicit promise, delay, has been deprecated in favor implicit delayedAssign, which is only executed for its side effects -- "Unevaluated promises should never be visible." Function arguments are implicit promises, so you can get your hands on them, but you can't place a promise into a data structure without it being forced.

CS 101

It turns out these two challenges can be solved by thinking back to Computer Science 101. Data structures can be implemented via closures.

cons <- function(h,t) function(x) if(x) h else t

first <- function(kons) kons(TRUE)
rest <- function(kons) kons(FALSE)

Now due to R's lazy function argument semantics, our cons already capable of lazy sequences.

fibonacci <- function(a,b) cons(a, fibonacci(b, a+b))
fibs <- fibonacci(1,1)

In order to be useful though we need a suite of lazy sequence processing functions. In Clojure, the sequence processing functions that are part of the core language work naturally with lazy sequences as well. R's sequence functions, on the other hand, would not be immediately compatible. Many rely on knowing ahead of time the (finite) sequence length. Let's define a few capable of working on lazy sequences instead.

filterz <- function(pred, s) {
  if(is.null(s)) return(NULL)
  f <- first(s)
  r <- rest(s)
  if(pred(f)) cons(f, filterz(pred, r)) else filterz(pred, r) }

take_whilez <- function(pred, s) {
   if(is.null(s) || !pred(first(s))) return(NULL)
   cons(first(s), take_whilez(pred, rest(s))) }

reduce <- function(f, init, s) {
  r <- init
  while(!is.null(s)) {
    r <- f(r, first(s))
    s <- rest(s) }
  return(r) }

Let's use what we've created to sum all of the even Fibonacci numbers less than 4 million (Euler Project #2):

reduce(`+`, 0, filterz(function(x) x %% 2 == 0, take_whilez(function(x) x < 4e6, fibs)))
# [1] 4613732

Original Answer

I'm very rusty with R, but since (1) since I'm familiar with Clojure, and (2) I don't think you are getting your point across to the R users, I'll attempt a sketch based on my illustration of how Clojure lazy-sequences work. This is for example purposes only, and not tuned for performance in any way. It would probably be better implemented as an C extension (if one does not exist already).

A lazy sequence has the rest of the sequence generating calculation in a thunk. It is not immediately called. As each element (or chunk of elements as the case may be) is requested, a call to the next thunk is made to retrieve the value(s). That thunk may create another thunk to represent the tail of the sequence if it continues. The magic is that (1) these special thunks implement a sequence interface and can transparently be used as such and (2) each thunk is only called once -- its value is cached -- so the realized portion is a sequence of values.

Standard examples first

Natural numbers

numbers <- function(x) as.LazySeq(c(x, numbers(x+1)))
nums <- numbers(1)

take(10,nums) 
#=> [1]  1  2  3  4  5  6  7  8  9 10

#Slow, but does not overflow the stack (level stack consumption)
sum(take(100000,nums))
#=> [1] 5000050000

Fibonacci sequence

fibonacci <- function(a,b) { 
 as.LazySeq(c(a, fibonacci(b, a+b)))}

fibs <- fibonacci(1,1)

take(10, fibs)
#=> [1]  1  1  2  3  5  8 13 21 34 55

nth(fibs, 20)
#=> [1] 6765

Followed by naive R implementation

Lazy sequence class

is.LazySeq <- function(x) inherits(x, "LazySeq")

as.LazySeq <- function(s) {
  cache <- NULL
  value <- function() {
    if (is.null(cache)) {
      cache <<- force(s)
      while (is.LazySeq(cache)) cache <<- cache()}
    cache}
  structure(value, class="LazySeq")}

Some generic sequence methods with implementations for LazySeq

first <- function(x) UseMethod("first", x)
rest <- function(x) UseMethod("rest", x)

first.default <- function(s) s[1]

rest.default <- function(s) s[-1]

first.LazySeq <- function(s) s()[[1]]

rest.LazySeq <- function(s) s()[[-1]]

nth <- function(s, n) {
  while (n > 1) {
    n <- n - 1
    s <- rest(s) }
  first(s) }

#Note: Clojure's take is also lazy, this one is "eager"
take <- function(n, s) {
  r <- NULL
  while (n > 0) {
    n <- n - 1
    r <- c(r, first(s))
    s <- rest(s) }
  r}

Question 2

The iterators library might be able to achieve what you're looking for:

library(iterators)
i <- icount()
nextElem(i)
# 1
nextElem(i)
# 2

You can keep calling nextElem forever.

Question 3

Since R works best on vectors it seems like one often wants an iterator that return a vector, for instance

library(iterators)
ichunk <- function(n, ..., chunkSize) {
    it <- icount(n)              # FIXME: scale by chunkSize
    chunk <- seq_len(chunkSize)
    structure(list(nextElem=function() {
        (nextElem(it) - 1L) * chunkSize + chunk
    }), class=c("abstractiter", "iter"))
}

with typical use being with chunkSize in the millions range. At least this speeds up the approach to forever.

Question 4

You didn't state your intended goal, so I'll point out that, in any language,

j <- 1
while(TRUE) { x[j+1] <- x[j]+1  ; j<-j+1}

will give you an infinite sequence. What do you actually want to do with an iterator?