Question

I'm implementing an S4 class that contains a data.table, and attempting to implement [ subsetting of the object (as described here) such that it also subsets the data.table. For example (defining just i subsetting):

library(data.table)

.SuperDataTable <- setClass("SuperDataTable", representation(dt="data.table"))

setMethod("[", c("SuperDataTable", "ANY", "missing", "ANY"),
    function(x, i, j, ..., drop=TRUE)
{
    initialize(x, dt=x@dt[i])
})

d = data.table(a=1:4, b=rep(c("x", "y"), each=2))
s = new("SuperDataTable", dt=d)

At this point, subsetting with a numeric vector (s[1:2]) works as desired (it subsets the data.table in the slot). However, I'd like to add the ability to subset using an expression. This works for the data.table itself:

s@dt[b == "x"]
#    a b
# 1: 1 x
# 2: 2 x

But not for the S4 [ method:

s[b == "x"]
# Error: object 'b' not found

The problem appears to be that arguments in the signature of the S4 method are not evaluated using R's traditional lazy evaluation- see here:

All arguments in the signature of the generic function will be evaluated when the function is called, rather than using the traditional lazy evaluation rules of S. Therefore, it's important to exclude from the signature any arguments that need to be dealt with symbolically (such as the first argument to function substitute).

This explains why it doesn't work, but not how one can implement this kind of subsetting, since i and j are included in the signature of the generic. Is there any way to have the i argument not be evaluated immediately?

Was it helpful?

Solution

You may be out of luck on this one. From the R developer notes,

Arguments appearing in the signature of the generic will be evaluated as soon as the generic function is called; therefore, any arguments that need to take advantage of lazy evaluation must not be in the signature. These are typically arguments treated literally, often via the substitute() function. For example, if one wanted to turn substitute() itself into a generic, the first argument, expr, would not be in the signature since it must not be evaluated but rather treated as a literal.

Furthermore, due to method caching,

All the arguments in the full signature are evaluated as described above, not just the active ones. Otherwise, in special circumstances the behavior of the function could change for one method when another method was cached, definitely undesirable.

I would follow the example from the data.table package writers and use an S3 object (see line 304 of R/data.table.R in their source code). Your S3 object can still create and manipulate an S4 object underneath to maintain the semi-static typing feature.

We can't get extraordinarily clever:

 ‘[’ is a primitive function;  methods can be defined, but the generic function is implicit, and cannot be changed.

Defining both an S3 and S4 method will dispatch the S3 method, which makes it seem like we should be able to route around the S4 call and dispatch it manually, but unfortunately the argument evaluation still occurs! You can get close by borrowing plyr::., which would give you syntax like:

s <- new('SuperDataTable', dt = as.data.table(iris))
s[.(Sepal.Length > 4), 2]

Not ideal, but closer than anything else.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top