سؤال

I want to create an S4 class in R that will allow me to access large datasets (in chunks) from the cloud (similar to the goals of the ff package). Right now I'm working with a toy example called "range.vec" (I don't want to deal with internet access yet), which stores a sequence of numbers like so:

setClass("range.vec",
representation(start = "numeric", #beginning num in sequence
end = "numeric",                  #last num in sequence
step = "numeric",                 #step size
chunk = "numeric",                #cache a chunk here to save memory
chunkpos = "numeric"),            #where does the chunk start in the overall vec
contains="numeric"                #inherits methods from numeric
)

I want this class to inherit the methods from "numeric", but I want it to use these methods on the whole vector, not just the chunk that I'm storing. For example, I don't want to define my own method for 'mean', but I want 'mean' to get the mean of the whole vector by accessing it chunk by chunk, using length(), '[', '[[', and el() functions that I've defined. I've also defined a chunking function:

setGeneric("set.chunk", function(x,...) standardGeneric("set.chunk"))
setMethod("set.chunk",  signature(x = "range.vec"),
    function (x, chunksize=100, chunkpos=1) {
    #This function extracts a chunk of data from the range.vec object.
    begin <- x@start + (chunkpos - 1)*x@step
    end <- x@start + (chunkpos + chunksize - 2)*x@step
    data <- seq(begin, end, x@step) #calculate values in data chunk

    #get rid of out-of-bounds values
    data[data > x@end] <- NA

    x@chunk <- data
    x@chunkpos <- chunkpos
    return(x)
}})

When I try to call a method like 'mean', the function inherits correctly, and accesses my length function, but returns NA because I don't have any data stored in the .Data slot. Is there a way that I can use the .Data slot to point to my chunking function, or to tell the class to chunk numeric methods without defining every single method myself? I'm trying to avoid coding in C if I can. Any advice would be very helpful!

هل كانت مفيدة؟

المحلول 2

Looks like there isn't a good way to do this within the class. The only solution I've found is to tell the user to calculate to loop through all of the chunks of data from the cloud, and calculate as they go.

نصائح أخرى

You could remove your chunk slot and replace it by numeric's .Data slot.

Little example:

## class definition
setClass("foo", representation(bar="numeric"), contains="numeric")
setGeneric("set.chunk", function(x, y, z) standardGeneric("set.chunk"))
setMethod("set.chunk",
        signature(x="foo", y="numeric", z="numeric"), 
        function(x, y, z) {
    ## instead of x@chunk you could use numeric's .Data slot
    x@.Data <- y
    x@bar <- z
    return(x)
})

a <- new("foo")

a <- set.chunk(a, 1:10, 4)

mean(a) # 5.5
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top