Question

I'm having some trouble understanding how R handles subsetting internally and this is causing me some issues while trying to build some functions. Take the following code:

f <- function(directory, variable, number_seq) {

  ##Create a empty data frame

  new_frame <- data.frame()

  ## Add every data frame in the directory whose name is in the number_seq to new_frame
  ## the file variable specify the path to the file

  for (i in number_seq){
    file <- paste("~/", directory, "/",sprintf("%03d", i), ".csv", sep = "")
    x <- read.csv(file)
    new_frame <- rbind.data.frame(new_frame, x)
    }

  ## calculate and return the mean
  mean(new_frame[, variable], na.rm = TRUE)*
  }

*While calculating the mean I tried to subset first using the $ sign new_frame$variable and the subset function subset( new_frame, select = variable but it would only return a None value. It only worked when I used new_frame[, variable].

Can anyone explain why the other subseting didn't work? It took me a really long time to figure it out and even though I managed to make it work I still don't know why it didn't work in the other ways and I really wanna look inside the black box so I won't have the same issues in the future.

Thanks for the help.

Was it helpful?

Solution

This behavior has to do with the fact that you are subsetting inside a function.

Both new_frame$variable and subset(new_frame, select = variable) look for a column in the dataframe withe name variable.

On the other hand, using new_frame[, variable] uses the variablename in f(directory, variable, number_seq) to select the column.

OTHER TIPS

The dollar sign ($) can only be used with literal column names. That avoids confusion with

dd<-data.frame(
    id=1:4,
    var=rnorm(4),
    value=runif(4)
)
var <- "value"
dd$var

In this case if $ took variables or column names, which do you expect? The dd$var column or the dd$value column (because var == "value"). That's why the dd[, var] way is different because it only takes character vectors, not expressions referring to column names. You will get dd$value with dd[, var]

I'm not quite sure why you got None with subset() I was unable to replicate that problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top