Why parameters passed by function can not be found when sapply subset to a list of dataframe in R

StackOverflow https://stackoverflow.com/questions/12244534

  •  29-06-2021
  •  | 
  •  

Вопрос

I met a wired problem while sapply subset to a list of dataframe inside a function that R saying "Error in eval(expr, envir, enclos) : object 'thresh' not found". I wonder why this would happen.

test<-list()
test[[1]]<-as.data.frame(matrix(rnorm(50*5,10,100),50,5))
test[[2]]<-as.data.frame(matrix(rnorm(50*5,10,100),50,5))


findmax<-function(test,thresh){
  print(thresh)
  max(unlist(sapply(test,subset,V1>thresh,select=c("V1"))))
}

findmax(test,thresh=10)
Это было полезно?

Решение

Do heed the Warning in ?subset:

Warning:

     This is a convenience function intended for use interactively.
     For programming it is better to use the standard subsetting
     functions like ‘[’, and in particular the non-standard evaluation
     of argument ‘subset’ can have unanticipated consequences.

subset has some strange evaluation rules as to where it looks for objects and variables therein, which depend upon calling environments etc. These work fine when called by the user interactively at the top level, but often fail when wrapped inside functions as you have found.

Here is one way to rewrite the function using standard subsetting:

findmax <- function(test, thresh, want) {
    foo <- function(x, thresh, want) {
       take <- x[, want] > thresh
       x[take, want]
    }
    max(unlist(sapply(test, foo, thresh = thresh, want = want)))
}
findmax(test, thresh = 10, want = "V1")

which for your test data gives:

R> findmax(test, thresh = 10, want = "V1")
[1] 230.9756
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top