Using dlply with pROC

https://stackoverflow.com/questions/11321490

19-06-2021
|

Question

I'm trying to apply the roc() function from the pROC package to specific variables from dataframe df, subset on df$site which consists of characters that look like "01", "02", "03". The function roc() returns a list, so I'm expecting my object roc_site to be a list which in turn contains a list of results for each site.

roc_site <- dlply(
  .data = df, 
  .variables = "site", 
  .fun = roc, 
  .progress = "text",
  response = df$Risk,
  predictor = df$Rating, 
  na.rm = TRUE, plot = TRUE)

This runs successfully, and roc_site is a list that consists of one list for each site, but the results for each site are identical; it hasn't split the dataframe apart. What am I missing?

Solution

The function that you pass to .fun in dlply needs to accept the entire chunk of the data frame as its (first) argument.

So in this case, what you really want is to write your own small function that will take your data frame and calculate what you want. e.g.

foo <- function(x){
    roc(x$Risk, x$Rating, na.rm = TRUE, plot = TRUE)
}

and then pass that function to .fun.

The reason you're getting the identical results is that for each chunk, dlply is calling roc on your chunk, but passing df$Risk and df$Rating each time, and those are the vectors for the entire data set.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow