Question
I'm trying to apply the roc()
function from the pROC package to specific variables from dataframe df
, subset on df$site
which consists of characters that look like "01", "02", "03". The function roc()
returns a list, so I'm expecting my object roc_site
to be a list which in turn contains a list of results for each site.
roc_site <- dlply(
.data = df,
.variables = "site",
.fun = roc,
.progress = "text",
response = df$Risk,
predictor = df$Rating,
na.rm = TRUE, plot = TRUE)
This runs successfully, and roc_site
is a list that consists of one list for each site, but the results for each site are identical; it hasn't split the dataframe apart. What am I missing?
Solution
The function that you pass to .fun
in dlply
needs to accept the entire chunk of the data frame as its (first) argument.
So in this case, what you really want is to write your own small function that will take your data frame and calculate what you want. e.g.
foo <- function(x){
roc(x$Risk, x$Rating, na.rm = TRUE, plot = TRUE)
}
and then pass that function to .fun
.
The reason you're getting the identical results is that for each chunk, dlply
is calling roc
on your chunk, but passing df$Risk
and df$Rating
each time, and those are the vectors for the entire data set.