Question

I'm trying to pass a function (weight.func) to a different function (wrapper) that calls ddply. I want ddply to use that function (weight.func) as part of its calculations. I get the output I want when weight.func is set 'globally' but not when it is passes as an anonymous function to the wrapper.

Can I get ddply to do what I want? Here is a code example:

> print(sampleData)
   studentId   problem  part       workerId rating
1       8001 problem26 partA A127R5QI5OGBIK    0.0
2       8001 problem26 partA A1FCLYRBAB430F    0.0
3       8001 problem26 partA A25FZQY34C6RVO    0.0
4       8001 problem26 partA A3G0MO562MHMZ3    0.5
5       8001 problem26 partA A3RB9ZOIUC3NWG    2.0
6       8001 problem26 partB A1FCLYRBAB430F    0.5
7       8001 problem26 partB A1XRDZKSJBWY8Q    0.5
8       8001 problem26 partB A22CRWMZUX7FFR    0.5
9       8001 problem26 partB A25FZQY34C6RVO    1.0
10      8001 problem26 partB A3G0MO562MHMZ3    0.5
11      8001 problem27 partA A1ET309DW6M2XA    2.0
12      8001 problem27 partA A1FCLYRBAB430F    0.0
13      8001 problem27 partA A22CRWMZUX7FFR    0.0
14      8001 problem27 partA A25FZQY34C6RVO    0.0
15      8001 problem27 partA A3G0MO562MHMZ3    0.0
16      8001 problem27 partB A1FCLYRBAB430F    1.0
17      8001 problem27 partB A22CRWMZUX7FFR    0.0
18      8001 problem27 partB A25FZQY34C6RVO    0.0
19      8001 problem27 partB A2U9676210WST5    0.0
20      8001 problem27 partB A3G0MO562MHMZ3    0.0
21      8002 problem26 partA A127R5QI5OGBIK    0.0
22      8002 problem26 partA A1FCLYRBAB430F    0.5
23      8002 problem26 partA A22CRWMZUX7FFR    0.0
24      8002 problem26 partA A25FZQY34C6RVO    2.0
25      8002 problem26 partA A3G0MO562MHMZ3    0.5
26      8002 problem26 partB A17EHJZNJGNRAN    2.0
27      8002 problem26 partB A1FCLYRBAB430F    0.0
28      8002 problem26 partB A2IPRDTE6B4TAB    0.0
29      8002 problem26 partB A3G0MO562MHMZ3    0.0
30      8002 problem26 partB  A6SON3OS15XKA    0.0
31      8002 problem27 partA A1FCLYRBAB430F    0.0
32      8002 problem27 partA A25FZQY34C6RVO    0.0
33      8002 problem27 partA A2IPRDTE6B4TAB    0.0
34      8002 problem27 partA A2U9676210WST5    0.0
35      8002 problem27 partA A3G0MO562MHMZ3    0.0
36      8002 problem27 partB A1FCLYRBAB430F    0.0
37      8002 problem27 partB A1V52SSKROBV8E    2.0
38      8002 problem27 partB A25FZQY34C6RVO    2.0
39      8002 problem27 partB A2IPRDTE6B4TAB    0.0
40      8002 problem27 partB A3G0MO562MHMZ3    0.0
> 
> #Make a wrapper
> wrapper <- function ( ratingData, weight.func ) {
+   print(weight.func) #prove that the function is being passed
+   ddply(ratingData, c('studentId','problem','part'), summarize, 
+           sum.weights = sum ( weight.func(rating)  ))
+ }
> wrapper( sampleData, weight.func=function(x) (x+.001)^-1  )
function(x) (x+.001)^-1
Error in data.frame(sum.weights = sum(weight.func(rating))) : 
  could not find function "weight.func"
> 
> #'globally' declare weight.func
> weight.func <- function(x) (x+.001)^-1
> wrapper( sampleData, weight.func=NULL  )
NULL
  studentId   problem  part sum.weights
1      8001 problem26 partA 3002.495758
2      8001 problem26 partB    8.983033
3      8001 problem27 partA 4000.499750
4      8001 problem27 partB 4000.999001
5      8002 problem26 partA 2004.491766
6      8002 problem26 partB 4000.499750
7      8002 problem27 partA 5000.000000
8      8002 problem27 partB 3000.999500

The second output is the goal. Any help appreciated! (Including a non plyr based way to accomplish the same task.)

The example above is a toy example. It's the simplest case I could get to reproduce the behavior.

Was it helpful?

Solution

you can use aggregate:

w2 <- function(d, f){
  aggregate(rating~studentId+problem+part, function(x)sum(f(x)), data=d)
}

w2( sampleData, function(x) (x+.001)^-1  )

Note that the name of the aggregated column is automatically determined, so if you want to name then you need to do it by yourself.

and you can same thing by ddply without summarize

wrapper <- function ( ratingData, weight.func ) {
   ddply(ratingData, c('studentId','problem','part'), function(x)c(sum.weights=sum(weight.func(x$rating))))
 }

wrapper( sampleData, weight.func=function(x) (x+.001)^-1  )

in this case you can specify the name inside function.

OTHER TIPS

This is a known bug in plyr: https://github.com/hadley/plyr/issues#issue/3

I'm not exactly sure which change I made (taking out the spaces after "sum" or changing the NULL to a real function or << something >> ), but this now works:

wrapper <- function ( ratingData, weight.func=weight.func) {
      ddply(ratingData, .variables=c('studentId','problem','part'),  
            .fun=summarise, sum.weights = sum(weight.func(rating)  ))
  }

wrapper( sampleData, weight.func=weight.func  )
  studentId   problem  part sum.weights
1      8001 problem26 partA 3002.495758
2      8001 problem26 partB    8.983033
3      8001 problem27 partA 4000.499750
4      8001 problem27 partB 4000.999001
5      8002 problem26 partA 2004.491766
6      8002 problem26 partB 4000.499750
7      8002 problem27 partA 5000.000000
8      8002 problem27 partB 3000.999500

An update on this issue in plyr (https://github.com/hadley/plyr/issues/3):

Use the 'here' function in plyr, just replace 'summarize', with 'here(summarize)' to access the environment where ddply was called from.

wrapper <- function(ratingData, weight.func){
           ddply(ratingData, c('studentId','problem','part'),
                 here(summarize),  # here(summarize)!
                 sum.weights = sum(weight.func(rating))
                 )
            }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top