Accessing fitted.values when using ddply

https://stackoverflow.com/questions/18082160

23-06-2022
|

Question

I am using ddply to execute glm on subsets of my data. I am having difficulty accessing the estimated Y values. I am able to get the model parameter estimates using the below code, but all the variations I've tried to get the fitted values have fallen short. The dependent and independent variables in the glm model are column vectors, as is the "Dmsa" variable used in the ddply operation.

Define the model:

Model <- function(df){coef(glm(Y~D+O+B+A+log(M), family=poisson(link="log"), data=df))}

Execute the model on subsets:

Modrpt <- ddply(msadata, "Dmsa", Model)

Print Modrpt gives the model coefficients, but no Y estimates.

I know that if I wasn't using ddply, I can access the glm estimated Y values by using the code:

Model <- glm(Y~D+O+B+A+log(M), family=poisson(link="log"), data=msadata)

fits <- Model$fitted.values

I have tried both of the following to get the fitted values for the subsets, but no luck:

fits <- fitted.values(ddply(msadata, "Dmsa", Model))

fits <- ddply(msadata, "Dmsa", fitted.values(Model))

I'm sure this is a very easy to code...unfortunately, I'm just learning R. Does anyone know where I am going wrong?

La solution

You can use an anonymous function in your call to ddply e.g.

require(plyr)
data(iris)
model <- function(df){
    lm( Petal.Length ~ Sepal.Length + Sepal.Width , data = df )
    }

ddply( iris , "Species" , function(x) fitted.values( model(x) ) )

This has the advantage that you can also, without rewriting your model function, get thecoef values by doing

    ddply( iris , "Species" , function(x) coef( model(x) ) )

As @James points out, this will fall down if you have splits of unequal size, better to use dlply which puts the result of each subset in it's own list element.

(I make no claims for statistical relevance or correctness of the example model - it is just an example)

Autres conseils

I'd recommending doing this in two steps:

library(plyr)

# First first the models
models <- dlply(iris, "Species", lm, 
  formula = Petal.Length ~ Sepal.Length + Sepal.Width )

# Next, extract the fitted values
ldply(models, fitted.values)

# Or maybe
ldply(models, as.data.frame(fitted.values))

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow