Predictive Margins/predictions holding variables constant in R
Question
I recently switched to R, and I love it. But one of the things I miss the most is being able to generate predicted model responses holding certain variables at preset levels (the mean, 90th percentile, etc). This comes in enormously handy when trying to discern the effect of interaction terms, transformed variables, etc.
I can do this easily in Stata using the adjust
command. I've tried and tried to figure out how to do it in R, but one of the big pitfalls of using a language named R (for which there is also a statistic R) and searching for terms like "Adjust" are that I can only seem to find hits on adjusted R squared. It's beyond frustrating.
So, at the risk of asking a really easy question, does anyone know how to do this? I've looked into predictive margins, and that seems like at least a related type of method, but its implementation usually involves standardizing the explanatory variables in some way.
Solution
The sort of tasks you are describing have been encapsulated into the rms/Hmisc package combo. Frank Harrell is the author and he builds a data description object called a datadist
which his other functions (enhanced version of ordinary R regression and analytic operations) use when building various output tables. I mention this because you seem to be reinventing the "Hmisc-wheel". You could, of course, build your own version with some combinations of expand.grid
, and the newdata=
argument to predict
, but it may not be needed. Frank's other important contribution to the R/S world is very well-thought-of "Regression Modeling Strategies" text which summarizes much of his work and the work of others in performing high quality data analysis.
OTHER TIPS
http://www.stata.com/support/faqs/stat/adjust.html states that:
In fact, adjust is really just a front end for predict
And it works by creating a new data set for the prediction.
R's predict
has an argument newdata
be used to vary the prediction assumptions, but maybe not quite as easily as Stata's adjust
. transform
may be of interest too, see the following example.
# create model
mtcars.lm <- lm(hp~disp*cyl,mtcars)
# default fit predictions
predict(mtcars.lm)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
128.60896 128.60896 80.25811 128.88296
Hornet Sportabout Valiant Duster 360 Merc 240D
208.48842 128.79069 208.48842 75.58796
Merc 230 Merc 280 Merc 280C Merc 450SE
76.29995 128.63021 128.63021 197.85671
Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
197.85671 197.85671 222.63037 221.11516
Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
218.58981 83.79391 84.15593 84.71104
Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
78.79793 203.18519 201.41745 207.22575
Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
213.53912 83.75770 78.77380 81.81483
Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
207.35202 128.56702 201.03865 78.68933
# predict assuming all cars have 8 cylinders
predict(mtcars.lm,newdata=transform(mtcars,cyl=8))
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
183.2349 183.2349 176.6690 195.6091
Hornet Sportabout Valiant Duster 360 Merc 240D
208.4884 191.4423 208.4884 181.5556
Merc 230 Merc 280 Merc 280C Merc 450SE
180.8106 184.1946 184.1946 197.8567
Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
197.8567 197.8567 222.6304 221.1152
Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
218.5898 172.9694 172.5906 172.0098
Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
178.1969 203.1852 201.4174 207.2257
Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
213.5391 173.0073 178.2221 175.0402
Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
207.3520 181.3409 201.0386 178.3105
As for searching for R related topics, you might find the following link useful: How to search for "R" materials?
To make model predictions, use the predict
function. See, e.g., ?predict.lm
and example(predict.lm)
.
The .lm
bit above is an example of the S3
class system. In this case it means "call the predict
function on an object of class lm
". That is, the result of a linear regression. Most of the common models will work use their own version of predict
(if it's appropriate) but not all models are guaranteed to support it.