Predictive Margins/predictions holding variables constant in R

https://stackoverflow.com/questions/8515259

17-03-2021
|

Question

I recently switched to R, and I love it. But one of the things I miss the most is being able to generate predicted model responses holding certain variables at preset levels (the mean, 90th percentile, etc). This comes in enormously handy when trying to discern the effect of interaction terms, transformed variables, etc.

I can do this easily in Stata using the adjust command. I've tried and tried to figure out how to do it in R, but one of the big pitfalls of using a language named R (for which there is also a statistic R) and searching for terms like "Adjust" are that I can only seem to find hits on adjusted R squared. It's beyond frustrating.

So, at the risk of asking a really easy question, does anyone know how to do this? I've looked into predictive margins, and that seems like at least a related type of method, but its implementation usually involves standardizing the explanatory variables in some way.

Solution

The sort of tasks you are describing have been encapsulated into the rms/Hmisc package combo. Frank Harrell is the author and he builds a data description object called a datadist which his other functions (enhanced version of ordinary R regression and analytic operations) use when building various output tables. I mention this because you seem to be reinventing the "Hmisc-wheel". You could, of course, build your own version with some combinations of expand.grid, and the newdata= argument to predict, but it may not be needed. Frank's other important contribution to the R/S world is very well-thought-of "Regression Modeling Strategies" text which summarizes much of his work and the work of others in performing high quality data analysis.

OTHER TIPS

http://www.stata.com/support/faqs/stat/adjust.html states that:

In fact, adjust is really just a front end for predict

And it works by creating a new data set for the prediction.

R's predict has an argument newdata be used to vary the prediction assumptions, but maybe not quite as easily as Stata's adjust. transform may be of interest too, see the following example.

# create model
mtcars.lm <- lm(hp~disp*cyl,mtcars)

# default fit predictions

predict(mtcars.lm)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
          128.60896           128.60896            80.25811           128.88296 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
          208.48842           128.79069           208.48842            75.58796 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
           76.29995           128.63021           128.63021           197.85671 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
          197.85671           197.85671           222.63037           221.11516 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
          218.58981            83.79391            84.15593            84.71104 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
           78.79793           203.18519           201.41745           207.22575 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
          213.53912            83.75770            78.77380            81.81483 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
          207.35202           128.56702           201.03865            78.68933 


# predict assuming all cars have 8 cylinders

predict(mtcars.lm,newdata=transform(mtcars,cyl=8))
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
           183.2349            183.2349            176.6690            195.6091 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
           208.4884            191.4423            208.4884            181.5556 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
           180.8106            184.1946            184.1946            197.8567 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
           197.8567            197.8567            222.6304            221.1152 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
           218.5898            172.9694            172.5906            172.0098 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
           178.1969            203.1852            201.4174            207.2257 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
           213.5391            173.0073            178.2221            175.0402 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
           207.3520            181.3409            201.0386            178.3105

As for searching for R related topics, you might find the following link useful: How to search for "R" materials?

To make model predictions, use the predict function. See, e.g., ?predict.lm and example(predict.lm).

The .lm bit above is an example of the S3 class system. In this case it means "call the predict function on an object of class lm". That is, the result of a linear regression. Most of the common models will work use their own version of predict (if it's appropriate) but not all models are guaranteed to support it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow