Extrapolating GLM coefficients for year a product was sold into future years?

https://datascience.stackexchange.com/questions/1024

16-10-2019
|

Question

I've fit a GLM (Poisson) to a data set where one of the variables is categorical for the year a customer bought a product from my company, ranging from 1999 to 2012. There's a linear trend of the coefficients for the values of the variable as the year of sale increases.

Is there any problem with trying to improve predictions for 2013 and maybe 2014 by extrapolating to get the coefficients for those years?

Solution

I believe that this is a case for applying time series analysis, in particular time series forecasting (http://en.wikipedia.org/wiki/Time_series). Consider the following resources on time series regression:

http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471363553.html
http://www.stats.uwo.ca/faculty/aim/tsar/tsar.pdf (especially section 4.6)
http://arxiv.org/abs/0802.0219 (Bayesian approach)

OTHER TIPS

If you suspect your response is linear with year, then put year in as a numeric term in your model rather than a categorical.

Extrapolation is then perfectly valid based on the usual assumptions of the GLM family. Make sure you correctly get the errors on your extrapolated estimates.

Just extrapolating the parameters from a categorical variable is wrong for a number of reasons. The first one I can think of is that there may be more observations in some years than others, so any linear extrapolation needs to weight those year's estimates more. Just eyeballing a line - or even fitting a line to the coefficients - won't do this.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange