Question

Is there a way to add more importance to points which are more recent when analyzing data with xgboost?

Was it helpful?

Solution

You could try building multiple xgboost models, with some of them being limited to more recent data, then weighting those results together. Another idea would be to make a customized evaluation metric that penalizes recent points more heavily which would give them more importance.

OTHER TIPS

Just add weights based on your time labels to your xgb.DMatrix. The following example is written in R but the same principle applies to xgboost on Python or Julia.

data <- data.frame(feature = rep(5, 5),
                   year = seq(2011, 2015), 
                   target = c(1, 0, 1, 0, 0))
weightsData <- 1 + (data$year - max(data$year)) * 5 * 0.01

#Now create the xgboost matrix with your data and weights
xgbMatrix <- xgb.DMatrix(as.matrix(data$feature), 
                         label = data$target, 
                         weight = weightsData)

On Python you have a nice scikit-learn wrapper, so you can write just like this:

import xgboost as xgb
exgb_classifier = xgb.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)

More information you can receive from this: http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top