Question

We have a click-model which is currently being used for search ranking in production, and I want to create a new model which takes the old model click probability as one input and adds some other variables in too. The problem is that training data will positionally biased by the fact that the probability of a click is correlated to the old model's prediction.

My plan is to introduce a penalty factor on the original model's prediction to ensure that it doesn't dominate the new model (eyeballing the results to decide on an appropriate penalty factor). Is this approach valid or would there be a better way to approach this?

Note that I don't want to rebuild the old model with the new variables because

  1. The existing model takes a long time (days) to build
  2. The new model and old model will be deployed separately, i.e. the old model will be scored offline/batch whereas the new model will be scored real-time
Was it helpful?

Solution

The problem is that training data will positionally biased by the fact that the probability of a click is correlated to the old model's prediction.

It should be the case that many of your input variables are correlated in some way with the output, otherwise your model could not work. The main difference here is you are expecting a strong correlation from a single feature. This is not a problem - you could think of it as a complex form of feature engineering.

You are essentially stacking the old model with some new variables which you hope are predictive. You should probably in this case include all the existing/old variables so that the new model can more easily spot mistakes made by the old model.

My plan is to introduce a penalty factor on the original model's prediction to ensure that it doesn't dominate the new model

I doubt this would be useful. However the correct way to assess this plan is to try it and measure the performance compared to the simpler version without any penalty.

OTHER TIPS

This depends a bit on the used case but you could also create the 2nd model independent and then use a third model to combine them. This has the advantage that there should be no leakage between the two models (if variables are independent) and you can merge them in a very controlled way.

What is unclear for me is what advantage you would have from taken input from a model that can not be used in the same context. (offline vs real time)

If you want to use them combined and Model A as input for B you would need to guess what the output for Model A is.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top