Question

Let's say I have a supervised learning problem with a sequence of features and labels. First, I learn on the training data and then I decide to stream in data, point by point and do online learning. Is it possible to update the weights or figure out the feature importances as each data point comes in? Also, what online learning algorithms would allow me to do this and can this be done in Python?

Was it helpful?

Solution

Online learning actually is an optimization method , dealing with large scale data and huge feature space .

FTRL is a typical one , derived from stochastic gradient descent . You can refer paper http://www.jmlr.org/proceedings/papers/v15/mcmahan11b/mcmahan11b.pdf if you want to know more about that .

There are other specific online methods developed based on it , such as TDAP , you can check paper http://www.cs.cmu.edu/~epxing/papers/2016/HuaWei_KDD16.pdf to know more .

As you said , you wanted to know 『feature importances』while training . Model changes while iteration goes on or data points comes in , so the model will tell you the exact 『feature importances』.

At such circumstances , most of them are developed with scala or java based on Spark , others may be developed with c++ based on OMP , you can develop your own online learning method with python .

Hopes this contributes you -)

OTHER TIPS

Yes, this can be done in Python. Scikit-Learn has a few online learning algorithms available, of which you can derive the feature importances. Look at the following webpage under 6.1.3. Incremental learning:

http://scikit-learn.org/stable/modules/scaling_strategies.html

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top