Take a look at linear_model.SGDRegressor
, it learns a a linear model using stochastic gradient.
In general, sklearn has many models that admit "partial_fit
", they are all pretty useful on medium to large datasets that don't fit in the RAM.
문제
Does sklearn.LinearRegression
support online/incremental learning?
I have 100 groups of data, and I am trying to implement them altogether. For each group, there are over 10000 instances and ~ 10 features, so it will lead to memory error with sklearn if I construct a huge matrix (10^6 by 10). It will be nice if I can update the regressor each time with batch samples of new group.
I found this post relevant, but the accepted solution works for online learning with single new data (only one instance) rather than batch samples.
해결책
Take a look at linear_model.SGDRegressor
, it learns a a linear model using stochastic gradient.
In general, sklearn has many models that admit "partial_fit
", they are all pretty useful on medium to large datasets that don't fit in the RAM.
다른 팁
Not all algorithms can learn incrementally, without seeing all of the instances at once that is. That said, all estimators implementing the partial_fit
API are candidates for the mini-batch learning, also known as "online learning".
Here is an article that goes over scaling strategies for incremental learning. For your purposes, have a look at the sklearn.linear_model.SGDRegressor
class. It is truly online so the memory and convergence rate are not affected by the batch size.