Вопрос

I'm trying to piece together how the SGDClassifier picks its learning rate when I use the partial_fit method to train it.

I.e., my main learning loop looks like this:

from sklearn.linear_model import SGDClassifier
m = SGDClassifier(n_iter=1, alpha=0.01)
n_iter = 40
t0 = time.time()
for i in range(n_iter):
    for fname in files:
        X, y = load_next_batch(fname)
        m.partial_fit(X, y, classes = [0, 1])
    print "%d:  valid-error: %f  (time: %fs)" % (i, 1.0-m.score(Xvalid, yvalid), time.time() - t0)

now, since I make 40 passes through the whole training set, I'd like to anneal my learning rate over time. If i'd use fit instead of partial fit, it is my understanding that this would happen automatically (unless I'd modify the learning_rate parameter).

However, It is unclear to me how this happens when using partial fit. Skimming the code didn't help either. Could anyone clarify how I could achieve an annealed learning rate in my setting?

Это было полезно?

Решение

fit is using partial_fit internally, so the learning rate configuration parameters apply for both fit an partial_fit. The default annealing schedule is eta0 / sqrt(t) with eta0 = 0.01.

Edit: this is not correct, as seen in the comments the actual default schedule for SGDClassifier is:

1.0 / (t + t0) where t0 is set heuristically and t is the number of samples seen in the past.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top