Вопрос

Consider any situation in which you ultimately use a gradient descent optimization method. Assume that you've successfully created a hypothesis which fits your training set and works fine. After some time your algorithm receives more and more new data, which it has to learn from.

Questions: 1) Can this algorithm continue to be considered supervised ?

2) If so, is there a way to learn from the new data without iterating again through all ( new + old ) data ?

Это было полезно?

Решение

There is no generic answer to your question, as this is a very broad problem/issue in machine learning, you should do research in two topics:

  • Online learning - this is the family of algorithms/models that can learn from the new data without complete re-learning. The simplest such model is naive bayes, but even SVMs can be trained this way
  • Concept drift - it is a more advanced topic which arises when not only new data is being added but also old data can get "wrong" (is no longer actual, true)

There are dozens of approaches to both problems (and it does not really matter that you use gradient descent, it is more important what exact model are you fitting), everything depends on the particular dataset and application.

So in general:

  1. If your problem seems to be completely stationary, nothing really changes with time, and you are satisfied with your results, you can ignore new data
  2. If problem seems to be rather stationary (small fluctuations) or you are not satisfied with its accuracy - try online learning (or, if for your particular model it is impossible - retrain the whole model once for a while)
  3. If problem seems to be very dynamic (large fluctuations) - consider concept drift solutions

And yes, it is still supervised learning, although there are also semi- and un- supervised algorithms used for dealing with concept drift.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top