Question

I came across this line in the Gensim Documentation- Gensim LDA - "The model can also be updated with new documents for online training."

So my assumption on what it means is - 'Once we have a model trained on one corpus, we can add new data and continue to train the model with new data thereby adding more vocabulary and enriching results. Is this correct?

Is this the same approach discussed in the paper - Online Learning for LDA ? Help me understand this technique.

Was it helpful?

Solution

Yes, your intuition about the definition of online learning in Topic modeling(LDA) is correct :

"The model can also be updated with new documents for online training."

However, I would quote the standard definition of online learning in machine learning :

It is a method in machine learning in which data becomes available in sequential order and is used to update our best predictor for future data at each step, as opposed to batch learning techniques which generates the best predictor by learning on the entire training dataset at once. It is a very useful technique in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms.

You can find more about it here.

In your case (nlp), online learning would -

  1. Add vocabulary as the data comes.
  2. Train the model on newly added corpus.
  3. update the results

All of these would keep happening as the new data comes.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top