SVM How to calculate tf-df of test documents in document classification?

https://stackoverflow.com/questions/18206097

machine-learning
svm
tf-idf
feature-extraction
feature-selection

24-06-2022
|

Question

In my SVM, i am using tf-idf on the documents for feature extraction. These tf-idf are calculated on the whole of training documents.

Now when i get a test-document that i want to classify, how do i generate the vector for it ?

I used stemming before calculating tf-idf. I can perform that on test-document too. I have count_of_words for train-documents.

Should i increment count of words that are in the train-document count_of_words for calculating the tf-idf of test-document or should i use it directly ?

Solution

Calculate them the same way as during training but: use idf based on the training documents and tf from the test documents. If you have many new documents coming in, just update the training data time to time and retrain your model.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow