LibSVM and non-numerical data

https://stackoverflow.com/questions/4279421

machine-learning
svm
categorization
libsvm
document-classification

28-09-2019
|

Question

I'm interested in doing text categorization using LibSVM. How do you recommend I convert the terms/words to numerical data, so LibSVM can understand it?

Thank you!

Solution

In text categorization people tend to build histograms of the words used in the domain, sometimes they look at combinations of two words and put that in their histogram (this are called bigrams). But it really depends on your data and your objectives.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow