Question

I'm interested in doing text categorization using LibSVM. How do you recommend I convert the terms/words to numerical data, so LibSVM can understand it?

Thank you!

Was it helpful?

Solution

In text categorization people tend to build histograms of the words used in the domain, sometimes they look at combinations of two words and put that in their histogram (this are called bigrams). But it really depends on your data and your objectives.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top