문제

I have been reading about text classification and found several Java tools which are available for classification, but I am still wondering: Is text classification the same as sentence classification!

Is there any tool which focuses on sentence classification?

도움이 되었습니까?

해결책

Theres no formal difference between 'Text classification' and 'Sentence classification'. After all, a sentence is a type of text. But generally, when people talk about text classification, IMHO they mean larger units of text such as an essay, review or speech. Classifying a politician's speech into democrat or republican is a lot easier than classifying a tweet. When you have a lot of text per instance, you don't need to squeeze each training instance for all the information it can give you and get pretty good performance out a bag-of-words naive-bayes model.

Basically you might not get the required performance numbers if you throw off-the-shelf weka classifiers at a corpora of sentences. You might have to augment the data in the sentence with POS tags, parse trees, word ordering, ngrams, etc. Also get any related metadata such as creation time, creation location, attributes of sentence author, etc. Obviously all of this depends on what exactly are you trying to classify.. the features that will work out for you need to be intuitively meaningful to the problem at hand.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top