문제

I am using Naive Bayes classifier for my sentiment analysis on customer support. But unfortunately I don't have huge annotated data sets in the customer support domain. But I have a little amount of annotated data in the same domain(around 100 positive and 100 negative). I have the amazon product review data set as well.

Is there anyway can I implement a weighted naive bayes classifier using mahout, so that I can give more weight to the small set of customer support data and small weight to the amazon product review data. A training on the above weighted data set would drastically improve accuracy I guess. Kindly help me with the same.

도움이 되었습니까?

해결책

One really simple approach is oversampling. Ie just repeat the customer support examples in your training data multiple times.

Though it's not the same problem you might get some further ideas by looking into the approaches used for class imbalance; in particular oversampling (as mentioned) and undersampling.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top