Question

I have a document collection with size 1000, they all have 1 feature, a vector with 5 elements. The total sum of the 5 elements equals 100. So for example I can have a document with feature: [10,15,40,20,15].

Each vector element equals a sentiment, ranging from very negative to very positive. The results I get for the 1000 text documents come out a little on the negative side, so I am trying to nudge them all a little to the right without altering the total sum.

For example [10,15,40,20,15] should, after applying the formula, result to [7,13,32,40,8]. How can I manage this?

Thanks in advance!

No correct solution

OTHER TIPS

As I understand, you want the first (left) elements of that vector to get smaller, and the right part to get bigger, right? This can be accomplished by adding something like [-10,-5,0,5,10] to each vector.

If the issue is that the corpus is genuinely more negative than you'd like it to be, then how about pre-prending to each document, just before the analysis:

I am a happy bunny!

And if that isn't enough, then also add in:

The sun is shining beautifully in Happy Bunny Land today!!

If the issue is that your analysis is producing a more negative result than what you believe is the correct answer, then fiddle with the weights (if using a weighted approach); if not using a weighted word approach, and you have a list of positive and negative words, then review those lists for the document context and either remove some negative words, or add in some more words to the positive list.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top