Question

I have a sentiment dictionary of positive and negative words with their sentiment strength value. My main work is to check whether this strength value have effect on final classification or not. It means I want to check if the text with word "good" (strength=6) and word with outstanding(strength=9) have different final sentiment score or not.

I am confused in creating feature vector for SVM. If i use TF-IDF measure or POS tagging it doesn't check strength value. So my main problem is how to use this strength value in SVM and how to generate feature vector containing strength value of word?

For example,

"This book is good." 

For this sentence how can I generate feature vector considering strength value?

  • First I thought to multiply strength value with term frequency and use this weighted score as feature input, but it will just increase the frequency of word. For example "good" occurs 2 times and then I multiply it with its strength value 6 then its value became 12, so it will just increase the occurrence of word "good", am I right?

  • So please can anyone tell me if i can use sentiment strength value for SVM and how can i use it?

  • How can I generate feature vector with their values?

Was it helpful?

Solution

Just some suggestions:

Construct a vocabulary

This vocabulary serves as a dictionary. You will not include any word that does not present in the dictionary into your feature vector. Suppose your dictionary contains 5000 words.

Prepare the sentiment strength for each word in the vocabulary

Of course you can setup some default for those words that you have no idea about their sentiment strength.

Construct feature vector for each text you want to do classification

For any given text, e.g.,

This book is good.

construct a feature vector with 5000 dimensions. Each dimension corresponds to its Tf-Idf score or just the number of occurrences of a word in the dictionary. Suppose in your dictionary, you have

strength(book) = 0.01
strength(good) = 6.0, 

and you don't have entries for this or is. Then you will end up with a vector with 5000 elements (I am using the number of occurrences instead of Tf-Idf in my following example. Feel free to try Tf-Idf in a similar way).

          book,good
[0,0,0, ..., 1,1,0,0,....,0]

All elements are zeros except the two elements that correspond to book and good. Plug in your sentiment strength, you get:

           book,good
[0,0,0, ...,0.01,6.0,0,0,....,0]

Multiplying the strength value with the number of occurrences will probably increase or decrease the value of the corresponding element. This is fine because you do want to boost or weaken the contribution of the component by its sentiment strength.

Training the SVM

When supplying each feature vector with a target value or class label, you can train your SVM now.

Hope they help.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top