Just some suggestions:
Construct a vocabulary
This vocabulary serves as a dictionary. You will not include any word that does not present in the dictionary into your feature vector. Suppose your dictionary contains 5000 words.
Prepare the sentiment strength for each word in the vocabulary
Of course you can setup some default for those words that you have no idea about their sentiment strength.
Construct feature vector for each text you want to do classification
For any given text, e.g.,
This book is good.
construct a feature vector with 5000 dimensions. Each dimension corresponds to its Tf-Idf score or just the number of occurrences of a word in the dictionary. Suppose in your dictionary, you have
strength(book) = 0.01
strength(good) = 6.0,
and you don't have entries for this
or is
. Then you will end up with a vector with 5000 elements (I am using the number of occurrences instead of Tf-Idf in my following example. Feel free to try Tf-Idf in a similar way).
book,good
[0,0,0, ..., 1,1,0,0,....,0]
All elements are zeros except the two elements that correspond to book
and good
. Plug in your sentiment strength, you get:
book,good
[0,0,0, ...,0.01,6.0,0,0,....,0]
Multiplying the strength value with the number of occurrences will probably increase or decrease the value of the corresponding element. This is fine because you do want to boost or weaken the contribution of the component by its sentiment strength.
Training the SVM
When supplying each feature vector with a target value or class label, you can train your SVM now.
Hope they help.