Weka: Text sentiment analysis on multiple text attributes

Question

You are approaching multi-faceted sentiment analysis, as you are keeping information about different facets (attributes) of the retail store. For getting an overall analysis of the store, it is not wrong to mix all attributes in the analysis; just apply StringToWordVector to all String attributes and that's it.

On one side, you may increase accuracy because you will be getting better statistics and more features tyhan when using only one of the attributes. On the other side, you may decrease acuracy because one review may say positive things about the Store Experience but being negative overall, so mixing the attributes may put some noise in the model - however this is unlikely because such a review would be a bad example when learning only from the Store Experience attribute.

If you follow the tutorial, you will see that there are plenty of options in the StringToWordVector filter, and you can add AttributeSelection as well. I suggest to test both per attribute and combining all attributes, using binary/TF/TF.IDF weights in the StringToWordVector filter, using the NGramTokenizer (for identifying positive/negative multiwords -- e.g. "very very good"), using AttributeSelection with Ranker and InfoGainAttributeEval, and of course, testing as many learning algorithms as you can.

You have an additional tutorial here.