Question

I am looking to do text sentiment analysis on multiple text attributes. I followed this great beginners video tutorial which could be used for a single text attribute and its class - positive or negative. I want to extend the idea to multiple attributes simultaneously.

To make clear, here's an example of what I am trying to do:

Attributes collected from customers about a retail store:

  1. Store Experience review - String
  2. Collection review - String
  3. Assistance provided review - String
  4. Overall ranking - Integer (1 to 5) - class

I want the analysis based on all the attributes (1 - 3) for the class attribute (4).

If I tried using filter > unsupervised > attribute > StringToWordVector individually for each of these attributes then observed the results have lower correctly classified %.

Is this the correct way to proceed here to perform the text sentiment analysis?

Was it helpful?

Solution

You are approaching multi-faceted sentiment analysis, as you are keeping information about different facets (attributes) of the retail store. For getting an overall analysis of the store, it is not wrong to mix all attributes in the analysis; just apply StringToWordVector to all String attributes and that's it.

On one side, you may increase accuracy because you will be getting better statistics and more features tyhan when using only one of the attributes. On the other side, you may decrease acuracy because one review may say positive things about the Store Experience but being negative overall, so mixing the attributes may put some noise in the model - however this is unlikely because such a review would be a bad example when learning only from the Store Experience attribute.

If you follow the tutorial, you will see that there are plenty of options in the StringToWordVector filter, and you can add AttributeSelection as well. I suggest to test both per attribute and combining all attributes, using binary/TF/TF.IDF weights in the StringToWordVector filter, using the NGramTokenizer (for identifying positive/negative multiwords -- e.g. "very very good"), using AttributeSelection with Ranker and InfoGainAttributeEval, and of course, testing as many learning algorithms as you can.

You have an additional tutorial here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top