Question

How do you find the aggregated sentiment of multiple sentences/a paragraph/large passage of text.

I have the following code below which I have based on the github Stanford CoreNLP tests and various examples, but everything I've found has completed sentiment analysis only computes the sentiment for individual sentences. But I want the overall tweet's sentiment regardless of how many sentences are in it.

The only other way I can think of doing this is creating a separate thread for a SentimentPipeline.main(String[]) and feeding the text to stdin and collecting the overall sentiment in sdout. I would prefer just being able to use my code to make it simpler/more efficient, but I haven't found anything.

Also, I don't want to do a system call to a jar like most people do as I will be doing millions of tweets per day. The overhead would be too great loading the resources each time.

Annotation document = new Annotation(text);
pipeline.annotate(document);

List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        String output;
        for (CoreMap sentence : sentences) {
            // traversing the words in the current sentence a CoreLabel is a CoreMap with additional token-specific methods
             output = "";
            for (CoreLabel token : sentence.get(TokensAnnotation.class)) {

                // this is the text of the token
                String word = token.get(TextAnnotation.class);

                // this is the Parts Of Speech tag of the token (noun, verb, adjective etc)
                // String pos = token.get(PartOfSpeechAnnotation.class);

                // this is the NER label of the token
                String ne = token.get(NamedEntityTagAnnotation.class);
                if (!ne.contentEquals("O")) {
                    output = output + (ne + " " + word + " ");
                }
            }

            //**************Sentiment Analysis 
            Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);
             String sentiment = RNNCoreAnnotations.getPredictedClass(tree);
Was it helpful?

Solution

The sentiment analysis toolkit in stanford corenlp is trained on a sentence-level data set. If you need a document-level sentiment engine, I think training a new model on documents is a better choice. You can also try to process the sentences one by one, and use some tricky methods (such as average, max) as your baselines to test how it works.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top