Question

It seems like a pretty easy question, but for some reason I still can't understand how to solve the same. I have an elastic search cluster which is using twitter river to download tweets. I would like to implement a sentiment analysis module which takes each tweet and computes a score (+ve/-ve) etc. I would like the score to be computed for each of the existing tweets as well as for new tweets and then visualize using Kibana.

However, I am not sure where should I place the call to this sentiment analysis module in the elastic search pipeline.

I have considered the option of modifying twitter river plugin but that will not work retrospectively.

Essentially, I need to answer two questions :- 1) how to call python/java code while indexing a document so that I can modify the json accordingly. 2) how to use the same code to modify all the existing documents in ES.

Was it helpful?

Solution

If you don't want an external application to do the analysis before indexing the documents in Elasticsearch, the best way I guess is to write a plugin that does it. You can write a plugin that implements a custom analyzer that does the sentiment analysis. Then in the mapping define the fields you want to run your analyzer on.

See examples of analysis plugins - https://github.com/barminator/elasticsearch-analysis-annotation https://github.com/yakaz/elasticsearch-analysis-combo/

To run the analysis on all existing documents you will need to reindex them after defining the correct mapping.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top