Question

I am trying to use flume to use the Twitter Stream API and index the tweet to my elasticsearch. I setup my flume.conf to use com.cloudera.flume.source.TwitterSource as twitter source (with my dev tokens) and I use the default elastisearch for the sink.

I am able to get the tweets (because I also save it into HDFS, and when I open the file I can see the tweets) but when i search into my elasticsearch, I get as response :

 {
      _index: twitter-2014-02-14
      _type: tweet-rt
      _id: ilL5ZrBRSlqrZcsVUbnO-g
      _version: 1
      _score: 1
      _source: {
      @message: org.elasticsearch.common.xcontent.XContentBuilder@12da4409
      @timestamp: 2014-02-14T10:16:13.000Z
      @fields: {
      timestamp: 1392372973000
      }

  }

here example of my flume config.

# - ElasticSearch Sink                                                                                                                                
TwitterAgent.sinks.ES.type = elasticsearch
TwitterAgent.sinks.ES.channel = FileChannel
TwitterAgent.sinks.ES.hostNames = 192.168.10.100:9300
TwitterAgent.sinks.ES.indexName = twitter
TwitterAgent.sinks.ES.indexType = tweet-rt
TwitterAgent.sinks.ES.clusterName = testou

Do I have to add something else ? I dont understand why ES cannot deserialize my tweet.

Any ideas?

thankyou

Was it helpful?

Solution

This is weird. It's doing some form of identityHashCode on the XContentBuilder to get that message and it should not.

I think I'd recommend clearing out Flume and re-installing. I'd be concerned about classpath and JAR dependency issues.

What version of Flume?

OTHER TIPS

For others who come across this error, this is a bug in flume elastic search sink which has been fixed now. See https://issues.apache.org/jira/browse/FLUME-2126

If you are on flume version earlier than 1.6 you may want to cherry pick and build one with this patch against your version.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top