Question

I would like some design advice for a centralized logging project I am considering. I have a number of components producing logs on various servers. Apache Flume looks like the sensible choice for streaming to a central log server, most likely into an elasticsearch instance for querying and analysis.

Here's my question: I would like to provide a scripting engine listening to the flow of log events arriving on the central server. Would it make sense to do that as an interceptor in Flume, or as a plugin to elasticsearch, or something else completely?

Was it helpful?

Solution

flume provides the pipleline to Hadoop/HBase originally, and it allows you to do pretty much all sort of decorating, transforming and intercepting before it reaches the final storage. So flume is a perfect place to have the pre-processing (alerting in your case). The flume sink can be Elastic Search, which means the logs will be eventually ended up in Elastic Search. To answer your question, before the logs gets into the final destination, it makes perfect sense to have all your alerting/alarm/notifications triggered in the pipeline, both old flume and flume-ng architecture are customisable and powerful in this regard.

Another thing to mention is that, Elastic Search is perfect for full-text-search, but analytics, it can't compete against Hadoop ecosystem. Cloudera CDH4.3 added Solr cloud into Hadoop, this gives a plus to the combination: flume + HDFS or HBase + Solr. It's worthwhile looking at this mix as well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top