Question

I need to collect and store a large quantity data from different kinds of log files, but before is necessary filter certain fields for extract only the necessary information. So I'm thinking about of the possibility to use a ETL tool to do the dirt service for me. My idea is build a solution based on file connector, programming or customize the transformation processes and finally deploy this solution in linux machines for able watching file on the fly, extract the needed information and store this in the database, for example.

So my question is. What open source tool is more suitable, flexible and more KISS for this job?

Scriptella, Kettle, Talent or other?

Again, for work with log/text files there as a de facto tool?

The manly intention and objective is create a efficient solution to watch, extract logs and store data form districts logs formats.

Thx!

Was it helpful?

Solution

What I believe to be the best combination is: map-reduce implementation like apache hadoop or gridgain or JPPF (for processing large datasets) + jdmp for data mining + NoSQL db for query and retrieval (neo4j or bigtable etc). It is still not clear on what is the exact use case ;-)

Also look this link for more detaiLs: Do you know batch log processing tools for hadoop (zohmg alternatives)?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top