문제

I need to collect and store a large quantity data from different kinds of log files, but before is necessary filter certain fields for extract only the necessary information. So I'm thinking about of the possibility to use a ETL tool to do the dirt service for me. My idea is build a solution based on file connector, programming or customize the transformation processes and finally deploy this solution in linux machines for able watching file on the fly, extract the needed information and store this in the database, for example.

So my question is. What open source tool is more suitable, flexible and more KISS for this job?

Scriptella, Kettle, Talent or other?

Again, for work with log/text files there as a de facto tool?

The manly intention and objective is create a efficient solution to watch, extract logs and store data form districts logs formats.

Thx!

도움이 되었습니까?

해결책

What I believe to be the best combination is: map-reduce implementation like apache hadoop or gridgain or JPPF (for processing large datasets) + jdmp for data mining + NoSQL db for query and retrieval (neo4j or bigtable etc). It is still not clear on what is the exact use case ;-)

Also look this link for more detaiLs: Do you know batch log processing tools for hadoop (zohmg alternatives)?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top