문제

I have a requirement in my project. I have to collect log data using flume and that data has to be fed into hive table.

Here my requirement to collect files placed in a folder into hdfs which I am doing using spooldir. After this I need to process these files and place output in hive folder for data to be queried immediately.

Can I process the source files using sink in such a way that data placed in hdfs is already process into required format.?

Thanks, Sathish

도움이 되었습니까?

해결책 2

Using below configuration has served my purpose.

source.type = spooldir source.spooldir = ${location}

다른 팁

Yes, you need to use a serializer (implement this class - http://flume.apache.org/releases/content/1.2.0/apidocs/org/apache/flume/serialization/EventSerializer.html), drop it into plugin.d/ and then add it to the configuration for the HDFS Sink.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top