문제

I need to create a map-reducing program which reads an Excel file from HDFS and does some analysis on it. From there store the output in the format of excel file. I know that TextInputFormat is used to read a .txt file from HDFS but which method or which inputformat should I have to use?

도움이 되었습니까?

해결책

Generally, hadoop is overkill for this scenario, but some relevant solutions

  1. parse the file externally and convert to an hadoop compatible format

  2. read the complete file as a single record see this answer

  3. use two chained jobs. the 1st like in 2, reads the file in bulk, and emits each record as input for the next job.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top