Could anybody give me an advice about how to merge a lot of small files from local to a whole file in HDFS efficiently

StackOverflow https://stackoverflow.com/questions/23126273

  •  05-07-2023
  •  | 
  •  

Could anybody give me an advice about how to merge a lot of small files from normal File System to a whole file in HDFS efficiently.

有帮助吗?

解决方案

In case your files exist on Linux you can try this command

cat *.txt > merge.log |cat merge.log|hadoop fs -put - mergedFile.log

其他提示

You could consider the below techniques

  1. HAR -- Hadoop archive files
  2. Sequence Files
  3. May be use a CombineInputFileFormat, it might be a little tricky to implement though you can get some help from here
  4. Or a different storage system like HBase

This a common problem and you should be able to google up on it, this blog here should also give you some pointers

Let me know if you needed help with something more specific

hadoop fs -getmerge <src> <localdst> [addnl]

-getmerge : Get all the files in the directories that match the source file pattern and merge and sort them to only one file on local fs. is kept.

example: hadoop fs -getmerge /user/hdfs/test/ /home/hdfs/Desktop/merge where :/user/hdfs/test/ is hdfs dir where files to be merged reside and /home/hdfs/Desktop/merge5 is local destination path where merge file will be copied.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top