Could anybody give me an advice about how to merge a lot of small files from local to a whole file in HDFS efficiently

StackOverflow https://stackoverflow.com/questions/23126273

  •  05-07-2023
  •  | 
  •  

Frage

Could anybody give me an advice about how to merge a lot of small files from normal File System to a whole file in HDFS efficiently.

War es hilfreich?

Lösung

In case your files exist on Linux you can try this command

cat *.txt > merge.log |cat merge.log|hadoop fs -put - mergedFile.log

Andere Tipps

You could consider the below techniques

  1. HAR -- Hadoop archive files
  2. Sequence Files
  3. May be use a CombineInputFileFormat, it might be a little tricky to implement though you can get some help from here
  4. Or a different storage system like HBase

This a common problem and you should be able to google up on it, this blog here should also give you some pointers

Let me know if you needed help with something more specific

hadoop fs -getmerge <src> <localdst> [addnl]

-getmerge : Get all the files in the directories that match the source file pattern and merge and sort them to only one file on local fs. is kept.

example: hadoop fs -getmerge /user/hdfs/test/ /home/hdfs/Desktop/merge where :/user/hdfs/test/ is hdfs dir where files to be merged reside and /home/hdfs/Desktop/merge5 is local destination path where merge file will be copied.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top