Frage

I am a java programmer, learning Hadoop. I read that the Name node in HDFS stores its information into two files namely fsImage & editLog. In case of start up it reads this data from the disk & performs checkpoint operation.

But at many places I also read that Name Node stores the data in RAM & that is why apache recommends a machine with high RAM for Name Node server.

Please enlighten me on this. What data does it store in RAM & where does it store fsImage and edit Log ?

Sorry if I asked anything obvious.

War es hilfreich?

Lösung

Let me first answer

What data does it store in RAM & where does it store fsImage and edit Log ?

In RAM -- file to block and block to data node mapping. In persistent storage (includes both edit log and fsimage) -- file related metadata (permissions, name and so on)

Regarding the storage location of the fsimage and editlog @mashuai's answer is spot on.

For a more detailed discussion you can read up on this

Andere Tipps

When namenode starts, it loads fsimage from persistent storage(disk) it's location specified by the property dfs.name.dir (hadoop-1.x) or dfs.namenode.name.dir (hadoop-2.x) in hdfs-site.xml. Fsimage is loaded into main memory. Also as you asked during namenode starting it performs check point operation. Namenode keeps the Fsimage in RAM inorder to serve requests fast.

Apart from initial checkpoint, subsequent checkpoints can be controlled by tuning the following parameters in hdfs-site.xml.

dfs.namenode.checkpoint.period       # in second 3600 Secs by default
dfs.namenode.checkpoint.txns         # No of namenode transactions

It store fsimage and editlog in dfs.name.dir , it's in hdfs-site.xml. When you start the cluster, NameNode load fsimage and editlog to the memory.

When Name Node starts, it goes in safe mode. It loads FSImage from persistent storage and replay edit logs to create updated view of HDFS storage(FILE TO BLOCK MAPPING). Then it writes this updated FSImage to to persistent storage. Now Name node waits for block reports from data nodes. From block reports it creates BLOCK TO DATA NODE MAPPING. When name node received certain threshold of block reports, it goes out of safe mode and Name Node can start serving client requests. Whenever any change in meta data done by client, NameNode(NN) first write thing change in edit log segment with increasing transaction ID to persistent storage (Hard Disk). Then it updates FSImage present in its RAM.

Fsimage and editlog are stored in dfs.name.dir , it's in hdfs-site.xml. During the start of cluster, NameNode load fsimage and editlog to the memory(RAM).

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top