質問

We run out of space on our Hadoop Cluster that is setup in this way:

  • 1x 1TB HDD / <- Ubuntu System Partition
  • 3x 1.5TB HDD /data1 /data2 /data3 <- HDFS Data Volumes

The system partition is almost not used (97% free) and will not be used for tasks unrelated to hadoop.

Is it safe to add the system partition as HDFS data dir to the DataNode Configuration?

I'm afraid that Hadoop fills up the partition and make the system unusable.

The best way would probably to setup seperate lvm volumes or re-partition the disk. But I would avoid going this way.

Does hadoop respect unix quotas? E.g. if I add an directory from the system partition and restrict the hadoop user via quota to only use e.g. 0.5TB would that help?

役に立ちましたか?

解決

Yes, Hadoop uses general file-writing APIs to write block data and would respect Unix-level quotas. In addition, there is the config property dfs.datanode.du.reserved which lets you set a reserved space per volume (applied to all volumes) that the DataNodes will not consider writing onto.

However, it is generally bad practice to allow writes into the OS mount. If you envision looking for more storage space eventually (given that you are hitting limits already), it may be better to buy a few more disks and mount them on the DataNodes.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top