Domanda

A simple question: is it preferable to have more nodes splitting the resources or fewer 'stronger' nodes?

È stato utile?

Soluzione

It's a case of reliability, replication vs general performance.

Assuming a fixed number of disks, CPUs, and RAM evenly distributed accross a cluster with X_1 nodes and a cluster with X_2 nodes, where X_1 < X_2 then:

  • If all nodes stay up jobs will run faster on X_1
  • If nodes die during the job, it's possible that the remaining resources on the X_2 cluster exceed that of the X_1 cluster. It's easier to imagine examples if X_1 is only 1 or 2 nodes. In this case the cost of extra net IO may be less than the loss of resources and so the job may run faster on the X_2
  • If your replication factor is obviously limited by the size of the cluster, if you want replication 3, then you'll need 3 nodes.

Altri suggerimenti

Well i the simple question does not really have a simple answer :) Depends on your use case.

If you have a problem which is easy to divide then i guess have more nodes should be the way to go. Divide and conquer basically.

But if your problem is not easy to divide up then having fewer stronger nodes is the only option you have left.

In general Hadoop is meant for the former kind of problems.

I hope this helps. If you can give us more specifics they we should be able to help out better i guess.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top