It's a case of reliability, replication vs general performance.
Assuming a fixed number of disks, CPUs, and RAM evenly distributed accross a cluster with X_1 nodes and a cluster with X_2 nodes, where X_1 < X_2 then:
- If all nodes stay up jobs will run faster on X_1
- If nodes die during the job, it's possible that the remaining resources on the X_2 cluster exceed that of the X_1 cluster. It's easier to imagine examples if X_1 is only 1 or 2 nodes. In this case the cost of extra net IO may be less than the loss of resources and so the job may run faster on the X_2
- If your replication factor is obviously limited by the size of the cluster, if you want replication 3, then you'll need 3 nodes.