8 hosts per node would certainly be a valid way of doing things.
You could potentially also use fewer if you plan to do something like use MPI + threads. It just depends on your application. In general though, it is a safe way to go to say that you plan to use 1 rank per core.