Does an MPI star (hub-and-spoke) communicator perform better than MPI_COMM_WORLD?

Question

Communicator topologies are convenience mapping features, they are not required to change the way that actual communication happens. Even given that in a star or whatever other graph topology there are processes, that are not connected (in the topological sense), this does not prevent them from sending messages to each other, if they know the rank of the other process in the communicator. MPI implementations might use the topology as a hint in order to optimise the communication path, but this would make them very complex pieces of code and at least Open MPI doesn't do that in its collective algorithms (the not very well tested and hence usually disabled hierarch collective component takes hardware hierarchy into consideration, but not virtual topologies).

Topologies may influence communication via ranks reordering if one gives reorder=1 to the communicator constructor. This gives the MPI implementation the freedom to reorder the process ranks so as to match as close as possible their physical placement to the topology scheme supplied to the constructor, given the physical topology of the hardware underneath. There are hardware platforms with dedicated networks for collective operations. For example IBM Blue Gene/P has a global interrupt network that allows for fast implementation of MPI_BARRIER and a specialised collective network that speeds up some collective operations (including broadcasts). But these are only usable on MPI_COMM_WORLD - the fall-back software implementation is used for any other communicator.

Is this behavior implementation dependent?

Yes, it is implementation and system dependent (for implementations that support multiple hardware/communication systems). And this is also the answer to the rest of your question.