(This is a modified version of my post: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-so-many-vnodes-td7588267.html)
The number of tokens per node (let's call it T and the number of nodes N), 256, was chosen to give good load balancing for random token assignments for most cluster sizes. For small T, a random choice of initial tokens will in most cases give a poor distribution of data. The larger T is, the closer to uniform the distribution will be, with increasing probability.
Also, for small T, when a new node is added, it won't have many ranges to split so won't be able to take an even slice of the data.
For this reason T should be large. But if it is too large, there are too many slices to keep track of so performance will be hit. The function to find which keys live where becomes more expensive and operations that deal with individual vnodes e.g. repair become slow. (An extreme example is SELECT * LIMIT 1, which when there is no data has to scan each vnode in turn in search of a single row. This is O(NT) and for even quite small T takes seconds to complete.)
So 256 was chosen to be a reasonable balance. I don't think most users will find it too slow; users with extremely large clusters may need to increase it.