The problem is because of TCP connection keep-alive - when the clients first connect the connection is established to existing instances and then it persists to those instances. So when the service scales out the clients won't reconnect unless the connection is broken. New clients will connect to both existing and new instances.
Here's another question for a very similar scenario. For testing purposes you can just disable keep-alive to ensure that load is indeed distributed between instances.