Aside from questions about your communications protocol, I see a fundamental problem: How will server B know about your client. Server A knows about, because it was contacted by it. In a query response approach the client waits for server A to respond. Server A knows about the client because it received a request from the client. The client knows to wait for the response from A because it contacted it in the first place. B knows nothing of the client, only about server A. While server A could, in theory send information about the client to B, B is now going to have to able to connect to the client in order to send information back.
A better design is to let server A handle communications both ways and to use server B and any others to handle the work. When you system is to big for a single server A to handle, then you can introduce a load balancer to direct traffic to multiple server A's.