Question

I am working on a proof of concept implementation of NServiceBus v4.x for work.

Right now I have two subscribers and a single publisher.

The publisher can publish over 500 message per second. It runs great.

Subscriber A runs without distributors/workers. It is a single process.

Subscriber B runs with a single distributor powering N number of workers.

In my test I hit an endpoint that creates and publishes 100,000 messages. I do this publish with the subscribers off line.

Subscriber A processes a steady 100 messages per second. Subscriber B with 2+ workers (same result with 2, 3, or 4) struggles to top 50 messages per second gross across all workers.

It seems in my scenario that the workers (which I ramped up to 40 threads per worker) are waiting around for the distributor to give them work.

Am I missing something possibly that is causing the distributor to be throttled? All Buses are running an unlimited Dev license.

System Information: Intel Core i5 M520 @ 2.40 GHz 8 GBs of RAM SSD Hard Drive

UPDATE 08/06/2013: I finished deploying the system to a set of servers. I am experiencing the same results. Every server with a worker that I add decreases the performance of the subscriber.

Subscriber B has a distributor on one server and two additional servers for workers. With Subscriber B and one server with an active worker I am experiencing ~80 messages/events per second. Adding in another worker on an additional physical machine decreases that to ~50 messages per second. Also, these are "dummy messages". No logic actually happens in the handlers other than a log of the message through log4net. Turning off the logging doesn't increase performance.

Suggestions?

Was it helpful?

Solution

If you're scaling out with NServiceBus master/worker nodes on one server, then trying to measure performance is meaningless. One process with multiple threads will always do better than a distributor and multiple worker nodes on the same machine because the distributor will become a bottleneck while everything is competing for the same compute resources.

If the workers are moved to separate servers, it becomes a completely different story. The distributor is very efficient at doling out messages if that's the only thing happening on the server.

Give it a try with multiple servers and see what happens.

OTHER TIPS

Rather than have a dummy handler that does nothing, can you simulate actual processing by adding in some sleep time, say 5 seconds. And then compare the results of having a subscriber and through the distributor?

Scaling out (with or without a distributor) is only useful for where the work being done by a single machine takes time and therefore more computing resources helps. To help with this, monitor the CriticalTime performance counter on the endpoint and when you have the need, add in the distributor. Scaling out using the distributor when needed is made easy by not having to change code, just starting the same endpoint in distributor and worker profiles.

The whole chain is transactional. You are paying heavy for this. Increasing the workload across machines will really not increase performance when you do not have very fast disk storage with write through caching to speed up transactional writes.

When you have your poc scaled out to several servers just try to mark a messages as 'Express' which does not do transactional writes in the queue and disable MSDTC on the bus instance to see what kind of performance is possible without transactions. This is not really usable for production unless you know where this is not mandatory or what is capable when you have a architecture which does not require DTC.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top