Question

We are trying to determine whether we are using Service Broker appropriately and getting the max performance out of it. We have been tweaking our SB conversations and handling, and have gone from 3000/minute to 8000/minute, but CPU has stayed constant at 100%. Additionally, on some days the SB queue stays empty, but on similar-traffic days the queue can back up by 500k.

The machine is a quad-quad (16 cores), no HT, 32gb RAM and 26gb assigned to SQL Server, with AWE enabled.

SQL Server 2008 SP1 (no CUs), Enterprise Edition. Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64) Mar 29 2009 10:11:52 Copyright (c) 1988-2008 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.1 (Build 7600: )

The messages are inserted into a service broker queue, which pulls groups of messages and runs them through a CLR, which parses the XML (not a simple parse, alas) and inserts into a table. The CLR is considerably faster than our T-SQL code.

We have an average of 35 runnable tasks per scheduler

We run nightly stats/index maintenance.

We have set server MAXDOP = 1 to try and help performance.

We've upped our number of tempdb files to 64 to avoid SGAM contention, which combined with TF1118 seems to have stopped TEMPDB contention.

Looking at sys.dm_os_waiting_tasks, we typically have ~60 tasks waiting on THREADPOOL, with only a handful on other types.

Our signal waits are 70% (resource waits = 30%).

We've verified that the TokenAndUserPermCache stays under 20mb.

Looking at sys.dm_os_latch_stats, we see 40-200k BUFFER latches in 1 minute, which are mostly on sysdesend and a user table we use to deal with Dialogs.

We also see high SOS_SCHEDULER_WAIT, which also indicates CPU pressure. But is that because of the CLR being freakishly busy, or because of Service Broker overhead? I'll happily provide code - let me know what I need to post here.

Thanks in advance.

Was it helpful?

Solution

  1. Do you use SSB only as a local queueing/processing mechanism, or is there any remote message delivery (x-machine transmission) involved?
  2. How many queues?
  3. Is Activation involved, I assume yes, how many max_queue_readers?
  4. Anything you can asscoiate with the 500k spikes? How long does it take for them to drain?

Some shots in the dark:

~60 tasks waiting for workers on a 16 CPU machine ... I woudl normally consider OK, but for a machine dedicated to SSB processing is a bit weird, as such machines tend to have few long running tasks (the activated jobs) as opposed to many short running ones, so they don't tend to show THREADPOOL waits.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top