Question

I have an agent job set to run log backups every two hours from 2:00 AM to 11:59 PM (leaving a window for running a full or differential backup). A similar job is set up in every one of my 50 or so instances. I may be adding several hundred instances over time (we host SQL Servers for some of our customers). They all backup to the same SAN disk volume. This is causing latency issues and otherwise impacting performance.

I'd like to offset the job run times on each instance by 5 minutes, so that instance one would run the job at 2:00, 4:00, etc., instance two would run it at 2:05, 4:05, etc., instance three would run it at 2:10, 4:10, etc. and so on. If I offset the start time for the job on each instance (2:00 for instance one, 2:05 for instance two, 2:10 for instance three, etc.), can I reasonably expect that I will get my desired result of not having all the instances run the job at the same time?

Was it helpful?

Solution

If this is the same conversation we just had on twitter: when you tell SQL Server Agent to run every n minutes or every n hours, the next run is based on the start time, not the finish time. So if you set a job on instance 1 to run at 2:00 and run every 2 hours, the 2nd run will run at 4:00, whether the first run took 1 minute, 12 minutes, 45 minutes, etc.

There are some caveats:

  • there can be minor delays due to internal agent synchronization, but I've never seen this off by more than a few seconds
  • if the first run at 2:00 takes more than 2 hours (but less than 4 hours), the next time the job runs will be at 6:00 (the 4:00 run is skipped, it doesn't run at 4:10 or 4:20 to "catch up")

There was another suggestion to add a WAITFOR to offset the start time (and we should discard random WAITFOR, because that is probably not what you want - random <> unique). If you want to hard-code a different delay on each instance (1 minute, 2 minutes, etc.) then it is much more straightforward to do that with a schedule than by adding steps to all of your jobs. IMHO.

OTHER TIPS

Perhaps you could setup a centralized DB that manages the "schedule" and have the jobs add/update a row when they run. This way each subsequent server can start the job that "polls" when it can start. This way any latency in the jobs will cause the others to wait so you don't have a disparity in your timings when one of the servers is thrown off.

Being a little paranoid I'd add a catchall scenario that says after "x" minutes of waiting proceed anyway so that a delay doesn't cascade far enough that the jobs don't run.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top