How to handle failed Index Optimization (Ola Hallengren) step in SQL Azure Elastic Job Agent?

https://dba.stackexchange.com/questions/233673

26-01-2021
|

Question

I have created a SQL maintenance job using Azure Elastic Job Agent with the following step:

EXEC jobs.sp_add_jobstep @job_name = @jobName,
@step_name = 'Optimize indexes and statistics',
@command=N'
    EXECUTE dbo.IndexOptimize
    @Databases = ''USER_DATABASES'',
    @FragmentationLow = NULL,
    @FragmentationMedium = ''INDEX_REORGANIZE,INDEX_REBUILD_ONLINE'',
    @FragmentationHigh = ''INDEX_REBUILD_ONLINE'',
    @FragmentationLevel1 = 10,
    @FragmentationLevel2 = 30,
    @MinNumberOfPages = 10,
    @TimeLimit = 3600,
    @UpdateStatistics = ''ALL'',
    @OnlyModifiedStatistics = ''Y'',
    @SortInTempdb = ''Y'',
    @MaxDOP = 1,
    @LogToTable = ''Y''
',
@credential_name = @jobStepCredName,
@target_group_name= @targetGroupName,
@retry_attempts  = 0,
@step_timeout_seconds = 3600,
@max_parallelism = 1 -- IMPORTANT! We don't want to run index optimization on multiple databases at the same time

The code uses stored procedure dbo.IndexOptimize provided by Ola Hallengren.

The job is scheduled to run daily at 5 am and fails once or twice per week. The reason for failure is the internal issue to the Azure Elastic Job Agent: "Jobs service restarted while this task was in progress.". The service from Azure is still in preview so internal service errors are expected.

My current solution is to set @retry_attempts to some number greater than 0 so that Job Agent can retry the step, however I'm not sure if that is a good idea to retry a failed step for index optimization.

In particular, I'm not sure what would happen to the INDEX REBUILD, INDEX REORGANIZE, OR UPDATE STATISTICS processes if they would be cancelled or killed.

So to summarize I have the following questions:

Is this a good idea to retry index maintenance if step fails?
What happens when INDEX REBUILD, INDEX REORGANIZE, OR UPDATE STATISTICS processes fail or terminate.

I appreciate your feedback on this matter.

Solution

In your case you should add the @Resumable = 'Y' option to your job and either restart or just wait for the next schedule to run the job and finish, depending on time and concurrency issues, as running index rebuild while many users are using the database can be an issue.

Statistic and index rebuilds were transactional in earlier versions so if an alter index rebuild statement failed the results were rolled back and the old object used but in 2017 onward there is a new option WITH (RESUMABLE=ON) that allows you to resume the index rebuild later.

You should also look at the @WaitAtLowPriorityMaxDuration and @LockTimeout parameters in Ola's scripts as those can help in minimizing blocking.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange