Queries and updates extremely slow after IndexOptimize
-
07-02-2021 - |
Question
Database SQL Server 2017 Enterprise CU16 14.0.3076.1
We recently tried switching from the default Index Rebuild maintenance jobs to the Ola Hallengren IndexOptimize
. The default Index Rebuild jobs had been running for a couple of months without any issues, and the queries and updates were working with acceptable execution times.
After running IndexOptimize
on the database:
EXECUTE dbo.IndexOptimize
@Databases = 'USER_DATABASES',
@FragmentationLow = NULL,
@FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationLevel1 = 5,
@FragmentationLevel2 = 30,
@UpdateStatistics = 'ALL',
@OnlyModifiedStatistics = 'Y'
performance was extremely degraded. An update statement that took 100ms before IndexOptimize
took 78.000ms afterwards (using an identical plan), and queries were also performing several orders of magnitude worse.
Since this still is a test database (we're migrating a production system from Oracle) we reverted to a backup and disabled IndexOptimize
and everything returned to normal.
However, we would like to understand what IndexOptimize
does differently from the "normal" Index Rebuild
that could have caused this extreme performance degradation in order to make sure we avoid it once we go to production. Any suggestions on what to look for would be greatly appreciated.
Execution plan for the update statement when it is slow. i.e.
After IndexOptimize
Actual execution plan (coming asap)
I haven't been able to spot a difference.
Plan for the same query when it is fast
Actual execution plan
Solution
I suspect you've got a different sample rate defined between your two maintenance approaches. I believe Ola's scripts use default sampling unless you specify the @StatisticsSample
parameter, which it doesn't look like you're currently doing.
At this point, this is speculation, but you can check to see what sampling rate is currently being used on your statistics by running the following query in your database:
SELECT OBJECT_SCHEMA_NAME(st.object_id) + '.' + OBJECT_NAME(st.object_id) AS TableName
, col.name AS ColumnName
, st.name AS StatsName
, sp.last_updated
, sp.rows_sampled
, sp.rows
, (1.0*sp.rows_sampled)/(1.0*sp.rows) AS sample_pct
FROM sys.stats st
INNER JOIN sys.stats_columns st_col
ON st.object_id = st_col.object_id
AND st.stats_id = st_col.stats_id
INNER JOIN sys.columns col
ON st_col.object_id = col.object_id
AND st_col.column_id = col.column_id
CROSS APPLY sys.dm_db_stats_properties (st.object_id, st.stats_id) sp
ORDER BY 1, 2
If you see this is coming through a 1s (e.g. 100%) chances are this is your issue. Maybe try Ola's scripts again including the @StatisticsSample
parameter with the percentage getting returned by this query and see if that fixes your problem?
As additional supporting evidence for this theory, the execution plan XML shows vastly different sampling rates for the slow query (2.18233 %):
<StatisticsInfo LastUpdate="2019-09-01T01:07:46.04" ModificationCount="0"
SamplingPercent="2.18233" Statistics="[INDX_UPP_4]" Table="[UPPDRAG]"
Schema="[SVALA]" Database="[ulek-sva]" />
Versus the fast query (100 %):
<StatisticsInfo LastUpdate="2019-08-25T23:01:05.52" ModificationCount="555"
SamplingPercent="100" Statistics="[INDX_UPP_4]" Table="[UPPDRAG]"
Schema="[SVALA]" Database="[ulek-sva]" />
OTHER TIPS
John's answer is the correct solution, this is just an addition as to what parts of the execution plan changed and en example on how to easily spot the differences with Sentry One Plan explorer
An update statement that took 100ms before IndexOptimize took 78.000ms afterwards (using an identical plan)
When looking at all the query plans when your performance was degraded, you can easily spot the differences.
Degraded performance
Two counts of over 35 seconds of cpu time & elapsed time
Expected performance
Much better
The main degradation is twice on this update query:
UPDATE SVALA.INGÅENDEANALYS
SET
UPPDRAGAVSLUTAT = @NEW$AVSLUTAT
WHERE INGÅENDEANALYS.ID IN
(
SELECT IA.ID
FROM
SVALA.INGÅENDEANALYS AS IA
JOIN SVALA.INGÅENDEANALYSX AS IAX
ON IAX.INGÅENDEANALYS = IA.ID
JOIN SVALA.ANALYSMATERIAL AS AM
ON AM.ID = IA.ANALYSMATERIALID
JOIN SVALA.ANALYSMATERIALX AS AMX
ON AMX.ANALYSMATERIAL = AM.ID
JOIN SVALA.INSÄNTMATERIAL AS IM
ON IM.ID = AM.INSÄNTMATERIALID
JOIN SVALA.INSÄNTMATERIALX AS IMX
ON IMX.INSÄNTMATERIAL = IM.ID
WHERE IM.UPPDRAGSID = SVALA.PKGSVALA$STRIPVERSION(@NEW$ID)
)
the execution plan for this query with degraded performance
The estimated query plan of this update query has very high estimates when performance was degraded:
While in reality (the actual execution plan) it still has to do work, just not the crazy amount that the estimates show.
The biggest impact on performance is the two scans & hash match joins below:
Actual scan on degraded performance #1
Actual scan on degraded performance #2
The execution plan for this query with expected performance
When you compare that to the estimates (or actuals) of the query plan with normal expected performance, the differences are easy to spot.
Also, the previous two table accesses did not even happen:
You don't see this elimination on the hash join because the build (top) input is inserted into the hash table first. Afterwards zero values are probed in this hash table, returning zero values.
Without more information we can only take lightly informed stabs in the dark, so you should edit the question to provide a little more. For instance the query plans for that update statement that you have given the timings for, both before and after the index maintenance operations as the plans may differ due to the index stats having been updated (https://www.brentozar.com/pastetheplan/ is useful for this, rather than filling the question with what might be a huge chunk of XML or giving a screen-grab which doesn't include some of the pertinent information the text of the plan contains).
Two very simple points off the bat though:
- Has the optimise run definitely completed? If your tests are competing with the IO of long running index rebuilds that will affect the timings.
- Did you test multiple times? If the update is based on data from a query that considers a lot of data (rather than a simple `UPDATE TheTable SET ThisColumn = 'A Static Value') then it may be that this data is normally in memory but has been flushed out in which the first runs of related queries will be slower than usual due to hitting disk rather than finding the needed pages already in the buffer pool in memory.