Question

I've just upgraded our Data warehouse to SQL 2016. I'm seeing some really interesting graphs in the Query Store (I love this feature!). Below is the weirdest example i've seen. 22 plans for the same query.

enter image description here

It's making me consider performance tuning of my ETL process and the pros and cons of temporary tables and how you could influence execution plan behavior.

My ETL process uses a number of stored procedures which use mix of standard and temporary #tables as staging tables. The #tables are typically used once and then dropped. Some are only a few thousand rows. Some are millions. SSMS advises that there are missing indexes, but on smaller tables would they make enough of difference to be worth the effort of adding them? Are better statistics sufficient?

I've just read this Brent Ozar blog post about Statistics on Temp tables, and Paul White's article on Temporary Tables in Stored procedures

It says that statistics are created automatically, when the #table is queried, and then presumably used by the optimizer.

My questions are: Is there much point or benefit in creating an index on a #table. And/or: Is it worth explicitly updating statistics as a step in the stored procedure before using it in queries given they're only used once.

Are the additional steps and overhead worth it? Would it result in significantly better or different execution plans?

Was it helpful?

Solution

There can be benefit in creating indexes on temporary tables, but maybe not for a staging table. It's an "it depends" answer, unfortunately. You will need to test. If you posted the code for how you are interacting with the staging table, we could help determine if any indexes would help. An example of where an index might help is if you were joining the temp table to another table. If you were to index the joined column, there could be performance gains, especially if there are a lot of rows in the temp table.

You probably do not need to update statistics on the temporary tables. It's also an "it depends" answer, though I've never seen an update stats on temp tables in any of the thousands upon thousands of stored procedures I've looked at, nor have I needed to add it to resolve a performance issue.

OTHER TIPS

Statistics alone are not sufficient. The storage engine has to have some way of getting to the rows that match the query predicate. There is no value in knowing that, say, three rows match the condition out of one million in the table if it cannot determine which three they are. Without an index the only strategy is a table scan. One million rows will be read. 99.9997% will be discarded. With a matching index the pointers can be followed to pick out just the three rows required.

With small tables that need only a few pages one must take into account the effort to read the index pages. Let's say a non-clustered index that matches the query exactly only just needs two levels. That's two page reads to follow the keys. Then the clustered index is followed. That could well be two more page reads. So if the entire table's less than 4 pages that non-clustered index is unlikely to get used.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top