Procedures using TVPs are slower when the TVP numeric value gets larger?

https://dba.stackexchange.com/questions/285526

16-03-2021
|

Pregunta

A legacy application has a nightly job that repeatedly calls some store procedure using a TVP and passes in batches of 10,000 ids that are in sequential order that it needs to process. Now that the ids are in the millions, it seems that this process is taking noticeably longer. The roughly the same number of batch calls are being run each night, but from profiling it seemed that the procedure was getting slower. We checked the usual culprits, rebuilt the indices and updated stats on the tables in use and tried sticking a recompile on the procedure. But nothing fixed the regression.

The procedure does some processing and returns a few result each with a cardinality of maybe 10000 rows. One of my colleagues looked at it and fixed the performance regression by updating the store procedure by simply adding the following to the top of the query:

select id into #t from @ids

and replacing all usages of @ids with #t.

I was amazed at this simple fix, and was trying to understand it more. I tried to create a very simple reproduction.

create table dbo.ids
(
   id int primary key clustered,
   timestamp
);

create type dbo.tvp as table(id int primary key clustered)

insert into dbo.ids(id)
select row_number() over (order by 1/0)
from string_split(space(1414),' ') a,string_split(space(1414),' ') b
go
create or alter procedure dbo.tvp_proc
(
    @ids dbo.tvp readonly
)
as
begin
    declare @_ int = 0, @r int = 5;
    while(@r > 0)
        select @_ = count(*), @r -= 1
        from dbo.ids i
        where exists (
            select 1
            from @ids t
            where t.id = i.id     
        );
end 
go
create or alter procedure dbo.temp_proc
(
    @ids dbo.tvp readonly
)
as
begin
    select * into #t from @ids
    declare @_ int = 0, @r int = 5;
    while(@r > 0)
        select @_ = count(*), @r -= 1
        from dbo.ids i
        where exists (
            select 1
            from #t t
            where t.id = i.id     
        );
end

And here is my simple benchmark.

set nocount on;
declare @s nvarchar(4000)=
'declare @ids tvp;
insert into @ids(id)
select @init + row_number() over (order by 1/0)
from string_split(space(99),char(32)) a,string_split(space(99),char(32)) b
declare @s datetime2 = sysutcdatetime()
create table #d(_ int)
insert into #d
exec dbo.tvp_proc @ids
print concat(right(concat(space(10),format(@init,''N0'')),10),char(9),datediff(ms, @s, sysutcdatetime()))',
@params nvarchar(20)=N'@init int'
print 'tvp result'
exec sp_executesql @s,@params,10000000
exec sp_executesql @s,@params,1000000
exec sp_executesql @s,@params,100000
exec sp_executesql @s,@params,10000
select @s=replace(@s,'tvp_proc','temp_proc')
print 'temp table result'
exec sp_executesql @s,@params,10000000
exec sp_executesql @s,@params,1000000
exec sp_executesql @s,@params,100000
exec sp_executesql @s,@params,10000

Running this benchmark on my machine yields the following results:

tvp result
10,000,000  653
 1,000,000  341
   100,000  42
    10,000  12
temp table result
10,000,000  52
 1,000,000  60
   100,000  57
    10,000  59

The results show that the tvp approach seems to slow down as the ids inside get bigger, where as the temp table stays pretty consistent. Anyone have an idea as to why referencing a tvp with larger values is slower than a temp table?

Solución

Table Variables, even when used as a parameter (TVP), are given very poor cardinality estimates as opposed to Temp Tables which are much more accurately estimated. This difference is especially noticeable as the amount of data grows that's being utilized in a TVP vs Temp Table. If you looked closely at the Estimated Number of Rows vs the Actual Number of Rows in the execution plans of each implementation, you should see Temp Tables are a lot more accurately estimated.

You can read more on TVPs and their downsides in this Jeremiah Peschka post. Specifically the Gotchas section:

First: the table variable that comes in as a table valued parameter cannot be changed. You’re stuck with whatever values show up. No inserts, updates, or deletes can be applied.

Second: table valued parameters are still table variables – they get terrible cardinality estimates.

We can get around both of these problems with the same technique – copy the contents of the TVP into a temp table.

Additionally, TVPs used in procedures can result in parameter sniffing issues, as this other post details. This quote adds some specifics to the cardinality estimates you'll encounter with TVPs regardless of how big the actual Table Variable is:

Table Variables (unless you Recompile, or use a Trace Flag), will sport either a 1 or 100 row estimate, depending on which version of the Cardinality Estimator you use. The old version guesses 1 row, the new guesses 100 rows.

This is an additional good article on cardinality estimate issues that result from Table Variables, by Pinal Dave.

One key reason why having a bad cardinality estimate such as an under-estimate in this case, is because it'll result in the SQL Engine under-provisioning the necessary server resources to process the query and serve the data. For example, your query is likely requesting much less memory than it needs for it to be processed because of the low cardinality estimate making the SQL Engine think there's a lot less rows going to be returned than there actually will be. The larger the table variable gets, the larger the discrepancy between the estimate vs actual.

You should pretty much always opt for Temp Tables when possible for the very reason they have many more performance benefits than a Table Variable, and can do almost everything a Table Variable can do and more. In the cases where you need to use a Table Variable, selecting it into a Temp Table first and using that Temp Table in your subsequent querying is the way to go.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a dba.stackexchange