Proper use of varbinary type in MS SQL Server

https://dba.stackexchange.com/questions/186863

09-10-2020
|

سؤال

I am at a new position and I am being told to implement a stored procedure that will accept a list of user ids and update a flag.

When I suggested the use of a table value parameter (array emulation), I was told to implement as demonstrated below, as this is how it's been done before. I'm not a DBA but I am a full stack developer and this smells funny to me.

FYI- the user ids are implemented as type int

CREATE PROCEDURE [dbo].[UpdateUsers]       
 @rgIDs varbinary(max)      -- contains several ids
AS      
 DECLARE @tblTmp TABLE (ID int PRIMARY KEY)      

 DECLARE @ich int, @cch int, @ID int      
 SET @cch = DATALENGTH (@rgIDs)      
 SET @ich = 1   

 WHILE (@ich < @cch)      
 BEGIN      
  SET @ID = SUBSTRING (@rgIDs, @ich, 4) 

  UPDATE dbo.Users u 
  SET isUpdated = 1
  WHERE u.ID = @ID

  SET @ich = @ich + 4      
 END

Doesn't a table valued type work better here? More readable, performance, less error prone, etc...?

I will be using SQL Server 2012 or 2014, I believe. Definitely > 2008.

المحلول

I was told to implement as demonstrated below as this is how its been done before. ... this smells funny to me.

You know what smells funnier than this approach? That line of "reasoning". The old "this is how it's always been done" is simply a means of avoiding thinking about it and discussing it. And that alone should raise a red flag, even if the code is found to be the best approach. I've been told the same thing before and it was for the same sad underlying reason: that approach had been taken for an obsolete reason (such as an older version of SQL Server didn't have a particular feature, but we were using a newer version that did have that feature). Very frustrating.

Doesn't a table valued type work better here? More readable, performance, less error prone, etc...

Assuming that you are using SQL Server version 2008 or newer, then the answer is generally YES to all of that. I mean, the current code is a somewhat clever-ish approach to avoiding a string split operation. But, a TVP can be strongly-typed so no munging the int values in the app layer into a byte[] just so that it can be unpacked in a WHILE loop here. A single set-based UPDATE statement will be much better for performance than N number of UPDATEs based on how many IDs are in that list (the current code creates, and commits) 1 transaction per each UPDATE statement since the WHILE loop is not wrapped in an explicit transaction. Hence, 20 IDs to update = 20 transactions in the current approach. This is part of why the single, set-based UPDATE is more efficient.

Just be sure to implement the full-streaming approach which means: do not dump the values from a collection into a DataTable! Doing so, which is unfortunately what you find in most examples, is a needless waste of time, memory, and CPU. Instead, create a method that implements IEnumerable, accepts the collection of UserIDs, and returns IEnumerable<SqlDataRecord>. Then, use that method as the "value" for the SqlParameter that is the TVP. In that method, cycle through the collection, and per each element, call yield return;.

I have some example code on StackOverflow in my answer to the following question:

Pass Dictionary to Stored Procedure T-SQL

Please note that I have heard, though not yet tested, that on the more recent versions of SQL Server (2014 or newer) that there might be additional performance gains by using in-memory OLTP for the User-Defined Table Type.

Also, since you mentioned this as part of a desire for "more readable code", I would suggest using meaningful parameter and variable names :-). And yes, I do realize that this is probably not your code, but just thought I would mention it. Also, I would get rid of DECLARE @tblTmp as it isn't be used, but again, I realize that this might be a part of a larger example that you redacted.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى dba.stackexchange