Which is faster, EXISTS before or after the INSERT?

https://stackoverflow.com/questions/2282513

21-09-2019
|

Question

I have an SP in SQL Server which runs hundreds of times a minute, and needs to check incoming traffic against a database. At the moment it does the following

INSERT INTO table
SELECT @value1,@value2 WHERE NOT EXISTS 
(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2);

However, I could also go with

IF NOT EXISTS(SELECT * FROM table WHERE value1 = @value1 AND value2 = @value2)    
   INSERT INTO table (value1,value2) VALUES (@value1,@value2);

Which would be faster? I get the feeling there's not much difference between them but I'm historically not very good at TSQL... =/

UPDATE: Whoops... meant to state that the EXISTS uses more than 1 value to find if a record exists, so a unique constraint won't work. Edited the sample to reflect that...

Solution

After adding a gazillion comments on this question and its answers, I will have my own go on answering it.

I would not expect any major difference in performance between the two proposed proposed in the original question. On one hand, as pointed out by Ray, the second approach might save you from doing some preparations for the insert, but on the other hand, an RDBMS usually performs best with batch statements, as in the first solution.

KM and DVK suggest adding a UNIQUE constraint, which will make the uniqueness test implicit, but will require you to add some kind of error handling around your INSERT statement. I have a hard time spotting why this should add any additional performance, assuming that you already have an index covering the two columns. If you do not have such index, add it, and reconsider your need for more performance.

Whether the uniqueness check is performed explicit or implicit should not matter AFAIK. If anything is gained by having the check done "inside" the stomach of the DBMS, that gain might just be eaten up by overhead associated with raising and handling errors when duplicates exists.

The bottom line: Assuming an index is already in place, if you still find yourself lusting for performance, my recommendation is that you perform empirical tests on the three suggested solutions. Cook up a small program that simulates the expected input data, and blow each of the three solutions away with a few billion rows, including a plausible amount of duplicates. do this, be sure to post your results :-)

OTHER TIPS

Both variants are incorrect. You will insert pairs of duplicate @value1, @value2, guaranteed.

The correct way to handle this is to enforce a unique constraint on two columns and to always INSERT and handle the constraint violation:

ALTER TABLE Table ADD CONSTRAINT uniqueValue1Value UNIQUE (value1, values2);

and to insert:

BEGIN TRY
   INSERT INTO Table (value1, value2) VALUES (@value1, @value2);
END TRY
BEGIN CATCH
   DECLARE @error_number int, @error_message NVARCHAR(4000), @xact_state INT;
   SET @error_number = ERROR_NUMBER();
   SET @error_message = ERROR_MESSAGE();
   SET @xact_state = XACT_STATE();
   IF (@xact_state = -1)
   BEGIN
     ROLLBACK TRANSACTION;
   END
   IF (@error_number != 2627) /* 2627 is ' Cannot insert duplicate key in object ...' */
   BEGIN
      RAISERROR(N'Error inserting into Table: %i %s', 16,1, @errror_number, @error_message);
   END
ENd CATCH

While these may seem complicated, one has to factor in a little detail named correctness. This is by far simpler when compared with a lock hints based solution. This is also the most performant solution: does only one seek. All other solutions need at least two seeks (one to validate that it can be inserted, one to insert).

In a hardly concurrent environment, a concurrent INSERT can happen in between IF NOT EXISTS and INSERT in your second query.

Your first query will place the shared locks on the record it examines, which will not be lifted until the end of the query, so it will be impossible to insert a new record until the query is running.

However, you should not rely solely on this behavior. Place an additional UNIQUE constraint on the value.

It will not only make the database more consistent, but will create an index which will make the first query more fast.

If you want the values to be unique, why not just create a unique constraint on the value, do an INSERT without SELECT and gracefully handle constraint violation error?

That'd be faster than either of these approaches.

Also, your first approach doesn't work - by the time you get to select, you already inserted the value so select will obviously find what you just inserted.

just do it, and ignore any error (assumes an unique constraint on Value)...

BEGIN TRY
    INSERT INTO Table (value) VALUES (@value);
END TRY
BEGIN CATCH
    PRINT 'it was already in there!'
END CATCH

Since this runs hundreds of times a minute, locking hints should be added to the SELECTs and a transaction to avoid a race condition

(SELECT * FROM Table WITH (UPDLOCK, HOLDLOCK)  WHERE value = @value);

however, my proposed idea of just INSERT and ignore any duplicate constraint error would avoid a race condition as well.

If I had to guess, I would guess the second option would be faster. sql server would not have to do any kind of setup for the insert if the exists fails, whereas in the first one, it might look up some table and field names and prepare for an insert that never happens. However, I would try it in the query analyzer and see what the plan says.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow