SQL Arithmetic Overflow for Identity Insert

https://stackoverflow.com/questions/16156214

11-04-2022
|

Question

I have an error with arithmetic overflow inserting values into a lookup table with a row-id set as TINYINT datatype. This IS NOT a case where the number of unique records exceeds 255 values. This is a bit more unusual and did not occur during the first tests of this setup.

The production version of the code below actually only has 66 unique values, but it is possible that new values could be added (slowly and in very small numbers) over time... 255 available slots should be more than enough for the lifespan of this analysis process.

My initial thoughts were that it may be due to a cached plan recognizing the hierarchical source table has more than 255 values (there are in fact 1028), and evaluating that this may exceed the destination table's capacity. I have tested that this is not true however.

-- This table represents a small (tinyint) subset of unique primary values.
CREATE TABLE #tmp_ID10T_Test (
ID10T_Test_ID tinyint identity (1,1) not null,
ID10T_String varchar(255) not null
PRIMARY KEY CLUSTERED
(ID10T_String ASC)
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = ON, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
) ON [PRIMARY]



-- This table represents a larger (smallint) set of non-unique source values, defined by a secondary key value (Value_Set).
CREATE TABLE #tmp_ID10T_Values (
ID10T_Value_ID smallint identity (1,1) not null,
ID10T_Value_Set tinyint not null,
ID10T_String varchar(255) not null
) ON [PRIMARY]



-- Create the initial dataset - 100 unique records; The insertion tests below illustrate that the INDEX is working
--  correctly on the primary key field for repetative values, however something is happening with the IDENTITY field...
DECLARE @ID10T tinyint
, @i tinyint -- A randomized value to determine which subset of printable ASCII characters will be used for the string.
, @String varchar(255)

SET @ID10T = 0
WHILE @ID10T < 100
    BEGIN

    SET @String = ''
        WHILE LEN(@String) < (1+ROUND((254 * RAND(CHECKSUM(NEWID()))),0))
        BEGIN
            SELECT @i = (1 + ROUND((2 * RAND()),0)) -- Randomize which printable character subset is drawn from.
            SELECT @String = @String + ISNULL(CASE WHEN @i = 1 THEN char(48 + ROUND(((57-48)* RAND(CHECKSUM(NEWID()))),0))
            WHEN @i = 2 THEN char(65 + ROUND(((90-65) * RAND(CHECKSUM(NEWID()))),0))
            WHEN @i = 3 THEN char(97 + ROUND(((122-97) * RAND(CHECKSUM(NEWID()))),0))
            END,'-')
        END

    INSERT INTO #tmp_ID10T_Values (ID10T_Value_Set, ID10T_String)
    SELECT 1, @String

    SET @ID10T = @ID10T + 1

    END

-- Demonstrate that IGNORE_DUP_KEY = ON works for primary key index on string-field
 SELECT * FROM #tmp_ID10T_Values


-- Method 1 - Simple INSERT INTO: Expect Approx. (100 row(s) affected)
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT ID10T_String
FROM #tmp_ID10T_Values
GO


-- Method 2 - LEFT OUTER JOIN WHERE NULL to prevent dupes.
--  this is the test case to determine whether the procedure cache is mixing plans
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT T1.ID10T_String
FROM #tmp_ID10T_Values AS T1
LEFT OUTER JOIN #tmp_ID10T_Test AS t2
ON T1.ID10T_String = T2.ID10T_String
WHERE T2.ID10T_Test_ID IS NULL
GO


-- Repeat Method 1: Duplicate key was ignored (0 row(s) affected).
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT ID10T_String
FROM #tmp_ID10T_Values
GO

This does not seem to be a query plan cache issue - I should see the arithmetic error on Method 1 retests if that were true.

-- Repeat Method 1: Expected: Arithmetic overflow error converting IDENTITY to data type tinyint.
INSERT INTO #tmp_ID10T_Test (ID10T_String)
SELECT DISTINCT ID10T_String
FROM #tmp_ID10T_Values
GO

I am particularly curious why the exception would be thrown. I can understand that in Method 1 all 100 unique values are tested... So conceivably the query agent sees a potential of 200 records after the second insert attempt; I DO NOT understand why it would see a potential for 300 records after a third repetition - the second attempt resulted in 0 rows so at most there would be a potential of 200 unique values.

Can someone explain this please?

Solution

The sequence of events using IGNORE_DUP_KEY is:

record is prepared for insert, including consumption of IDENTITY values. This makes the IDENTITY sequence go up to 200
record is inserted. IGNORE_DUP_KEY is observed and the insertions are silently failed

Now your failed batch of INSERT actually generates the IDENTITY values 201-300, for which anything above 255 overflows your tinyint column.

This is trivially verified by running

select ident_current('tmp_ID10T_Test')

liberally throughout your code. So for your code, annotated below:

-- Method 1 - Simple INSERT INTO: Expect Approx. (100 row(s) affected)
   -- ident_current = 1

-- Method 2 - LEFT OUTER JOIN WHERE NULL to prevent dupes.
   -- ident_current = 100
   -- NOTE: The SELECT doesn't produce any rows. No rows to insert

-- Repeat Method 1: Duplicate key was ignored (0 row(s) affected).
   -- ident_current = 200
   -- NOTE: SELECT produced 100 rows, which consumed IDENTITY numbers. Later ignored

-- Repeat Method 1: Expected: Arithmetic overflow error converting IDENTITY to data type tinyint.
   -- ident_current = 255
   -- error raised when IDENTITY reaches 256

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow