Question

We have a stored procedure which purges old data. On a regular basis, it is getting deadlocked with an application stored procedure. Below is the deadlock graph:

Deadlock Graph

Deadlock XML:

<deadlock>   <victim-list>    <victimProcess id="process1a780008c8"/>   </victim-list>   <process-list>    <process id="process1a780008c8" taskpriority="0" logused="1604" waitresource="KEY: 8:72065847945068544 (862c7d05dafc)" waittime="1477" ownerId="4649107786" transactionname="batch_archive" lasttranstarted="2020-12-18T22:05:16.377" XDES="0x27495e1c40" lockMode="U" schedulerid="6" kpid="7560" status="suspended" spid="1049" sbid="0" ecid="4" priority="0" trancount="0" lastbatchstarted="2020-12-18T22:05:01.070" lastbatchcompleted="2020-12-18T22:05:01.070" lastattention="1900-01-01T00:00:00.070" clientapp="SQLAgent - TSQL JobStep (Job 0xAC6E654BCA73184DA4769BFB133908A1 : Step 2)" hostname="SERVER_NAME" hostpid="4024" isolationlevel="read committed (2)" xactid="4649107786" currentdb="6" currentdbname="DB_NAME" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">     <executionStack>      <frame procname="DB_NAME.dbo.PURGE_SP_NAME" line="674" stmtstart="45158" stmtend="45440" sqlhandle="0x03000600119b6e78797169003dab000001000000000000000000000000000000000000000000000000000000">  delete from DBNAME..TABLE_NAME with(rowlock) where email_id in      (select email_id from ARCHIVAL_BATCH    </frame>      <frame procname="adhoc" line="6" stmtstart="166" stmtend="566" sqlhandle="0x010006002208560560621cea2000000000000000000000000000000000000000000000000000000000000000">  EXEC @return_value = [dbo].[PURGE_SP_NAME]    @arch_mon_crit = 93,    @batch_size = 3,    @po_sql_code = @po_sql_code OUTPUT,    @po_sql_msg = @po_sql_msg OUTPU    </frame>     </executionStack>     <inputbuf>    DECLARE @return_value int,    @po_sql_code int,    @po_sql_msg nvarchar(2000)    EXEC @return_value = [dbo].[PURGE_SP_NAME]    @arch_mon_crit = 93,    @batch_size = 3,    @po_sql_code = @po_sql_code OUTPUT,    @po_sql_msg = @po_sql_msg OUTPUT    SELECT @po_sql_code as N&apos;@po_sql_code&apos;,    @po_sql_msg as N&apos;@po_sql_msg&apos;    SELECT &apos;Return Value&apos; = @return_value       </inputbuf>    </process>    <process id="process1a93ff7c28" taskpriority="0" logused="10000" waittime="1457" schedulerid="1" kpid="5312" status="suspended" spid="1049" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2020-12-18T22:05:01.070" lastbatchcompleted="2020-12-18T22:05:01.070" lastattention="1900-01-01T00:00:00.070" clientapp="SQLAgent - TSQL JobStep (Job 0xAC6E654BCA73184DA4769BFB133908A1 : Step 2)" hostname="SERVER_NAME" hostpid="4024" loginname="DOMAIN\SQL_SRVC_ACCOUNT" isolationlevel="read committed (2)" xactid="4649107786" currentdb="6" currentdbname="DB_NAME" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">     <executionStack>      <frame procname="DB_NAME.dbo.PURGE_SP_NAME" line="674" stmtstart="45158" stmtend="45440" sqlhandle="0x03000600119b6e78797169003dab000001000000000000000000000000000000000000000000000000000000">  delete from DBNAME..TABLE_NAME with(rowlock) where email_id in      (select email_id from ARCHIVAL_BATCH    </frame>      <frame procname="adhoc" line="6" stmtstart="166" stmtend="566" sqlhandle="0x010006002208560560621cea2000000000000000000000000000000000000000000000000000000000000000">  EXEC @return_value = [dbo].[PURGE_SP_NAME]    @arch_mon_crit = 93,    @batch_size = 3,    @po_sql_code = @po_sql_code OUTPUT,    @po_sql_msg = @po_sql_msg OUTPU    </frame>     </executionStack>     <inputbuf>    DECLARE @return_value int,    @po_sql_code int,    @po_sql_msg nvarchar(2000)    EXEC @return_value = [dbo].[PURGE_SP_NAME]    @arch_mon_crit = 93,    @batch_size = 3,    @po_sql_code = @po_sql_code OUTPUT,    @po_sql_msg = @po_sql_msg OUTPUT    SELECT @po_sql_code as N&apos;@po_sql_code&apos;,    @po_sql_msg as N&apos;@po_sql_msg&apos;    SELECT &apos;Return Value&apos; = @return_value       </inputbuf>    </process>    <process id="process1a78191468" taskpriority="0" logused="21516" waitresource="OBJECT: 8:1761441349:4 " waittime="3632" ownerId="4649111643" transactionname="user_transaction" lasttranstarted="2020-12-18T22:05:19.960" XDES="0x208e1d4400" lockMode="IX" schedulerid="5" kpid="7936" status="suspended" spid="332" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2020-12-18T22:05:19.963" lastbatchcompleted="2020-12-18T22:05:19.547" lastattention="1900-01-01T00:00:00.547" clientapp="rx-process_jpid[8344]" hostname="APP_SERVER_NAME" hostpid="0" loginname="DOMAIN\SQL_SRVC_ACCOUNT" isolationlevel="read committed (2)" xactid="4649111643" currentdb="8" currentdbname="DBNAME" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">     <executionStack>      <frame procname="DBNAME.dbo.APP_SERVER_NAME1" line="99" stmtstart="3616" stmtend="3868" sqlhandle="0x0300080011c5170b94ab5e01b0a8000001000000000000000000000000000000000000000000000000000000">  INSERT INTO TABLE_NAME_alt ( email_id ,activity_id ,text_content ) VALUES ( @p_email_id ,@p_activity_id ,@p_text_content    </frame>      <frame procname="DBNAME.dbo.APP_SERVER_NAME" line="301" stmtstart="14314" stmtend="15014" sqlhandle="0x0300080012f6881b97ab5e01b0a8000001000000000000000000000000000000000000000000000000000000">  EXEC @v_ret_value = APP_SERVER_NAME1 @p_email_id , @p_activity_id , @p_alias_id , @p_subject , @p_email_date , @p_email_size , @p_num_attachments , @p_charset , @p_message_id , @p_from_email_address , @p_recv_email_address , @p_delete_flag , @p_header , @p_content , @p_content_type , @p_text_content , @v_sql_code OUTPUT , @v_sql_message OUTPU    </frame>      <frame procname="adhoc" line="1" stmtstart="1204" stmtend="1900" sqlhandle="0x0100080051e7691ef0b8d2062300000000000000000000000000000000000000000000000000000000000000">  EXEC @P0 = APP_SP_NAME   @P1  ,  @P2  ,  @P3  ,  @P4  ,  @P5  ,  @P6  ,  @P7  ,  @P8  ,  @P9  ,  @P10  ,  @P11  , &apos;n&apos; , @P12  ,  @P13  ,  @P14  ,  @P15  ,  @P16  ,  @P17  ,  @P18  ,  @P19  ,  @P20 OUT  ,  @P21  ,  @P22  ,  @P23  ,  @P24  ,  @P25  ,  @P26  ,  @P27  ,  @P28  ,  @P29  ,  @P30  ,  @P31  ,  @P32  ,  @P33 OUT  ,  @P34 OU    </frame>      <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">  unknown    </frame>     </executionStack>     <inputbuf>  (@P0 decimal(38,4) OUTPUT,@P1 bigint,@P2 bigint,@P3 nvarchar(4000),@P4 datetime2,@P5 int,@P6 int,@P7 nvarchar(4000),@P8 nvarchar(4000),@P9 nvarchar(4000),@P10 nvarchar(4000),@P11 nvarchar(4000),@P12 nvarchar(4000),@P13 nvarchar(max),@P14 nvarchar(4000),@P15 nvarchar(4000),@P16 nvarchar(4000),@P17 nvarchar(4000),@P18 nvarchar(4000),@P19 nvarchar(4000),@P20 decimal(38,4) OUTPUT,@P21 int,@P22 int,@P23 bigint,@P24 bigint,@P25 nvarchar(4000),@P26 bigint,@P27 datetime2,@P28 int,@P29 nvarchar(4000),@P30 nvarchar(4000),@P31 nvarchar(4000),@P32 bigint,@P33 decimal(38,4) OUTPUT,@P34 nvarchar(4000) OUTPUT)EXEC @P0 = APP_SP_NAME   @P1  ,  @P2  ,  @P3  ,  @P4  ,  @P5  ,  @P6  ,  @P7  ,  @P8  ,  @P9  ,  @P10  ,  @P11  , &apos;n&apos; , @P12  ,  @P13  ,  @P14  ,  @P15  ,  @P16  ,  @P17  ,  @P18  ,  @P19  ,  @P20 OUT  ,  @P21  ,  @P22  ,  @P23  ,  @P24  ,  @P25  ,  @P26  ,  @P27  ,  @P28  ,  @P29  ,  @P30  ,  @P31  ,  @P32  ,  @P33 OUT  ,  @P34 OUT                                                                           </inputbuf>    </process>   </process-list>   <resource-list>    <keylock hobtid="72065847945068544" dbid="8" objectname="DBNAME.dbo.TABLE_NAME" indexname="TABLE_NAME_IDX" id="lock271c759a00" mode="X" associatedObjectId="72065847945068544">     <owner-list>      <owner id="process1a78191468" mode="X"/>     </owner-list>     <waiter-list>      <waiter id="process1a780008c8" mode="U" requestType="wait"/>     </waiter-list>    </keylock>    <exchangeEvent id="Pipe1d4b680400" WaitType="e_waitPipeGetRow" waiterType="Coordinator" nodeId="12" tid="0" ownerActivity="sentData" waiterActivity="needMoreData" merging="false" spilling="false" waitingToClose="false">     <owner-list>      <owner id="process1a780008c8"/>     </owner-list>     <waiter-list>      <waiter id="process1a93ff7c28"/>     </waiter-list>    </exchangeEvent>    <objectlock lockPartition="4" objid="1761441349" subresource="FULL" dbid="8" objectname="DBNAME.dbo.TABLE_NAME_ALT" id="lock23febd7300" mode="X" associatedObjectId="1761441349">     <owner-list>      <owner id="process1a93ff7c28" mode="X"/>     </owner-list>     <waiter-list>      <waiter id="process1a78191468" mode="IX" requestType="wait"/>     </waiter-list>    </objectlock>   </resource-list>  </deadlock> 

I have just started troubleshooting to fix the deadlock. I am not able to get a few things in about the deadlock graph:

  1. Our Purging SP is a single SP which runs as one transaction, then why we are we seeing two processes for it.
  2. What is exchange event? From where did it come?
  3. At what level are locks are being taken? Row, Page or table?
  4. How to proceed further?

NOTE: 1. We are deleting less records batch wise in purging SP.
2. I have removed the actual SP and table Names with Aliases.

UPDATE 1:

After implementing Eric Darling's suggestion mentioned in the comment (i.e adding OPTION (MAXDOP 1) in the problematic statement of purging SP), it is still getting deadlocked, but this time the deadlock graph is different. This time there is only one process for purging SP instead of 2 and also exchange event has disappeared. Now error message is also the traditional one.

I am still puzzled about why our purging SP is getting object level lock even though we are deleting very less number of rows compared to total number of rows in the table.

Any help would be appreciated.

No correct solution

OTHER TIPS

  1. The query in the SP which is deadlocking is running in parallel (multi-threaded), Eric Darling's suggestion of MAXDOP 1 is good in that it eliminates parallelism as the casue of the deadlock. It is clearly something else.
  2. An exchange event happens in parallelism operators when rows are passed between threads.
  3. In this case you have both a row lock (Key Lock) on the NC index TABLE_NAME_IDX and a table lock (Object Lock) on TABLE_NAME_ALT.
    Without access to your code and query plans, I would assume that the order of access of the two queries that are deadlocking are different: the purging SP is probably updating the clustered index with a table lock then going for a row lock on the NC index. Whereas the other query is doing the opposite.
  4. One method is to make sure that access to the table is always in the same order. This can be difficult (careful study of query plans) and fragile (a recompilation can break it).
    A better method in this case would be for one or both queries to have a TABLOCKX hint, so that the access to the table is serialized. This obviously reduces concurrency.
    A further option is simply to retry. Note that this can be problematic inside of an SP due to deadlocking being batch-aborting with XACT_ABORT ON. You may need to run the whole SP in a transaction and catch the error client-side.
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top