Cursor Replacement for Newbies

https://dba.stackexchange.com/questions/233884

26-01-2021
|

Question

I'd like to know what the general replacement is for a cursor. The general implementation of a cursor I see out and about is

DECLARE @variable INT, @sqlstr NVARCHAR(MAX)

DECLARE cursor_name CURSOR
FOR select_statement --essentially to get an array for @variable 
                     --usually it's a subset of unique ids for accounts, clients, parts, etc

OPEN cursor_name
FETCH NEXT FROM cursor_name INTO @variable
WHILE @@FETCH_STATUS = 0
BEGIN
     SET @sqlstr = N'
     /* some query that uses '+ str(@variable) +' to do dirty work
     such as: go through all our accounts, if it''s some subset (possible new cursor), 
     go through those accounts and connect this way, 
     map those fields and add it to our big uniform table */
     '

     EXEC sp_executesql @sqlstr
FETCH NEXT FROM cursor_name INTO @variable
END

CLOSE cursor_name
DEALLOCATE cursor_name

Since so many people are anti-cursor (with a nod to SO: Why do people hate cursors) what is the general replacement for the general implementation (preferably SQL Server)?

Solution

It depends™

The ability to work around one or multiple cursors, will depend on what is going to be executed inside of this cursor. Without knowing what is going on in it, there is no way to tell. It could be that there is no workaround, and you have to do row by row processing.

Below are a some examples.

Not working in sets

This example is the most basic one, and is simply the fact that you could query your entire dataset or parts of your dataset at once, but the cursor was created and is querying the data row by row. Common ones to replace this with are JOIN's, CROSS APPLY / OUTER APPLY and others.

Consider the following data set:

CREATE TABLE dbo.Lotr(LotrId int, CharacterName varchar(255), Val varchar(255));
CREATE TABLE dbo.LotrAttributes(LotrATtributeId int, LotrId int, AttrVal varchar(255));

INSERT INTO dbo.Lotr(LotrId,CharacterName,Val)
VALUES(1,'Frodo','Ring')
,(2,'Gandalf','Staff');

INSERT INTO dbo.LotrAttributes(LotrId,LotrATtributeId,AttrVal)
VALUES(1,1,'RingAttribute1')
,(1,2,'RingAttribute2')
,(1,3,'RingAttribute3')
,(2,4,'StaffAttribute1')
,(2,5,'StaffAttribute2');

One could try and find each record and it matches separately, by looping over the Lotr Table.

Cursor:

DECLARE @LotrID int
DECLARE C CURSOR FOR SELECT LotrId from dbo.Lotr;
OPEN C
FETCH NEXT FROM C INTO @LotrID;
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT LotrATtributeId from dbo.LotrAttributes where LotrId = @LotrID;
FETCH NEXT FROM C INTO @LotrID;
END
CLOSE C
DEALLOCATE C

Resulting in two result sets

LotrATtributeId
1
2
3
LotrATtributeId
4
5

When this inner join is used, we get the same result as one resultset.

SELECT LotrATtributeId from dbo.Lotr L
INNER JOIN dbo.LotrAttributes LA 
ON L.LotrId = LA.LotrId;

LotrATtributeId
1
2
3
4
5

String Manipulation

A common one is to use FOR XML PATH('') to replace string manipulations inside of cursors.

Dataset

CREATE TABLE dbo.Lotr(LotrId int, CharacterName varchar(255), Val varchar(255));
CREATE TABLE dbo.LotrAttributes(LotrATtributeId int, LotrId int, AttrVal varchar(255));

INSERT INTO dbo.Lotr(LotrId,CharacterName,Val)
VALUES(1,'Frodo','Ring');

INSERT INTO dbo.LotrAttributes(LotrId,LotrATtributeId,AttrVal)
VALUES(1,1,'RingAttribute1')
,(1,2,'RingAttribute2')
,(1,3,'RingAttribute3');

Double cursor with string manipulation

DECLARE @LotrId int, @CharacterName varchar(255), @Val varchar(255)
DECLARE @LotrATtributeId int, @AttrVal varchar(255)
DECLARE C CURSOR FOR
SELECT LotrId,CharacterName, Val FROM dbo.Lotr
OPEN C
FETCH NEXT FROM C INTO @LotrId,@CharacterName,@Val
WHILE @@FETCH_STATUS = 0
BEGIN

        SET @CharacterName +='|'+ @Val

        DECLARE D CURSOR FOR
        SELECT LotrATtributeId, AttrVal FROM dbo.LotrAttributes where LotrId = @LotrId
        OPEN D
        FETCH NEXT FROM D INTO @LotrATtributeId,@AttrVal
        WHILE @@FETCH_STATUS = 0
        BEGIN
        SET @CharacterName +='['+@AttrVal+ '],'

        FETCH NEXT FROM D INTO @LotrATtributeId,@AttrVal
        END
        CLOSE D 
        DEALLOCATE D

FETCH NEXT FROM C INTO @LotrId,@CharacterName,@Val
END
CLOSE C
DEALLOCATE C
SELECT LEFT(@CharacterName,len(@charactername)-1);

Result

(No column name)
Frodo|Ring[RingAttribute1],[RingAttribute2],[RingAttribute3],

Removing the cursors with FOR XML PATH('')

SELECT L.Charactername +'|'+ L.Val + (SELECT stuff((SELECT ','+QUOTENAME(AttrVal) FROM dbo.LotrAttributes LA WHERE LA.LotrId = L.LotrId FOR XML PATH('')), 1, 1, ''))
FROM
dbo.Lotr L;

*

The real workaround here would be figuring out why the data is presented in this manner, and changing the application/... as to not need it in this format, storing it somewhere, ....

If your hands are tied, this would be the next best thing.

Insert top 10 values into a temp table based on Id's in another table

Data

CREATE TABLE dbo.sometable(InsertTableId int, val varchar(255)); CREATE TABLE dbo.Top10Table(Top10TableId int, InsertTableId int, val varchar(255));

INSERT INTO dbo.sometable(InsertTableId,val)
VALUES(1,'bla')
,(2,'blabla');
INSERT INTO dbo.Top10Table(Top10TableId,InsertTableId,Val)
VALUES(1,1,'WUW')
,(2,1,'WUW')
,(3,1,'WUW');

Cursor

CREATE TABLE #Top10Values(Top10TableId int, InsertTableId int, val varchar(255))

    DECLARE @InsertTableId int;
    DECLARE C CURSOR FOR select InsertTableId from dbo.sometable;
    OPEN C
    FETCH NEXT FROM C INTO @InsertTableId;
    WHILE @@FETCH_STATUS =0
    BEGIN
    INSERT INTO #Top10Values(Top10TableId,InsertTableId,val)
    SELECT top(10) Top10TableId,InsertTableId,Val FROM dbo.Top10Table 
    where InsertTableId = @InsertTableId
    ORDER BY Top10TableId 

    FETCH NEXT FROM C INTO @InsertTableId;
    END
    CLOSE C
    DEALLOCATE C

    SELECT * FROM  #Top10Values;
    DROP TABLE #Top10Values;

Result

Top10TableId    InsertTableId   val
1   1   WUW
2   1   WUW
3   1   WUW

Replacing the cursor with CROSS APPLY and a CTE

CREATE TABLE #Top10Values(Top10TableId int, InsertTableId int, val varchar(255));
;WITH CTE 
AS
(
select InsertTableId  from dbo.sometable
)

INSERT INTO #Top10Values(Top10TableId,InsertTableId,val)
SELECT  T1T.Top10TableId,T1T.InsertTableId,T1T.Val 
FROM 
CTE
CROSS APPLY (SELECT TOP (10) Top10TableId,InsertTableId,Val from dbo.Top10Table T1T
WHERE T1T.InsertTableId = CTE.InsertTableId
) T1T ;

SELECT * FROM  #Top10Values;
DROP TABLE #Top10Values;

Other examples

An example on replacing a cursor to select a dynamic set of items per Supplier by using CROSS APPLY here.
An example on using windowing functions to replace a cursor here.

Sometimes there is no other choice

If you cannot work in sets, and have to do row by row processing, you could still optimize the cursor.

One of the biggest changes in speeding up the cursor is by adding LOCAL FAST_FORWARD to it.

DECLARE C CURSOR LOCAL FAST_FORWARD FOR SELECT LotrId from dbo.Lotr

Take a look at this blogpost by @AaronBertrand where he explains the possible differences in performance when using or not using cursor settings like LOCAL & FAST_FORWARD.

OTHER TIPS

There is no "general replacement" - you hid all the "dirty work" here so it's hard to tell if there is even a specific replacement in this case. There are certainly some specific cases where you're processing a set of rows one row at a time, whether using a cursor, or while loop, or any other iterative process, where converting to a set-based process that processes all the rows at once is much better. But there are other things that just have to be done one row at a time, like executing a stored procedure or some dynamic SQL per row, the same query across multiple databases, etc.

Cursor or not, the problems you're alluding to and linking are the same whether you use declare cursor or some other looping struct (see this post), and are irrelevant when the thing you have to do has to be done one row at a time anyway. So if you provide some specific details about what this cursor is doing, you might get some advice about how to remove the cursor (or that you can't), but your search for a magical eliminate-all-cursors approach that you can apply to all scenarios is going to be pretty frustrating for you.

The general advice for new people entering the language, IMHO, should be to always think about what you need to do to a set of rows, as opposed to what you need to do to each row in a set. The difference in the language is subtle, but crucial. If people think about the problem as a set of data instead of a bunch of individual rows, they might be less likely to use a cursor by default. But if they come from different types of programming - where iterative is the best/only way - other than simply teaching them that SQL Server isn't optimized to work that way, I don't know that there's any way to make that obvious or automatic.

Your question still asks for a general replacement, and I still believe there is no such thing.

Doug Lane did a series of videos called "T-SQL Level Up" that are on YouTube. Part of the series explores a general approach to removing cursors that goes something like this:

Remove all the cursor language (declare cursor, open, fetch, while, close, deallocate, etc) and other variable declarations
Identify places where set-based operations can be combined (variables populated by a SELECT that are later used in an INSERT might be replaced by an INSERT INTO...SELECT statement, for example)
Move conditional logic (IF...ELSE) into WHERE clauses, CASE statements, subqueries, etc

As the other great answers here have pointed out, there's no silver bullet for this. But these videos are, in my opinion, a really intuitive approach to solving the problem.

Doug goes through three cursor replacements of increasing complexity in each part, I'd highly recommend watching (as the whole deal comes across better in video):

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange