When will a FAST_FORWARD cursor have a work table (and is this something to avoid)?

https://stackoverflow.com/questions/8828996

15-04-2021
|

Question

Background

I noticed whilst experimenting with running total queries that sometimes the estimated plan just shows a "Fetch Query"

Fetch

and the actual plan shows repeated Fetches from the Clustered Index Scan

Fetch Scan

on other occasions (e.g when adding a TOP to the query) the estimated plan shows a "Population Query" stage that populates a work table

Fetch and Populate

With the actual plan showing a clustered index scan to populate the work table then repeated seeks against that work table.

Seeks

Question

What criteria does SQL Server use in choosing one approach over the other?
Would I be right in thinking that the first method (without the additional work table population step) is more efficient?

(Bonus question: If anyone could explain why each scan in the first query counts as 2 logical reads that might be quite enlightening too)

Additional Information

I have found this article here which explains that FAST_FORWARD cursors can either use a dynamic plan or a static plan. The first query in this case appears to be using a dynamic plan and the second one a static plan.

I've also found that if I try

SET @C2 = CURSOR DYNAMIC TYPE_WARNING FOR SELECT TOP ...

The cursor gets implicitly converted to a keyset cursor so it is clear that the TOP construct is not supported for dynamic cursors, perhaps for the reasons in Ruben's answer - Still looking for a definitive explanation of this.

However I have also read that dynamic cursors tend to be slower than their static counterparts (source 1, source 2) which seems surprising to me given that the static variety have to read the source data, copy it, then read the copy rather than just read the source data. The article I referenced earlier mentions that dynamic cursors use markers. Can anyone explain what these are? Is it just a RID or the CI key, or something different?

Script

SET STATISTICS IO OFF

CREATE TABLE #T ( ord INT IDENTITY PRIMARY KEY, total INT, Filler char(8000))

INSERT INTO #T (total) VALUES (37),(80),(55),(31),(53)

DECLARE @running_total INT, 
    @ord INT, 
    @total INT
    
SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C1 AS CURSOR;
SET @C1 = CURSOR FAST_FORWARD FOR SELECT ord, total FROM #T ORDER BY ord;
OPEN @C1;
PRINT 'Initial FETCH C1'
FETCH NEXT FROM @C1 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
  SET @running_total = @running_total + @total
  PRINT 'FETCH C1'
  FETCH NEXT FROM @C1 INTO @ord, @total ;
END

SET @running_total = 0
SET STATISTICS IO ON
DECLARE @C2 AS CURSOR;
SET @C2 = CURSOR FAST_FORWARD FOR SELECT TOP 5 ord, total FROM #T ORDER BY ord;
OPEN @C2;
PRINT 'Initial FETCH C2'
FETCH NEXT FROM @C2 INTO @ord, @total ;
WHILE @@FETCH_STATUS = 0
BEGIN
  SET @running_total = @running_total + @total
  PRINT 'FETCH C2'
  FETCH NEXT FROM @C2 INTO @ord, @total ;
END

PRINT 'End C2'
DROP TABLE #T

Solution

What criteria does SQL Server use in choosing one approach over the other?

It is primarily a cost-based decision. Quoting from the article you linked to, "In situations where the dynamic plan looks promising, the cost comparison may be heuristically skipped. This occurs mainly for extremely cheap queries, though the details are esoteric."

Would I be right in thinking that the first method (without the additional work table population step) is more efficient?

It depends. Dynamic and static cursor plans have different strengths and weaknesses. If all rows will eventually be touched, the static plan is likely to perform better. More on this in a moment.

It is clear that the TOP construct is not supported for dynamic cursors

This is true. All iterators in a dynamic cursor plan must be able to save and restore state, scan forward and backward, process one input row for each output row, and be non-blocking. Top, in general, does not satisfy all these requirements; the class CQScanTopNew does not implement the necessary Set/Get/Goto/Marker(), and ReverseDirection() methods (among others).

I have also read that dynamic cursors tend to be slower than their static counterparts.

This is often true, for Transact-SQL cursors, where most or all of the cursor set is touched. There is a cost associated with saving and restoring the state of a dynamic query plan. Where a single row is processed on each call, and all rows eventually touched, this save/restore overhead is maximized.

Static cursors have the overhead of making a copy of the set (which may be the dominant factor for a large set), but the per-row cost of retrieval is quite small. Keysets have a higher per-row retrieval overhead than static because they must outer join back to the source tables to retrieve non-key columns.

Dynamic cursors are optimal when a relatively small fraction of the set is accessed, and/or retrieval is not row-at-a-time. This is a typical access pattern in many common cursor scenarios, just not the ones blog posts tend to test :)

If anyone could explain why each scan in the first query counts as 2 logical reads that might be quite enlightening too

This is down to the way state is saved for the scan, and the way reads are counted.

The article I referenced earlier mentions that dynamic cursors use markers. Can anyone explain what these are? Is it just a RID or the CI key, or something different?

Markers exist for each iterator in a dynamic cursor plan, not just access methods. The 'marker' is all the state information necessary to restart the plan iterator at the point it left off. For an access method, an RID or index key (with uniquifier if necessary) is a big part of this, but not the whole story by any means.

OTHER TIPS

Just a hunch, but normally a TOP-ORDER BY requires SQL Server to buffer the result in some way (either the index scan's result or indeed the entire result in a temp structure, or anything in between).

One could argue that for cursors this is also necessary even when ordering by the primary key (as in your example), as you cannot allow a TOP 5 cursor to unexpectedly return less than 5 rows when the corresponding SELECT does return exactly 5 rows (or worse: the cursor returns more than 5 rows).

This weird situation could theoretically happen when there are deletes or inserts on the table after the index scan's range has already been determined for the cursor, and the inserts/deletes fall within the index scan's range, but you're not yet done fetching. To prevent this from happening, they might err on the safe side here. (And they just didn't optimize for #temp tables.)

A question though: does SQL Server allow a FETCH FROM SELECT TOP n without an ORDER BY clause? (Haven't got a SQL Server instance running here.) Might be interesting to know what plan that causes.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow