Question

I have a a view with a complex logic and three levels of deepness (nested views). Because of the complexity I can't paste the execution plan.

As the purpose of the view is to provide some business analytics to data analysts, while they are developing reports they use to check a sample of the view by doing a select (top N) query.

This (top N) queries in the view perform super bad because the optimizer is choosing a different execution plan for this view (afaik CQScanTopSortNew)

I've tried to do some optimizations for the top (N) use case, like using hash joins but this spoils the non top (n) use cases.

The non top (n) performs good. I would like to know how can I prevent the optimizer to choose a different execution plan when it has a top (n) clause without dramatically change the structure or functionality of the view.

For instance, if I add a select distinct inside the view, the optimizer chooses the correct plan always, but the functionality of the view changes.

Was it helpful?

Solution

I can't think of a fully transparent way to achieve what you want with views without disabling row goals in general, which probably won't suit your purposes.

As the purpose of the view is to provide some business analytics to data analysts, while they are developing reports they use to check a sample of the view by doing a select (top N) query.

Perhaps you could convince them to use a TOP (x) PERCENT instead? This has to count the number of rows in the whole set to work out the percentage. Materializing the full set comes with a tempdb cost, but this could be manageable if the data is a reasonable size. In any case it will give a plan without a row goal.

SELECT TOP (1) PERCENT TV.* FROM TheView AS TV;

They may find this more natural than choosing an arbitrary sort order (and one which is guaranteed to need a physical sort.

All that said, it's possible that a detailed investigation would reveal tuning options to make the query with TOP as quick as it should be. That's consulting work though, so beyond what can be addressed here on the basis of estimated plans alone.

OTHER TIPS

First, the plans you pasted were the estimated execution plans, we would need the actual execution plan. I'm guessing that you right-clicked in SSMS and selected, "show execution plan"... instead you need to select "Include Actual Execution Plan" prior to running the query. They are usually the same but the actual plan includes the actual number of rows returned by each table/index.

That said and based on what you posted it appears that the problem is that the TOP query is getting a serial execution plan while the non-TOP one is getting a parallel plan. This would make sense because the optimizer will assign a lower cost to the TOP query as its expecting to do much less work.

The undocumented way to force a parallel execution plan is to use TRACEFLAG 8649 like so:

select top 10 * 
from TheView
where Date = '2020-02-20'
option (querytraceon 8649)

Running undocumented stuff in Production environments is a bad idea. Fortunately, Adam Machanic created make_parallel(): http://dataeducation.com/next-level-parallel-plan-forcing-an-alternative-to-8649/

You would use it like this.

select top 10 * 
from TheView
CROSS JOIN dbo.make_parallel()
where Date = '2020-02-20';

Another thing to consider is that your you are not including an ORDER BY clause with your TOP clause. If the order doesn't matter , you can play around with different ORDER BY's. Forcing a sort may force a parallel plan.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top