Question

I've got a SQL Server data warehouse with a Fact Constellation Schema design. I've got to build a report on 4 objects/views:

  1. Sales actuals - fact table [Sales]
  2. Sales targets - fact table [Targets]
  3. Stocks - fact table [Inventory]
  4. Inbound - fact table [Transit]

All the objects/views have the same signature, for example:

Sales actuals: ProductID, RegionID, SalesManagerID, ..., <product data>, <region data>, ..., Quantity;
Sales targets: ProductID, --null--, SalesManagerID, ..., <product data>, -----null----, ..., Quantity;
Stocks:        ProductID, RegionID, -----null-----, ..., <product data>, <region data>, ..., Quantity;
...

To achieve such a signature, each object/view is sourced from a fact table and 5-6 dimension tables. Dimension tables are shared between objects (table with product data, table with region data, ...).

It doesn't take SQL more than 5-10 seconds to calculate each view.

NOW that I want to combine them in a single report, I'm doing this:

Select * from [Sales actuals]
UNION
Select * from [Sales targets]
UNION
Select * from [Stocks]
UNION
Select * from [Inbound]

And here SQL doesn't even manage to retrieve 10% of data within 1 minute. It seems that the query optimizer combines 4 fact tables into one large vector and attaches dimension table - which drives the system crazy.

What I want is to keep views/objects encapsulated. That means, the engine must calculate the views first (4 * 5 secs = 20 secs). and only then apply Union operation (10 secs + some overhead) to retrieve the result.

Question: how can I disable query optimization in nested views to achieve such "computational encapsulation"?

Doing like the compiler does: union the fact tables first and then join the dimension tables - is no option, because I want to keep code interpretable and reusable.

Thanks in advance! Constantine

Was it helpful?

Solution

For starters you probably want to change UNION into UNION ALL ?!?

To have the server do what you want you could add the NOEXPAND table hint in the 'final' view; but to be honest I've never seen it being beneficial to overall performance.

I'm wondering how smart the query optimizer would be if you tried something along the lines of :

;WITH facts (dimension_ids & measures)
         AS (Select * from [Sales actuals]
             UNION ALL
             Select * from [Sales targets]
             UNION ALL
             Select * from [Stocks]
             UNION ALL
             Select * from [Inbound])
SELECT dimension_values, measures
  FROM facts
  JOIN dimension1 ON dim1.id = facts.dim1_id
                 AND dim1.property = @filter_dim1 
  JOIN dimension2 ON dim2.id = ...

etc...

It probably would do what you want, although I'm not convinced it would be better than

SELECT dimension_values, measures
  FROM [Sales actuals] fct
  JOIN dimension1 ON dim1.id = fct.dim1_id
                 AND dim1.property = @filter_dim1 
  JOIN dimension2 ON dim2.id = ...

UNION ALL 

SELECT dimension_values, measures
  FROM [Sales targets] fct
  JOIN dimension1 ON dim1.id = fct.dim1_id
                 AND dim1.property = @filter_dim1 
  JOIN dimension2 ON dim2.id = ...

UNION ALL
etc...

which --going by your explanation-- is what you're running up against and apparently has terrible performance ?

PS: I know this will sound 'basic', but did you run sp_updatestats first ?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top