Question

My warehouse datamart data is split across 3 databases on the same server. This is Proof-Of-Concept project with three extracts that were loaded into individual databases.

While pulling into a cube, I am essentially doing this:

SELECT * FROM DB1.dbo.Fact_Pres
UNION
SELECT * FROM DB2.dbo.Fact_Pres
UNION
SELECT * FROM DB3.dbo.Fact_Pres

Should I actually consolidate the data into ONE table? Will that make my processing any faster?

I have no issues with disk space - I want to implement the best solution.

In either case, can you help me understand why the method you suggest would be optimal?

Was it helpful?

Solution

The cross-database query is (somewhat) slower than than in-database, If you need three separate tables, it is advisable to use different schemas inside the same db. If this happens to be one fact table, then it is best to load into a a single fact table and use partitioning if the size happens to be too big.

When it comes to ETL, the suggestion is to ETL into staging tables in a separate schema -- like ETL -- within same DB and load the fact table from there. Once done, you truncate staging tables.

The recommendations are from the Microsoft Project Real.

Also keep in mind that foreign key can not be used across databases.

OTHER TIPS

Consider federating your table using SQL Server's first class partitioning (as opposed to doing it yourself). If you're always select every data point, then maybe yeah, pulling off multiple disks is faster.

But why have multiple databases? You could always stack the three tables into one table, but have that one table implemented atop three drives RAIDed together. This is a more clear cut solution if what you're after is speed.

The federating only makes sense if you're ever selecting specific, adjacent parts of the set. But according to your OP, you're selecting everything, so that eliminates that benefit.

Yes, you definitely should. There is no point in splitting the same table in different databases. If you have problems with hard disk space, think about partitioning your table.

Regarding your comment:

The performance cost isn't THAT huge, but a union performs a merge join, which brings a bit ov an overhead.

In addition to that, are you sure you are using UNION correctly? UNION will eliminate duplicate values. Maybe what you really want to do is UNION ALL?

A UNION essentially performs a select distinct so that it can remove duplicate records. This will (potentially) impair performance all by itself. You could rectify that by changing it to UNION ALL.

Beyond that, you should check the execution plan to see what kind of performance hit you're getting. I do know that SQL Server will use indexes even on tables in other databases, but what you're doing here still doesn't make a lot of sense. I'd suggest using table partitioning and keeping it all in one database if the normal usage scenario is to UNION all of the tables together, there is little if any good reason to separate them into different databases.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top