Question

I'm looking to refactor the below query to something more readable and modifiable. The first half is identical to the second, with the exception of the database queried from (table names are the same though.)

  SELECT
    Column 1 AS c1,
    ...
    Column N AS cN
  FROM
    database1.dbo.Table1

UNION

  SELECT
    'Some String' as c1,
    ...
    NULL as cN
  FROM
    database1.dbo.Table2

UNION

  SELECT
    Column 1 AS c1,
    ...
    Column N AS cN
  FROM
    database2.dbo.Table1

UNION

  SELECT
    'Some String' as c1,
    ...
    NULL as cN
  FROM
    database2.dbo.Table2

This query is the definition of DRY and is calling to me to be re-written, but I have no idea how!

EDIT: We can't use linq and we desire distinct results; I'm looking to make the query smaller in physical file size, not in results returned.

EDIT: The database I'm querying off is a proprietary ERP database. Restructuring it is not an option.

Was it helpful?

Solution

This is a pretty standard SQL pattern. Sometimes it's easy to inadvisedly transfer OOP/Procedural code principles like DRY to SQL, but they aren't necessarily transferable concepts.

Note how easily you can grok the entire logical design of the query, vs. hunting through submodules. If one of the subexpressions had an extra column, or columns reversed, it would stick out. This is basically a pretty simple SQL statement to grok as an execution unit, where disaggregating it would muddle it.

And when you're debugging, it's handy to be able to use the editor's text highlighting option to selectively exercise parts of the statement - a technique that doesn't exist in procedural code. OTOH, it can get messy trying to trace down all the pieces if they are scattered into views etc. Even CTEs can make this inconvenient.

OTHER TIPS

I'm going to go out on a limb here, and say, based on the information you've given us;

That's as good as it's going to get

One performance tip that I see off the bat is using UNION ALL instead of UNION unless you intentionally want distinct records. A simple UNION will eliminate duplicates which takes time. UNION ALL doesn't do that.

You could rewrite it with dynamic SQL and a loop but I think the result would be worse. If there is enough duplicate code to justify the dynamic sql approach, then I guess it could be justified.

Alternatively, have you considered moving the logic out of the stored procedure into something like LINQ? For many, this isn't an option so I'm just asking.

A final note: resist the urge to fix what's not broken just to make it look cleaner. If cleanup will aide in maintenance, verification, etc., then go for it.

What's the problem? Too long? Too repetitive?

Sometimes you get ugly SQL - not much you can do about it.

I don't see any way to clean it up unless you want to use separate views and then union them together.

I vote for views, which impose near-enough zero overhead (OK, maybe a small compile-time cost but that ought to be all). Then your procs become something of the form

SELECT * FROM database1.view1
UNION
SELECT * FROM database1.view2
UNION
SELECT * FROM database2.view1
UNION
SELECT * FROM database2.view2

I'm not sure if I'd want to condense it any further, although I expect most platforms would tolerate it.

On the Dynamic SQL theme - here is a sample - not sure if it is any better. The benefit is you only have to write the SELECT list once.

DECLARE @Select1 varchar(1000)
DECLARE @Select2 varchar(1000)

DECLARE @SQL varchar(4000)


SET @Select1 = 'SELECT
    Column 1 AS c1,
    ...
    Column N AS cN'


SET @Select2 = 'SELECT
    ''Some String'' as c1,
    ...
    NULL as cN'


SET @SQL = @Select1 + ' FROM database1.dbo.Table1 '

SET @SQL = @SQL + ' UNION ' + @Select2 + ' FROM database1.dbo.Table2 '

SET @SQL = @SQL + ' UNION ' + @Select1 + ' FROM database2.dbo.Table1 '

SET @SQL = @SQL + ' UNION ' + @Select2 + ' FROM database2.dbo.Table2 '


EXEC @SQL

If all your procs look like this - you've probably got an architectural problem.

Are all your calls to table2 just have one useful field? (and because of UNION, end up having just one row?)

I totally second the idea of going with parameterized dynamic SQL and/or code-generation for this job, even going so far as to generate the column list dynamically using INFORMATION_SCHEMA. This isn't exactly what you need but it's a start (you might generate off a table of databases and tables):

DECLARE @template AS varchar(MAX)
SET @template = 'SELECT {@column_list} FROM {@database_name}.dbo.{@table_name}'
DECLARE @column_list AS varchar(MAX)

SELECT @column_list = COALESCE(@column_list + ',', '') + COLUMN_NAME
FROM database1.dbo.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @table_name
ORDER BY ORDINAL_POSITION

DECLARE @sql AS varchar(MAX)
SET @sql = @template
SET @sql = REPLACE(@sql, '{@column_list}', @column_list)
SET @sql = REPLACE(@sql, '{@database_name}', @database_name)
SET @sql = REPLACE(@sql, '{@table_name}', @table_name)

Depending on the number of rows returned, you may be best using UNION ALL on the selects with a select distinct query around it. I've seen a similar problem before and had different execution plans for the two different styles

SELECT DISTINCT subquery.c1, subquery.cN
FROM
(
SELECT Column 1 AS c1, Column N AS cN FROM database1.dbo.Table1
UNION ALL
SELECT 'Some String' as c1, NULL as cN FROM database1.dbo.Table2
UNION ALL
SELECT Column 1 AS c1, Column N AS cN FROM database2.dbo.Table1
UNION ALL
SELECT 'Some String' as c1, NULL as cN FROM database2.dbo.Table2
) subquery
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top