Question

It was a long day, perhaps this is a simple question but i'm stuck anyway.

Basically i have two similar tables Sales and Forecasts. I'm trying to create a view which selects rows from both tables and picks whatever is there for a given model+month+country. If both tables contain data, Sales has priority which means that Forecast rows should be omitted.

To simplify the query i'm using CTE's. Actually the schema of both tables is different and many tables are joined, also Forecasts contains history rows where only the last should be shown.

I have created a simplified schema and data to show you what i'm trying to do:

WITH Sales AS
(
    SELECT 
        ID, Model, Month, Country,
        Amount              = Count,
        [Forecast / Sales]  = 'Sales'
    FROM dbo.Sales
)
, Forecasts AS
(
    SELECT 
        ID, Model, Month, Country,
        Amount              = Count,
        [Forecast / Sales]  = 'Forecast'
    FROM dbo.Forecast
)
SELECT  ID = COALESCE(s.ID, fc.ID), 
        Model = COALESCE(s.Model, fc.Model), 
        Month = COALESCE(s.Month, fc.Month),
        Country = COALESCE(s.Country, fc.Country),
        Amount = COALESCE(s.Amount, fc.Amount),
        [Forecast / Sales] = COALESCE(s.[Forecast / Sales], fc.[Forecast / Sales])
FROM Sales s
FULL OUTER  JOIN Forecasts fc 
    ON s.Model = fc.Model
        AND s.Month = fc.Month
        AND s.Country = fc.Country
ORDER BY ID,Month,Country,Model

Here's a sql-fiddle with sample data: http://sqlfiddle.com/#!3/9081b/9/2

Result:

ID  MODEL   MONTH   COUNTRY AMOUNT  FORECAST / SALES
1   ABC December, 01 2013 00:00:00+0000 Germany 777 Sales
2   ABC January, 01 2014 00:00:00+0000  Germany 999 Sales
3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales
3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales
4   ABC January, 01 2014 00:00:00+0000  UK  600 Forecast
4   ABC February, 01 2014 00:00:00+0000 UK  444 Sales
5   ABC March, 01 2014 00:00:00+0000    UK  500 Forecast

This query returns duplicates according to the ID and the source (last column).

3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales
3   ABC February, 01 2014 00:00:00+0000 Germany 900 Sales

Apparently the Sales rows are being duplicated by multiple Forecast-rows for that model+month+country combination. How do i get only Sales rows if Sales+Forecast rows are available without duplicates and Forecast rows if there are no Sales rows?

Was it helpful?

Solution 2

Lamak's answer provides the reason for the duplicate rows in the result. Here is one solution:

WITH Sales AS
( ... )
, Forecasts AS
( ...)
, Combos AS                             -- get all distinct
(                                       -- model + month + country  
   SELECT Model, Month, Country         -- combinations
   FROM Sales                           -- from Sales
 UNION                                             -- this is UNION DISTINCT
   SELECT Model, Month, Country
   FROM Forecasts                       -- and Forecasts
)
SELECT  ID = COALESCE(s.ID, f.ID), 
        c.Model, 
        c.Month,
        c.Country,
        Amount = COALESCE(s.Amount, f.Amount),
        [Forecast / Sales] = COALESCE(s.[Forecast / Sales], 
                                      f.[Forecast / Sales])
FROM Combos c
  LEFT JOIN Sales s
    ON  s.Model = c.Model
    AND s.Month = c.Month
    AND s.Country = c.Country
  LEFT JOIN Forecasts f 
    ON  s.Model IS NULL           -- join Forecasts only if there is no Sales
    AND f.Model = c.Model
    AND f.Month = c.Month
    AND f.Country = c.Country
ORDER BY ID, Month, Country, Model ;

Test at: SQL-Fiddle

OTHER TIPS

The problem with your query isn't the use of COALESCE, but simply with the JOIN. There are 2 rows in the Forecast table that have the same combination of Model, Month, Country, rows with ID 2 and 3:

╔════╦═══════╦═════════════════════════╦═════════╦═══════╗
║ ID ║ Model ║          Month          ║ Country ║ Count ║
╠════╬═══════╬═════════════════════════╬═════════╬═══════╣
║  2 ║ ABC   ║ 2014-02-01 00:00:00.000 ║ Germany ║  1100 ║
║  3 ║ ABC   ║ 2014-02-01 00:00:00.000 ║ Germany ║   900 ║
╚════╩═══════╩═════════════════════════╩═════════╩═══════╝

Both of them join with the row ID 3 from the Sales table:

╔════╦═══════╦═════════════════════════╦═════════╦═══════╗
║ ID ║ Model ║          Month          ║ Country ║ Count ║
╠════╬═══════╬═════════════════════════╬═════════╬═══════╣
║  3 ║ ABC   ║ 2014-02-01 00:00:00.000 ║ Germany ║   900 ║
╚════╩═══════╩═════════════════════════╩═════════╩═══════╝

And since your query is using COALESCE(s.ID, fc.ID), then you get 2 rows with ID 3 in the results

It appears you simply want to return the entire Sales set and complement it with entries from Forecasts that are not found in Sales. For that, I would probably just use UNION ALL like this:

WITH Sales AS
(
  ...
)
, Forecasts AS
(
  ...
)

SELECT ID, Model, Month, Country, Amount, [Forecast / Sales]
FROM Sales

UNION ALL

SELECT ID, Model, Month, Country, Amount, [Forecast / Sales]
FROM Forecasts
WHERE NOT EXISTS
(
  SELECT Model, Month, Country
  INTERSECT
  SELECT Model, Month, Country
  FROM Sales
);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top