Question

I have a sales table that is keyed by sku, store, and period. From this, I need a query that returns a record containing both This Year and Last Year's information.

The logic behind the query below is this:

  1. Calculate last year sales (in the with table)
  2. Calculate this year sales in the main body (WHERE CLAUSE)
  3. Join the "LAST YEAR" table to the main table. Only joining on sku and store (you cannot join by date because they will not overlap)

My problem is that the results for last year are not the entire amount. My results act as though I am doing a LEFT JOIN, and not returning all the results from the "LAST YEAR" table.

Additional Detail:

  • When I run a LEFT JOIN, and a FULL OUTER JOIN, I get the same results.
  • When I execute the "WITH" clause independently, the results are correct
  • When I run the entire statement, last year sales are not the full amount

The code below has been simplified some... I'm not so worried about the syntax, but more about the LOGIC. If anyone has any ideas, or know possible flaws in my logic, I'm all ears! Thanks in advance!

WITH lastYear AS (                                                 
    SELECT 
        spsku "sku", 
        spstor "store", 
        sum(spales) "sales_ly"   
    FROM SALES                                              
    WHERE spyypp BETWEEN 201205 AND 201205 
    GROUP BY spstor, spsku
)                                                                  
SELECT 
    Sales_report.spstor "store", 
    sum(spales) "bom_retail", 
    sum(LY."sales_ly") "sales_ly"
FROM SALES Sales_report                              
FULL OUTER JOIN lastYear LY ON LY."sku" = spsku AND LY."store" = spstor
WHERE spyypp BETWEEN 201305 AND 201305      
GROUP BY spstor
Was it helpful?

Solution

The clause WHERE spyypp BETWEEN 201305 AND 201305 has the consequence of coercing your join into an INNER JOIN, as it is performed after the join is completed.

In order to achieve the effect you desire you must move this clause into the ON condition like this so that the clause is applied before the join is:

WITH lastYear AS (                                                 
    SELECT 
        spsku "sku", 
        spstor "store", 
        sum(spales) "sales_ly"   
    FROM SALES                                              
    WHERE spyypp BETWEEN 201205 AND 201205 
    GROUP BY spstor, spsku
)                                                                  
SELECT 
    Sales_report.spstor "store", 
    sum(spales) "bom_retail", 
    sum(LY."sales_ly") "sales_ly"
FROM SALES Sales_report                              
FULL OUTER JOIN lastYear LY
    ON LY."sku" = spsku
   AND LY."store" = spstor
   AND spyypp BETWEEN 201305 AND 201305      
GROUP BY spstor

Alternatively, which provides clearer code in some circumstance, make both LAST_YEAR and THIS_YEAR common table expressions like this:

WITH 
lastYear AS (                                                 
    SELECT 
        spsku "sku", 
        spstor "store", 
        sum(spales) "sales_ly"   
    FROM SALES                                              
    WHERE spyypp BETWEEN 201205 AND 201205 
    GROUP BY spstor, spsku
),
this year as (
    SELECT 
        spsku "sku", 
        spstor "store", 
        sum(spales) "sales_ly"   
    FROM SALES                                              
    WHERE spyypp BETWEEN 201305 AND 201305 
    GROUP BY spstor, spsku
)                                                                  
SELECT 
    TY.spstor "store", 
    sum(TY.spales) "bom_retail", 
    sum(LY."sales_ly") "sales_ly"
FROM this year TY
FULL OUTER JOIN lastYear LY
    ON LY."sku"   = TY.sku
   AND LY."store" = TY.stor

OTHER TIPS

There seems to be multiple problems. This predicate:

WHERE spyypp BETWEEN 201305 AND 201305 

is probably eliminating some of the "outer joined" rows. Those rows are going to have a NULL for spyypp. (The grouping by spsku is a bit odd, but that may actually not be a problem, you're just going to get separate rows... one total where there were matching spsku, and another row where they weren't, but those are all going to get collapsed buy the GROUP BY, so I don't see the point.

If you want to use common table expressions, I think you want to use two, and do the full outer join on those resultsets. I'd use a function that picks up the non-NULL value for non-matches, the ISNULL function is handy for this.

WITH lastYear AS
(
    SELECT 
        spsku,
        spstor,
        sum(spales) AS sales_ly
    FROM SALES
    WHERE spyypp BETWEEN 201205 AND 201205
    GROUP BY spstor, spsku
)
, thisYear AS (
    SELECT 
        spsku,
        spstor,
        SUM(spales) AS sales_ty
    FROM SALES
    WHERE spyypp BETWEEN 201305 AND 201305
    GROUP BY spstor, spsku
)
SELECT ISNULL(thisYear.spstor,lastYear.spstor) AS "store"
     , SUM(TY.sales_ty) AS "bom_retail"
     , SUM(LY.sales_ly) AS "sales_ly"
  FROM thisYear TY
  FULL
 OUTER
  JOIN lastYear LY
    ON LY.spsku = TY.spsku 
   AND LY.store = TY.store
 GROUP
    BY ISNULL(thisYear.spstor,lastYear.spstor)

If that's the resultset you're after, that seems like a whole lot of unnecessary noise. If you aren't concerned with the spsku being returned, and its a full outer join, then this query would return an equivalent resultset:

SELECT r.spstor AS "store"
     , SUM(CASE WHEN r.spyypp BETWEEN 201305 AND 201305 THEN r.spsales END) AS "bom_retail"
     , SUM(CASE WHEN r.spyypp BETWEEN 201205 AND 201205 THEN r.spsales END) AS "sales_ly"
  FROM SALES r
 WHERE r.spyypp BETWEEN 201305 AND 201305
    OR r.spyypp BETWEEN 201205 AND 201205
 GROUP
    BY r.spstor

The "trick" here is using a conditional test, to determine whether an spsales amount should be included in the SUM or not.


If this is actually for MySQL (and not SQL Server), then I'd write it like this:

SELECT r.spstor AS `store`
     , SUM(IF(r.spyypp BETWEEN 201305 AND 201305,r.spsales,NULL)) AS `bom_retail`
     , SUM(IF(r.spyypp BETWEEN 201205 AND 201205,r.spsales,NULL)) AS `sales_ly`
  FROM SALES r
 WHERE r.spyypp BETWEEN 201305 AND 201305
    OR r.spyypp BETWEEN 201205 AND 201205
 GROUP
    BY r.spstor

Thank you all for you suggestions. I did restructure the SQL to have both this year and last year nested inside a with clause. The fatal flaw I was over looking is sku's that existed only in the Last Year dataset were not being included unless I selected/grouped by sku in the main clause.

To solve the issue, I used the following code below. I separately built the data-sets with placeholders for TY/LY sales. I then performed a UNION to combine tables (TY/LY being stored in different columns, and different rows). I stuffed all that in a sub-query. Because I was summing the data (grouping by non-summed fields), this would collapse all rows so it would properly reflect in the desired format.

WITH lastYear AS (                                                
   SELECT sku, store, sum(sales) "sales_ly"  
   FROM DWHLIB.SLSSUMPD                                             
   WHERE spyypp BETWEEN 201205 AND 201205    
   GROUP BY store, sku
),                                                                
thisYear AS (                                                     
   SELECT spsku sku, store, sum(sales) "sales"     
   FROM DWHLIB.SLSSUMPD                                             
   WHERE spyypp BETWEEN 201305 AND 201305    
   GROUP BY store, sku                                        
)
SELECT sum(AY."sales"), sum(AY."sales_ly"), AY."store"                                                                 
FROM (
    SELECT sum(TY."sales") "sales", 0 "sales_ly", TY."store"
    FROM thisYear TY GROUP BY TY."store"                              
    UNION ALL                                                         
    SELECT  0 "sales", sum(LY."sales_ly") "sales_ly", LY."store"
    FROM lastYear LY
    GROUP BY LY."store"
) AY
GROUP BY "store"    
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top