Question

I am coding a dashboard, and I need to pull some data out of Microsoft SQL Server.

For a simple example, I have three tables, one master Category table, and two tables containing values linked to the Category table via a primary/foreign key relationship (Blue and Green value tables).

Using Microsoft SQL Sever (t-sql), I wish to total (sum) the values in the two value tables, grouped by the common category found in the category table.

Category Table

CategoryID (PK) | CategoryName
1               | Square
2               | Circle

Blue Table

BlueID (PK) | CategoryID (FK) | BlueValue | BlueMonth | BlueYear
1           | 1               | 10        | 6         | 2012
2           | 1               | 20        | 12        | 2012
3           | 2               | 5         | 6         | 2012
4           | 2               | 9         | 12        | 2012
5           | 1               | 12        | 6         | 2013
6           | 1               | 21        | 12        | 2013
7           | 2               | 4         | 6         | 2013
8           | 2               | 8         | 12        | 2013

Green Table

GreenID (PK)| CategoryID (FK) | GreenValue| GreenMonth| GreenYear
1           | 1               | 3         | 6         | 2012
2           | 1               | 6         | 12        | 2012
3           | 2               | 2         | 6         | 2012
4           | 2               | 7         | 12        | 2012
5           | 1               | 2         | 6         | 2013
6           | 1               | 5         | 12        | 2013
7           | 2               | 4         | 6         | 2013
8           | 2               | 8         | 12        | 2013

If I use the following SQL, I get the results I expect.

SELECT
    [Category].[CategoryName],
    SUM([Green].[GreenValue]) AS [GreenTotal]
FROM
    [Category]
LEFT JOIN
    [Green] ON [Category].[CategoryID] = [Green].[CategoryID]
GROUP BY
    [Category].[CategoryName]

Results:

CategoryName | GreenTotal
Square       | 16
Triangle     | 21

However, if I add the Blue table, to try and fetch a total for BlueValue as well, my obviously incorrect T-SQL gives me unexpected results.

SELECT
    [Category].[CategoryName],
    SUM([Green].[GreenValue]) AS [GreenTotal],
    SUM([Blue].[BlueValue]) AS [BlueTotal]
FROM
    [Category]
LEFT JOIN
    [Green] ON [Category].[CategoryID] = [Green].[CategoryID]
LEFT JOIN
    [Blue] ON [Category].[CategoryID] = [Blue].[CategoryID]
GROUP BY
    [Category].[CategoryName]

Incorrect Results:

CategoryName | GreenTotal | BlueTotal
Square       | 64         | 252
Triangle     | 84         | 104

The results all seem to be out by a factor of 4, which is the total number of rows in each value table for each category.

I am aiming to see the following results:

CategoryName | GreenTotal | BlueTotal
Square       | 16         | 63
Triangle     | 21         | 26

I would be over the moon if someone could tell me what on earth I am doing wrong?

Thanks, Mark.

Was it helpful?

Solution 2

What you're getting is a Cartesian product. You can see the effects of this by removing the grouping and looking through the data.

For example; if your green table contained 2 rows and your blue table contained 4, your join would return a total of 8 records.

To resolve the problem, well, you're nearly there. You've got all the right pieces, just not put them together quite right.

Assuming the following query returns the correct results for green:

SELECT CategoryID
     , Sum(GreenValue) As GreenTotal
FROM   Green
GROUP
    BY CategoryID

The results for blue can be retrieved by following the same method:

SELECT CategoryID
     , Sum(BueValue) As BlueTotal
FROM   Blue
GROUP
    BY CategoryID

Now that we have two distinct results that are correct, we should join these results to our category table:

SELECT Category.CategoryName
     , GreenSummary.GreenTotal
     , BlueSummary.BlueTotal
FROM   Category
 LEFT
  JOIN (    
        SELECT CategoryID
             , Sum(GreenValue) As GreenTotal
        FROM   Green
        GROUP
            BY CategoryID
       ) As GreenSummary
    ON GreenSummary.CategoryID = Category.CategoryID
 LEFT
  JOIN (    
        SELECT CategoryID
             , Sum(BlueValue) As BlueTotal
        FROM   Blue
        GROUP
            BY CategoryID
       ) As BlueSummary
    ON BlueSummary.CategoryID = Category.CategoryID

OTHER TIPS

Something like this would be best done with an APPLY in my opinion. Fast performance-wise, simple to use, and easy to control in case of variations in the query.

IE:

SELECT C.[CategoryName], G.[GreenTotal], B.[BlueTotal]
FROM [Category] C
OUTER APPLY (SELECT SUM([GreenValue]) AS [GreenTotal] FROM [Green] WHERE [CategoryID] = C.CategoryID) G
OUTER APPLY (SELECT SUM([BlueValue]) AS [BlueTotal] FROM [Blue] WHERE [CategoryID] = C.CategoryID) B

Create a query for each total table. Group by category, create the sum column and add the column id.

Then use this querys as sub query and make a left outer join with the main table. This can give you the result expected and can have null values when the sum is not available. You can use the isnull function to convert the null values to 0.

I would sum them up first with CTE. Then simply join the 2 together on what is common with only 1 occurrence in each, the CategoryName. This way you can't get a Cartesian product. I put the isnull statement in because there is a possibility that there are no results for a CategoryName in Blue or in Green. If you didn't do this you could get null for your CategoryName.

WITH GREENSUM as (
SELECT
    [Category].[CategoryName],
    SUM([Green].[GreenValue]) AS [GreenTotal]
FROM
    [Category]
LEFT JOIN
    [Green] ON [Category].[CategoryID] = [Green].[CategoryID]
GROUP BY
    [Category].[CategoryName]
),
WITH BLUESUM as (
SELECT
    [Category].[CategoryName],
    SUM([Blue].[BlueValue]) AS [BlueTotal]
FROM
    [Category]
LEFT JOIN
    [Blue] ON [Category].[CategoryID] = [Blue].[CategoryID]
GROUP BY
    [Category].[CategoryName])
SELECT isnull(GREENSUM.CategoryName, BLUESUM.CategoryName) as CategoryName, 
    GreenTotal, BlueTotal 
FROM [GREENSUM] 
FULL OUTER JOIN 
    [BLUESUM] ON [GREENSUM].CategoryName = [BLUESUM].CategoryName)

I also use CTE, find it easier on the eye - but rank the selects internal.

/*
create table Category ( CategoryId Integer, CategoryName nvarchar(50) )
create table Green ( CategoryId Integer, GreenValue Integer )
create table Blue ( CategoryId Integer, BlueValue Integer )

insert into Category VALUES (1,'Square'),(2,'Circle')
insert into Blue VALUES (1,10),(1,20),(2,5),(2,9),(1,12),(1,21),(2,4),(2,8)
insert into Green VALUES (1,3),(1,6),(2,2),(2,7),(1,2),(1,5),(2,4),(2,8)
*/

with CatSums(ColorRank, CategoryId, CategoryValue) as
(
  select 1, CategoryId, GreenValue from Green
  union all
  select 2, CategoryId, BlueValue  from Blue
)

select 
    C.CategoryName, 
    Sum(case when ColorRank = 1 then CategoryValue else 0 end) as GreenTotal,
    Sum(case when ColorRank = 2 then CategoryValue else 0 end) as BlueTotal
 from CatSums S left join Category C on C.CategoryId = S.CategoryId
group by C.CategoryName
  • Although I must profess to increasingly liking the OUTER APPLY solution.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top