Can somebody explain the running total and SQL self-join in this tutorial to me?

https://stackoverflow.com/questions/2529367

22-09-2019
|

문제

I was reading over the tutorial here: http://www.1keydata.com/sql/sql-running-totals.html and it all made sense until it suddenly got extremely ridiculously unbelievably complicated when it got to rank, median, running totals, etc. Can somebody explain in plain English how that query results in a running total? Thanks!

해결책

Before I get started, I've not seen this before and it doesn't look like a terribly comprehensible way to accomplish a running total.

Okay, here's the query from the tutorial:

SELECT a1.Name, a1.Sales, SUM(a2.Sales) Running_Total
FROM Total_Sales a1, Total_Sales a2
WHERE a1.Sales <= a2.sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)
GROUP BY a1.Name, a1.Sales
ORDER BY a1.Sales DESC, a1.Name DESC;

And the sample output

Name    Sales   Running_Total
Greg     50     50
Sophia    40    90
Stella    20    110
Jeff      20    130
Jennifer  15    145
John      10    155

The simple part of this query is displaying the sales data for each employee. All we're doing is selecting name and sales from each employee and ordering them by the sale amount (descending). This gives us our base list.

Now for the running total, we want every row that has already been displayed. So, we join the table against itself, on each row that would already have been displayed:

WHERE a1.Sales <= a2.sales or (a1.Sales=a2.Sales and a1.Name = a2.Name)

Then we use the SUM aggregate function and group accordingly. A good way to understand this is if you look at what would happen if you didn't use the group function. The 'Sophia' row would look like this:

Name    A1.Sales    A2.Sales
Sophia  40          50
Sophia    40         40

Notice how we got Greg's sales row? The group will sum that up, and voila!

Hope that helps. Joe

다른 팁

The first table joins to itself, the join resulting in x number of rows, where x is the number of rows that have total sales lower than itself, or the name in the row is the same (i.e. all those sales previous to the row we are looking at, when ordered by sales amount).

It then groups on the fields in the left side of the join and sums the rows we join to, thus a running total. To see how it works, you might want to run it without the sum and grouping, to see the raw results returned.

The SQL above gives a different result on Sybase (ASE 15). I think the reason is that the 'order by' is not applied until display time. Here is the SQL and the result:

drop table Total_Sales
go
create table Total_Sales
(
    Name char(15),
    Sales  int
)

INSERT INTO Total_Sales VALUES( 'John', 10 )
INSERT INTO Total_Sales VALUES( 'Jennifer', 15)
INSERT INTO Total_Sales VALUES('Stella', 20 )
INSERT INTO Total_Sales VALUES('Sophia', 40 )
INSERT INTO Total_Sales VALUES('Greg', 50 )
INSERT INTO Total_Sales VALUES('Jeff', 20 )

SELECT a1.Name, a1.Sales, SUM(a2.Sales) Running_Total 
FROM Total_Sales a1, Total_Sales a2 
WHERE a1.Sales <= a2.Sales or (a1.Sales=a2.Sales and a1.Name = a2.Name) 
GROUP BY a1.Name, a1.Sales 
ORDER BY a1.Sales DESC, a1.Name DESC

Result:

Name           Sales Running_Total   
Greg            50  50   
Sophia          40  90   
Stella          20  130 --note that two running totals are the same! 
Jeff            20  130  
Jennifer        15  145  
John            10  155

Bob

I also get the same incorrect output as Bob above where the running total breaks down at Stella & Jeff, who have the same sales number. I'm using SQL Server 2014 Management Studio Express. I don't think the website's solution is actually correct. I did the join based on name instead of on sales and came up with these, which produce a correct running total:

select a1.name
, a1.sales
, sum(a2.sales) 'running_total'
from #total_sales a1
inner join #total_sales a2 on a1.name <= a2.name 
group by a1.name, a1.sales
order by sum(a2.sales);

Yields:

name      sales  running_total
Stella    20     20
Sophia    40     60
John      10     70
Jennifer  15     85
Jeff      20     105
Greg      50     155

You could also do the variant below if you're uncomfortable sorting on an aggregate. It changes the order, but the running total is still correct:

select a1.name
, a1.sales
, sum(a2.sales) 'running_total'
from #total_sales a1
inner join #total_sales a2 on a1.name >= a2.name 
group by a1.name, a1.sales
order by a1.name;

Yields:

name     sales  running_total
Greg     50     50
Jeff     20     70
Jennifer 15     85
John     10     95
Sophia   40     135
Stella   20     155

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow