What is the difference between GROUP BY and DISTINCT?
Question
I have the table with the following data
empid empname deptid address
--------------------------------
aa76 John 6 34567
aa75 rob 4 23456
aa71 smith 3 12345
aa74 dave 2 12345
a77 blake 2 12345
aa73 andrew 3 12345
aa90 sam 1 12345
aa72 will 6 34567
aa70 rahul 5 34567
I've used the following queries:
select deptid, EMPID ,EMPNAME ,ADDRESS
from mytable
group by 1,2,3,4
Which gives the result:
deptid empid empname address
------------------------------
1 aa90 sam 12345
2 aa74 dave 12345
2 aa77 blake 12345
3 aa71 smith 12345
3 aa73 andrew 12345
4 aa75 rob 23456
5 aa70 rahul 34567
6 aa76 John 34567
6 aa72 will 34567
And for the query:
select distinct (deptid),EMPID,EMPNAME,ADDRESS
from mytable
The result set is:
deptid empid empname address
----------------------------
1 aa90 sam 12345
2 aa74 dave 12345
2 aa77 blake 12345
3 aa71 smith 12345
3 aa73 andrew 12345
4 aa75 rob 23456
5 aa70 rahul 34567
6 aa72 will 34567
6 aa76 John 34567
In the second query though I've given DISTINCT
for DEPTID, how come I got the duplicate DEPTID...
Could you explain this?
Solution
DISTINCT
refer to distinct records as a whole, not distinct fields in the record.
OTHER TIPS
DISTINCT
works only on the entire row. Don't be mislead into thinking SELECT DISTINCT(A), B
does something different. This is equivalent to SELECT DISTINCT A, B
While group by all columns and distinct will give you the same results in Teradata, they have different algorithms behind the scenes and you will generally get better performance from using group by than from using distinct. I believe there were plans to have both implemented the same way, but they are still different in the version I'm using (v2r6) and I haven't tried on Teradata 12 yet.
Group By and Distinct both will work Same. Comparing to Distinct Group By Gives good performance because it processes less rows and occupies less spool memory
Distinct will not work fine with multi column. though given distinct on single column but it gives the unique combination of specified columns.
So, Group by gives the unique records and can do aggregates too.
I don't know how to explain the difference but I give you the examples _with_queries_ through this you can better understand the difference between GROUP BY
and DISTINCT
.
Question: How many people are in each unique state in the customers table
select distinct(state), count(*) from customers;
RESULT
Washington 17
----------------------------------------------------------
select State, count(*) from customers GROUP BY STATE;
RESULT
**Arizona 6
Colorado 2
Hawaii 1
Idaho 1
North Carolina 1
Oregon 2
Sourth Carolina 1
Washington 2
Wisconsin 1**
Just make your own table and check the result