What is the difference between GROUP BY and DISTINCT?

https://stackoverflow.com/questions/1720105

sql
teradata

19-09-2019
|

Question

I have the table with the following data

empid   empname deptid   address
--------------------------------
aa76    John     6       34567
aa75    rob      4       23456
aa71    smith    3       12345
aa74    dave     2       12345
a77     blake    2       12345
aa73    andrew   3       12345
aa90    sam      1       12345
aa72    will     6       34567
aa70    rahul    5       34567

I've used the following queries:

select deptid, EMPID ,EMPNAME ,ADDRESS
from mytable
group by 1,2,3,4

Which gives the result:

deptid  empid  empname address
------------------------------
1       aa90   sam      12345
2       aa74   dave     12345
2       aa77   blake    12345
3       aa71   smith    12345
3       aa73   andrew   12345
4       aa75   rob      23456
5       aa70   rahul    34567
6       aa76   John     34567
6       aa72   will     34567

And for the query:

select distinct (deptid),EMPID,EMPNAME,ADDRESS
from mytable

The result set is:

deptid empid empname address   
----------------------------
1      aa90  sam     12345
2      aa74  dave    12345
2      aa77  blake   12345
3      aa71  smith   12345
3      aa73  andrew  12345
4      aa75  rob     23456
5      aa70  rahul   34567
6      aa72  will    34567
6      aa76  John    34567

In the second query though I've given DISTINCT for DEPTID, how come I got the duplicate DEPTID...

Could you explain this?

Solution

DISTINCT refer to distinct records as a whole, not distinct fields in the record.

OTHER TIPS

DISTINCT eliminates repeating rows. GROUP BY groups unique records, and allows you to perform aggregate functions.

DISTINCT works only on the entire row. Don't be mislead into thinking SELECT DISTINCT(A), B does something different. This is equivalent to SELECT DISTINCT A, B

While group by all columns and distinct will give you the same results in Teradata, they have different algorithms behind the scenes and you will generally get better performance from using group by than from using distinct. I believe there were plans to have both implemented the same way, but they are still different in the version I'm using (v2r6) and I haven't tried on Teradata 12 yet.

Group By and Distinct both will work Same. Comparing to Distinct Group By Gives good performance because it processes less rows and occupies less spool memory

Distinct will not work fine with multi column. though given distinct on single column but it gives the unique combination of specified columns.

So, Group by gives the unique records and can do aggregates too.

I don't know how to explain the difference but I give you the examples _with_queries_ through this you can better understand the difference between GROUP BY and DISTINCT.

Question: How many people are in each unique state in the customers table

select distinct(state), count(*) from customers;

RESULT

Washington  17
----------------------------------------------------------

select State, count(*) from customers GROUP BY STATE;

RESULT

**Arizona    6
Colorado         2
Hawaii           1
Idaho            1
North Carolina   1
Oregon           2
Sourth Carolina  1
Washington   2
Wisconsin    1**

Just make your own table and check the result

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow