A little fuzzy on getting DISTINCT on one column?

https://stackoverflow.com/questions/1544314

20-09-2019
|

Question

I've seen a few different versions of this question but having difficulty applying it to what I need...

MS SQL Server 2008 query:

SELECT Receipts.ReceiptID, Receipts.UserID, Receipts.UserCardID, FolderLink.ReceiptFolderLinkID
FROM dbo.tbl_ReceiptFolderLnk AS FolderLink 
    INNER JOIN dbo.tbl_Receipt AS Receipts ON FolderLink.ReceiptID = Receipts.ReceiptID

ReceiptID            UserID               UserCardID           ReceiptFolderLinkID
-------------------- -------------------- -------------------- --------------------
1                    1                    3                    2
2                    1                    3                    3
3                    1                    3                    10
4                    1                    4                    11
5                    1                    4                    13
6                    2                    5                    14
3                    1                    3                    15
4                    1                    4                    16

Now I would like to get distinct values on ReceiptID. Using distinct will not work as distinct works on the entire row.

Help would be appreciated.

Thanks!

Solution

If you want all the results columns in your original query and want each ReceiptID to only show up one time, you'll need to use GROUP BY.

Like:

SELECT Receipts.ReceiptID,
    MIN(Receipts.UserID),
    MIN(Receipts.UserCardID),
    MIN(FolderLink.ReceiptFolderLinkID)
FROM dbo.tbl_ReceiptFolderLnk AS FolderLink
INNER JOIN dbo.tbl_Receipt AS Receipts ON FolderLink.ReceiptID = Receipts.ReceiptID
GROUP BY Receipts.ReceiptID

OTHER TIPS

Asking for one of many rows with the same receiptID is ambiguous. How do you want to choose that one row from the others?

I've done this with the MIN or MAX aggregate functions which seems to work pretty well and is far simpler to write than most other methods I've seen.

So it'd be something like this:

SELECT [ColumnA], MIN(ColumnB) AS ColumnB, MIN(ColumnC) AS ColumnC 
FROM MyTable 
GROUP BY ColumnA

In this case ColumnA would be the distinct column.

DISTINCT is distinct for the whole row. Try GROUP BY ReceiptID, of course the values for the other columns may not make sense, because it will just pick one row.

SELECT  Receipts.ReceiptID, Receipts.UserID, Receipts.UserCardID, ReceiptFolderLinkID
FROM    dbo.tbl_Receipt AS Receipts
CROSS APPLY
        (
        SELECT  TOP 1 FolderLink.ReceiptFolderLinkID
        FROM    dbo.tbl_ReceiptFolderLnk FolderLink
        WHERE   FolderLink.ReceiptID = Receipts.ReceiptID
        ORDER BY
                FolderLink.ReceiptFolderLinkID
        ) fl

This will eliminate the need for GROUP BY or DISTINCT which may be quite inoptimal.

Also by changing the ORDER BY clause you can choose which FolderLink you want to return if there are muptiple.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow