Question

I have a CTE in which I am finding duplicate records matching on 5 columns:

    ;WITH DuplicateCount AS
    (
    SELECT 
                   FirstName, 
                   LastName, 
                   DateofBirth,  
                   Email,  
                   c1.Status, 
                   Count(*) AS TotalCount
    FROM Customer c
    INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
    GROUP BY   FirstName, LastName, DateofBirth, Email, c1.Status
    HAVING COUNT(*) > 1
    )

I am then selecting Status and TotalCount from that CTE and joining an Enum table to produce readable data

;WITH DuplicateCount AS
(
SELECT 
               FirstName, 
               LastName, 
               DateofBirth,  
               Email,  
               c1.Status, 
               Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY   FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)

SELECT e.Display, dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index

In this scenario, I am able to pull back readable data and use Excel to spit out a graph report of duplicates by Status.

Problem

I need to join the Customer_1 table once again to gather one more column: Stage. Here is how I tried to do it:

;WITH DuplicateCount AS
(
SELECT         customerID,
               FirstName, 
               LastName, 
               DateofBirth,  
               Email,  
               c1.Status, 
               Count(*) AS TotalCount
FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY   customerID, FirstName, LastName, DateofBirth, Email, c1.Status
HAVING COUNT(*) > 1
)

SELECT e.Display, 
       CASE 
            WHEN c1.Stage = 6 THEN 'First'
            WHEN c1.Stage = 7 THEN 'Second'
            WHEN c1.Stage = 8 THEN 'Third'
            WHEN c1.Stage = 11 THEN 'Fourth'
            WHEN c1.Stage = 9 THEN 'Fifth'
            WHEN c1.Stage = 10 THEN 'Sixth'
            WHEN c1.Stage = 12 THEN 'Unknown'
            ELSE ''
       END AS Stage,
       dc.TotalCount
FROM DuplicateCount dc
INNER JOIN Enum e ON dc.Status = e.Index
INNER JOIN Customer_1 c1 ON c1.customerID = dc.customerID

Obviously, that didn't work because none of my records will have duplicate PKs.

Is there a way to join a table to my CTE without a PK? Or somehow add a PK to my CTE without grouping by it?

Edit: This is what I am trying to achieve

FirstName LastName Stage Total Count
John Smith First 2
John Smith Third 2
Alex Smith First 2
Jane Smith Third 2
Jane Smith First 2
Jack Smith Second 2

Then, when reporting on this data:

  • John Smith has 4 total records. Two in First, two in Third

  • Alex Smith has 2 total records. Two in First

  • Jane Smith has 4 total records. Two in First and two in Third

  • Jack Smith has 2 total records. Two in Second.

When graphing this data, I should be able to see:

  • First: 6 total.

  • Second: 2 total.

  • Third: 4 total.

Ideally, I could then also bring in CreatedDate and begin to gather data-over-time reports for:

  • How many duplicates per Stage.

  • How many duplicates per Person.

  • How many duplicates for specific date ranges, events, etc.

Was it helpful?

Solution

The cardinality of the two sets of data don't match. By that I mean the first set of data with the identified duplicates in is aggregated data across a number of customers (without identifying any customers). You can't then take the multiple separate Customer IDs and attribute them back to the aggregated rows.

I think what you need to do is re-frame what you are trying to get out of your data and work backwards. Post an example set of results that you are trying to achieve.

UPDATE:

It seems you want a list of customer\stage groups with counts?:

SELECT     customerID,
           FirstName, 
           LastName, 
           DateofBirth,  
           Email,  
           c1.Status, 
       CASE 
        WHEN c1.Stage = 6 THEN 'First'
        WHEN c1.Stage = 7 THEN 'Second'
        WHEN c1.Stage = 8 THEN 'Third'
        WHEN c1.Stage = 11 THEN 'Fourth'
        WHEN c1.Stage = 9 THEN 'Fifth'
        WHEN c1.Stage = 10 THEN 'Sixth'
        WHEN c1.Stage = 12 THEN 'Unknown'
        ELSE ''
   END AS Stage,
           Count(*) AS TotalCount

FROM Customer c
INNER JOIN Customer_1 c1 ON c1.customerID = c.customerID
GROUP BY   customerID, FirstName, LastName, DateofBirth, Email, c1.Status, c1.Stage
HAVING COUNT(*) > 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top