SQL query to get duplicate record counts based on other factors
-
07-07-2019 - |
Question
I have a table (participants) which has multiple columns that could all be distinct. Two columns that are of special interest in this query are the userID and the programID I have a two part inquery here.
- I want to be able to acquire the list of all userIDs that appear more than once in this table. How do I go about doing it?
I want to be able to acquire the count of all programID's where the same userID appears in multiple programIDs. (I.E. count of programs where same userID appears in 2 programs, count of programs where same USErID appears in 3 programs, etc...) For Example:
programID: prog1 userID: uid1 userID: uid3 userID: uid12 programID: prog2 userID: uid3 userID: uid5 userID: uid14 userID: uid27 programID: prog3 userID: uid3 userID: uid7 userID: uid14 userID: uid30 programID: prog4 userID: uid1
Expected Results: userID count = 2; programs = 3 userID count = 3; programs = 3
Can anyone please help me with this.
my current code for question 1 is:
SELECT
WPP.USERID,
WPI.EMAIL,
WPI.FIRSTNAME,
WPI.LASTNAME,
WPI.INSTITUTION
FROM WEBPROGRAMPARTICIPANTS WPP
INNER JOIN WEBPERSONALINFO WPI
ON WPP.USERID = WPI.USERID
INNER JOIN WEBPROGRAMS WP
ON WPP.PROGRAMCODE = WP.PROGRAMCODE
WHERE
WP.PROGRAMTYPE IN ('1','2','3','4','5','6', '9', '10')
GROUP BY
WPP.USERID,
WPI.EMAIL,
WPI.FIRSTNAME,
WPI.LASTNAME,
WPI.INSTITUTION
HAVING COUNT(WPP.USERID) > 1
ORDER BY WPI.EMAIL
Solution 3
This was an issue on my side, with a logic step that was left out.
OTHER TIPS
1.
select userID , SUM(userID ) AS Count
from Preparations
group by userID where Count > 1
Your query for part one looks good. Here's your query for part 2:
SELECT DISTINCT p1.programID, COUNT(p1.userID) AS Multiple
FROM participants p1
JOIN participants p2 ON p2.userID = p1.userID
GROUP BY p1.userID, programID
ORDER BY Multiple, programID
It lists programID and the number of other programIDs that the same userID appears in for each programID. I think your expected results are wrong for your sample data. It should be:
userID count = 1; programs = 3; userID count = 2; programs = 4; userID count = 3; programs = 3;
You can use the above as a subquery (derived table) if you want to fine tune the results to look more like your expected results.