I am trying to develop a query that counts certain values for multiple tables.
The query works fine when I am taking the aggregate count of a field with just one left join in place. But when I add another left join, the aggregate fields in my results are incorrect. I.E., the counts are plain wrong.
I want to left join two tables to my main table (dbo.rep_profile), and then get a count of certain values within each table. But as soon as I left join the second table, my results are thrown off and some appear wrong, while others appear right.
Here is my code, and beneath it is a better synopsis of my issue:
select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY,
count(ac.ACTIVITY_CONTACT_ID) as count_of_activities,
count(cl.LABEL_ID) as count_of_labels --including public, private, and shared
from dbo.REP_PROFILE rp (nolock)
left join dbo.ACTIVITY_CONTACT ac (nolock) on rp.CONTACT_ID = ac.CONTACT_ID
left join dbo.CONTACT_LABEL cl (nolock) on rp.CONTACT_ID = cl.CONTACT_ID --if this join is removed or commented out, the query return logically correct results
where
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')
/*and rp.UPDATED_BY not in --last update NOT made by our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')*/
group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
having count(ac.ACTIVITY_CONTACT_ID)>0 --record has at least one activity
--or count(cl.LABEL_ID)>0 --record has at least one label
order by rp.CONTACT_ID
If a contact_ID (the primary key I am joining on) appears in both tables which I am joining (both the activity_contact and label_id) then the results of both the count_of_activities aggregate column and count_of_labels aggregate column are incorrect. BUT... if a certain contact_id appears in just ONE of the joined tables, then the aggregate results are correct.
Here is a Venn Diagram of what I am attempting to do with all my left joins leading to the Rep_Profile table:
I am stumped. I don't understand the logical flaw that is causing erroneous aggregate counts.
EDIT Here is my working code, with the new sub-queries in the select
statement
select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY,
(select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID) as count_of_activities,
(select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID) as count_of_labels, --including public, private, and shared
(select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID) as count_of_trades
from dbo.REP_PROFILE rp (nolock) --query gave logical errors when multiple joins were attempted, used sub-queries in Select statement to fix the issue
where
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')
/*the following criteria ensure that the query results will display reps with at least 1 activity, label, or trade. */
and
((select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID)>0 --trades > 0
or (select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID)>0
or (select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID)>0) --labels > 0
group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
order by rp.CONTACT_ID