Question

I am trying to develop a query that counts certain values for multiple tables.

The query works fine when I am taking the aggregate count of a field with just one left join in place. But when I add another left join, the aggregate fields in my results are incorrect. I.E., the counts are plain wrong.

I want to left join two tables to my main table (dbo.rep_profile), and then get a count of certain values within each table. But as soon as I left join the second table, my results are thrown off and some appear wrong, while others appear right.

Here is my code, and beneath it is a better synopsis of my issue:

    select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
count(ac.ACTIVITY_CONTACT_ID) as count_of_activities,
count(cl.LABEL_ID) as count_of_labels --including public, private, and shared

from dbo.REP_PROFILE rp (nolock)
left join dbo.ACTIVITY_CONTACT ac (nolock) on rp.CONTACT_ID = ac.CONTACT_ID
left join dbo.CONTACT_LABEL cl (nolock) on rp.CONTACT_ID = cl.CONTACT_ID --if this join is removed or commented out, the query return logically correct results


where 
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in  --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')
/*and rp.UPDATED_BY not in --last update NOT made by our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')*/

group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY

having count(ac.ACTIVITY_CONTACT_ID)>0 --record has at least one activity
--or count(cl.LABEL_ID)>0 --record has at least one label

order by rp.CONTACT_ID

If a contact_ID (the primary key I am joining on) appears in both tables which I am joining (both the activity_contact and label_id) then the results of both the count_of_activities aggregate column and count_of_labels aggregate column are incorrect. BUT... if a certain contact_id appears in just ONE of the joined tables, then the aggregate results are correct.

Here is a Venn Diagram of what I am attempting to do with all my left joins leading to the Rep_Profile table:

enter image description here

I am stumped. I don't understand the logical flaw that is causing erroneous aggregate counts.

EDIT Here is my working code, with the new sub-queries in the select statement

    select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
(select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID) as count_of_activities, 
(select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID) as count_of_labels, --including public, private, and shared
(select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID) as count_of_trades

from dbo.REP_PROFILE rp (nolock) --query gave logical errors when multiple joins were attempted, used sub-queries in Select statement to fix the issue

where 
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in  --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X') 
/*the following criteria ensure that the query results will display reps with at least 1 activity, label, or trade. */
and 
((select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID)>0 --trades > 0
or (select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID)>0
or (select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID)>0) --labels > 0

group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY

order by rp.CONTACT_ID
Was it helpful?

Solution

Left join on 1:m relationship will produce m times rows.

So, when you add a join to contact_label table, there will be rows in result as many as there is matching rows in contact_label. This will affect the results af aggregations.

select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
count(ac.ACTIVITY_CONTACT_ID) as count_of_activities,
(select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID) as count_of_labels
from dbo.REP_PROFILE rp (nolock)
left join dbo.ACTIVITY_CONTACT ac (nolock) on rp.CONTACT_ID = ac.CONTACT_ID
where...

OTHER TIPS

This having breaks the left part of the left join dbo.ACTIVITY_CONTACT

having count(ac.ACTIVITY_CONTACT_ID) > 0

that dbo.CONTACT_LABEL join brings in duplicate rows from dbo.ACTIVITY_CONTACT
try a distinct

Not sure this is a fix but it might get you there

select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
      ,count(distinct(ac.ACTIVITY_CONTACT_ID)) as count_of_activities
      ,count(cl.LABEL_ID) as count_of_labels --including public, private, and shared
  from dbo.REP_PROFILE rp (nolock)
  join dbo.SP_USER_ROLE ur 
    on ur.user_id = rp.CREATED_BY 
   and ur.ROLE_ID = '00003'
   and rp.CREATED_DT between '2013-06-01' and '2014-01-01'
   and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation 
  join dbo.ACTIVITY_CONTACT ac (nolock) 
    on rp.CONTACT_ID = ac.CONTACT_ID 
  left join dbo.CONTACT_LABEL cl (nolock) 
    on rp.CONTACT_ID = cl.CONTACT_ID --if this join is removed or commented out, the query return logically correct results
 group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
 order by rp.CONTACT_ID
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top