Question

I am trying to figure out a more efficient way of writing a query that my company uses. Currently we are using LEFT JOINs, but I feel like that may be a bad way to approach this.

How would you all approach this? I'm trying to familiarize myself with EXISTS and CROSS APPLYs. Maybe this is a situation where I should be using these types of statements.

SELECT  p.people_id ,
        p.date_created ,
        p.last_name ,
        p.first_name ,
        p.middle_name ,
        p.known_as ,
        p.ssn ,
        p.home_phone ,
        p.work_mobile ,
        p.other_phone ,
        p.display_email ,
        s.source ,
        ISNULL(p.address_1, '') AS address_1 ,
        ISNULL(p.address_2, '') AS address_2 ,
        p.city ,
        p.state ,
        p.zip_code ,
        pec.emergency_name ,
        pec.work_phone ,
        pec.emergency_relationship ,
        jc.job_category ,
        et.education_type ,
        pp.part_time_only ,
        pp.perm_job ,
        pp.temp_job ,
        p.applied_online ,
        p.owner_division_id ,
        p.role_id ,
        p.older_18 ,
        p.disclaimer ,
        SUBSTRING(p.ssn, 6, 4) AS L4_ssn ,
        pp.custom_code_4 AS job_title ,
        p.external_id ,
        p.last4 ,
        p.resume_category ,
        rc.resume_category_description ,
        p.home_phone_perm ,
        p.work_mobile_perm
FROM    people p
        LEFT OUTER JOIN lkp_resume_category rc ON p.resume_category = rc.resume_category_id
        LEFT OUTER JOIN people_profile pp ON pp.people_id = p.people_id
        LEFT OUTER JOIN companies_job_titles cjt ON cjt.job_title_id = pp.job_title_1
        LEFT OUTER JOIN lkp_job_categories jc ON jc.job_category_id = pp.job_class_id
        LEFT OUTER JOIN lkp_education_types et ON et.education_type_id = pp.education_id
        LEFT OUTER JOIN lkp_sources s ON pp.source_id = s.source_id
        LEFT OUTER JOIN people_emergency_contacts pec ON p.people_id = pec.people_id
WHERE   ( p.role_id <= 4 )

Results Plan Diagram

No correct solution

OTHER TIPS

There are actually two separate questions being asked here:

  1. Should I be using LEFT JOINs?
  2. How can I make my query more efficient?

I'll answer #2 first because I think it's easier. In your query plan, over 70% of your cost comes from the table scan of the "people" table. So, you can optimize your JOINs all day and still not improve your efficiency much. The critical question is, what percentage of your "people" have a "role_id <= 4"? If it's less than 10%, you have some room to optimize based on how you index; if it's more than about 70%-- that is, if the purpose of this query really is to pull a nearly complete list of everyone in the "people" table-- then you pretty much just have to pay what it costs to do that.

Now, about question #1: so long as the following inferences about your data model are true, then your LEFT JOINs are probably the best way to do what you are trying to do. The inferences are:

  1. A "people" entry has zero-to-one corresponding resume category; that is, people.resume_category_id can be NULL or can have a meaningful value. (If it can have invalid values not found in the parent table, then you have a referential integrity problem and what you need is a foreign key constraint.)
  2. A "people" entry has zero-to-many emergency contacts.
  3. A "people" entry has zero-to-many people profiles.
  4. A "people profile" entry has zero-to-one job titles (as above with resume_category)
  5. A "people profile" entry has zero-to-one job categories (as above)
  6. A "people profile" entry has zero-to-one education type (as above)
  7. A "people profile" entry has zero-to-one source (as above)
  8. You want to list all people regardless of presence or absence of data in any of these other tables

Hope that helps and all the best.

--- EDIT ---

Hey, something has been bothering me about this answer, and I just now figured out what it is. There is an actual problem with your query structure, but it isn't related to your use of LEFT JOINs. It's that you are joining to two different child tables at once, with both having the same parent table of "people". Depending how your data is actually distributed, this would give you a Cartesian product as your resultset. For example, suppose you have a person "Bob" with two profiles ("Work" and "Home") and two emergency contacts ("Alice" and "Carol"). Then a query structured like yours would give:

Person   Profile   Contact
------   -------   -------
Bob      Work      Alice
Bob      Home      Alice
Bob      Work      Carol
Bob      Home      Carol

If the relationships that are structured like zero-to-many can, in fact, have multiple child rows, then the solution depends on how your app is using the data. There are, however, two basic possible approaches:

  1. Separate each zero-to-many JOIN into its own query, so you would have a total of three queries instead of just one.
  2. Use some sort of aggregation operator like FIRST or MAX (slightly sketchy since it can give unpredictable results and/or mix-and-match fields from different rows in the resultset).

As a side note, if the child tables can't have multiple child rows, then you should ensure this by putting a unique constraint onto the "people_id" fields of each of those tables.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top