How can I make an SQL statement that finds unassociated records?
-
21-09-2019 - |
Question
I have two tables as follows:
tblCountry (countryID, countryCode)
tblProjectCountry(ProjectID, countryID)
The tblCountry
table is a list of all countries with their codes and the tblProjectCountry
table associates certain countries with certain projects. I need an SQL statement that gives me a list of the countries with their country code that do NOT have an associated record in the tblProjectCountry
table. so far I got to here:
SELECT tblCountry.countryID, tblCountry.countryCode
FROM tblProjectCountry INNER JOIN
tblCountry ON tblProjectCountry.countryID = tblCountry.countryID
WHERE (SELECT COUNT(ProjectID)
FROM tblProjectCountry
WHERE (ProjectID = 1) AND (countryID = tblCountry.countryID)) = 0
The above statement parses as correct but doesn't give the exact result I'm looking for. Can anyone help?
Solution
Does this work?
SELECT countryID, countryCode
FROM tblCountry
WHERE countryID NOT IN ( SELECT countryID FROM tblProjectCountry )
OTHER TIPS
Another alternative:
SELECT outerTbl.countryID, outerTbl.countryCode
FROM tblCountry AS outerTbl
WHERE NOT EXISTS
(
SELECT countryID FROM tblProjectCountry WHERE countryID = outerTbl.countryID
)
This uses what's called a correlated subquery
Note that I also make use of the EXISTS keyword (see also)
On SQL Server, NOT EXISTS is generally thought to be more performant. On other RDMS's your mileage may vary.
There are, at least, two ways to find unassociated records.
1. Using LEFT JOIN
SELECT DISTINCT -- each country only once
tblCountry.countryID,
tblCountry.tblCountry
FROM
tblCountry
LEFT JOIN
tblProjectCountry
ON
tblProjectCountry.countryID = tblCountry.countryID
WHERE
tblProjectCountry.ProjectID IS NULL -- get only records with no pair in projects table
ORDER BY
tblCountry.countryID
As erikkallen mentioned this could perform not well.
2. Using NOT EXISTS
Various version of using NOT EXISTS
or IN
were suggested by rohancragg and others:
SELECT
tblCountry.countryID,
tblCountry.tblCountry
FROM
tblCountry
WHERE
-- get only records with no pair in projects table
NOT EXISTS (SELECT TOP 1 1 FROM tblProjectCountry WHERE tblProjectCountry.countryID = tblCountry.countryID)
ORDER BY
tblCountry.countryID
Depends on your DBMS and size of countries and projects tables both version could perform better.
In my test on MS SQL 2005 there was no significant difference between first and second query for table with ~250 countries and ~5000 projects. However on table with over 3M projects second version (using NOT EXISTS
) performed much, much better.
So like always, it's worth to check both versions.
SELECT ... WHERE ID NOT IN (SELECT ... )