SQL SELECT: combining and grouping data between three tables using subqueries
Question
Sorry for a long question and not a very descriptive title, but my problem is very difficult to explain briefly.
I have three database tables:
TABLE A:
AID PK
STATUS VARCHAR
TABLE B:
BID PK
AID FK
CID FK
TABLE C:
CID PK
CREATIONTIME DATE
For each STATUS = 'OK' row in table A I want to find the corresponding row in C which has the latest creation time.
First I can to fetch all rows from table A where STATUS = 'OK'.
Next I can to fetch all corresponding rows from table B.
But how to continue from there?
For example:
select AID, CID from B where AID in (select AID from A where STATUS = 'OK')
could return something like:
AID, CID
1 1
2 2
2 3
3 4
4 5
4 6
Let's say that CID 2 has later creation time than CID 3 and CID 6 is newer than CID 5. This means that the correct result would be rows 1, 2, 4 and 6 in table C.
Is there a way to express this with a query?
EDIT: Sorry that I wasn't specific enough. What I want to get is the CIDs from table C.
EDIT:
I counted returned rows with the different solutions. Results were very interesting - and diversified:
HAINSTECH: 298 473 rows
JMUCCHIELLO: 298 473 rows
RUSS CAM: 290 121 rows
CHRIS: 344 093 rows
TYRANNOSAURS: 290 119 rows
I have not yet had the time to analyse returned rows in depth, but I'd really appreciate views on which of the queries are "broken" and why.
Solution
Something like this, if I've understood you correctly
SELECT
MAX(CREATIONTIME),
A.AID
FROM
A
INNER JOIN
B
ON
A.AID = B.AID
INNER JOIN
C
ON
B.CID = C.CID
WHERE
A.STATUS = 'OK'
GROUP BY
A.AID
EDIT:
I have now checked the following in SQL Server (I would epxect the same outcome in Oracle) and it returns the CID
for the C
record with the Maximum CREATIONTIME
where the STATUS
for the related record in A
id 'OK'
.
SELECT C.CID
FROM
C C
INNER JOIN
B B
ON
C.CID = B.CID
INNER JOIN
(
SELECT
MAX(C.CREATIONTIME) CREATIONTIME,
A.AID
FROM
A A
INNER JOIN
B B
ON
A.AID = B.AID
INNER JOIN
C C
ON
B.CID = C.CID
WHERE
A.STATUS = 'OK'
GROUP BY
A.AID
) ABC
ON B.AID = ABC.AID
AND C.CREATIONTIME = ABC.CREATIONTIME
Demonstrated with the following T-SQL
DECLARE @A TABLE(AID INT IDENTITY(1,1), STATUS VARCHAR(10))
DECLARE @B TABLE(BID INT IDENTITY(1,1), AID INT, CID INT)
DECLARE @C TABLE(CID INT IDENTITY(1,1), CREATIONTIME DATETIME)
INSERT INTO @A VALUES ('OK')
INSERT INTO @A VALUES ('OK')
INSERT INTO @A VALUES ('NOT OK')
INSERT INTO @A VALUES ('OK')
INSERT INTO @A VALUES ('NOT OK')
INSERT INTO @C VALUES ('10 MAR 2008')
INSERT INTO @C VALUES ('13 MAR 2008')
INSERT INTO @C VALUES ('15 MAR 2008')
INSERT INTO @C VALUES ('17 MAR 2008')
INSERT INTO @C VALUES ('21 MAR 2008')
INSERT INTO @B VALUES (1,1)
INSERT INTO @B VALUES (1,2)
INSERT INTO @B VALUES (1,3)
INSERT INTO @B VALUES (2,2)
INSERT INTO @B VALUES (2,3)
INSERT INTO @B VALUES (2,4)
INSERT INTO @B VALUES (3,3)
INSERT INTO @B VALUES (3,4)
INSERT INTO @B VALUES (3,5)
INSERT INTO @B VALUES (4,5)
INSERT INTO @B VALUES (4,1)
INSERT INTO @B VALUES (4,2)
SELECT C.CID
FROM
@C C
INNER JOIN
@B B
ON
C.CID = B.CID
INNER JOIN
(
SELECT
MAX(C.CREATIONTIME) CREATIONTIME,
A.AID
FROM
@A A
INNER JOIN
@B B
ON
A.AID = B.AID
INNER JOIN
@C C
ON
B.CID = C.CID
WHERE
A.STATUS = 'OK'
GROUP BY
A.AID
) ABC
ON B.AID = ABC.AID
AND C.CREATIONTIME = ABC.CREATIONTIME
Results in the following
CID
-----------
3
4
5
EDIT 2:
In response to your comment about each of the statements giving different results, I have ran some of the different answers here through SQL Server 2005 using my test data above (I appreciate you are using Oracle). Here are the results
--Expected results for CIDs would be
--CID
-----------
--3
--4
--5
--As indicated in the comments next to the insert statements
DECLARE @A TABLE(AID INT IDENTITY(1,1), STATUS VARCHAR(10))
DECLARE @B TABLE(BID INT IDENTITY(1,1), AID INT, CID INT)
DECLARE @C TABLE(CID INT IDENTITY(1,1), CREATIONTIME DATETIME)
INSERT INTO @A VALUES ('OK') -- AID 1
INSERT INTO @A VALUES ('OK') -- AID 2
INSERT INTO @A VALUES ('NOT OK')
INSERT INTO @A VALUES ('OK') -- AID 4
INSERT INTO @A VALUES ('NOT OK')
INSERT INTO @C VALUES ('10 MAR 2008')
INSERT INTO @C VALUES ('13 MAR 2008')
INSERT INTO @C VALUES ('15 MAR 2008')
INSERT INTO @C VALUES ('17 MAR 2008')
INSERT INTO @C VALUES ('21 MAR 2008')
INSERT INTO @B VALUES (1,1)
INSERT INTO @B VALUES (1,2)
INSERT INTO @B VALUES (1,3) -- Will be CID 3 For AID 1
INSERT INTO @B VALUES (2,2)
INSERT INTO @B VALUES (2,3)
INSERT INTO @B VALUES (2,4) -- Will be CID 4 For AID 2
INSERT INTO @B VALUES (3,3)
INSERT INTO @B VALUES (3,4)
INSERT INTO @B VALUES (3,5)
INSERT INTO @B VALUES (4,5) -- Will be CID 5 FOR AID 4
INSERT INTO @B VALUES (4,1)
INSERT INTO @B VALUES (4,2)
-- Russ Cam
SELECT C.CID, ABC.CREATIONTIME
FROM
@C C
INNER JOIN
@B B
ON
C.CID = B.CID
INNER JOIN
(
SELECT
MAX(C.CREATIONTIME) CREATIONTIME,
A.AID
FROM
@A A
INNER JOIN
@B B
ON
A.AID = B.AID
INNER JOIN
@C C
ON
B.CID = C.CID
WHERE
A.STATUS = 'OK'
GROUP BY
A.AID
) ABC
ON B.AID = ABC.AID
AND C.CREATIONTIME = ABC.CREATIONTIME
-- Tyrannosaurs
select A.AID,
max(AggC.CREATIONTIME)
from @A A,
@B B,
( select C.CID,
max(C.CREATIONTIME) CREATIONTIME
from @C C
group by CID
) AggC
where A.AID = B.AID
and B.CID = AggC.CID
and A.Status = 'OK'
group by A.AID
-- jmucchiello
SELECT c.cid, max(c.creationtime)
FROM @B b, @C c
WHERE b.cid = c.cid
AND b.aid IN (SELECT a.aid FROM @A a WHERE status = 'OK')
GROUP BY c.cid
-- hainstech
SELECT agg.aid, agg.cid
FROM (
SELECT a.aid
,c.cid
,max(c.creationtime) as maxcCreationTime
FROM @C c INNER JOIN @B b ON b.cid = c.cid
INNER JOIN @A a on a.aid = b.aid
WHERE a.status = 'OK'
GROUP BY a.aid, c.cid
) as agg
--chris
SELECT A.AID, C.CID, C.CREATIONTIME
FROM @A A, @B B, @C C
WHERE A.STATUS = 'OK'
AND A.AID = B.AID
AND B.CID = C.CID
AND C.CREATIONTIME =
(SELECT MAX(C2.CREATIONTIME)
FROM @C C2, @B B2
WHERE B2.AID = A.AID
AND C2.CID = B2.CID);
the results are as follows
--Russ Cam - Correct CIDs (I have added in the CREATIONTIME for reference)
CID CREATIONTIME
----------- -----------------------
3 2008-03-15 00:00:00.000
4 2008-03-17 00:00:00.000
5 2008-03-21 00:00:00.000
--Tyrannosaurs - No CIDs in the resultset
AID
----------- -----------------------
1 2008-03-15 00:00:00.000
2 2008-03-17 00:00:00.000
4 2008-03-21 00:00:00.000
--jmucchiello - Incorrect CIDs in the resultset
cid
----------- -----------------------
1 2008-03-10 00:00:00.000
2 2008-03-13 00:00:00.000
3 2008-03-15 00:00:00.000
4 2008-03-17 00:00:00.000
5 2008-03-21 00:00:00.000
--hainstech - Too many CIDs in the resultset, which CID has the MAX(CREATIONTIME) for each AID?
aid cid
----------- -----------
1 1
1 2
1 3
2 2
2 3
2 4
4 1
4 2
4 5
--chris - Correct CIDs, it is the same SQL as mine
AID CID CREATIONTIME
----------- ----------- -----------------------
1 3 2008-03-15 00:00:00.000
2 4 2008-03-17 00:00:00.000
4 5 2008-03-21 00:00:00.000
I would recommend running each of the given answers against a smaller number of records, so that you can ascertain whether the resultset returned is the expected one.
OTHER TIPS
SQL> create table a (aid,status)
2 as
3 select 1, 'OK' from dual union all
4 select 2, 'OK' from dual union all
5 select 3, 'OK' from dual union all
6 select 4, 'OK' from dual union all
7 select 5, 'NOK' from dual
8 /
Tabel is aangemaakt.
SQL> create table c (cid,creationtime)
2 as
3 select 1, sysdate - 1 from dual union all
4 select 2, sysdate - 2 from dual union all
5 select 3, sysdate - 3 from dual union all
6 select 4, sysdate - 4 from dual union all
7 select 5, sysdate - 6 from dual union all
8 select 6, sysdate - 5 from dual
9 /
Tabel is aangemaakt.
SQL> create table b (bid,aid,cid)
2 as
3 select 1, 1, 1 from dual union all
4 select 2, 2, 2 from dual union all
5 select 3, 2, 3 from dual union all
6 select 4, 3, 4 from dual union all
7 select 5, 4, 5 from dual union all
8 select 6, 4, 6 from dual union all
9 select 7, 5, 6 from dual
10 /
Tabel is aangemaakt.
SQL> select a.aid
2 , max(c.cid) keep (dense_rank last order by c.creationtime) cid
3 , max(c.creationtime) creationtime
4 from a
5 , b
6 , c
7 where b.aid = a.aid
8 and b.cid = c.cid
9 and a.status = 'OK'
10 group by a.aid
11 /
AID CID CREATIONTIME
---------- ---------- -------------------
1 1 30-04-2009 09:26:00
2 2 29-04-2009 09:26:00
3 4 27-04-2009 09:26:00
4 6 26-04-2009 09:26:00
4 rijen zijn geselecteerd.
Select the field you are looking for using a join of all 3 tables and then limit the results to the ones where the CREATIONDATE is the most recent.
SELECT A.AID, C.CID, C.CREATIONTIME
FROM A A, B B, C C
WHERE A.STATUS = 'OK'
AND A.AID = B.AID
AND B.CID = C.CID
AND C.CREATIONTIME =
(SELECT MAX(C2.CREATIONTIME)
FROM C C2, B B2
WHERE B2.AID = A.AID
AND C2.CID = B2.CID);
EDIT: My previous answer was nonsense. This is now a complete rewrite
This is actually a problem which has bugged me throughout my SQL life. The solution I'm going to give you is messy as hell but it works and I'd appreciate anyone either saying "yes this is messy as hell but it's the only way to do it" or say "no, do this...".
I think the unease comes from joining two dates. The way it happens here it's not an issue as they will be an exact match (they have exactly the same root data) but it still feels wrong...
Anyway, breaking this down, you need to do this in two stages.
1) The first is to return a results set [AID], [earliest CreationTime] giving you the earliest creationtime for each AID.
2) You can then use latestCreationTime to pull the CID you want.
So for part (1), I'd personally create a view to do it just to keep things neat. It allows you to test this part and get it working before you merge it with the other stuff.
create view LatestCreationTimes
as
select b.AID,
max(c.CreationTime) LatestCreationTime
from TableB b,
TableC c
where b.CID = c.CID
group by b.AID
Note, we've not taken into account the status at this point.
You then need to join that to TableA (to get the status) and TableB and TableC (to get the CID). You need to do all the obvious links (AID, CID) and also join the LatestCreationTime column in the view to the CreationTime column in TableC. Don't also forget to join the view on AID otherwise where two records have been created at the same time for different A records you'll get issues.
select A.AID,
C.CID
from TableA a,
TableB b,
TableC c,
LatestCreationTimes lct
where a.AID = b.AID
and b.CID = c.CID
and a.AID = lct.AID
and c.CreationTime = lct.LatestCreationTime
and a.STATUS = 'OK'
I'm certain that works - I've tested it, tweaked data, retested it and it behaves. At least it does what I believe it's meant to do.
It doesn't however deal with the possibility of two identical CreationTimes in table C for the same record. I'm guessing that this shouldn't happen however unless you've written sometime that absolutely constrains it it needs to be accounted for.
To do this I need to make an assumption about which one you'd prefer. In this case I'm going to say that if there are two CIDs which match, you'd rather have the higher one (it's most likely more up to date).
select A.AID,
max(C.CID) CID
from TableA a,
TableB b,
TableC c,
LatestCreationTimes lct
where a.AID = b.AID
and b.CID = c.CID
and c.CreationTime = lct.LatestCreationTime
and a.STATUS = 'OK'
group by A.AID
And that, I believe should work for you. If you want it as one query rather than with the view then:
select A.AID,
max(C.CID) CID
from TableA a,
TableB b,
TableC c,
(select b.AID,
max(c.CreationTime) LatestCreationTime
from TableB b,
TableC c
where b.CID = c.CID
group by b.AID) lct
where a.AID = b.AID
and b.CID = c.CID
and c.CreationTime = lct.LatestCreationTime
and a.STATUS = 'OK'
group by A.AID
(I've just embedded the view in the query, otherwise the principal is exactly the same).
There is no need for a subquery, the aggregation to determine the latest cid creation time is straightforward:
SELECT a.aid
,c.cid
,max(c.creationtime) as maxcCreationTime
FROM c INNER JOIN b ON b.cid = c.cid
INNER JOIN a on a.aid = b.aid
WHERE a.status = 'OK'
GROUP BY a.aid, c.cid
If you really don't want the creationtime in your row set, you can just wrap it in a subquery and drop it from the projection:
SELECT agg.aid, agg.cid
FROM (
SELECT a.aid
,c.cid
,max(c.creationtime) as maxcCreationTime
FROM c INNER JOIN b ON b.cid = c.cid
INNER JOIN a on a.aid = b.aid
WHERE a.status = 'OK'
GROUP BY a.aid, c.cid
) as agg
Coding in the web page, please excuse any syntax mistakes. Also, I'm an mssql guy so I hope there is nothing different in the Oracle world for this..
Note that the schema you have provided does not enforce uniqueness of CREATIONTIME per cid. If there are ever two cid values that map to a given aid value with the same creationtime, they will both be outputted. If you rely on the pair of cid,creationtime to be unique, you should enforce it declaratively with a constraint.
Am I missing something? What is wrong with:
EDIT: Okay, I see you actually want to group by aid.
SELECT c.cid FROM b, c,
(SELECT b.aid as aid, max(c.creationtime) as creationtime
FROM b, c
WHERE b.cid = c.cid
AND b.aid IN (SELECT a.aid FROM a WHERE status = 'OK')
GROUP BY b.aid) as z
WHERE b.cid = c.cid
AND z.aid = b.aid
AND z.creationtime = c.creationtime