I need to perform a self join that can result in multiple rows, but I need to limit the join to a single row per record. When multiple rows match the join criteria, only the value row with the maximum PK should be used. Here is a simplified schema, hypothetical:
CREATE TABLE #Records(
Id int NOT NULL,
GroupId int NOT NULL,
Node varchar(10) NOT NULL,
Value varchar(10) NULL,
Meta1 varchar(10) NULL,
Meta2 varchar(10) NULL,
Meta3 varchar(10) NULL
)
Here are some sample inserts:
INSERT INTO #Records VALUES(1,123,'Parent', '888', 'meta1', 'meta2', 'meta3')
INSERT INTO #Records VALUES(2,123,'Guardian', '789', 'meta1', 'meta2', 'meta3')
INSERT INTO #Records VALUES(3,123,'Parent', '999', 'meta1', 'meta2', 'meta3')
INSERT INTO #Records VALUES(4,123,'Guardian', '654', 'meta1', 'meta2', 'meta3')
INSERT INTO #Records VALUES(5,123,'Sibling', '222', 'meta1', 'meta2', 'meta3')
INSERT INTO #Records VALUES(6,456,'Parent', '777', 'meta1', 'meta2', 'meta3')
INSERT INTO #Records VALUES(7,456,'Guardian', '333', 'meta1', 'meta2', 'meta3')
In generic terms, I would want the count of rows returned to equal the number or records in the table. I need a Parent column in a Guardian column. Parent should equal the most recent row, based on Id, that has a Node of 'Parent', for the matching GroupId. I need the same for Guardian, but the Node should be 'Guardian'. Results would look like this:
Id GroupId Node Value Meta1 Meta2 Meta3 Parent Guardian
--- ---------- --------- --------- ------- ------- ------- ------- ----------
1 123 Parent 888 meta1 meta2 meta3 999 654
2 123 Guardian 654 meta1 meta2 meta3 999 654
3 123 Parent 999 meta1 meta2 meta3 999 654
4 123 Guardian 789 meta1 meta2 meta3 999 654
5 123 Sibling 222 meta1 meta2 meta3 999 654
6 456 Parent 777 meta1 meta2 meta3 777 333
7 456 Guardian 333 meta1 meta2 meta3 777 333
Note, I have this partially working now, but it does not limit to the latest value. It works fine when all parent and guardian value nodes have the same value. I was attempting to limit to MAX, but have failed. Looking at this query may bias your judgement, so please don't hesitate to toss it out completely.
SELECT #Records.*, Parent,Guardian
FROM #Records
LEFT JOIN (
SELECT MAX(Id) As ParentRow, GroupId, Value AS Parent
FROM #Records
WHERE Node = 'Parent'
GROUP BY GroupId, Value
) AS Parents
ON #Records.GroupId = Parents.GroupId
LEFT JOIN (
SELECT MAX(Id) As ParentRow, GroupId, Value AS Guardian
FROM #Records
WHERE Node = 'Guardian'
GROUP BY GroupId, Value
) AS Guardians
ON #Records.GroupId = Guardians.GroupId
Thanks in advance!