Rewrite 2022
I expect your added solution to perform poorly, as it's doing a of of unnecessary work. The following should be much faster.
The question and the added solution do not define which row to pick when there are multiple with the same dob
. Typically you'll want a deterministic pick. This query pick the alphabetically first name from each group of peers with the same dob
. Adapt to your needs.
UPDATE person p
SET younger_sibling_name = y.name
, younger_sibling_dob = y.dob
FROM (
SELECT dob, name, lead(dob) OVER (ORDER BY dob) AS next_dob
FROM (
SELECT DISTINCT ON (dob)
dob, name
FROM person p
ORDER BY dob, name -- ①
) sub
) y
WHERE p.dob = y.next_dob;
db<>fiddle here - with extended test case
Works since at least Postgres 8.4.
Needs an index on dob
to be fast, ideally a multicolumn index on (dob, name)
.
Subquery sub
passes over the whole table once and distills distinct rows per dob
.
① I added name
to ORDER BY
as tiebreaker to pick the row with the alphabetically first name. Adapt to our needs.
In the outer SELECT
add the next later dob
(next_dob
) to each row with lead()
- simple now with distinct dob
. Then join to that next_dob
and the rest is simple.
If no younger person exists, no UPDATE
happens and the columns stay NULL
.
About DISTINCT ON
and possibly faster query techniques for many duplicates:
Taking dob
and name
from the same row guarantees we stay in sync. Multiple correlated subqueries would not offer this guarantee, and would be more expensive anyway.
Original answer
Still valid.
Old query 1
WITH cte AS (
SELECT *, dense_rank() OVER (ORDER BY dob) AS drk
FROM person
)
UPDATE person p
SET younger_sibling_name = y.name
, younger_sibling_dob = y.dob
FROM cte x
JOIN (SELECT DISTINCT ON (drk) * FROM cte) y ON y.drk = x.drk - 1
WHERE x.pid = p.pid;
Old sqlfiddle
In the CTE cte
use the window function dense_rank()
to get a rank without gaps according to the dop
for every person.
Join cte
to itself, but remove duplicates on dob
from the second instance. Thereby everybody gets exactly one UPDATE
. If more than one person share the same dop
, the same one is selected as younger sibling for all persons on the next dob
. I do this with:
(SELECT DISTINCT ON (rnk) * FROM cte)
Add ORDER BY rnk, ...
to this subquery to pick a particular person for every dob
.
Old query 2
WITH cte AS (
SELECT dob, min(name) AS name
, row_number() OVER (ORDER BY dob) rn
FROM person p
GROUP BY dob
)
UPDATE person p
SET younger_sibling_name = y.name
, younger_sibling_dob = y.dob
FROM cte x
JOIN cte y ON y.rn = x.rn - 1
WHERE x.dob = p.dob;
Old sqlfiddle
This works, because aggregate functions are applied before window functions. And it should be very fast since both operations agree on the sort order.
Obviates the need for a later DISTINCT
like in query 1.
Result is the same as query 1, exactly.
Again, you can add more columns to ORDER BY
to pick a particular person for every dob
.