How can you recursively query across 2 columns?
-
15-03-2021 - |
Question
Sorry my DBA skills are a bit limited, but here's an example schema (simplified from my real table)
create table test
(
id serial primary key,
person TEXT,
start_point TEXT,
end_point TEXT
);
and here's some sample data
insert into test(person, start_point, end_point)
values ('Bob', 'a', 'b'),
('Bob', 'b', 'c'),
('Bob', 'c', 'd'),
('Alice', 'a', 'b'),
('Alice', 'b', 'c'),
('Fred', 'a', 'b'),
('Fred', 'c', 'd')
;
Notice how a particular person can go from a->b, then b->c etc.
The Question
How can I SELECT
that traverses the start/end and gives me something like this as the output?
| person | place | group_id |
|--------|-------|----------|
| Bob | a | 1 |
| Bob | b | 1 |
| Bob | c | 1 |
| Bob | d | 1 |
| Alice | a | 2 |
| Alice | b | 2 |
| Alice | c | 2 |
| Fred | a | 3 |
| Fred | b | 3 |
| Fred | c | 4 |
| Fred | d | 4 |
Notice how Bob went thru a-b-c-d, Alice went thru a-b-c, but Fred didn't make it all the way thru so he gets split out into an a-b then a c-d, because he was missing the b-c. group_id
can just be from a sequence.
What I have tried so far
What I'm currently doing, is selecting the entire dataset into a CURSOR
, then doing a LOOP
over it, checking if the start_point
matches the end_point
of the previous row, and the person
matches, whilst this condition is true group them up and update a table.
When we don't find a match, assume it's for a new person and start counting again.
This is incredibly inefficient when the table grows, and I'm in a multithreaded application so I'd much rather be able to do an atomic UPDATE...SELECT
rather than opening a cursor and looping through it.
Thanks
Solution
It isn't exactly clear what the group_id represents but for the traversing of the points this can be handled with a recursive query like so:
WITH RECURSIVE movement(id, person, place, end_point, level) as
(
select id,person, start_point, end_point, 1
from test
union
select p.id, c.person, c.end_point, c.end_point
, case when p.id=c.id then p.level else p.level + 1 end
from test c
inner join movement p on (c.start_point=p.end_point or c.start_point=p.place)
and c.person=p.person
), movementSum as
(
select id,person,place,level
from movement m
where m.id = (select min(id) from movement t where t.person=m.person and m.place = t.place)
)
select *
from movementSum ms
order by id, place, level
Once you share more about group_id perhaps the exact sql can be produced. Use this sql fiddle to decide what group_id should represent: http://sqlfiddle.com/#!17/cc042/2