How can you recursively query across 2 columns?

https://dba.stackexchange.com/questions/284572

15-03-2021
|

Question

Sorry my DBA skills are a bit limited, but here's an example schema (simplified from my real table)

create table test
(
    id          serial primary key,
    person      TEXT,
    start_point TEXT,
    end_point   TEXT
);

and here's some sample data

insert into test(person, start_point, end_point)
values ('Bob', 'a', 'b'),
       ('Bob', 'b', 'c'),
       ('Bob', 'c', 'd'),
       ('Alice', 'a', 'b'),
       ('Alice', 'b', 'c'),
       ('Fred', 'a', 'b'),
       ('Fred', 'c', 'd')
;

Notice how a particular person can go from a->b, then b->c etc.

The Question

How can I SELECT that traverses the start/end and gives me something like this as the output?

| person | place | group_id |
|--------|-------|----------|
| Bob    | a     | 1        |
| Bob    | b     | 1        |
| Bob    | c     | 1        |
| Bob    | d     | 1        |
| Alice  | a     | 2        |
| Alice  | b     | 2        |
| Alice  | c     | 2        |
| Fred   | a     | 3        |
| Fred   | b     | 3        |
| Fred   | c     | 4        |
| Fred   | d     | 4        |

Notice how Bob went thru a-b-c-d, Alice went thru a-b-c, but Fred didn't make it all the way thru so he gets split out into an a-b then a c-d, because he was missing the b-c. group_id can just be from a sequence.

What I have tried so far What I'm currently doing, is selecting the entire dataset into a CURSOR, then doing a LOOP over it, checking if the start_point matches the end_point of the previous row, and the person matches, whilst this condition is true group them up and update a table.

When we don't find a match, assume it's for a new person and start counting again.

This is incredibly inefficient when the table grows, and I'm in a multithreaded application so I'd much rather be able to do an atomic UPDATE...SELECT rather than opening a cursor and looping through it.

Thanks

Solution

It isn't exactly clear what the group_id represents but for the traversing of the points this can be handled with a recursive query like so:

WITH RECURSIVE movement(id, person, place, end_point, level) as
(
 select id,person, start_point, end_point, 1
 from test
 union
 select p.id, c.person, c.end_point, c.end_point
  , case when p.id=c.id then p.level else p.level + 1 end
 from test c 
 inner join movement p on (c.start_point=p.end_point or c.start_point=p.place)
 and c.person=p.person
), movementSum as
(
  select id,person,place,level
  from movement m
  where m.id = (select min(id) from movement t where t.person=m.person and m.place = t.place)   
)

select *
from movementSum ms
order by id, place, level

Once you share more about group_id perhaps the exact sql can be produced. Use this sql fiddle to decide what group_id should represent: http://sqlfiddle.com/#!17/cc042/2

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange