How to recursively update a same string?

https://dba.stackexchange.com/questions/255805

20-02-2021
|

Question

I have a table with a column containing names like this:

id, employee
1, Mr. John Cole Thornton
2, Mr. Paul George Mckenzie
3, Mr. George Mick McDoughal
4, Ms. Emily Suzan Flemming
5, Mr. Alan Bourdillion Traherne

I have a second table with a list of first names, like this

id, first_name
1, Emily
2, John
3, George
4, Suzan
5, Paul
6, Alan
7, Mary
8, Mick
9, Bourdillion
10, Jim
11, Cole

And I want to remove the first names in the first table, in order to obtain this:

id, employee
1, Mr. Thornton
2, Mr. Mckenzie
3, Mr. McDoughal
4, Ms. Flemming
5, Mr. Traherne

No matter how many first names, I would like to remove them all without calling several times my first_names table, and I wonder if this would be possible without using a loop in a function.

I have tried a query like this :

WITH RECURSIVE name AS (   SELECT REPLACE(t1.employee, t2.first_name, '') sec_name 
                        FROM  t1, t2 
                        WHERE position(t2.first_name in t1.employee) > 0 ) 
SELECT sec_name FROM name ;

But I get as many output as there are first names in the input, like :

Id, sec_name
1, John Thornton
1, Cole Thornton
2, Paul Mckenzie
2, George Mckenzie
...

My Postgres version is 9.6 .

Any help will be much appreciated !

Solution

WITH RECURSIVE cte AS (
    SELECT employee, 1 id
    FROM t1
    UNION ALL
    SELECT REPLACE(employee, first_name, ''), id+1
    FROM cte
    JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte 
WHERE id > ( SELECT MAX(id) 
             FROM t2 )

fiddle

If t2.id values do not start with 1 or have omissions, you must re-enumerate t2 records in CTE using ROW_NUMBER().

UPDATE.

Possible problem - some first_name is a substring of some employee (John and Johnson, for example). If so use spaces as additional wrappers:

WITH RECURSIVE cte AS (
SELECT employee || ' ' employee, 1 id
FROM t1
UNION ALL
SELECT REPLACE(employee, ' ' || first_name || ' ', ' '), id+1
FROM cte
JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte 
WHERE id > ( SELECT MAX(id) 
             FROM t2 )

fiddle

PS. Trim excess trailing space if needed.

OTHER TIPS

You may use regexp_replace with first names presented in an alternation, to be replaced by an empty string. No recursivity is needed in that case.

The model is:

SELECT regexp_replace(fullname,
  '\m(firstname1|firstname2|firstname3|...)\M ', -- note the ending space!
   '',
   'g')
from...

\m and \M match at word boundaries, ensuring that partial name matches don't occur. The space at the end is meant to avoid matching the last name if it happens to coincide with a first name in the list. It also works when there is a single first name rather than two, even if your sample data has always two.

If there is any chance that first names might contain non-alphabetic characters that are special to regular expressions, they'd need to be quoted with backslashes, like this:

CREATE FUNCTION quote_meta(text) RETURNS text AS $$
  select regexp_replace($1, '([\[\]\\\^\$\.\|\?\*\+\(\)])', '\\\1', 'g');
$$ language sql strict immutable;

Then the alternation can be formed by aggregating all first names like this:

SELECT string_agg(quote_meta(first_name), '|') FROM table

Finally a global update in your table could plausibly be done in a single pass by combining the above pieces into a query like this:

WITH replacement AS (
 SELECT id,
     regexp_replace(employee,
        concat (
          '\m(',
          (SELECT string_agg(quote_meta(first_name), '|') FROM table_first_name),
          ')\M ' -- note the ending space!
     ),
     '',
     'g') AS newval
   FROM table_employees
)
UPDATE table_employees
   SET employee = newval
  FROM replacement
 WHERE replacement.id = table_employees.id
   AND employee <> newval;

Warning: this is untested.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange