How to recursively update a same string?
-
20-02-2021 - |
Question
I have a table with a column containing names like this:
id, employee
1, Mr. John Cole Thornton
2, Mr. Paul George Mckenzie
3, Mr. George Mick McDoughal
4, Ms. Emily Suzan Flemming
5, Mr. Alan Bourdillion Traherne
I have a second table with a list of first names, like this
id, first_name
1, Emily
2, John
3, George
4, Suzan
5, Paul
6, Alan
7, Mary
8, Mick
9, Bourdillion
10, Jim
11, Cole
And I want to remove the first names in the first table, in order to obtain this:
id, employee
1, Mr. Thornton
2, Mr. Mckenzie
3, Mr. McDoughal
4, Ms. Flemming
5, Mr. Traherne
No matter how many first names, I would like to remove them all without calling several times my first_names table, and I wonder if this would be possible without using a loop in a function.
I have tried a query like this :
WITH RECURSIVE name AS ( SELECT REPLACE(t1.employee, t2.first_name, '') sec_name
FROM t1, t2
WHERE position(t2.first_name in t1.employee) > 0 )
SELECT sec_name FROM name ;
But I get as many output as there are first names in the input, like :
Id, sec_name
1, John Thornton
1, Cole Thornton
2, Paul Mckenzie
2, George Mckenzie
...
My Postgres version is 9.6 .
Any help will be much appreciated !
Solution
WITH RECURSIVE cte AS (
SELECT employee, 1 id
FROM t1
UNION ALL
SELECT REPLACE(employee, first_name, ''), id+1
FROM cte
JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte
WHERE id > ( SELECT MAX(id)
FROM t2 )
If t2.id
values do not start with 1 or have omissions, you must re-enumerate t2
records in CTE using ROW_NUMBER().
UPDATE.
Possible problem - some first_name
is a substring of some employee
(John and Johnson, for example). If so use spaces as additional wrappers:
WITH RECURSIVE cte AS (
SELECT employee || ' ' employee, 1 id
FROM t1
UNION ALL
SELECT REPLACE(employee, ' ' || first_name || ' ', ' '), id+1
FROM cte
JOIN t2 USING (id)
)
SELECT REGEXP_REPLACE(employee, ' +', ' ') employee
FROM cte
WHERE id > ( SELECT MAX(id)
FROM t2 )
PS. Trim excess trailing space if needed.
OTHER TIPS
You may use regexp_replace
with first names presented in an alternation,
to be replaced by an empty string. No recursivity is needed in that case.
The model is:
SELECT regexp_replace(fullname,
'\m(firstname1|firstname2|firstname3|...)\M ', -- note the ending space!
'',
'g')
from...
\m
and \M
match at word boundaries, ensuring that partial name matches don't occur.
The space at the end is meant to avoid matching the last name if it happens to
coincide with a first name in the list. It also works when there is a single first name rather than two, even if your sample data has always two.
If there is any chance that first names might contain non-alphabetic characters that are special to regular expressions, they'd need to be quoted with backslashes, like this:
CREATE FUNCTION quote_meta(text) RETURNS text AS $$
select regexp_replace($1, '([\[\]\\\^\$\.\|\?\*\+\(\)])', '\\\1', 'g');
$$ language sql strict immutable;
Then the alternation can be formed by aggregating all first names like this:
SELECT string_agg(quote_meta(first_name), '|') FROM table
Finally a global update in your table could plausibly be done in a single pass by combining the above pieces into a query like this:
WITH replacement AS (
SELECT id,
regexp_replace(employee,
concat (
'\m(',
(SELECT string_agg(quote_meta(first_name), '|') FROM table_first_name),
')\M ' -- note the ending space!
),
'',
'g') AS newval
FROM table_employees
)
UPDATE table_employees
SET employee = newval
FROM replacement
WHERE replacement.id = table_employees.id
AND employee <> newval;
Warning: this is untested.