Strip longest common suffix and concat in Postgres
-
16-01-2021 - |
Question
Given a table t
:
id | name
------------
1 | abcfug
1 | deffug
1 | hijfug
2 | etc
How can I do something like:
select string_agg(strip_lcs(name), ', ') from t where id = 1
returning:
abc, def, hij
NB I wrote an aggregate function to return lcs if that helps:
CREATE FUNCTION lcs_iterate(_state TEXT, _value TEXT)
RETURNS TEXT
AS
$$
SELECT RIGHT($2, s - 1)
FROM generate_series(1, LEAST(LENGTH($1), LENGTH($2))) s
WHERE RIGHT($1, s) <> RIGHT($2, s)
UNION ALL
SELECT LEAST($1, $2)
LIMIT 1;
$$
LANGUAGE 'sql';
CREATE AGGREGATE lcs(TEXT) (SFUNC = lcs_iterate, STYPE = TEXT);
Solution
Your aggregate function is smart and fast, but there is a bug. If one string matches the tail of another completely, the UNION ALL
part kicks in to return LEAST($1, $2)
. That must instead be something like CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END
. Test with 'match' and 'aa_match'. (See fiddle below.)
Plus, make the function IMMUTABLE STRICT
:
CREATE OR REPLACE FUNCTION lcs_iterate(_state text, _value text)
RETURNS text AS
$func$
SELECT right($2, s - 1)
FROM generate_series(1, least(length($1), length($2))) s
WHERE right($1, s) <> right($2, s)
UNION ALL
SELECT CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END -- !
LIMIT 1;
$func$ LANGUAGE sql IMMUTABLE STRICT; -- !
NULL values are ignored and empty strings lead to zero-length common suffix. You may want to treat these special cases differently ...
While we only need the length of the common suffix, a very simple FINALFUNC
returns just that:
CREATE AGGREGATE lcs_len(text) (
SFUNC = lcs_iterate
, STYPE = text
, FINALFUNC = length() -- !
);
Then your query can look like:
SELECT string_agg(trunc, ', ') AS truncated_names
FROM (
SELECT left(name, -lcs_len(name) OVER ()) AS trunc
FROM tbl
WHERE id = 1
) sub;
.. using the custom aggregate as window function.
db<>fiddle here
I also tested with Postgres 9.4, and it should work with your outdated Postgres 9.1, but that's too old for me to test. Consider upgrading to a current version.
Related:
OTHER TIPS
Given that you have an aggregate that finds the longest common suffix
WITH x AS(
SELECT left(name,length(name)-length(lcs(name) over ())) AS s
FROM t WHERE id = 1
)
SELECT string_agg(s,', ') FROM x;