Question

Given a table t:

id | name
------------
1  | abcfug
1  | deffug
1  | hijfug
2  | etc

How can I do something like:

select string_agg(strip_lcs(name), ', ') from t where id = 1

returning:

abc, def, hij

NB I wrote an aggregate function to return lcs if that helps:

CREATE FUNCTION lcs_iterate(_state TEXT, _value TEXT)
RETURNS TEXT
AS
$$
        SELECT  RIGHT($2, s - 1)
        FROM    generate_series(1, LEAST(LENGTH($1), LENGTH($2))) s
        WHERE   RIGHT($1, s) <> RIGHT($2, s)
        UNION ALL
        SELECT  LEAST($1, $2)
        LIMIT 1;
$$
LANGUAGE 'sql';

CREATE AGGREGATE lcs(TEXT) (SFUNC = lcs_iterate, STYPE = TEXT);
Was it helpful?

Solution

Your aggregate function is smart and fast, but there is a bug. If one string matches the tail of another completely, the UNION ALL part kicks in to return LEAST($1, $2). That must instead be something like CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END. Test with 'match' and 'aa_match'. (See fiddle below.)

Plus, make the function IMMUTABLE STRICT:

CREATE OR REPLACE FUNCTION lcs_iterate(_state text, _value text)
  RETURNS text AS
$func$
SELECT  right($2, s - 1)
FROM    generate_series(1, least(length($1), length($2))) s
WHERE   right($1, s) <> right($2, s)

UNION   ALL
SELECT  CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END  -- !
LIMIT  1;
$func$ LANGUAGE sql IMMUTABLE STRICT;  -- !

NULL values are ignored and empty strings lead to zero-length common suffix. You may want to treat these special cases differently ...

While we only need the length of the common suffix, a very simple FINALFUNC returns just that:

CREATE AGGREGATE lcs_len(text) (
   SFUNC = lcs_iterate
 , STYPE = text
 , FINALFUNC = length()  -- !
   );

Then your query can look like:

SELECT string_agg(trunc, ', ') AS truncated_names
FROM  (
   SELECT left(name, -lcs_len(name) OVER ()) AS trunc
   FROM   tbl
   WHERE  id = 1
   ) sub;

.. using the custom aggregate as window function.

db<>fiddle here

I also tested with Postgres 9.4, and it should work with your outdated Postgres 9.1, but that's too old for me to test. Consider upgrading to a current version.

Related:

OTHER TIPS

Given that you have an aggregate that finds the longest common suffix

WITH x AS( 
    SELECT left(name,length(name)-length(lcs(name) over ())) AS s 
    FROM t WHERE id = 1
)
SELECT string_agg(s,', ') FROM x;
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top