Group by binary operator in PostgreSQL

https://stackoverflow.com/questions/17374289

01-06-2022
|

Question

I'm using PostgreSQL 9.2.4.

Question

I have a table with an ID and a second column of some type. Let's call the type X. I also have a binary function that operates on a pair of Xs and returns a boolean. We'll call this function f. f is transitive; i.e., if f(a,b) and f(b,c) both return true, then f(a,c) will as well.

What I want to do is a get a set of IDs for which all pairs the second column give back true for this binary function. How can I accomplish this?

Performance is not a huge concern; this is part of an import process that will only be run about once annually. The database will not be in use otherwise during this import.

(Relatively) Simple Example

I've created a SQL Fiddle to get things started: http://sqlfiddle.com/#!12/57b97/3. I want to collect IDs by the result of the f function. Keep in mind that in general, f might be more complicated. This is just an example.

The output I'm looking for with this example SQL Fiddle would be something like the following:

{1,3,6}
{2,4}

For example, say we choose any pair of IDs from a single set. Let's say we pick 1 and 3. Then SELECT f((SELECT data FROM temp WHERE id = 1), (SELECT data FROM temp WHERE id = 3)); returns true.

5 doesn't show up anywhere because 'green' is the only string of length 5. It would be fine if I got back duplicates; I can figure out how to clean them up.

Real Situation Details

In reality, my "second column" is a PostGIS GEOMETRY(LINESTRING) and my "binary function" is ST_Equals. So really, I'm searching for a bunch of duplicate line strings. I don't feel like this information is relevant to the question at hand other than showing I can't simplify the problem down to easier to handle operations.

Solution 2

After working with Clodoaldo Neto's answer a while, I finally got it.

WITH matches AS (
    select t1.id id1, t2.id id2
    from temp t1
    inner join temp t2 on t1.id < t2.id
    where f(t1.data, t2.data)
)
SELECT id1 || ARRAY_AGG(id2)
FROM matches
WHERE id1 NOT IN (SELECT DISTINCT id2 FROM matches)
GROUP BY id1

SQL Fiddle: http://sqlfiddle.com/#!12/57b97/14

The CTE comes straight from Clodoaldo Neto's inner query. This is really nice because it also lets me split them and have the lowest ID by itself if I want:

WITH matches AS (select t1.id id1, t2.id id2
                 from temp t1
                 inner join temp t2 on t1.id < t2.id
                 where f(t1.data, t2.data)
                )
SELECT id1, ARRAY_AGG(id2) AS duplicates
FROM matches
WHERE id1 NOT IN (SELECT DISTINCT id2 FROM matches)
GROUP BY id1

OTHER TIPS

Start fiddling with this SQL Fiddle

select
    t1.id id1,
    t1.data data1,
    t2.id id2,
    t2.data data2,
    f(t1.data, t2.data) f
from
    temp t1
    inner join
    temp t2 on t1.id < t2.id
order by t1.id, t2.id

And then go to the final version SQL Fiddle

select array[id1] || array_agg(id2) id2
from (
    select t1.id id1, t2.id id2
    from
        temp t1
        inner join
        temp t2 on t1.id < t2.id
    where f(t1.data, t2.data)
) s
group by id1
order by id1, id2

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow