Question

I have two tables, map1 and map2, there are multiple possible combinations between the columns map1.id1 and map2.id2.

I have tried the below query:

SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)),map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)

Current output

Multiple rows for id1:

id1   id2   min               g2
----  ----  ----------------  ----------------------------------------------------------------------------------------------------------------------------
6116   338  1.8122154049353   "0102000020E610000002000000590E3EDEF5AB23409255B6B4BF1B4A4031197DBBDBAB23404AF663EEB51B4A40"
6116   645  1.82162999509807  "0102000020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
6116   674  1.82397666934862  "0102000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
  65   695  1.22999509807     "01456020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
  65   689  1.556666934862    "0202000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
--  many more ...

Desired output

I wanted to SELECT only 1st 1 or 2 rows for for each id1 - as defined by the minimum Hausdorff distance:

id1   id2   min               g2
----  ----  ----------------  --------------------------------------------------------------------------------------------------------------------------
6116   338  1.8122154049353   "0102000020E610000002000000590E3EDEF5AB23409255B6B4BF1B4A4031197DBBDBAB23404AF663EEB51B4A40"
6116   645  1.82162999509807  "0102000020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
  65   695  1.22999509807     "01456020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
  65   689  1.556666934862    "0202000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"

Related answer on gis.SE to illustrate the term "Hausdorff distance":

Was it helpful?

Solution

This would achieve it:

SELECT m1.id1, m2.*
FROM   map1 m1
CROSS  JOIN LATERAL (
   SELECT ST_HausdorffDistance(m1.g1, m2.g2) AS h_dist, m2.id2, m2.g2
   FROM   map2 m2
   WHERE  ST_HausdorffDistance(m1.g1, m2.g2) < 2
   ORDER  BY 1, 2
   LIMIT  2
   ) m2;

Returns 1 or 2 rows for every row in map1, extended with the top 2 corresponding row(s) in map2 (as defined by minimum Hausdorff distance) and the said Hausdorff distance between them. If there is no row with Hausdorff distance < 2 in map2, no row is returned.

Key element is the LATERAL subquery. There are variants of this query, depending on exact (missing) requirements. Related:

I wouldn't know of any way to use an index here. So this is going to be an expensive query.

OTHER TIPS

In postgres you can use LIMIT to limit the number of rows returned.

SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)),map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
LIMIT 1

Adding that to the query will limit it to how many rows you specify and the example I provided will just return the first row.

https://www.postgresql.org/docs/current/queries-limit.html

To get your desired output, this might work. However, your result output does not match your explanation.

with a as (
    SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)) min,map2.g2
    FROM map1, map2
    WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
    GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
    ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
), b as (
    SELECT id1, MIN(min) min 
    FROM a
    GROUP BY id1
)
SELECT a.* 
FROM a
INNER JOIN b on a.min = b.min AND a.id1 = b.id1
INNER JOIN map1 on map1.id1 = a.id1
INNER JOIN map2 on map2.id2 = a.id2
ORDER BY a.id1,a.id2, ST_HausdorffDistance(map1.g1, map2.g2)
;
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top