How to SELECT “closest” rows from another table?
-
30-01-2021 - |
문제
I have two tables, map1
and map2
, there are multiple possible combinations between the columns map1.id1
and map2.id2
.
I have tried the below query:
SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)),map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
Current output
Multiple rows for id1
:
id1 id2 min g2
---- ---- ---------------- ----------------------------------------------------------------------------------------------------------------------------
6116 338 1.8122154049353 "0102000020E610000002000000590E3EDEF5AB23409255B6B4BF1B4A4031197DBBDBAB23404AF663EEB51B4A40"
6116 645 1.82162999509807 "0102000020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
6116 674 1.82397666934862 "0102000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
65 695 1.22999509807 "01456020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
65 689 1.556666934862 "0202000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
-- many more ...
Desired output
I wanted to SELECT only 1st 1 or 2 rows for for each id1
- as defined by the minimum Hausdorff distance:
id1 id2 min g2
---- ---- ---------------- --------------------------------------------------------------------------------------------------------------------------
6116 338 1.8122154049353 "0102000020E610000002000000590E3EDEF5AB23409255B6B4BF1B4A4031197DBBDBAB23404AF663EEB51B4A40"
6116 645 1.82162999509807 "0102000020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
65 695 1.22999509807 "01456020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
65 689 1.556666934862 "0202000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
Related answer on gis.SE to illustrate the term "Hausdorff distance":
해결책
This would achieve it:
SELECT m1.id1, m2.*
FROM map1 m1
CROSS JOIN LATERAL (
SELECT ST_HausdorffDistance(m1.g1, m2.g2) AS h_dist, m2.id2, m2.g2
FROM map2 m2
WHERE ST_HausdorffDistance(m1.g1, m2.g2) < 2
ORDER BY 1, 2
LIMIT 2
) m2;
Returns 1 or 2 rows for every row in map1
, extended with the top 2 corresponding row(s) in map2
(as defined by minimum Hausdorff distance) and the said Hausdorff distance between them. If there is no row with Hausdorff distance < 2 in map2
, no row is returned.
Key element is the LATERAL
subquery. There are variants of this query, depending on exact (missing) requirements. Related:
- OFFSET and LIMIT on complex query
- How to make DISTINCT ON faster in PostgreSQL?
- How to speed up querying last values in a time series?
- Optimise a LATERAL JOIN query on a big table
I wouldn't know of any way to use an index here. So this is going to be an expensive query.
다른 팁
In postgres you can use LIMIT to limit the number of rows returned.
SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)),map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
LIMIT 1
Adding that to the query will limit it to how many rows you specify and the example I provided will just return the first row.
To get your desired output, this might work. However, your result output does not match your explanation.
with a as (
SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)) min,map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
), b as (
SELECT id1, MIN(min) min
FROM a
GROUP BY id1
)
SELECT a.*
FROM a
INNER JOIN b on a.min = b.min AND a.id1 = b.id1
INNER JOIN map1 on map1.id1 = a.id1
INNER JOIN map2 on map2.id2 = a.id2
ORDER BY a.id1,a.id2, ST_HausdorffDistance(map1.g1, map2.g2)
;