How to SELECT “closest” rows from another table?
-
30-01-2021 - |
質問
I have two tables, map1
and map2
, there are multiple possible combinations between the columns map1.id1
and map2.id2
.
I have tried the below query:
SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)),map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
Current output
Multiple rows for id1
:
id1 id2 min g2
---- ---- ---------------- ----------------------------------------------------------------------------------------------------------------------------
6116 338 1.8122154049353 "0102000020E610000002000000590E3EDEF5AB23409255B6B4BF1B4A4031197DBBDBAB23404AF663EEB51B4A40"
6116 645 1.82162999509807 "0102000020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
6116 674 1.82397666934862 "0102000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
65 695 1.22999509807 "01456020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
65 689 1.556666934862 "0202000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
-- many more ...
Desired output
I wanted to SELECT only 1st 1 or 2 rows for for each id1
- as defined by the minimum Hausdorff distance:
id1 id2 min g2
---- ---- ---------------- --------------------------------------------------------------------------------------------------------------------------
6116 338 1.8122154049353 "0102000020E610000002000000590E3EDEF5AB23409255B6B4BF1B4A4031197DBBDBAB23404AF663EEB51B4A40"
6116 645 1.82162999509807 "0102000020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
65 695 1.22999509807 "01456020E61000000300000057900B73277D2340A1675831011E4A4094C55801197D23406A204C40021E4A40B0CFF7AE9C7C234079AAE8B4131E4A40"
65 689 1.556666934862 "0202000020E610000002000000AC0E6F8C53B723405B80118F1F1E4A404EFF48C78BB723406C9159620A1E4A40"
Related answer on gis.SE to illustrate the term "Hausdorff distance":
解決
This would achieve it:
SELECT m1.id1, m2.*
FROM map1 m1
CROSS JOIN LATERAL (
SELECT ST_HausdorffDistance(m1.g1, m2.g2) AS h_dist, m2.id2, m2.g2
FROM map2 m2
WHERE ST_HausdorffDistance(m1.g1, m2.g2) < 2
ORDER BY 1, 2
LIMIT 2
) m2;
Returns 1 or 2 rows for every row in map1
, extended with the top 2 corresponding row(s) in map2
(as defined by minimum Hausdorff distance) and the said Hausdorff distance between them. If there is no row with Hausdorff distance < 2 in map2
, no row is returned.
Key element is the LATERAL
subquery. There are variants of this query, depending on exact (missing) requirements. Related:
- OFFSET and LIMIT on complex query
- How to make DISTINCT ON faster in PostgreSQL?
- How to speed up querying last values in a time series?
- Optimise a LATERAL JOIN query on a big table
I wouldn't know of any way to use an index here. So this is going to be an expensive query.
他のヒント
In postgres you can use LIMIT to limit the number of rows returned.
SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)),map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
LIMIT 1
Adding that to the query will limit it to how many rows you specify and the example I provided will just return the first row.
To get your desired output, this might work. However, your result output does not match your explanation.
with a as (
SELECT map1.id1, map2.id2, MIN(ST_HausdorffDistance(map1.g1, map2.g2)) min,map2.g2
FROM map1, map2
WHERE ST_HausdorffDistance(map1.g1, map2.g2) < 2
GROUP BY map1.id1,map2.g2,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
ORDER BY map1.id1,map2.id2, ST_HausdorffDistance(map1.g1, map2.g2)
), b as (
SELECT id1, MIN(min) min
FROM a
GROUP BY id1
)
SELECT a.*
FROM a
INNER JOIN b on a.min = b.min AND a.id1 = b.id1
INNER JOIN map1 on map1.id1 = a.id1
INNER JOIN map2 on map2.id2 = a.id2
ORDER BY a.id1,a.id2, ST_HausdorffDistance(map1.g1, map2.g2)
;