Question

I have been struggling with marker-clustering problem with 1000+ markers (that should be put on a Google map). I am not very keen on rendering large JSON structures with all the markers, neither I am fond of some complex server "geo"-computations with PostGIS.

The solution I came up with is to divide world map into some sort of hierarchical spatial tree, let's say quad tree, where each point in my db will be assigned with "coordinates" in that tree. These coordinates are strings that have on position_x index_of_tile in tier_x, e.g. '031232320012'. The length of the string depends on number of zoom levels that will be enabled for the front-end map. Basically if a user moves or zooms the map, I'll launch Ajax GET request with the current zoom level and view port coordinates as parameters. Then in back-end I plan to build a string that should point to the "viewport at the given zoom level", e.g. '02113' and I want to find all points that have this prefix ('02113') in the tree coordinates column.

EDIT: I will also need fast GROUP BY, e.g. SELECT count(*) from points GROUP BY left(coordinates, 5);

My question is how to perform these operations as fast as possible? My database is PostgreSQL.

Was it helpful?

Solution

Then in back-end I plan to build a string that should point to the "viewport at the given zoom level", e.g. '02113' and I want to find all points that have this prefix ('02113') in the tree coordinates column.

An ordinary index should perform well on any modern dbms as long as you're looking at the left-most five (or six or seven) characters of a string in an indexed column.

SELECT ...
...
WHERE column_name LIKE '02113%';

In PostgreSQL, you can also build an index on an expression. So you could create an index on the first five characters.

CREATE INDEX your_index_name ON your_table (left(column_name, 5));

I'd expect PostgreSQL's query optimizer to pick the right index if there were three or four like that. (One for 5 characters, one for 6 characters, etc.)


I build a table, and I populated it with a million rows of random data.

In the following query, PostgreSQL's query optimizer did pick the right index.

explain analyze
select s
from coords
where left(s, 5) ='12345';

It returned in 0.1 ms.

I also tested using GROUP BY. Again, PostgreSQL's query optimizer picked the right index.

"GroupAggregate  (cost=0.00..62783.15 rows=899423 width=8) (actual time=91.300..3096.788 rows=90 loops=1)"
"  ->  Index Scan using coords_left_idx1 on coords  (cost=0.00..46540.36 rows=1000000 width=8) (actual time=0.051..2915.265 rows=1000000 loops=1)"
"Total runtime: 3096.914 ms"

An expression like left(name, 2) in the GROUP BY clause will require PostgreSQL to touch every row in the index, if not every row in the table. That's why my query took 3096ms; it had to touch a million rows in the index. But you can see from the EXPLAIN plan that it used the index.

Ordinarily, I'd expect a geographic application to use a bounding box against a PostGIS table to reduce the number of rows you access. If your quad tree implementation can't do better than that, I'd stick with PostGIS long enough to become an expert with it. (You won't know for sure that it can't do the job until you've spent some time in it.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top