Question

Does anyone have any handy algorithms that could be used to reduce the number of geo-points ?

I am using a list of 2,000,000 postcodes which come with their own geo-point. I am using them to collect data from an API to be used offline. The program is written in C++.

I have to go through each postcode, calculate a bounding box based on the postcodes location, and then send it to the API which gives me some data near to that postcode.

However 2,000,000 is a lot to process and some of the postcodes are next to each other or close enough to each other that they would share some of the same data.

So far I've came up with two ways I could reduce them but I am not sure if they would work:

1 - Program uses data structure to record which postcode overlaps which and then run a routine a few time to removes the ones that have overlaps one by one until we are left without ones without overlapping postcodes.

  1. Start at the top left geo point of the UK and slowly increment it the rough size of a postcode area until we have covered the entire UK.

Is there a easy way to reduce these number of postcodes so that I have few of them overlapping as possible ? whilst still making sure I get data covering as much of the UK as possible ? I was thinking there may be an algorithm handy for this, that people use else where.

Was it helpful?

Solution

You can use a quadtree especially a quadkey. A quadkey plot the points along a curve. It's similar to sort the points into a grid. Then you can traverse the grid to search deeper in the tree. You can also search around a center point. You can also use a database with a spatial index. It depends how much the data overlap but with a quadtree you can choose the size of the grid.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top