Assessing OpenStreetMap for international routing

https://stackoverflow.com/questions/4109635

29-09-2019
|

Question

I have been using a commercial solution for route distances and travel times for North America and Western/Mid Europe. I am considering expanding the project to cover other countries - and perhaps the entire world. A very limited budget and patchy regional coverage from individual commercial providers, probably make locally-hosted OpenStreetMap the only viable option. Before someone suggests an online solution, my application requires a lot of intensive route calculation - something which would cost a lot or be very impolite (and probably banned) if performed using a web service. The results of the calculations are put back in the public domain, so rediting OpenStreetMaps is not a problem.

My problem is how do I assess the routing data coverage for individual countries in the OpenStreetMap database? Such an assessment could determine if the project is viable, and a suitable order for processing (ie. do the countries with the best coverage first).

High-end commercial data providers can typically supply statistical descriptions, as well as regional descriptions of surveyed coverage. OpenStreetMap is much more patchy - an area typically includes some roads, but not all roads. Individual location errors of a few metres of even 10-20m will not be a problem for my application (I'm looking at city-city distances), but route graph connectivity is. Ie. the road vectors must logically meet correctly at a junction.

Has anyone attempted to create statistics describing data coverage of the OpenStreetMap database?

If not, how would you go about it?

The best I can think of is to take a random sampling of places (eg. cities), and then attempt to calculate routes. There would have to be an assumption that the major roads will tend to be added before the minor roads. Therefore a route between two distant cities would use the logical major road, and not a minor road (which is typically longer/slower) because the major road is missing.

Another problem would be that it is physically not possible to drive between many towns. Often this is due to the presence of islands (where ferries could be used) but often there is no surface route (eg. settlements in Nunavut). So how would such statistics be used when comparing between (say) Tonga and Afghanistan. Afganistan probably has very low data coverage. Tonga is probably better but the settlements are spread out across an archipelago.

Some details about my application: All start and end points are towns and cities with locations taken from the Geonames database. Typically I am looking at the 1000 largest cities in a country that also have a population of at least 1000. Routes are currently calculated in duplicate as both fastest routes and shortest routes. Reasonable road speeds vary according to broad road categories. Estimated travel times are computed alongside road distances. These details are preferences for consistency- they are not set in stone.

Solution

There's a number of initiatives to describe quality of OpenStreetMap, but they're all confined to a specific area. Muki Haklay has done extensive research into data quality of OpenStreetMap. Many quantitative results pertain to the UK. His blog is a prime resource if you're looking to learn about the quality of OpenStreetMap in general - which is a lot more than just data completeness. Here's his assessment of completeness of OpenStreetMap in the UK. A comparable study has been done for Germany (PDF) recently.

The thing is, to measure completeness, you need an accurate reference dataset to measure against. You could take TeleAtlas or NAVTEQ data for that, but that's expensive data and these companies don't give out their data for research purposes readily. Government data may also be suitable, but is not always available or, as is the case for the US for example, hopelessly outdated and inaccurate. In fact, OpenStreetMap jumpstarted the US mapping effort with a huge import from TIGER, a dataset which was never intended for routing / navigation and is a topology mess. Volunteers are working hard to improve that data, but it's slow progress.

If you want to give generating quantitative quality metrics a go yourself, your best bet engage with the OpenStreetMap community to learn about the data model and see how it fits with what you're trying to do. What constitutes 'routing data'? The roads and ferry routes themselves, obviously. Turn restrictions? Maximum speeds? Road quality? Grades? The OpenStreetMap help forum would probably be a good place to start. My guess is that with a limited budget, you will need to make a lot of assumptions to attain worldwide coverage.

HTH

OTHER TIPS

You will probably get a better range of answers over at https://gis.stackexchange.com/questions

There's a nice project looking specifically at the connectivity of OpenStreetMap - for details please see these blog articles about OSM connectivity. They're producing statistics about the number of "routing islands" and duplicate-ways.

And this link shows islands/duplicates etc on a map visualisation.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow