Domanda

Is there open data format for representing such GIS data as roads, localities, sublocalities, countries, buildings, etc.

I expect that format would define address structure and names for components of address.
What I need is a data format to return in response to reverse geocoding requests.
I looked for it on the Internet, but it seems that every geocoding provider defines its own format.

Should I design my own format?

Does my question make any sense at all? (I'm a newbie to GIS). In case I have not made myself clear I don't look for such data formats as GeoJSON, GML or WKT, since they define geometry and don't define any address structure.

UPD. I'm experimenting with different geocoding services and trying to isolate them into separate module. I need to provide one common interface for all of them and I don't want to make up one more data format (because on the one hand I don't fully understand domain and on the other hand the field itself seems to be well studied). The module's responsibility is to take partial address (or coordinates) like "96, Dubininskaya, Moscow" and to return data structure containing house number (96), street name (Dubininskaya), sublocality (Danilovsky rn), city (Moscow), administrative area (Moskovskaya oblast), country (Russia). The problem is that in different countries there might be more/less division (more/less address components) and I need to unify these components across countries.

È stato utile?

Soluzione

Nope there is not unfortunately.

Why you may ask

Beacuse different nations and countries have vastly different formats and requirements for storing addresses.

Here in the UK for example, defining a postcode has quite a complex set of rules, where as ZIP codes in the US, are 4 digit numerical prefixed with a simple 2 letter state code.

Then you have to consider the question what exactly constitutes an address? again this differences not just from country to country, but some times drastically within the same territory.

for example: (Here in the UK)

Smith and Sons Butchers
10 High street
Some town

Mr smith
10 High street
Some town

The Occupier
10 High Street
Some Town

Smith and Sons Butchers
High Street
Some Town

Are all valid addresses in the UK, and in all cases the post would arrive at the correct destination, a GPS however may have trouble.

A GPS database might be set up so that each building is a square bit of geometry, with the ID being the house number.

That, would give us the ability to say exactly where number 10 is, which means immediately the last look up is going to fail.

Plots may be indexed by name of business, again that s fine until you start using person names, or generic titles.

There's so much variation, that it's simply not possible to create one unified format that can encompass every possible rule required to allow any application on the planet to format any geo-coded address correctly.

So how do we solve the problem?

Simple, by narrowing your scope.

  • Deal ONLY with a specific set of defined entities that you need to work with.
  • Hold only the information you need to describe what you need to describe (Always remember YAGNI* here)
  • Use standard data transmission formats such as JSON, XML and CSV this will increase your chances of having to do less work on code you don't control to allow it to read your data output

(* YAGNI = You ain't gonna need it)

Now, to dig in deeper however:

When it comes to actual GIS data, there's a lot of standard format files, the 3 most common are:

  • Esri Shape Files (*.shp)
  • Keyhole mark up Language (*.kml)
  • Comma separated values (*.csv)

All of the main stay GIS packages free and paid for can work with any of these 3 file types, and many more.

Shape files are by far the most common ones your going to come across, just about every bit of Geospatial data Iv'e come across in my years in I.T has been in a shape file, I would however NOT recommend storing your data in them for processing, they are quite a complex format, often slow and sequential to access.

If your geometry files to be consumed in other systems however, you can't go wrong with them.

They also have the added bonus that you can attach attributes to each item of data too, such as address details, names etc.

The problem is, there is no standard as to what you would call the attribute columns, or what you would include, and probably more drastically, the column names are restricted to UPPERCASE and limited to 32 chars in length.

Kml files are another that's quite universally recognized, and because there XML based and used by Google, you can include a lot of extra data in them, that technically is self describing to the machine reading it.

Unfortunately, file sizes can be incredibly bulky even just for a handful of simple geometries, this trade off does mean though that they are pretty easy to handle in just about any programming language on the planet.

and that brings us to the humble CSV.

The main stay of data transfer (Not just geo-spatial) ever since time began.

If you can put your data in a database table or a spreadsheet, then you can put it in a CSV file.

Again, there is no standards, other than how columns may or may not be quoted and what the separation points are, but readers have to know ahead of time what each column represents.

Also there's no "Pre-Made" geographic storage element (In fact there's no data types at all) so your reading application, also will need to know ahead of time what the column data types are meant to be so it can parse them appropriately.

On the plus side however, EVERYTHING can read them, whether they can make sense of them is a different story.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top