Question

If you have geographic data stored in ESRI shapes, you have at least three files: one ending with .shp containing the vector-data, one ending with .dbf containing attributes and a file ending with .shx containing an index.

I'm interested in the shx-file. How does it work? Does it contain a complete mapping, like 'first geometry maps to third row in the dbf and second geometry maps to the first row' for every geometry? Or does it work different?

Was it helpful?

Solution

According to the spec the shx contains a 100 byte header followed by a sequence of 8 byte records. Each record stores a 4 byte offset and a 4 byte content length for a record in the main .shp data file.

+-----------------------------------------------+
| header (100 bytes)                            |
+-----------------+------------------+----------+
| offset(4 bytes) | length (4 bytes) | 
+-----------------+------------------+
| offset(4 bytes) | length (4 bytes) | 
+-----------------+------------------+
| offset(4 bytes) | length (4 bytes) | 
+-----------------+------------------+
| offset(4 bytes) | length (4 bytes) | 
+-----------------+------------------+
| ....                               | 
+-----------------+------------------+

Note that the offset is specified in 16 bit words, so the offset for the first record is 50 (as the .shp header is 100 bytes, or 50 words, long). The content length is also specified in 16 bit words.

So, you can figure out the number of records from (index_file_length-100)/8, and use the index to access a particular shape record in the .shp file at random or in sequence.

OTHER TIPS

Fine answer by Paul Dixon.

Though I was wondering what you are going to do with it! If you're going to write code to read or write SHP files I would strongly suggest using a library instead - there are some good free open source ones like GDAL, also some good commercial ones.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top