Question

I am trying to understand the file format of a Visual FoxPro compact index (*.IDX). I am currently referring to Microsoft's documentation for guidance.

The index is a B-tree of 512-byte nodes. Each leaf ("exterior") node contains multiple entries. Each entry consists of four pieces of data:

  • Row number [FIXED LENGTH]
  • Duplicate byte count (documentation doesn't explain this) [FIXED LENGTH]
  • Trailing byte count (documentation doesn't explain this) [FIXED LENGTH]
  • Key [VARIABLE LENGTH]

The entries (without their keys) are stored at the beginning of the node, immediately after the node's 24-byte header. Their keys are not included at this location because the keys vary in length, while the row number, duplicate byte count and trailing byte counts are fixed in length. The keys are stored at the end of the node and work their way backward. For example:

  • 24 byte header
  • row number, duplicate byte count, trailing byte count (entry #1)
  • row number, duplicate byte count, trailing byte count (entry #2)
  • row number, duplicate byte count, trailing byte count (entry #3)
  • ...
  • key (entry #3)
  • key (entry #2)
  • key (entry #1)

How do I determine the individual lengths of the keys? The documentation does not appear to specify this. They are perfectly contiguous (no null-byte separators).

I can isolate the keys manually by visual inspection. I suspected that the trailing byte count represented the length of the key. However, it did not correlate to the lengths determined by this inspection.

I believe that the FoxPro file formats are derived from the xBase standard. Perhaps this rings a bell?

Was it helpful?

Solution

After discovering the XBase::Index Perl module, I have determined that the keys in the exterior node are effectively the same length as the fixed-length keys found in the interior nodes, except any trailing spaces are removed. That is what the "trailing byte count" mentioned in the documentation refers to (how many trailing spaces were truncated off the end of the key). I still have not determined what the "duplicate byte count" is, but the module at least clarified its relationship:

variable_key_length = fixed_key_length - duplicate_byte_count - trailing_byte_count

For example, suppose the fixed key length for this index was 10 bytes. Now suppose that the key "DOG " was stored in an external node. Its duplicate byte count (according to what I have observed) will most likely be zero, while its trailing byte count will be 7 (the number of spaces truncated). Therefore, only the three bytes representing "DOG" would be stored.

OTHER TIPS

About duplicate byte count: this mean the number of first bytes, which are same in current key and in previous key. The first key entry stored at the end of the node has a full length, except trailing blanks; successive key entry has only symbols different from previous key entry.

In Xbase indexing rarely exceeds 10 characters or 15 (rare) when using indices (index discussing texts).

In any case if you know what is the number of keys proportionally divides the binary portion. When you make an algorithm that stores data, or store the data using: start or end markers or tabs, or do you leave a static size so you do not use left blank. The static format is less efficient but provides greater speed in reading and obviously generates more predictable structures.

Microsoft says this about the IDX file structure (and at the bottom of the page there are links to all others like the Compact Index format.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top