What accounts for a large single index size (other than bloat?)

https://dba.stackexchange.com/questions/254305

19-02-2021
|

Question

I've been doing some digging into the size of our production PostgreSQL 9.6 database and found some results I thought were surprising.

We have a table (let's call it foos) with about 10 million records. The primary key is an integer. We have a B-tree index on this table for an optional foreign key into another table (let's call it bars). Let's call the index index_foos_on_bar_id. bars.id is just an integer data type column.

When I look at the index size using the \di+ meta command I see that it occupies somewhere in the neighborhood of 1GB of space. Some back-of-the-envelope math would imply that each entry in the index thus takes about 1GB / 10 million = 100 bytes of space per-row.

There's almost no deletion taking place on the foos table, so the bloat is non-existent.

In my mind, an index would effectively contain something like sorted pairs of numbers mapping the indexed column to the primary key of the relevant table. However, since it's only integer types, that would only use about 4 + 4 = 8 bytes per row, which is way off from the 100 bytes per row that's actually occupied. I guess the fact that it's a tree structure could bump that up slightly, but the over 10x difference was a bit of an eyebrow-raiser for me.

What accounts for all of this "extra" space being used by the index?

Solution

There's almost no deletion taking place on the foos table, so the bloat is non-existent.

You don't need deletions to get bloat. updates will do so as well. A freshly made index on 10,000,000 ints gives me a size of 214MB, so you do have bloat (or are using weird hardware, perhaps). You can't detect bloat just by thinking about to. Rebuild the index and see if it gets smaller, or use pgstattuple. Of course having a tightly packed index is not very efficient, as then nearly every operation needs to do page splits.

Your mental model of an index does not match how PostgreSQL does things.

An index does not point to a primary key value, it points directly into the table. This is called a "tid", and generally takes 6 bytes.

The data value itself is stored to 8 byte alignment, so even if your data is an int4, it still takes 8 bytes of disk space. Due to this alignment, You could build an index on two int4 columns, and it will still take the same size as one int4 column.

PostgreSQL is always prepared to handle NULL values on each row, as well as variable length data, even if the definition of the table/index precludes such things. So each index tuple has a header which tells us whether that row has any NULL or variable length data, as well as the overall length of that tuple. This takes up some space.

PostgreSQL uses page-organized storage, not just a huge malloc chunk. Each page contains some more overhead, both per-block and per-entry within the block (see typedef struct ItemIdData).

So you get 8 bytes for the data itself (bar_id, aligned), 8 bytes of the tuple header which includes the tid pointer, and 4 bytes for the line pointer within each block, that comes up to 20 bytes. With some overhead of non-leaf pages plus per-block overhead and some space within blocks which is not usable due to alignment or because they are at the ragged end of the data, I get an actual value of 22.4 bytes. The other 4.5 fold would be bloat (although some of the "bloat" may be useful free space, unless your index is going to be absolutely static from now on)

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange