What is the impact of a large number of boolean columns in a table on storage and index performance in PostgreSQL?

dba.stackexchange https://dba.stackexchange.com/questions/268646

  •  03-03-2021
  •  | 
  •  

Question

I have a lot of boolean attribute for entities in a table and am in need to store even more of them. Currently I have 17 such attributes (Other fields are four bigint fields). I am in need of adding about dozen or so more attributes.

Should I just go ahead and create boolean columns for each of the required attributes or are there better ways? How would that impact storage?

These fields will be searched over on in multiple combinations (so multi-column indexes are not going to help).

Thought (premature) about saving storage: How about I use a 64-bit integer and use each bit as a boolean value? I can use the bit operators for matching but will such operation use indexes since scan is not based on ordering or equality of integers?

Was it helpful?

Solution

I wouldn't bother.

  1. Postgres has a record overhead of at least 23 bytes per record -- and that's not counting a bitmap to keep track of NULLs, or other block overheads which combined probably average another couple of bytes per record. See https://stackoverflow.com/a/26206031/391445
  2. Your bigint takes 8 bytes of storage, versus 17 for the booleans.
  3. Another option is to use bit strings, but these have an overhead of 5 or 8 bytes, so to store your 17 booleans you need 8 bytes anyway.
  4. Your 4 other bigint fields take another 32 bytes of storage.

So expect to save only around 10% -- less if you have more non-boolean columns, but only slightly more as you add more boolean columns -- at the expense of making your code and data much more difficult to understand.

But if you have a lot of data (hundreds of millions of records) I would do some performance tests to compare.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top