How to determine or predict space on disk per table row?
-
02-03-2021 - |
Question
I'm very new to Postgres so my math could be off here ...
This is my table:
CREATE TABLE audit (
id BIGSERIAL PRIMARY KEY,
content_id VARCHAR (50) NULL,
type VARCHAR (100) NOT NULL,
size bigint NOT NULL,
timestamp1 timestamp NOT NULL DEFAULT NOW(),
timestamp2 timestamp NOT NULL DEFAULT NOW()
);
I want to make some estimations on how much space 1 row would occupy. After some reading I've come up with this, is it correct?
1 row =
23 (heaptupleheader)
+ 1 (padding)
+ 8 (id)
+ 50 (content_id)
+ 6 (padding)
+ 100 (type)
+ 4 (padding)
+ 8 (size)
+ 8 (timestamp)
+ 8 (timestamp)
= 216 bytes
I also created this same table in my local Postgres DB but the numbers don't seem to match:
INSERT INTO public.audit(content_id, type, size)
VALUES ('aaa', 'bbb', 100);
SELECT pg_size_pretty( pg_total_relation_size('audit') ); -- returns 24 kb
INSERT INTO public.audit(content_id, type, size)
VALUES ('aaaaaaaaaaaaa', 'bbbbbbbbbbbbbb', 100000000000);
SELECT pg_size_pretty( pg_total_relation_size('audit') ); -- still returns 24 kb
Which brings me to think that Postgres reserves a space of 24 kb to start with and as I put in more data it will get incremented by 132 bytes once I go beyond 24 kb? But something inside me says that can't be right.
I want to see how much space 1 row would occupy in Postgres db so I can analyze how much data I can potentially store in it. Maybe I'm missing something very obvious.
Solution
Your calculation is close for the maximum bare row size. The actual range is 68 - 212 bytes per row, or 84 - 228 bytes including the index.
Most importantly a varchar(n)
does not have to occupy the maximum length. The data type is implemented with varlena
internally, which adds 1 byte overhead for short strings on disk, plus the actual number of bytes for the string.
Data types implemented with varlena
don't require alignment padding on disk.
And a NULL value (allowed for content_id
) is effectively free. See:
Finally the PRIMARY KEY
constraint is implemented using a standard btree index. We have to add that to the total size on disk.
So the calculation for the row size on disk is:
23 bytes tuple header 1 byte padding or null bitmap 8 bytes id BIGSERIAL PRIMARY KEY 0 - 51 bytes content_id VARCHAR (50) NULL 0 bytes alignement padding 2 - 101 bytes type VARCHAR (100) NOT NULL ? bytes alignment padding 8 bytes size bigint NOT NULL 8 bytes timestamp1 timestamp NOT NULL 8 bytes timestamp2 timestamp NOT NULL --- min 64 (incl 6 bytes padding) - max 208 bytes (no padding required) + 4 bytes item identifier in heap page --- 68 - 212 bytes + 16 bytes for the index tuple --- 84 - 228 bytes
Plus some overhead for heap pages and index pages as detailed in these related questions:
- Calculating and saving space in PostgreSQL
- Measure the size of a PostgreSQL table row
- Configuring PostgreSQL for read performance
Actual row sizes on disk (without item identifier):
SELECT pg_column_size(a) AS size_on_disk, *
FROM audit a;
Note that size in RAM can differ:
SELECT pg_column_size(content_id) AS content_id_size_on_disk
, pg_column_size(content_id || '') AS content_id_size_in_ram
, content_id
FROM audit a;
See:
About ...
Which brings me to think that Postgres reserves a space of 24 kb to start
The heap starts with 0 bytes. The index starts with one meta-page (8kb). After adding the first row, we see a minimum of 8 kb for the first heap page and 16 kb for the index (1 meta page + 1st index page). Details in the fiddle!
db<>fiddle here
OTHER TIPS
The PostgreSQL documentation and the following StackOverflow question/answer provide a good starting point.