How to determine or predict space on disk per table row?

https://dba.stackexchange.com/questions/267413

02-03-2021
|

Question

I'm very new to Postgres so my math could be off here ...

This is my table:

CREATE TABLE audit (
    id BIGSERIAL PRIMARY KEY,
    content_id VARCHAR (50) NULL, 
    type VARCHAR (100) NOT NULL, 
    size bigint NOT NULL, 
    timestamp1 timestamp NOT NULL DEFAULT NOW(), 
    timestamp2 timestamp NOT NULL DEFAULT NOW()
);

I want to make some estimations on how much space 1 row would occupy. After some reading I've come up with this, is it correct?

1 row =
23 (heaptupleheader)
+ 1 (padding)
+ 8 (id)
+ 50 (content_id)
+ 6 (padding)
+ 100 (type)
+ 4 (padding)
+ 8 (size)
+ 8 (timestamp)
+ 8 (timestamp)
= 216 bytes

I also created this same table in my local Postgres DB but the numbers don't seem to match:

INSERT INTO public.audit(content_id, type, size)
    VALUES ('aaa', 'bbb', 100);

SELECT pg_size_pretty( pg_total_relation_size('audit') );  -- returns 24 kb

INSERT INTO public.audit(content_id, type, size)
    VALUES ('aaaaaaaaaaaaa', 'bbbbbbbbbbbbbb', 100000000000);

SELECT pg_size_pretty( pg_total_relation_size('audit') ); -- still returns 24 kb

Which brings me to think that Postgres reserves a space of 24 kb to start with and as I put in more data it will get incremented by 132 bytes once I go beyond 24 kb? But something inside me says that can't be right.

I want to see how much space 1 row would occupy in Postgres db so I can analyze how much data I can potentially store in it. Maybe I'm missing something very obvious.

Solution

Your calculation is close for the maximum bare row size. The actual range is 68 - 212 bytes per row, or 84 - 228 bytes including the index.

Most importantly a varchar(n) does not have to occupy the maximum length. The data type is implemented with varlena internally, which adds 1 byte overhead for short strings on disk, plus the actual number of bytes for the string.

Data types implemented with varlena don't require alignment padding on disk.

And a NULL value (allowed for content_id) is effectively free. See:

Does not using NULL in PostgreSQL still use a NULL bitmap in the header?

Finally the PRIMARY KEY constraint is implemented using a standard btree index. We have to add that to the total size on disk.

So the calculation for the row size on disk is:

23        bytes  tuple header
 1        byte   padding or null bitmap
 8        bytes  id BIGSERIAL PRIMARY KEY
 0 - 51   bytes  content_id VARCHAR (50) NULL
 0        bytes  alignement padding
 2 - 101  bytes  type VARCHAR (100) NOT NULL
 ?        bytes  alignment padding
 8        bytes  size bigint NOT NULL
 8        bytes  timestamp1 timestamp NOT NULL 
 8        bytes  timestamp2 timestamp NOT NULL
---
min 64 (incl 6 bytes padding) - max 208 bytes (no padding required)

+ 4      bytes item identifier in heap page
---
68 - 212 bytes

+ 16     bytes for the index tuple
---
84 - 228 bytes

Plus some overhead for heap pages and index pages as detailed in these related questions:

Actual row sizes on disk (without item identifier):

SELECT pg_column_size(a) AS size_on_disk, *
FROM   audit a;

Note that size in RAM can differ:

SELECT pg_column_size(content_id) AS content_id_size_on_disk
     , pg_column_size(content_id || '') AS content_id_size_in_ram
     , content_id
FROM   audit a;

See:

What is the overhead for varchar(n)?

About ...

Which brings me to think that Postgres reserves a space of 24 kb to start

The heap starts with 0 bytes. The index starts with one meta-page (8kb). After adding the first row, we see a minimum of 8 kb for the first heap page and 16 kb for the index (1 meta page + 1st index page). Details in the fiddle!

db<>fiddle here

OTHER TIPS

The PostgreSQL documentation and the following StackOverflow question/answer provide a good starting point.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange