Question

I have a table that contains two text fields which hold a lot of text. For some reason our table have started growing exponentially. I suspect that TOAST (compression for text fields in postgres) is not working automatically. In our table definition we have not defined any clause to force compression of these fields. Is there a way to check if compression is working on that table or not?

Was it helpful?

Solution

From the docs . . .

If any of the columns of a table are TOAST-able, the table will have an associated TOAST table, whose OID is stored in the table's pg_class.reltoastrelid entry. Out-of-line TOASTed values are kept in the TOAST table, as described in more detail below.

So you can determine whether a TOAST table exists by querying the pg_class system catalog. This should get you close to what you're looking for.

select t1.oid, t1.relname, t1.relkind, t2.relkind, t2.relpages, t2.reltuples
from pg_class t1
inner join pg_class t2
on t1.reltoastrelid = t2.oid
where t1.relkind = 'r'
  and t2.relkind = 't';

In psql, you can use \d+. I'll use the pg_class system catalog as an example; you'd use your own table name.

sandbox=# \d+ pg_class
     Column     |   Type    | Modifiers | Storage  | Stats target | Description 
----------------+-----------+-----------+----------+--------------+-------------
 relname        | name      | not null  | plain    |              | 
 relnamespace   | oid       | not null  | plain    |              | 
 [snip]
 relacl         | aclitem[] |           | extended |              | 
 reloptions     | text[]    |           | extended |              | 

Where Storage is 'extended', PostgreSQL will try to reduce row size by compressing first, then by storing data out of line. Where Storage is 'main' (not shown), PostgreSQL will try to compress.

In your particular case, you might find it useful to monitor changes in size over time. You can use this query, and save the results for later analysis.

select table_catalog, table_schema, table_name, 
       pg_total_relation_size(table_catalog || '.' || table_schema|| '.' || table_name) as pg_total_relation_size,
       pg_relation_size(table_catalog || '.' || table_schema|| '.' || table_name) as pg_relation_size,
       pg_table_size(table_catalog || '.' || table_schema|| '.' || table_name) as pg_table_size
from information_schema.tables

PostgreSQL admin functions has details about what each function includes in its calculations.

OTHER TIPS

This is old, but I've recently had some success with a similar issue. ANALYZE VERBOSE revealed that a couple of our tables had grown to > 1 page of disk per tuple, and EXPLAIN ANALYZE revealed that sequential scans were taking up to 30 seconds on a table of 27K rows. Estimates of the number of active rows were getting further and further off.

After much searching, I learned that rows can only be vacuumed if there is no transaction that has been open since they were updated. This table was rewritten every 3 minutes, and there was a connection that was "idle in transaction" that was 3 days old. You can do the math.

In this case, we had to

  1. kill the connection with the open transaction
  2. reconnect to the database. Unfortunately the maximum transaction ID for rows that can be vacuumed is currently (as of 9.3) stored in the connection, so vacuum full will not work.
  3. VACUUUM FULL your table (this will take out an ACCESS EXCLUSIVE lock, which will block everything including reads. You may want to run VACUUM first (non-blocking), to speed up the time VACUUM FULL takes).

This may not have been your problem, but if you would like to see if tables are affected in your own database, I wrote a query to order tables by the average number of tuples stored in a page of disk. Tables with large rows should be at the top - ANALYZE VERBOSE should give you an idea of the ratio of dead to live tuples in these tables. Valid for 9.3 - this will probably require some minor tweaks for other versions:

SELECT rolname AS owner,
       nspname AS schemaname
     , relname AS tablename
     , relpages, reltuples, (reltuples::FLOAT / relpages::FLOAT) AS tuples_per_page
FROM pg_class
JOIN pg_namespace ON relnamespace = pg_namespace.oid
JOIN pg_roles     ON relowner     = pg_roles.oid
WHERE relkind = 'r' AND relpages > 20 AND reltuples > 1000
  AND nspname != 'pg_catalog'
ORDER BY tuples_per_page;

You can see what queries psql runs by starting it with the -E parameter, then running normal commands:

In this instance, the one that came down to was, first lookup your table's oid:

SELECT c.oid,
  n.nspname,
  c.relname
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname ~ '^(YOUR_TABLE_NAME_HERE)$'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;

then it executes this to lookup more stats about it:

SELECT a.attname,                                                                                                                                          pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
  a.attnotnull, a.attnum,
  (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t
   WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation) AS attcollation,
  NULL AS indexdef,
  NULL AS attfdwoptions,
  a.attstorage,
  CASE WHEN a.attstattarget=-1 THEN NULL ELSE a.attstattarget END AS attstattarget, pg_catalog.col_description(a.attrelid, a.attnum)
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = '57692' AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;

x.attrstorage is what you care about p is PLAIN, x is EXTENDED I wager.

If vacuuming rhe table strips it to 19GB from 80GB, what you're likely seeing is MVCC at work: dead rows take up space until vacuumed or re-used.

http://wiki.postgresql.org/wiki/MVCC

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top