How to reclaim space taken by an index that partially built and was terminated by a power outage

https://dba.stackexchange.com/questions/109092

28-09-2020
|

Question

I'm running postgres (postgis) 9.4.2 on a mac (10.10.4).

I've got a couple big tables (several TBs).

During an index build on one of them that takes about a week, I watched the available HD space drop as you'd expect to nearly the point at which the index would be finished when a power outage lasted longer than the battery unit and the system went down. I had buffers off, and fillfactor=100 during the build since it's a static datasource. On reboot, the available space left on the drive is exactly where it was at nearly the end of the index build. Vacuum analyze doesn't free the space.

I tried dropping the table and re-ingesting, and that didn't drop the space. Now I'm at a place where I don't have enough space to build the index.

Are the files generated during the index build stuck in some limbo where they can't be removed by the system because of the way the machine went down during the power outage?

When I look at the table sizes + indexes in the db (which is the only data on that drive) they add up to about 6TB. The drive is 8TB, and there is less than 500GB left on the drive, so it seems there are about 1.5TB lost somewhere which is about the size that index would have been.

Any ideas?

Solution

Normally we'd expect that when postgres was restarted, the crash recovery process would have removed files related to a rollback'ed index from the data directory.

Let's assume that it didn't work, or at least that it has to be checked manually.

The list of files that should be in the datadir can be established with a query like this:

select pg_relation_filenode(oid)
   from pg_class
  where relkind in ('i','r','t','S','m')
    and reltablespace=0
  order by 1;

reltablespace=0 is for the default tablespace. If the problematic index was created in a non-default tablespace, this 0 must be replaced by its OID in pg_tablespace.

i,r,t,S,m in relkind correspond respectively to indexes, tables, toast space, sequences, materialized views. All these objects have their data in files whose names match pg_relation_filenode(oid).

On disk, the data files are below $PGDATA/base/oid/ where oid is the oid of the database obtained by select oid,datname from pg_database. If we're not talking about the default tablespace, base is replaced by PG_version_somelabel instead.

List and sort the files matching relfilenodes in that directory:

ls | grep -E '^[0-9]+$' | sort -n > /tmp/list-of-relations.txt

(that actually keeps only the first segment for relations that are larger than 1Gb. If there are lingering segments not attached to anything they should be considered separately)

and diff that file with the result of the query above.

If there are lingering data files that do not correspond to any object that the db knows about, they should appear in that diff.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange