Question

I need to migrate a table from one PostgreSQL database to another. There was a chance I would need to fix some data, so I exported to a CSV. Then I imported the CSV in the second database with a COPY statement.

This process has been running for 5 days now. The only way I found to inspect its progress was to compare the sizes on disk. The original table was 95 GB (from psql's \dt+), and the CSV was 40 GB. So I thought I could compare the new table size with those numbers. I thought that the new table would stop at 95 GB, or even before. Instead, it's now at 103 GB and who knows when it will stop.

Of course, select count(*) does not work because the copy happens in its own transaction so the rows are shielded until it's done. But I know that the the table has about 1500 million rows. So if somehow I could get an estimate of the number of rows currently in the new table I could compare.

Was it helpful?

Solution

There is not yet a formal facility for monitoring the progress of a COPY operation. You can use the pageinspect extension to get an estimate of the number of rows, even uncommitted ones. Assuming the existence of the table has been committed, and the table was empty other than the in-progress COPY, then you could use:

select sum(
        (select count(*) from
          heap_page_items(get_raw_page('pgbench_accounts', x))
        )
    ) 
from generate_series(0, (pg_relation_size('pgbench_accounts')/8192)::int-1) as f(x);

(This still assumes you are using the default blocksize of 8192.)

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top