Question

I want to write a script to backup a table using COPY.

psql "connection parameters" -c "COPY (SELECT * FROM tbl WHERE insertion_date > 'date') TO STDOUT WITH CSV HEADER;" | bzip2 -c > backup.csv.bz2

Now I want to log how many lines are copied to zip file? I want to calculate it while copying, not with another command.

Was it helpful?

Solution

Update: Since Postgres 9.3 plpgsql can access the number of rows processed by COPY directly:


I recently ran into the same problem. I tried with a plpgsql function, but ROW_COUNT is not set by COPY - in Postgres 9.2 or older.

You could just run two queries. First count() and then COPY. With simple queries that's probably the way to go. But with huge / complex queries this is a pain and can double the execution time.

I came up with a solution that uses a temporary table and counts the rows before executing COPY.

I adapted what I have for you so you can COPY TO STDOUT and pipe to bzip2, which is not possible from within a plpgsql function:

1. Create a function that takes an SQL string and creates a temporary table with it:

CREATE OR REPLACE FUNCTION f_copy_prep(_query text)
  RETURNS void AS
$func$
DECLARE
   ct integer;
BEGIN
   -- If you deal with huge results, set more RAM for temp tables locally:
   -- SET temp_buffers = 512MB';  -- example value

   EXECUTE 'CREATE TEMP TABLE cp_tmp ON COMMIT DROP AS (' || _query || ')';

   GET DIAGNOSTICS ct = ROW_COUNT;
   RAISE LOG 'My text here. rows: %', ct;

END
$func$  LANGUAGE plpgsql;
ALTER FUNCTION f_copy_prep(text) SET search_path=public,pg_temp;
REVOKE ALL ON FUNCTION f_copy_prep(text) FROM public;
GRANT EXECUTE ON FUNCTION f_copy_prep(text) TO ???;
  • Function executes dynamic SQL, so you can use it for any query.

  • This is inherently unsafe, so run it with the minimum rights necessary, revoke all rights from public and grant EXECUTE exclusively to a trusted user. Follow the instructions in the manual!

  • Create temporary table with ON COMMIT DROP, so it gets dropped automatically at the end of the transaction.

  • Get the row count with GET DIAGNOSTICS - ROW_COUNT is set by the SELECT statement in EXECUTE. Write it to the log - your requirement. No need for a separate count(*).

2. Call from shell-script to pipe output through bzip2

psql "connection parameters" \
-c "SELECT f_copy_prep('SELECT * FROM tbl WHERE insertion_date > ''date'''); \
    COPY cp_tmp TO STDOUT WITH CSV HEADER;" \
| bzip2 -c > backup.csv.bz2
  • Put two SQL commands into your -c argument or put complex queries in a file and use the -f parameter. All is executed in one transaction. Only the output of the last command is returned - fits our need. Careful with the syntax - multiple layers of interpretation (first shell, then Postgres).

  • First command is the above function with your query-string as parameter. Second is the COPY TO STDOUT.

I tested this with PostgreSQL 9.1 on Linux and it worked for me: data in the file, message with row-count in the log.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top