How many rows were copied from table
-
16-10-2019 - |
Question
I want to write a script to backup a table using COPY
.
psql "connection parameters" -c "COPY (SELECT * FROM tbl WHERE insertion_date > 'date') TO STDOUT WITH CSV HEADER;" | bzip2 -c > backup.csv.bz2
Now I want to log how many lines are copied to zip file? I want to calculate it while copying, not with another command.
Solution
Update: Since Postgres 9.3 plpgsql can access the number of rows processed by COPY
directly:
I recently ran into the same problem. I tried with a plpgsql function, but ROW_COUNT
is not set by COPY
- in Postgres 9.2 or older.
You could just run two queries. First count()
and then COPY
. With simple queries that's probably the way to go. But with huge / complex queries this is a pain and can double the execution time.
I came up with a solution that uses a temporary table and counts the rows before executing COPY
.
I adapted what I have for you so you can COPY TO STDOUT
and pipe to bzip2
, which is not possible from within a plpgsql function:
1. Create a function that takes an SQL string and creates a temporary table with it:
CREATE OR REPLACE FUNCTION f_copy_prep(_query text)
RETURNS void AS
$func$
DECLARE
ct integer;
BEGIN
-- If you deal with huge results, set more RAM for temp tables locally:
-- SET temp_buffers = 512MB'; -- example value
EXECUTE 'CREATE TEMP TABLE cp_tmp ON COMMIT DROP AS (' || _query || ')';
GET DIAGNOSTICS ct = ROW_COUNT;
RAISE LOG 'My text here. rows: %', ct;
END
$func$ LANGUAGE plpgsql;
ALTER FUNCTION f_copy_prep(text) SET search_path=public,pg_temp;
REVOKE ALL ON FUNCTION f_copy_prep(text) FROM public;
GRANT EXECUTE ON FUNCTION f_copy_prep(text) TO ???;
Function executes dynamic SQL, so you can use it for any query.
This is inherently unsafe, so run it with the minimum rights necessary, revoke all rights from public and grant
EXECUTE
exclusively to a trusted user. Follow the instructions in the manual!Create temporary table with
ON COMMIT DROP
, so it gets dropped automatically at the end of the transaction.Get the row count with
GET DIAGNOSTICS
-ROW_COUNT
is set by theSELECT
statement inEXECUTE
. Write it to the log - your requirement. No need for a separatecount(*)
.
2. Call from shell-script to pipe output through bzip2
psql "connection parameters" \
-c "SELECT f_copy_prep('SELECT * FROM tbl WHERE insertion_date > ''date'''); \
COPY cp_tmp TO STDOUT WITH CSV HEADER;" \
| bzip2 -c > backup.csv.bz2
Put two SQL commands into your
-c
argument or put complex queries in a file and use the-f
parameter. All is executed in one transaction. Only the output of the last command is returned - fits our need. Careful with the syntax - multiple layers of interpretation (first shell, then Postgres).First command is the above function with your query-string as parameter. Second is the
COPY TO STDOUT
.
I tested this with PostgreSQL 9.1
on Linux and it worked for me: data in the file, message with row-count in the log.