Diagnosing the error in a Postgresql to Redshift migration through psycopg2
-
08-12-2020 - |
Question
I am running a job class that contains the following:
- A Postgresql connection that can issue SQL statements
- A Redshift connection that can do the same
- an S3 connection to function as an intermediary between the two
The current process I am taking is dealing with the columns with redshifts differences on a per-case basis. The types of columns used (disregarding duplicate types, can give the rest if needed) are:
id integer NOT NULL,
client_id integer NOT NULL,
manual_dt date,
scheduled_at timestamp,
some_other_id varchar(255),
is_good integer DEFAULT 0 NOT NULL,
url varchar(4096) NOT NULL,
image_width integer,
image_dim_ratio float8,
invalid_reason varchar(256) DEFAULT NULL::character varying
There are a total of ~2.3 million records that need to be copied in this initial load. I am using this as a SELECT and COPY on postgresql, uploading it to S3, then using Redshift's COPY to get it from the s3 source. All of this can be posted if needed too.
This works on loads of 100, 1000, 10000, 100000, and 1000000 records. But, if I go to the entire set or a limit of the exact number of records, I receive the following trace:
psycopg2.extensions.TransactionRollbackError: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
Is this the source of what is stopping the copy, or am I on the wrong track and it is another issue?
Solution
This is because you are querying a postgresql replica that doesn't allow queries to run for very long. Your only solution is to ask the database administrator to increase the query limit on that replica, or hit production instead.