Question

I have a Python script that uses psycopg2 to execute a COPY command to copy data from S3 to Redshift, this is running fine on a cron schedule.

Now I want to do some checks that the data has loaded properly each time and want to query the STL_LOAD_COMMITS and STL_LOAD_ERRORS tables.

Does anyone know if there is a way of getting the query ID returned from the COPY command so it can be used to query the tables above and retrieve the relevant log record?

I don't believe COPY returns anything at all, but if someone has come across some clever way of getting checking loads in code I'd be interested.

EDIT: Perhaps the right way to do this is to query using the filename instead of the query ID since I know the names of the files I've loaded.

select *
from STL_LOAD_COMMITS
where filename in ('s3://bucket/4f737c05-8f16-4ba7-8f50-30423369c389.csv.gz',
's3://bucket/5fe4fea9-a9e4-4622-b9f6-ed3f98f7d1e2.csv.gz')
Was it helpful?

Solution

Using PG_LAST_COPY_ID() will, as it suggests, return the last executed COPY query ID.

Source AWS Redshift PG_LAST_COPY_ID()

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top