Question

This is my first post to stackoverflow. Your forum has been SO very helpful as I've been learning Python and Postgres on the fly for the last 6 months, that I haven't needed to post yet. But this task is tripping me up and I figure I need to start earning reputation points:

I am creating a python script for backing up data into an SQL database daily. I have a CSV file with an entire months worth of hourly data, but I only want to select a single day of data from from the file and copy those select rows into my database. Am I able to query the CSV table and append the query results into my database? For example:

        sys.stdin = open('file.csv', 'r')    
        cur.copy_expert("COPY table FROM STDIN 
                         SELECT 'yyyymmddpst LIKE 20140131' 
                         WITH DELIMITER ',' CSV HEADER", sys.stdin)

This code and other variations aren't working out - I keep getting syntax errors. Can anyone help me out with this task? Thanks!!

Was it helpful?

Solution

You need create temporary table at first:

cur.execute('CREATE TEMPORARY TABLE "temp_table" (LIKE "your_table") WITH OIDS')

Than copy data from csv:

cur.execute("COPY temp_table FROM '/full/path/to/file.csv' WITH CSV HEADER DELIMITER ','")

Insert necessary records:

cur.execute("INSERT INTO your_table SELECT * FROM temp_table WHERE yyyymmddpst LIKE 20140131")

And don't forget do conn.commit() Temp table will destroy after cur.close()

OTHER TIPS

You can COPY (SELECT ...) TO an external file, because PostgreSQL just has to read the rows from the query and send them to the client.

The reverse is not true. You can't COPY (SELECT ....) FROM ... . If it were a simple SELECT PostgreSQL could try to pretend it was a view, but really it doesn't make much sense, and in any case it'd apply to the target table, not the source rows. So the code you wrote wouldn't do what you think it does, even if it worked.

In this case you can create an unlogged or temporary table, copy the full CSV to it, and then use SQL to extract just the rows you want, as pointed out by Dmitry.

An alternative is to use the file_fdw to map the CSV file as a table. The CSV isn't copied, it's just read on demand. This lets you skip the temporary table step.

From PostgreSQL 12 you can add a WHERE clause to your COPY statement and you will get only the rows that match the condition. So your COPY statement could look like:

COPY table 
 FROM '/full/path/to/file.csv' 
 WITH( FORMAT CSV, HEADER, DELIMITER ',' )
 WHERE yyyymmddpst LIKE 20140131
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top