Question

I'm going to use pymysql, tempfile, and subprocess to write a script which transports my MySQL database to MongoDB, but I don't make it right so far.

In the follow code:

for table_name in table_list:
    cur.execute("select * from {0};".format(table_name))
    with tempfile.TemporaryFile(mode='w+') as fp:
        writer = csv.writer(fp)
        writer.writerow([i[0] for i in cur.description])
        for record in cur.fetchall():
            writer.writerow(record)

        fp.seek(0)
        subprocess.Popen(['mongoimport', '-d', db_mongo, '-c', table_name, '--type', 'csv', '--file', *** I don't know what should be specified here ***, '--headerline'])

I tried to iterate table names of any mysql database, select all data from the table, write out the results into a temporary file, and then use mongoimport to pour the data in the file into mongodb database. However, one, I don't know how to point out the file name within subprocess.Popen function. Two, are there better ways to make what I want to do? I assume there are, but couldn't come up with the one which is better than this in my mind...

For your information, I originally used actual (not temporary) file name, but it failed if there are too large tables, since it attempts to create another file derived from another table before finishing the previous table's subprocess.Popen operation (go into next iteration I mean), and it also forces me to delete the file once the whole process finished, so I prefer using temporary file to actual one, but not completely sure whether I'm even correct...

Thanks.

Was it helpful?

Solution

tempfile.TemporaryFile doesn't have a name that you can get. In particular:

Under Unix, the directory entry for the file is removed immediately after the file is created.

On other platforms, there may or may not be a directory entry for the file, but you still have no way of getting it from the API.

This is exactly what tempfile.NamedTemporaryFile is for:

This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the name attribute of the file object.


Note that even NamedTemporaryFile will unlink the name as soon as you close it, unless you pass delete=False. So, you have two choices:

  • Use delete=False and rely on the OS to clean up the file at some point in the future. (On OS X, with the default per-user temp system, I believe the folder will get scanned and cleaned either at next logout or at next login, which man not be for a long time.)
  • Wait until the child process has opened the file before you close it. The obvious way to do that is to use a blocking subprocess.check_call or the like instead of just firing-and-forgetting a Popen, but if you want to get sneakier, you can. For example, you can do the waiting-and-deleting in a background thread. Or, if this script is part of a larger suite of scripts, you can have it pass the temp path upward (via stdout, a file, whatever) and have the last script clean up all the files. Or you could just find a way to wait for the file to be opened instead of waiting for the whole process to be finished. Or patch mongoimport to add a --unlink-input-file flag. Or…
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top