Question

I'm looking to schedule FTP file transfers, but to conserve bandwidth, I would like to only upload files that have changed. What's a good reliable way to do this that will work on a variety of different hosting providers?

Was it helpful?

Solution

First, checking to see whether a local file has changed really doesn't have anything to do with FTP. You're stating that you're only going to open an FTP connection to upload a file if/when it has changed.

At a high level, the basic strategy you're going to need to employ is by keeping track of when your application last checked for changes (previous execution timestamp), and compare that to the timestamps of the files you are interested in uploading. If the timestamp on the files is more recent, they will most likely have changed. I say most likely because it is possible to update only the timestamp (e.g. touch on unix/linux).

Here's a quick example showing you how to check the modification time for all of the items in a specific directory:

import os, time

checkdir="./"

for item in os.listdir(checkdir):
    item_path = "%s%s"%(checkdir,item)
    mtime = os.path.getmtime(item_path)
    print "%s: %s" %(item_path,mtime)

Note that this does not differentiate between file types (e.g. regular file, directory, symlink). Read the docs on os.path to discover how to determine file type so you can skip certain types, if you so choose.

You'll still need to come up with the logic to store the time of the previous 'scan' so that you refer to it in subsequent scans. A really simple way to do this would be to store a value in a file.

Make sure you use a locking strategy in case two 'scans' overlap. FTP uploads will take some time to complete.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top