Question

I have a script that pulls files from a FTP directory on a set interval. However, since the files are only being copied, not moved, it ends up pulling the same files over and over again. What's the best way to ensure that I am only pulling new files? I'm thinking of cross referencing the files on the FTP site with the files in the local directory, but not quite sure how to do that. Also, how would I go about not only checking file names, but modified dates as well? For example: random_file.txt was originally placed on the FTP site on 10/25/2012 at 2:15 pm and was downloaded 5 minutes later. Then, on 10/26/2012 at 11:40 am, random_file.txt was replaced on the FTP site with an updated version. Can I download from the FTP site and/or overwrite files on the local directory only the newer files? Thanks!

Here is my existing code:

import ftplib, os

def fetch():
    server = 'ftp.example.com'
    username = 'foo'
    password = 'bar'
    directory = '/random_directory/'
    filematch = '*.txt'
    ftp = ftplib.FTP(server)
    ftp.login(username, password)
    ftp.cwd(directory)
    for filename in ftp.nlst(filematch):
        fhandle = open(os.path.join('C:my_directory', filename), 'wb')
        print 'Getting ' + filename
        ftp.retrbinary('RETR ' + filename, fhandle.write)
        fhandle.close()

UPDATE: So I used Siddharth Toshniwal's links to figure it out, partially at least. For those that may stumble across this and need it, here is my new code so far. Note this only checks for the existence of the file, not it's modified date:

for filename in ftp.nlst(filematch):
        if os.path.exists('C:\my_directory\\' + filename) == False:
            fhandle = open(os.path.join('C:\my_directory', filename), 'wb')
            print 'Getting ' + filename
            ftp.retrbinary('RETR ' + filename, fhandle.write)
            fhandle.close()
        elif os.path.exists(('C:\my_directory\\' + filename)) == True:
            print 'File ', filename, ' Already Exists, Skipping Download'
Was it helpful?

Solution

I second the opinion of using something like rsync rather than hack something in python.

But for whatever reason if that is not feasible, the following links should help you: http://code.activestate.com/recipes/327141-simple-ftp-directory-synch/ http://alexharvey.eu/code/python/get-a-files-last-modified-datetime-using-python/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top