Question

I have been using a very simple batch file to download millions of files from a UNIX ftp server for years

login
passwd
ascii
prompt n
cd to the right directory
get some_file
get another_file
cd to the next directory
repeat the pattern

The nice thing about this was that it was simple and all the files arrived with Window's line breaks so the files were ready to use with my existing programs. Because of some changes in my router I had to write a Python script to pull the files - my first version of the script is very simple - but it works

for key in key_filings:
   for filing in key_filings[key]:
        remote_directory = '/foo/bar/' + key + '/' + filing['key_number']
        ftp.cwd(remote_directory)
        text_file = filing['txt']
        ftp.retrlines('RETR '+ text_file, open(save_dir + text_file,'w').writelines)
        hdr_file = filing['hdr']
        ftp.retrlines('RETR ' + hdr_file, open(save_dir + hdr_file,'w').writelines)

However, the files do not have any apparent line breaks. The files are stored in a unix system. Before when I downloaded the files using the Windows CMD shell the line breaks were just there. I have tried sending the ASCII command but as expected that did not have any effect.

It is critical that I be able to have access to the line breaks that existed originally as some of my code processing is line based.

Was it helpful?

Solution

Well as usually happens when I write a question out I can then go find the answer. I thought of deleting the question instead of answering it but I think there are probably others like me who could use the answer so I am going to post what I took away from this webpage by Fredrik Lundh.

I want to save the file instead of printing it to the screen as done in that script

Basically the retrlines is retrieving one line at a time from the server (s in the script below I am writing the line as it arrives with the addition of a newline character.

I don't really understand lamda functions or what callbacks are so this is an excuse to finally wrap my head around those concepts.

import ftplib
ftp = ftplib.FTP('ftp.some.site', user = 'username', passwd = 'password_for_username')

for key in key_filings:
    for filing in key_filings[key]:
        remote_directory = '/foo/bar/' + key + '/' + filing['key_number']
        ftp.cwd(remote_directory)
        text_file = filing['txt']
        save_text_ref = open(save_dir + text_file, 'w')
        ftp.retrlines('RETR '+ text_file, lambda s, w = save_text_ref.write: w(s+'\n'))
        save_text_ref.close()
        hdr_file = filing['hdr']
        save_hdr_ref = open(save_dir +hdr_file,'w')
        ftp.retrlines('RETR ' + hdr_file, lambda s, w = save_hdr_ref.write: w(s+'\n'))
        save_hdr_ref.close()

OTHER TIPS

Love PyNEwbie's use of lambda, thanks. Here is a more generic version of the same code - tried to add this as a comment to your post but it wouldn't take code:

from ftplib import FTP

def ftp_download_textfile(host, user, passwd, subdir, filename):
    ftp = FTP(host, user=user, passwd=passwd)
    ftp.cwd(subdir)
    fp = open(filename, 'w')
    ftp.retrlines('RETR ' + filename, lambda s, w = fp.write: w(s + '\n'))
    fp.close()

ftp_download_textfile('ftp.example.com', 'skywalker', 'maltesefalcon',
                      'spec/files', 'secretplans.csv')

I was looking at this and wondering why the creators of ftplib decided to strip out the new line characters in the first place. I googled around and did not find a satisfactory answer so I wondered about going into the ftplib and changing the code - this seemed simpler to me then my first answer. Thus I found the ftplib.py file in C:\Python27\Lib

I made a copy of it named ftplib_myMOD.py and then opened it in IDLE. and found the retrlines function and modified it

    fp = conn.makefile('rb')
    while 1:
        line = fp.readline()
        if self.debugging > 2: print '*retr*', repr(line)
        if not line:
            break
        #if line[-2:] == CRLF:  Commented out
         #   line = line[:-2]   Commented out
        #elif line[-1:] == '\n': Commented out
          #  line = line[:-1]    commented out

Saved the file, closed IDLE and the restarted it. After doing this and importing it

import ftplib_MYMOD as myftp

I found that the lines breaks were present.

I like this approach because it means I have fewer steps than when using the lambda. Not sure if it is exactly good practice but it was interesting to look through the functions and learn something from them.

It is so weird why the CRLF is being stripped out. I was trying it on a IBM iSeries DB2. Ended up doing this to avoid I/O to the disk file for each line read.

lines = []
ftp.retrlines('RETR ' + remote_file, lambda d:lines.append(d+'\n'))
f=open(yourfile,'w')
f.writelines(lines)
f.close()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top