Question

I have been playing around with Python's FTP library and am starting to think that it is too slow as compared to using a script file in DOS? I run sessions where I download thousands of data files (I think I have over 8 million right now). My observation is that the download process seems to take five to ten times as long in Python than it does as compared to using the ftp commands in the DOS shell.

Since I don't want anyone to fix my code I have not included any. I am more interested in understanding if my observation is valid or if I need to tinker more with the arguments.

Was it helpful?

Solution

FTPLib is implemented in Python whereas your "DOS Script" is actually a script which calls a compiled command. Executing this command is probably faster than interpreting Python code. If it is too slow for you, I suggest to call the DOS command from Python using the subprocess module.

OTHER TIPS

FTPlib may not be the cleanest Python API, I don't think it so bad that it run ten times slower than a DOS shell script.

Unless you do not provide any code to compare, e.g you shell and you python snippet to batch dl 5000 files, I can't see how we can help you.

The speed problem is probably in your code. FTPlib is not 10 times slower.

define blocksize along with storbinary of ftp connection,so you will get 1.5-3.0x more faster connection than FTP Filezilla :)

from ftplib import FTP

USER = "Your_user_id"
PASS = "Your_password"
PORT = 21
SERVER = 'ftp.billionuploads.com' #use FTP server name here

ftp = FTP()
ftp.connect(SERVER, PORT)
ftp.login(USER, PASS)

try:
    file = open(r'C:\Python27\1.jpg','rb')
    ftp.storbinary('STOR ' + '1.jpg', file,102400) #here we store file in 100kb blocksize
    ftp.quit()
    file.close()
    print "File transfered"
except:
    print "Error in File transfering"
import ftplib
import time
ftp = ftplib.FTP("localhost", "mph")
t0 = time.time()
with open('big.gz.sav', 'wb') as f:
    ftp.retrbinary('RETR ' + '/Temp/big.gz', f.write)
t1 = time.time()
ftp.close()
ftp = ftplib.FTP("localhost", "mph")
t2 = time.time()
ftp.retrbinary('RETR ' + '/Temp/big.gz', lambda x: x)
t3 = time.time()
print "saving file: %f to %f: %f delta" % (t0, t1, t1 - t0)
print "not saving file: %f to %f: %f delta" % (t2, t3, t3 - t2)

So, maybe not 10x. But my runs of this saving a file are all above 160s on a laptop with a core 1.8Ghz core i7 and 8GB of ram (should be overkill) running Windows 7. A native client does it at 100s. Without the file save I'm just under 70s.

I came to this question because I've seen slow performance with ftplib on a mac (I'll rerun this test again once I have access to that machine again). While going async with the writes might be a good idea in this case, on a real network I suspect that would be far less of a gain.

disable ftplib and execute ftp via Msdos

os.system('FTP -v -i -s:C:\\ndfd\\wgrib2\\ftpscript.txt')

inside ftpscript.txt

open example.com
username
password
!:--- FTP commands below here ---
lcd c:\MyLocalDirectory
cd  public_html/MyRemoteDirectory
binary
mput "*.*"
disconnect
bye

Bigger blocksize is not always the optimum. For example uploading the same 167 MB file through wired network to same FTP server I got following times in seconds for various blocksizes:

Blocksize  Time
102400       40
 51200       30
 25600       28
 32768       30
 24576       31
 19200       34
 16384       61
 12800      144

In this configuration the optimum was around 32768 (4x8192).

But if I used wireless instead, I got these times:

Blocksize  Time
204800       78
102400       76
 51200       79
 25600       76
 32768       89
 24576       86
 19200       75
 16384      166
 12800      178
default     223

In this case there were several optimum blocksize values, all different from 32768.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top