Python: ftplib hangs at end of transfer

Question 1

Without more information, I can't actually debug your problem, so I can only suggest the most general answer. This will probably not be necessary for you, but probably will be sufficient for anyone.

retrbinary will block until the entire file is done. If that's longer than 5 minutes, nothing will get sent over the control channel for the entire 5 minutes. Either your client is timing out the control channel, or the server is. So, when you try to hang up with ftp.quit(), it will either hang forever or raise an exception.

You can control your side's timeout with a timeout argument on the FTP constructor. Some servers support an IDLE command to allow you to set the server-side timeout. But, even if the appropriate one turns out to be doable, how do you pick an appropriate timeout in the first place?

What you really want to do is prevent the control socket from timing out while a transfer is happening on the data socket. But how? If you, e.g., ftp.voidcmd('NOOP') every so often in your callback function, that'll be enough to keep the connection alive… but it'll also force you to block until the server responds to the NOOP, which many servers will not do until the data transfer is complete, which means you'll just end up blocking forever (or until a different timeout) and not getting your data.

The standard techniques for handling two sockets without one blocking on the other are a multiplexer like select.select or threads. And you can do that here, but you will have to give up using the simple retrbinary interface and instead using transfercmd to get the data socket explicitly.

For example:

def downloadFile(…):
    ftp = FTP(…)
    sock = ftp.transfercmd('RETR ' + filename)
    def background():
        f = open(…)
        while True:
            block = sock.recv(1024*1024)
            if not block:
                break
            f.write(block)
        sock.close()
    t = threading.Thread(target=background)
    t.start()
    while t.is_alive():
        t.join(60)
        ftp.voidcmd('NOOP')

An alternative solution would be to read, say, 20MB at a time, then call ftp.abort(), and use the rest argument to resume the transfer with each new retrbinary until you reach the end of the file. However, ABOR could hang forever, just like that NOOP, so that doesn't guarantee anything—not to mention that servers don't have to respond to it.

What you could do is just close the whole connection down (not quit, but close). This is not very nice to the server, and may result in some wasted data being re-sent, and may also prevent TCP from doing its usual ramp up to full speed if you kill the sockets too quickly. But it should work.

See this answer—and notice that it requires a bit of testing against your particular broken server to figure out which, if any, variation works correctly and efficiently.

Question 2

Based on abarnet's solution (which was still hanging at the end) I've written this which finally works :-)

import ftplib
from tempfile import SpooledTemporaryFile

MEGABYTE = 1024 * 1024

def download(ftp_host, ftp_user, ftp_pass, ftp_path, filename):
    ftp = ftplib.FTP(ftp_host, ftp_user, ftp_pass, timeout=3600) # timeout: 1-hour
    ftp.cwd(ftp_path)

    filesize = ftp.size(filename) / MEGABYTE
    print(f"Downloading: {filename}   SIZE: {filesize:.1f} MB")

    with SpooledTemporaryFile(max_size=MEGABYTE, mode="w+b") as ff:
        sock = ftp.transfercmd('RETR ' + filename)
        while True:
            buff = sock.recv(MEGABYTE)
            if not buff: break
            ff.write(buff)
        sock.close()
        ff.rollover()  # force saving to HDD of the final chunk!!
        ff.seek(0)     # prepare for data reading
        print("Reading the buffer...")
        # alldata = ff.read()
        # upload_file_to_adls(filename, alldata, account_name, account_key, container, adls_path)
    ftp.quit()