Read a file in buffer from FTP python

Question 1

Make sure to login to the ftp server first. After this, use retrbinary which pulls the file in binary mode. It uses a callback on each chunk of the file. You can use this to load it into a string.

from ftplib import FTP
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@

# Setup a cheap way to catch the data (could use StringIO too)
data = []
def handle_binary(more_data):
    data.append(more_data)

resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
data = "".join(data)

Bonus points: how about we decompress the string while we're at it?

Easy mode, using data string above

import gzip
import StringIO
zippy = gzip.GzipFile(fileobj=StringIO.StringIO(data))
uncompressed_data = zippy.read()

Little bit better, full solution:

from ftplib import FTP
import gzip
import StringIO

ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@

sio = StringIO.StringIO()
def handle_binary(more_data):
    sio.write(more_data)

resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
sio.seek(0) # Go back to the start
zippy = gzip.GzipFile(fileobj=sio)

uncompressed = zippy.read()

In reality, it would be much better to decompress on the fly but I don't see a way to do that with the built in libraries (at least not easily).

Question 2

There are two easy ways I can think of to download a file using FTP and store it locally:

Using ftplib:

from ftplib import FTP

ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login()
ftp.cwd('pub/pmc')
ftp.retrbinary('RETR PMC-ids.csv.gz', open('PMC-ids.csv.gz', 'wb').write)
ftp.quit()

Using urllib

from urllib import urlretrieve

urlretrieve("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz", "PMC-ids.csv.gz")

If you don't want to download and store it to a file, but you want to process it gradually as it comes, I suggest using urllib2:

from urllib2 import urlopen

u = urlopen("ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/readme.txt")

for line in u:
   print line

which prints your file line by line.

Question 3

That is not possible. To process data on the server, you need to have some sort of execution permissions, be it for a shell script you would send or SQL access.

FTP is pure file transfer, no execution allowed. You will need either to enable SSH access, load the data into a Database and access that with queries or download the file with urllib then process it locally, like this:

import urllib
handle = urllib.urlopen('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
# Use data, maybe: buffer = handle.read()

In particular, I think the third one is the only zero-effort solution.