Process File in memory using python

Question 1

You can read the entire file into memory using:

data = urllib2.urlopen(url).read()

Once the file is in memory, you can load it into xlrd using the file_contents argument of open_workbook:

wb = xlrd.open_workbook(url, file_contents=data)

Pass the url in as the filename as the documentation states it might be used in messages; otherwise, it will be ignored.

Thus, your traverseWorbook method can be rewritten as:

def traverseWorkbook(url):
    values = []
    data = urllib2.urlopen(url).read()
    wb = xlrd.open_workbook(filename, file_contents=data)
    for s in wb.sheets():
        for row in range(s.nrows):
        if row > 10:
            rowData = processRow(s, row, type)
            if rowData:
                values.append(rowData)
    return values

Question 2

You could use the StringIO library and write the downloaded data to a file-like StringIO object, rather than a normal file.

import cStringIO as cs
from contextlib import closing

def retrieveFile(url, filename):
    try:
        req = urllib2.urlopen(url)
        CHUNK = 16 * 1024
        full_str = None
        with closing(cs.StringIO()) as fp:
            while True:
                chunk = req.read(CHUNK)
                if not chunk: break
                    fp.write(chunk)
            full_str = fp.getvalue()  # This contains the full contents of the downloaded file.
        return True
    except Exception, e:
        return None

Question 3

You can use pandas for this. The benefits are that it's optimized to handle working with data in memory since the computation is done in C and not actually Python. It also abstracts away a lot of the messy details that come with downloading the data.

import pandas as pd

xl = pd.ExcelFile(url, engine='xlrd')
sheets = xl.sheet_names

# work with the first sheet, or iterate through sheets if there are more than one.
df = xl.parse(sheets[0])

# The file is now a dataframe.
# You can manipulate the data in memory using the Pandas API
# ...
# ...

# after massaging the data, write to to an xls file:
out_file = '~/Documents/out_file.xls'
data.to_excel(out_file, encoding='utf-8', index=False)