Question

I am using the Python avro library. I want to send an avro file over http, but I don't particularly want to save that file to disk first, so I thought I'd use StringIO to house the file contents until I'm ready to send. But avro.datafile.DataFileWriter thoughtfully takes care of closing the file handle for me, which makes it difficult for me to get the data back out of the StringIO. Here's what I mean in code:

from StringIO import StringIO
from avro.datafile import DataFileWriter
from avro import schema, io
from testdata import BEARER, PUBLISHURL, SERVER, TESTDATA
from httplib2 import Http

HTTP = Http()
##
# Write the message data to a StringIO
#
# @return StringIO
#
def write_data():
    message = TESTDATA
    schema = getSchema()
    datum_writer = io.DatumWriter(schema)
    data = StringIO()
    with DataFileWriter(data, datum_writer, writers_schema=schema, codec='deflate') as datafile_writer:
        datafile_writer.append(message)
        # If I return data inside the with block, the DFW buffer isn't flushed
        # and I may get an incomplete file
    return data

##
# Make the POST and dump its response
#
def main():
    headers = {
        "Content-Type": "avro/binary",
        "Authorization": "Bearer %s" % BEARER,
        "X-XC-SCHEMA-VERSION": "1.0.0",
    }
    body = write_data().getvalue() # AttributeError: StringIO instance has no attribute 'buf'
    # the StringIO instance returned by write_data() is already closed. :(
    resp, content = HTTP.request(
        uri=PUBLISHURL,
        method='POST',
        body=body,
        headers=headers,
    )
    print resp, content

I do have some workarounds I can use, but none of them are terribly elegant. Is there any way to get the data from the StringIO after it's closed?

Was it helpful?

Solution

Not really.

The docs are very clear on this:

StringIO.close()

Free the memory buffer. Attempting to do further operations with a closed StringIO object will raise a ValueError.

The cleanest way of doing it would be to inherit from StringIO and override the close method to do nothing:

class MyStringIO(StringIO):
   def close(self):
       pass
   def _close(self):
       super(MyStringIO, self).close()

And call _close() when you're ready.

OTHER TIPS

I was looking to do exactly the same thing, the DataFileWriter has a flush method, so you should be able to flush after the call to append and then return the data. Seems a little more elegant to me than deriving a class from StringIO.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top