Question

I've come across a problem today, I'm uploading a multipart form with http POST using the poster module.

Part of the form is a file, which poster streams up - which is great.

The problem I'm having is the Content-Length is calculated up front before the upload begins but because the form data is then generated dynamically it's entirely possible that amount of data that gets uploaded ends up being different (this is happening to me if the file in the form gets modified during the upload by something external).

If the file gets longer then the server will close the connection when it has received the amount of data specified in the content length before I've finished and I get a Connection reset by peer error. If the file gets shorter then the upload hangs up where the server is waiting for the rest of the bytes that I promised.

In the latter case I get this stack trace when I interrupt the hung upload:

Traceback (most recent call last):
  File "/Users/paul/Source/Python/test_uploader.py", line 35, in <module>
    gUpload(target_file, size, result.signed, callback, md5=md5)
  File "/Users/paul/Source/Python/PythonApp/upload.py", line 597, in handlingHttpError
    return func(*args, **kwargs)
  File "/Users/paul/Source/Python/PythonApp/upload.py", line 663, in gUpload
    urllib2.urlopen(request)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/poster-0.8.1-py2.7.egg/poster/streaminghttp.py", line 142, in http_open
    return self.do_open(StreamingHTTPConnection, req)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1180, in do_open
    r = h.getresponse(buffering=True)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1030, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 447, in readline
    data = self._sock.recv(self._rbufsize)
KeyboardInterrupt

How can I deal with this situation? I don't mind it throwing an error but this hang is killing me!

Was it helpful?

Solution

Thanks for the suggestions, however I can't afford to lock any files as my process is almost always going to be a lower priority than the process that may be editing the file I'm uploading.

This is what I went for in the end, it seems to work well!

class SizeCheckFile(file):
    def __init__(self, size, *args, **kwargs):
        file.__init__(self, *args, **kwargs)
        self.size = size
        self.data_read = 0

    def read(self, *args, **kwargs):
        data = file.read(self, *args, **kwargs)
        self.data_read += len(data)
        if self.data_read > self.size:
            raise UploadSizeMismatchError("File has grown!")
        elif not data and self.data_read != self.size:
            raise UploadSizeMismatchError("File has shrunk!")
        return data

    def seek(self, *args, **kwargs):
        current_pos = self.tell()
        file.seek(self, *args, **kwargs)
        if current_pos != self.tell():
            raise NotImplementedError("%s currently assumes the file is being read from start to finish!" % self.__class__.__name__)

The size I pass into the constructor is the same as the size I pass to poster for the MultipartParam filesize parameter.

Of course this assumes that no seeking is taking place, or I would have to override seek and keep track of exactly what's being read but for my use case I needn't worry as the file's being streamed out.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top