Usando MultipartPostHandler a POST form-data con Python

https://stackoverflow.com/questions/680305

22-08-2019
|

Pregunta

Problema: Al publicar los datos con urllib2 de Python, todos los datos son URL codificada y enviada, Content-Type: urlencoded-x-www-formulario de solicitud /. Cuando la carga de archivos, el Content-Type lugar se debe establecer en multipart / form-data y el contenido sea MIME codificado. Una discusión de este problema está aquí: http://code.activestate.com/recipes/146306/

Para superar esta limitación algunos codificadores afilados crearon una biblioteca llamada MultipartPostHandler que crea un OpenerDirector puede utilizar con urllib2 a POST su mayoría de forma automática con multipart / form-data. Una copia de esta biblioteca está aquí: http://peerit.blogspot.com/2007/07/multipartposthandler -doesnt-trabajo-for.html

Soy nuevo en Python y soy incapaz de conseguir esta biblioteca para trabajar. Escribí a cabo esencialmente el siguiente código. Cuando capturo en un proxy HTTP local, puedo ver que los datos aún está codificado y no de varias partes MIME codificado URL. Por favor, ayudarme a averiguar lo que estoy haciendo mal o una mejor manera de hacer esto. Gracias: -)

FROM_ADDR = 'my@email.com'

try:
    data = open(file, 'rb').read()
except:
    print "Error: could not open file %s for reading" % file
    print "Check permissions on the file or folder it resides in"
    sys.exit(1)

# Build the POST request
url = "http://somedomain.com/?action=analyze"       
post_data = {}
post_data['analysisType'] = 'file'
post_data['executable'] = data
post_data['notification'] = 'email'
post_data['email'] = FROM_ADDR

# MIME encode the POST payload
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)
urllib2.install_opener(opener)
request = urllib2.Request(url, post_data)
request.set_proxy('127.0.0.1:8080', 'http') # For testing with Burp Proxy

# Make the request and capture the response
try:
    response = urllib2.urlopen(request)
    print response.geturl()
except urllib2.URLError, e:
    print "File upload failed..."

EDIT1: Gracias por su respuesta. Soy consciente de la solución ActiveState httplib a esto (he vinculado a ella más arriba). Prefiero abstraer el problema y utilizar una cantidad mínima de código para seguir utilizando urllib2 cómo he sido. Cualquier idea de por qué el primer partido no está siendo instalado y utilizado?

Solución

Parece que la forma más fácil y más compatible para evitar este problema es utilizar el módulo de 'cartel'.

# test_client.py
from poster.encode import multipart_encode
from poster.streaminghttp import register_openers
import urllib2

# Register the streaming http handlers with urllib2
register_openers()

# Start the multipart/form-data encoding of the file "DSC0001.jpg"
# "image1" is the name of the parameter, which is normally set
# via the "name" parameter of the HTML <input> tag.

# headers contains the necessary Content-Type and Content-Length
# datagen is a generator object that yields the encoded parameters
datagen, headers = multipart_encode({"image1": open("DSC0001.jpg")})

# Create the Request object
request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers)
# Actually do the request, and get the response
print urllib2.urlopen(request).read()

Esto funcionó perfecto y yo no tenía que ensuciar con httplib. El módulo está disponible aquí: http://atlee.ca/software/poster/index.html

Otros consejos

Encontrados esta receta para publicar multiparte usando directamente httplib (no hay bibliotecas externas involucrados)

import httplib
import mimetypes

def post_multipart(host, selector, fields, files):
    content_type, body = encode_multipart_formdata(fields, files)
    h = httplib.HTTP(host)
    h.putrequest('POST', selector)
    h.putheader('content-type', content_type)
    h.putheader('content-length', str(len(body)))
    h.endheaders()
    h.send(body)
    errcode, errmsg, headers = h.getreply()
    return h.file.read()

def encode_multipart_formdata(fields, files):
    LIMIT = '----------lImIt_of_THE_fIle_eW_$'
    CRLF = '\r\n'
    L = []
    for (key, value) in fields:
        L.append('--' + LIMIT)
        L.append('Content-Disposition: form-data; name="%s"' % key)
        L.append('')
        L.append(value)
    for (key, filename, value) in files:
        L.append('--' + LIMIT)
        L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename))
        L.append('Content-Type: %s' % get_content_type(filename))
        L.append('')
        L.append(value)
    L.append('--' + LIMIT + '--')
    L.append('')
    body = CRLF.join(L)
    content_type = 'multipart/form-data; boundary=%s' % LIMIT
    return content_type, body

def get_content_type(filename):
    return mimetypes.guess_type(filename)[0] or 'application/octet-stream'

Sólo tiene que usar python-solicitudes , fijará las cabeceras adecuadas y no cargar para usted:

import requests 
files = {"form_input_field_name": open("filename", "rb")}
requests.post("http://httpbin.org/post", files=files)

Me encontré con el mismo problema y que necesitaba hacer un formulario de envío de varias partes sin usar bibliotecas externas. Escribí un conjunto sobre los temas me encontré con .

Terminé usando una versión modificada del http://code.activestate.com/recipes/146306 / . El código en esa URL en realidad sólo añade el contenido del archivo como una cadena, que puede causar problemas con los archivos binarios. Aquí está mi código de trabajo.

import mimetools
import mimetypes
import io
import http
import json


form = MultiPartForm()
form.add_field("form_field", "my awesome data")

# Add a fake file     
form.add_file(key, os.path.basename(filepath),
    fileHandle=codecs.open("/path/to/my/file.zip", "rb"))

# Build the request
url = "http://www.example.com/endpoint"
schema, netloc, url, params, query, fragments = urlparse.urlparse(url)

try:
    form_buffer =  form.get_binary().getvalue()
    http = httplib.HTTPConnection(netloc)
    http.connect()
    http.putrequest("POST", url)
    http.putheader('Content-type',form.get_content_type())
    http.putheader('Content-length', str(len(form_buffer)))
    http.endheaders()
    http.send(form_buffer)
except socket.error, e:
    raise SystemExit(1)

r = http.getresponse()
if r.status == 200:
    return json.loads(r.read())
else:
    print('Upload failed (%s): %s' % (r.status, r.reason))

class MultiPartForm(object):
    """Accumulate the data to be used when posting a form."""

    def __init__(self):
        self.form_fields = []
        self.files = []
        self.boundary = mimetools.choose_boundary()
        return

    def get_content_type(self):
        return 'multipart/form-data; boundary=%s' % self.boundary

    def add_field(self, name, value):
        """Add a simple field to the form data."""
        self.form_fields.append((name, value))
        return

    def add_file(self, fieldname, filename, fileHandle, mimetype=None):
        """Add a file to be uploaded."""
        body = fileHandle.read()
        if mimetype is None:
            mimetype = mimetypes.guess_type(filename)[0] or 'application/octet-stream'
        self.files.append((fieldname, filename, mimetype, body))
        return

    def get_binary(self):
        """Return a binary buffer containing the form data, including attached files."""
        part_boundary = '--' + self.boundary

        binary = io.BytesIO()
        needsCLRF = False
        # Add the form fields
        for name, value in self.form_fields:
            if needsCLRF:
                binary.write('\r\n')
            needsCLRF = True

            block = [part_boundary,
              'Content-Disposition: form-data; name="%s"' % name,
              '',
              value
            ]
            binary.write('\r\n'.join(block))

        # Add the files to upload
        for field_name, filename, content_type, body in self.files:
            if needsCLRF:
                binary.write('\r\n')
            needsCLRF = True

            block = [part_boundary,
              str('Content-Disposition: file; name="%s"; filename="%s"' % \
              (field_name, filename)),
              'Content-Type: %s' % content_type,
              ''
              ]
            binary.write('\r\n'.join(block))
            binary.write('\r\n')
            binary.write(body)


        # add closing boundary marker,
        binary.write('\r\n--' + self.boundary + '--\r\n')
        return binary

¡Qué coinciden, hace 2 años, 6 meses creo el proyecto

https://pypi.python.org/pypi/MultipartPostHandler2 , que fijan para MultipartPostHandler utf-8 sistemas. También he hecho algunas mejoras menores, que son bienvenidos a probarlo :)

Para responder a la pregunta de la OP de por qué el código original no funcionaba, el manejador pasó no era una instancia de una clase. La línea

# MIME encode the POST payload
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler)

debería leer

opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler())

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow