Come leggere l'intestazione con pycurl

https://stackoverflow.com/questions/472179

19-08-2019
|

Domanda

Come posso leggere le intestazioni di risposta restituite da una richiesta PyCurl?

Soluzione

Esistono diverse soluzioni (per impostazione predefinita, vengono eliminate). Ecco un esempio usando l'opzione HEADERFUNCTION che ti permette di indicare a funzione per gestirli.

Altre soluzioni sono le opzioni WRITEHEADER (non compatibile con WRITEFUNCTION) o impostando HEADER su True in modo che vengano trasmessi con il corpo.

#!/usr/bin/python

import pycurl
import sys

class Storage:
    def __init__(self):
        self.contents = ''
        self.line = 0

    def store(self, buf):
        self.line = self.line + 1
        self.contents = "%s%i: %s" % (self.contents, self.line, buf)

    def __str__(self):
        return self.contents

retrieved_body = Storage()
retrieved_headers = Storage()
c = pycurl.Curl()
c.setopt(c.URL, 'http://www.demaziere.fr/eve/')
c.setopt(c.WRITEFUNCTION, retrieved_body.store)
c.setopt(c.HEADERFUNCTION, retrieved_headers.store)
c.perform()
c.close()
print retrieved_headers
print retrieved_body

Altri suggerimenti

import pycurl
from StringIO import StringIO

headers = StringIO()

c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.HEADER, 1)
c.setopt(c.NOBODY, 1) # header only, no body
c.setopt(c.HEADERFUNCTION, headers.write)

c.perform()

print headers.getvalue()

Aggiungi eventuali altri setopts di arricciatura come necessario / desiderato, come FOLLOWLOCATION.

Anothr alternated, human_curl utilizzo: pip human_curl

In [1]: import human_curl as hurl

In [2]: r = hurl.get("http://stackoverflow.com")

In [3]: r.headers
Out[3]: 
{'cache-control': 'public, max-age=45',
 'content-length': '198515',
 'content-type': 'text/html; charset=utf-8',
 'date': 'Thu, 01 Sep 2011 11:53:43 GMT',
 'expires': 'Thu, 01 Sep 2011 11:54:28 GMT',
 'last-modified': 'Thu, 01 Sep 2011 11:53:28 GMT',
 'vary': '*'}

Questa potrebbe o non potrebbe essere un'alternativa per te:

import urllib
headers = urllib.urlopen('http://www.pythonchallenge.com').headers.headers

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow