https는 urllib2로 로그인합니다

https://stackoverflow.com/questions/1045886

20-08-2019
|

문제

현재 웹 페이지를 다운로드하고 관심있는 데이터를 추출하는 작은 스크립트가 있습니다.

현재 페이지를 다운로드하고 있습니다.

import commands
command = 'wget --output-document=- --quiet --http-user=USER --http-password=PASSWORD https://www.example.ca/page.aspx'
status, text = commands.getstatusoutput(command)

이것은 완벽하게 작동하지만 WGET에 대한 의존성을 제거하는 것이 합리적이라고 생각했습니다. 위를 urllib2로 변환하는 것은 사소한 일이라고 생각했지만 지금까지는 성공하지 못했습니다. 인터넷은 완전한 urllib2 예제이지만 HTTPS 서버를 사용하여 간단한 사용자 이름과 비밀번호 HTTP 인증에 대한 필요성과 일치하는 것을 찾지 못했습니다.

해결책

그만큼 요청 모듈은 최신 API에서 HTTP/HTTPS 기능을 제공합니다.

import requests

url = 'https://www.someserver.com/toplevelurl/somepage.htm'

res = requests.get(url, auth=('USER', 'PASSWORD'))

status = res.status_code
text   = res.text

다른 팁

이것 말합니다

로컬 파이썬에 SSL 지원이있는 한 [

HTTP 기본 인증 만 사용하는 경우 설명대로 다른 핸들러를 설정해야합니다. 여기.

거기에 예제를 인용합니다.

import urllib2

theurl = 'http://www.someserver.com/toplevelurl/somepage.htm'
username = 'johnny'
password = 'XXXXXX'
# a great password

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
# this creates a password manager
passman.add_password(None, theurl, username, password)
# because we have put None at the start it will always
# use this username/password combination for  urls
# for which `theurl` is a super-url

authhandler = urllib2.HTTPBasicAuthHandler(passman)
# create the AuthHandler

opener = urllib2.build_opener(authhandler)

urllib2.install_opener(opener)
# All calls to urllib2.urlopen will now use our handler
# Make sure not to include the protocol in with the URL, or
# HTTPPasswordMgrWithDefaultRealm will be very confused.
# You must (of course) use it when fetching the page though.

pagehandle = urllib2.urlopen(theurl)
# authentication is now handled automatically for us

Digest를 사용하면 추가 헤더를 설정해야하지만 SSL 사용에 관계없이 동일합니다. Google python+urllib2+http+digest의 경우.

건배,

urllib2 문서에는 기본 인증 작업의 예가 있습니다.

http://docs.python.org/library/urllib2.html#examples

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow