Python urllib2를 사용하여 파일을 다운로드하고 있습니다.파일 크기가 얼마나 큰지 어떻게 확인하나요?

https://stackoverflow.com/questions/1636637

06-07-2019
|

문제

그리고 용량이 크면...다운로드를 중단할까요?12MB보다 큰 파일을 다운로드하고 싶지 않습니다.

request = urllib2.Request(ep_url)
request.add_header('User-Agent',random.choice(agents))
thefile = urllib2.urlopen(request).read()

해결책

필요하지 않습니다 Bobince httplib로 떨어 뜨렸다. urllib로 모든 것을 직접 수행 할 수 있습니다.

>>> import urllib2
>>> f = urllib2.urlopen("http://dalkescientific.com")
>>> f.headers.items()
[('content-length', '7535'), ('accept-ranges', 'bytes'), ('server', 'Apache/2.2.14'),
 ('last-modified', 'Sun, 09 Mar 2008 00:27:43 GMT'), ('connection', 'close'),
 ('etag', '"19fa87-1d6f-447f627da7dc0"'), ('date', 'Wed, 28 Oct 2009 19:59:10 GMT'),
 ('content-type', 'text/html')]
>>> f.headers["Content-Length"]
'7535'
>>>

httplib을 사용하는 경우 리디렉션 처리, 프록시 지원 및 urllib2가하는 다른 좋은 일을 구현해야 할 수도 있습니다.

다른 팁

넌 말할 수있다:

maxlength= 12*1024*1024
thefile= urllib2.urlopen(request).read(maxlength+1)
if len(thefile)==maxlength+1:
    raise ThrowToysOutOfPramException()

하지만 물론 여전히 12MB의 원치 않는 데이터를 읽은 것입니다.이런 일이 발생할 위험을 최소화하려면 HTTP Content-Length 헤더가 있으면 확인할 수 있습니다(없을 수도 있음).하지만 그렇게 하려면 다음으로 드롭다운해야 합니다. httplib 보다 일반적인 urllib 대신.

u= urlparse.urlparse(ep_url)
cn= httplib.HTTPConnection(u.netloc)
cn.request('GET', u.path, headers= {'User-Agent': ua})
r= cn.getresponse()

try:
    l= int(r.getheader('Content-Length', '0'))
except ValueError:
    l= 0
if l>maxlength:
    raise IAmCrossException()

thefile= r.read(maxlength+1)
if len(thefile)==maxlength+1:
    raise IAmStillCrossException()

원하는 경우 파일 가져오기를 요청하기 전에 길이를 확인할 수도 있습니다.메소드를 사용하는 것을 제외하면 기본적으로 위와 동일합니다. 'HEAD' 대신에 'GET'.

먼저 헤드 요청에서 컨텐츠 길이를 확인할 수 있지만 경고를 받으십시오.이 헤더는 설정할 필요가 없습니다. Python 2에서 헤드 HTTP 요청을 어떻게 보내나요?

컨텐츠 길이 헤더가 설정된 경우 작동합니다

import urllib2          
req = urllib2.urlopen("http://example.com/file.zip")
total_size = int(req.info().getheader('Content-Length'))

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow