在使用Python的urllib用的wget -c功能下载文件

https://stackoverflow.com/questions/2021519

19-09-2019
|

题

我编程Python中的软件从数据库下载HTTP PDF。有时下载停止与此消息：

retrieval incomplete: got only 3617232 out of 10689634 bytes

我怎么能要求下载重新启动它停止使用206 Partial Content HTTP功能？

我可以使用wget -c做到这一点，它工作得很好，但我想直接在我的Python软件实现它。

任何想法？

感谢您

解决方案

可以通过发送GET与Range头请求部分下载：

import urllib2
req = urllib2.Request('http://www.python.org/')
#
# Here we request that bytes 18000--19000 be downloaded.
# The range is inclusive, and starts at 0.
#
req.headers['Range'] = 'bytes=%s-%s' % (18000, 19000)
f = urllib2.urlopen(req)
# This shows you the *actual* bytes that have been downloaded.
range=f.headers.get('Content-Range')
print(range)
# bytes 18000-18030/18031
print(repr(f.read()))
# '  </div>\n</body>\n</html>\n\n\n\n\n\n\n'

要注意检查Content-Range学什么字节其实已经下载了，因为你的范围可能会出界，和/或并非所有的服务器似乎要尊重Range头。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow