Question

I use Scrapy framework to crawl data. My crawler will be interrupted if it encounters a 500 error. So I need to check an available link before I parse a web content.
Is there any approach to resolve my problem?
Thank you so much.

Was it helpful?

Solution

If the url exists you could use the getcode() method of urllib to check it:

import urllib
import sys

webFile = urllib.urlopen('http://www.some.url/some/file')
returnCode = webFile.getCode()

if returnCode == 500:
  sys.exit()

# in other case do something.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top