Frage

I use Scrapy framework to crawl data. My crawler will be interrupted if it encounters a 500 error. So I need to check an available link before I parse a web content.
Is there any approach to resolve my problem?
Thank you so much.

War es hilfreich?

Lösung

If the url exists you could use the getcode() method of urllib to check it:

import urllib
import sys

webFile = urllib.urlopen('http://www.some.url/some/file')
returnCode = webFile.getCode()

if returnCode == 500:
  sys.exit()

# in other case do something.
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top