Pergunta

I use Scrapy framework to crawl data. My crawler will be interrupted if it encounters a 500 error. So I need to check an available link before I parse a web content.
Is there any approach to resolve my problem?
Thank you so much.

Foi útil?

Solução

If the url exists you could use the getcode() method of urllib to check it:

import urllib
import sys

webFile = urllib.urlopen('http://www.some.url/some/file')
returnCode = webFile.getCode()

if returnCode == 500:
  sys.exit()

# in other case do something.
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top