Pergunta

I want to know how many and which of my requests didn't return any data with Scrapy. There are many cases, like when a 404 response is returned, or when the server returned something but didn't scraped any data because the format was not the one expected (e.g. when I'm extracting from a type of particular named div, and it actually doesn't exists in one of the url's). Thanks !

Foi útil?

Solução

e.g. when I'm extracting from a type of particular named div, and it actually doesn't exists in one of the url's)

This is not actually an empty response in HTTP terms. Your selector just didn't match.

You have to implement this logic by yourself:

sel = Selector(response)
data = sel.xpath('//div[@class="class"]').extract()
if not data:
    # increment failure stats here
    return
# else fill the item

For stats you could use a Scrapy stats collector.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top