Question

I want to know how many and which of my requests didn't return any data with Scrapy. There are many cases, like when a 404 response is returned, or when the server returned something but didn't scraped any data because the format was not the one expected (e.g. when I'm extracting from a type of particular named div, and it actually doesn't exists in one of the url's). Thanks !

Was it helpful?

Solution

e.g. when I'm extracting from a type of particular named div, and it actually doesn't exists in one of the url's)

This is not actually an empty response in HTTP terms. Your selector just didn't match.

You have to implement this logic by yourself:

sel = Selector(response)
data = sel.xpath('//div[@class="class"]').extract()
if not data:
    # increment failure stats here
    return
# else fill the item

For stats you could use a Scrapy stats collector.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top