문제

Crawler was working normally and doing fine job, but suddenly it stopped working properly. It follows pages but doesn't pull items from here.

And here is the crawler:

from scrapy.item import Item, Field
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector

class MobiItem(Item):
    brand = Field()
    title = Field()
    price = Field()

class MobiSpider(CrawlSpider):
    name = "mobi2"
    allowed_domains = ["mobi.ge"]
    start_urls = [
        "http://mobi.ge/?page=products&category=60"
    ]

    rules = (Rule (SgmlLinkExtractor(allow=("\?page=products&category=60&m_page=\d*", ))
            , callback="parse_items", follow=True),)

    def parse_items(self, response):
        sel = Selector(response)
        blocks = sel.xpath('//table[@class="m_product_previews"]/tr/td/a')
        for block in blocks:
            item = MobiItem()
            try:
                item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
                item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
                item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
                yield item
            except:
                continue

Examining the xpath didn't give any suspicious results. Any help would be appreciated.

도움이 되었습니까?

해결책

Analyze the logs and instead of this:

        try:
            item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
            item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
            item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
            yield item
        except:
            continue

do

        try:
            item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip()
            item["model"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip()
            item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip()
            yield item
        except Exception as exc:
            self.log('item filling exception: %s' % exc)
            continue

I think you might be getting IndexError exceptions.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top