문제

I am trying to scrape Craigslist classifieds using Scrapy to extract items that are for sale.

I am able to extract date, post title, and post url but am having trouble extracting price.

For some reason the current code extracts all of the prices, but when I remove the // before the price span look up the price field returns as empty.

Can someone please review the code below and help me out?

from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector
    from craigslist_sample.items import CraigslistSampleItem

    class MySpider(BaseSpider):
        name = "craig"
        allowed_domains = ["craigslist.org"]
        start_urls = ["http://longisland.craigslist.org/search/sss?sort=date&query=raptor%20660&srchType=T"]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select("//p")
    items = []
    for titles in titles:
        item = CraigslistSampleItem()
        item['date'] = titles.select('span[@class="itemdate"]/text()').extract()
        item ["title"] = titles.select("a/text()").extract()
        item ["link"] = titles.select("a/@href").extract()
        item ['price'] = titles.select('//span[@class="itempp"]/text()').extract()
        items.append(item)
    return items
도움이 되었습니까?

해결책

itempp appears to be inside of another element, itempnr. Perhaps it would work if you were to change //span[@class="itempp"]/text() to span[@class="itempnr"]/span[@class="itempp"]/text().

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top