Question

I want to navigate the pages in this web Page, I wrote code below,

pageNav.py:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request

class pageNaviSpider(Spider):
    name = 'navi'
    start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']

    def parse(self, response):
        print 'response from: ', response.url
        self.parseLink(response)

    def parseLink(self, response):
        print 'response from: ', response.url
        sel = Selector(response)

        for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
            yield Request(url, callback=self.parseLink) 

The python code above not worked. But, I wrote another spider code below, while it worked well. I do not know why. Have any Suggestions?

pageNav2.py:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request

class pageNaviSpider(Spider):
    name = 'navi2'
    start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']

def parse(self, response):
    print 'response from: ', response.url
    sel = Selector(response)

    for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
        yield Request(url, callback=self.parseLink) 
Was it helpful?

Solution

You should change:

def parse(self, response):
    print 'response from: ', response.url
    self.parseLink(response)

to this:

def parse(self, response):
    print 'response from: ', response.url
    for item in self.parseLink(response):
        yield item

Function will return None if there's no return/yield statement.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top