I am trying to crawl a website using scrapy and storing the scraped data into variables of item class

StackOverflow https://stackoverflow.com/questions/21551068

Question

I have a spider file dmoz_spider.py and it contets are:

    from scrapy.spider import Spider
    from scrapy.selector import Selector
    from dmoz.items import DmozItem


    class DmozSpider(Spider):
       name = "dmoz"
       allowed_domains = ["m.timesofindia.com"]
       start_urls = ["http://m.timesofindia.com/india/Congress-BJP-spar-over-Gujarat-govts-Rs-11-per-day-poverty-line/articleshow/29830237.cms"]

       def parse(self, response):
            sel = Selector(response)
                torrent = DmozItem()
                filename = response.url.split("/")[-2]+"1.txt"
            torrent['link']  = response.url
            torrent['title']  = sel.xpath("//h1/text()").extract() 
                open(filename, 'wb').write(torrent['link'])

2nd file is items.py

   from scrapy.item import Item, Field

     class DmozItem(Item):
        title = Field()
        link = Field()
        desc = Field()

I am getting following error on command line when i run my crawler...

ImportError: No module named dmoz.items

as to when i removed the import statement from my spider file it gave me error saying

exceptions.NameError: global name 'DmozItem' is not defined

Était-ce utile?

La solution

found the problem to my question and posting it so that if any one ends up on the similar problem he can get the answer.

in my code where I am doing this

 from dmoz.items import DmozItem

it should actually be

 from tutorial.items import DmozItem or

 from tutorial.items import *

since my project directory or package name is tutorial That was the mistake I was doing earlier.

Autres conseils

I was writing

item[title] = sel.xpath('a/text()').extract()

instead of

item['title'] = sel.xpath('a/text()').extract()
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top