I am trying to crawl a website using scrapy and storing the scraped data into variables of item class

https://stackoverflow.com/questions/21551068

06-10-2022
|

Question

I have a spider file dmoz_spider.py and it contets are:

    from scrapy.spider import Spider
    from scrapy.selector import Selector
    from dmoz.items import DmozItem


    class DmozSpider(Spider):
       name = "dmoz"
       allowed_domains = ["m.timesofindia.com"]
       start_urls = ["http://m.timesofindia.com/india/Congress-BJP-spar-over-Gujarat-govts-Rs-11-per-day-poverty-line/articleshow/29830237.cms"]

       def parse(self, response):
            sel = Selector(response)
                torrent = DmozItem()
                filename = response.url.split("/")[-2]+"1.txt"
            torrent['link']  = response.url
            torrent['title']  = sel.xpath("//h1/text()").extract() 
                open(filename, 'wb').write(torrent['link'])

2nd file is items.py

   from scrapy.item import Item, Field

     class DmozItem(Item):
        title = Field()
        link = Field()
        desc = Field()

I am getting following error on command line when i run my crawler...

ImportError: No module named dmoz.items

as to when i removed the import statement from my spider file it gave me error saying

exceptions.NameError: global name 'DmozItem' is not defined

Solution

found the problem to my question and posting it so that if any one ends up on the similar problem he can get the answer.

in my code where I am doing this

 from dmoz.items import DmozItem

it should actually be

 from tutorial.items import DmozItem or

 from tutorial.items import *

since my project directory or package name is tutorial That was the mistake I was doing earlier.

OTHER TIPS

I was writing

item[title] = sel.xpath('a/text()').extract()

instead of

item['title'] = sel.xpath('a/text()').extract()

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow