Question

I have been trying to get a simple spider to run with scrapy, but keep getting the error:

Could not find spider for domain:stackexchange.com

when I run the code with the expression scrapy-ctl.py crawl stackexchange.com. The spider is as follow:

from scrapy.spider import BaseSpider
from __future__ import absolute_import


class StackExchangeSpider(BaseSpider):
    domain_name = "stackexchange.com"
    start_urls = [
        "http://www.stackexchange.com/",
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

SPIDER = StackExchangeSpider()`

Another person posted almost the exact same problem months ago but did not say how they fixed it, Scrapy spider is not working I have been following the turtorial exactly at http://doc.scrapy.org/intro/tutorial.html, and cannot figure out why it is not working.

When I run this code in eclipse I get the error

Traceback (most recent call last): File "D:\Python Documents\dmoz\stackexchange\stackexchange\spiders\stackexchange_spider.py", line 1, in <module> from scrapy.spider import BaseSpider ImportError: No module named scrapy.spider

I cannot figure out why it is not finding the base Spider module. Does my spider have to be saved in the scripts directory?

Was it helpful?

Solution

try running python yourproject/spiders/domain.py to see if there are any syntax error. I don't think you should enable absolute import as scrapy relies on relatives imports.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top