Question

I have a three different spiders in different scrapy projects called REsale, REbuy and RErent, each with their own pipeline that directs their output to various MySQL tables on my server. They all run OK when called using scrapy crawl. Ultimately, I want a script that can run as a service on my windows 7 machine that runs the spiders at different intervals. ATM, I am stuck at the scrapy API. I cant even get it to run one of the spiders! Is there somewhere special this needs to be saved? At the moment it's just in my root python directory. Sale, Buy and Rent are the names of the spiders I would call using scrapy crawl and sale_spider etc. is the spider's .py file.

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log
from REsale.spiders.sale_spider import Sale
from REbuy.spiders.buy_spider import Buy
from RErent.spiders.sale_spider import Rent

spider = Buy()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

spider = Rent()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

spider = Sale()
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

This is returning the error:

c:\Python27>File "real_project.py", line 5, in <module>
from REsale.spiders.sale_spider import Sale
ImportError: No module named REsale.spiders.sale_spider

I am new so any help is greatly appreciated.

Was it helpful?

Solution

I suggest you'll look at http://scrapyd.readthedocs.org/en/latest/, a ready made scrapy daemon for deploying and scheduling scrapy spiders

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top