Just set start_urls
on your Spider
instance:
spider = FollowAllSpider(domain=domain)
spider.start_urls = ['http://google.com']
質問
I have a working scrapy spider and I'm able to run it through a separate script following the example here. I have also created a wxPython GUI for my script that simply contains a multi-line TextCtrl for users to input a list of URLs to scrape and a button to submit. Currently the start_urls are hardcoded into my spider - How can I pass the URLs entered in my TextCtrl to the start_urls array in my spider? Thanks in advance for the help!
解決 2
Just set start_urls
on your Spider
instance:
spider = FollowAllSpider(domain=domain)
spider.start_urls = ['http://google.com']
他のヒント
alecxe answer doesn't work for me. My solution works for Scrapy==1.0.3 :
from scrapy.crawler import CrawlerProcess
from tutorial.spiders.some_spider import SomeSpider
process = CrawlerProcess()
process.crawl(SomeSpider, start_urls=["http://www.example.com"])
process.start()
It might help someone in the future.