Pergunta

I am new to python and scrapy and hence am getting some basic doubts(please spare my ignorance about some fundamentals,which i m willing to learn :D).

Right now I am writing some spiders and implementing them using scrapy-ctl.py from the command line by typing:

C:\Python26\dmoz>python scrapy-ctl.py crawl spider

But I do not want two separate python codes and a command line to implement this.I want to somehow define a spider and make it crawl urls by writing and running a single python code.I could notice that in the file scrapy-ctl.py, 'execute' of type function is imported,but i am clueless as to how this function can be defined in the code containing spider.Could someone explain me how to do this, if it is possible because it greatly reduces the work.

Thanks in advance!!

Foi útil?

Solução

But I do not want two separate python codes and a command line to implement this. I want to somehow define a spider and make it crawl urls by writing and running a single python code.

I'm not sure the effort pays out, if you just want to scrape something. You have at least two options:

  • Dig into scrapy/cmdline.py. You'll see that this is a kind of dispatch script, finally handing over the work to the run method for the specified command, here crawl (in scrapy/commands/crawl.py). Look at line 54, I think scrapymanager.start() will begin your actual command, after some setup.

  • A little hacky method: use pythons subprocess module to have one your project and execution in one file (or project).

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top