Pergunta

I have spider that I have written using the Scrapy framework. I am having some trouble getting any pipelines to work. I have the following code in my pipelines.py:

class FilePipeline(object):

    def __init__(self):
        self.file = open('items.txt', 'wb')

    def process_item(self, item, spider):
        line = item['title'] + '\n'
        self.file.write(line)
        return item

and my CrawlSpider subclass has this line to activate the pipeline for this class.

ITEM_PIPELINES = [
        'event.pipelines.FilePipeline'
    ]

However when I run it using

scrapy crawl my_spider

I get a line that says

2010-11-03 20:24:06+0000 [scrapy] DEBUG: Enabled item pipelines:

with no pipelines (I presume this is where the logging should output them).

I have tried looking through the documentation but there doesn't seem to be any full examples of a whole project to see if I have missed anything.

Any suggestions on what to try next? or where to look for further documentation?

Foi útil?

Solução

Got it! The line needs to go in the settings module for the project. Now it works!

Outras dicas

I'm willing to bet that it's a capitalisation difference in the word pipeline somewhere:

Pipeline vs. PipeLine

I notice 'event.pipelines.FilePipeline' uses the former, whereas your code uses the latter: which do your filenames use?

(I have fallen victim to this spelling mistake many times!)

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top