Question

I am using scrapy to crawl different sites, for each site I have an Item (different information is extracted)

Well, for example I have a generic pipeline (most of information is the same) but now I am crawling some google search response and the pipeline must be different.

For example:

GenericItem uses GenericPipeline

But the GoogleItem uses GoogleItemPipeline, but when the spider is crawling it tries to use GenericPipeline instead of GoogleItemPipeline....how can I specify which pipeline Google spider must use?

Was it helpful?

Solution

Now only one way - check Item type in pipeline and process it or return "as is"

pipelines.py:

from grabbers.items import FeedItem

class StoreFeedPost(object):

    def process_item(self, domain, item):
        if isinstance(item, FeedItem):
            #process it...

        return item

items.py:

from scrapy.item import ScrapedItem

class FeedItem(ScrapedItem):
    pass
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top