سؤال

I'm faced with a page which consists of multiple H2 tags and I require all those titles to be stored on separate rows in my csv sheet. I'm using scrapy for this and my current code is :

item ["title"] = titles.select("//h2/text()").extract()

Obviously, this ends up storing all the h2 tags of that page into one single cell in my csv.

Is there any way by which I can have a break after it scrapes each h2 tag?

Thanks

هل كانت مفيدة؟

المحلول

You can loop on each h2 and create an Item per h2, setting the "title" for each:

    items = []
    for title in titles.select("h2"):

        item = MyItem()

        # note the relative XPath expression (starting with "./")
        item["title"] = title.select("./text()").extract()

        items.append(item)

    return items
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top