Question

I'm trying to use scrapy CSVFeedSpider for a csv link This is a line example :

number,"may contain commas","may contains commas","may contain commas",text,text,text,text,text,"may contain commas"

If a value contains commas it's surrounded by quotes, how could I implement this since it only accepts one delimiter ?

http://doc.scrapy.org/en/latest/topics/spiders.html#csvfeedspider

Was it helpful?

Solution

If the columns are surrounded by double quotes, it works fine with commas inside. It will complain the length not matched if it is surrounded by single quotes

Here is spider code:

# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from stackoverflow23429315.items import DemoItem
from scrapy.contrib.spiders import CSVFeedSpider
from scrapy import log


class DmozSpider(CSVFeedSpider):
    name = 'csvFeedTest'        
    start_urls = ['file:////home/vagrant/labs/stackoverflow23429315/test.csv']
    delimiter = ','
    headers = ['id', 'name', 'address1', 'address2', 'email']

    def parse_row(self, response, row):
        log.msg('Hi, this is a row!: %r' % row)

        item = DemoItem()
        item['id'] = row['id']
        item['name'] = row['name']
        item['address1'] = row['address1']
        item['address2'] = row['address2']
        item['email'] = row['email']
        return item

Item Class:

from scrapy.item import Item, Field

class DemoItem(Item):
    id = Field()
    name = Field()
    address1 = Field()
    address2 = Field()
    email = Field()

Testing csv file:

1,"John, Doe","1234 Main Street, APT A","2nd Floor",John.Doe@test.com
2,"John2, Doe","1234 Main Street, APT A","2nd Floor",John.Doe@test.com
3,'John3, Doe','1234 Main Street, APT A','2nd Floor',John.Doe@test.com
4,'John4, Doe','1234 Main Street, APT A','2nd Floor',John.Doe@test.com
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top