Pregunta

Scrapy seems to be pulling the data out correctly, but is formatting the output in my JSON object as if it were an array:

[{"price": ["$34"], "link": ["/product/product..."], "name": ["productname"]},
{"price": ["$37"], "link": ["/product/product"]...

My spider class looks like this:

def parse(self, response):
    sel = Selector(response)
    items = sel.select('//div/ul[@class="product"]')
    skateboards = []
    for item in items:
        skateboard = SkateboardItem()
        skateboard['name'] = item.xpath('li[@class="desc"]//text()').extract()
        skateboard['price'] = item.xpath('li[@class="price"]"]//text()[1]').extract()
        skateboard['link'] = item.xpath('li[@class="image"]').extract()
        skateboards.append(skateboard)
    return skateboards

How would I go about ensuring that Scrapy is only outputting a single value for each key, rather than the array it's currently producing?

¿Fue útil?

Solución

.extract()  

always returns a list you can use

''.join(item.xpath('li[@class="desc"]//text()').extract())

to get a string

Otros consejos

Use:
1 .extract_first() or
2 .extract()[0]

to get data in string format.

PS: using Scrapy 1.2

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top