Question

I have the following issue when trying to get information from some website using scrapy.

I'm trying to get all the text inside <p> tag, but my problem is that in some cases inside those tags there is not just text, but sometimes also an <a> tag, and my code stops collecting the text when it reaches that tag.

This is my Xpath expression, it's working properly when there aren't tags contained inside:

description = descriptionpath.xpath("span[@itemprop='description']/p/text()").extract()
Was it helpful?

Solution

Posting Pawel Miech's comment as an answer as it appears his comment has helped many of us thus far and contains the right answer:

Tack //text() on the end of the xpath to specify that text should be recursively extracted.

So your xpath would appear like this:

span[@itemprop='description']/p//text()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top