Question

With Scrapy, the following extract works ok just for normal text. It excludes all the BOLD tags that I am trying to retrive:

hxs.select('//td[@class="Info_Cell"]/text()').extract()

The following just extracts the BOLD text, but excludes normal text.

hxs.select('//td[@class="Info_Cell"]/b/text()').extract()

How would you extract Text, normal and with Bold tags.

Was it helpful?

Solution

In general // will extract recursively all child nodes, so in your case you need:

hxs.select('//td[@class="Info_Cell"]//text()').extract()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top