I´m scraping a web with lxml html, but I´m getting a problem. When I make a selection of HTML for example:

 html.cssselect('a.asig')

I must get the elements with class="asig" but the selection also prints the elements that contains "asig" in his id for example:

<a class="asig drcha" ...>

What could I do for get only the elements with "asig" and not the elements that contains asig? Thanks!

有帮助吗?

解决方案

Use either html.xpath and adjust accordingly, or be very implicit when declaring the class to locate. See the following code.

from lxml import html

sample = '<?xml version="1.0" encoding="UTF-8"?><root><a class="asig">I am the correct one.</a><a class="asig drcha">I am the wrong one.</a></root>'
tree = html.fromstring(sample)
print tree.xpath("//a[@class='asig']/text()")[0]
print tree.cssselect("a[class='asig']")[0].text

Result is as follows:

I am the correct one.
I am the correct one.
[Finished in 0.2s]

Notice how cssselect was used in the last line. Hope this helps.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top