How do I select an element with the exact class using cssselect in lxml?

https://stackoverflow.com/questions/23203287

07-07-2023
|

Question

I´m scraping a web with lxml html, but I´m getting a problem. When I make a selection of HTML for example:

 html.cssselect('a.asig')

I must get the elements with class="asig" but the selection also prints the elements that contains "asig" in his id for example:

<a class="asig drcha" ...>

What could I do for get only the elements with "asig" and not the elements that contains asig? Thanks!

La solution

Use either html.xpath and adjust accordingly, or be very implicit when declaring the class to locate. See the following code.

from lxml import html

sample = '<?xml version="1.0" encoding="UTF-8"?><root><a class="asig">I am the correct one.</a><a class="asig drcha">I am the wrong one.</a></root>'
tree = html.fromstring(sample)
print tree.xpath("//a[@class='asig']/text()")[0]
print tree.cssselect("a[class='asig']")[0].text

Result is as follows:

I am the correct one.
I am the correct one.
[Finished in 0.2s]

Notice how cssselect was used in the last line. Hope this helps.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow