Question

I´m scraping a web with lxml html, but I´m getting a problem. When I make a selection of HTML for example:

 html.cssselect('a.asig')

I must get the elements with class="asig" but the selection also prints the elements that contains "asig" in his id for example:

<a class="asig drcha" ...>

What could I do for get only the elements with "asig" and not the elements that contains asig? Thanks!

Était-ce utile?

La solution

Use either html.xpath and adjust accordingly, or be very implicit when declaring the class to locate. See the following code.

from lxml import html

sample = '<?xml version="1.0" encoding="UTF-8"?><root><a class="asig">I am the correct one.</a><a class="asig drcha">I am the wrong one.</a></root>'
tree = html.fromstring(sample)
print tree.xpath("//a[@class='asig']/text()")[0]
print tree.cssselect("a[class='asig']")[0].text

Result is as follows:

I am the correct one.
I am the correct one.
[Finished in 0.2s]

Notice how cssselect was used in the last line. Hope this helps.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top