문제

I´m scraping a web with lxml html, but I´m getting a problem. When I make a selection of HTML for example:

 html.cssselect('a.asig')

I must get the elements with class="asig" but the selection also prints the elements that contains "asig" in his id for example:

<a class="asig drcha" ...>

What could I do for get only the elements with "asig" and not the elements that contains asig? Thanks!

도움이 되었습니까?

해결책

Use either html.xpath and adjust accordingly, or be very implicit when declaring the class to locate. See the following code.

from lxml import html

sample = '<?xml version="1.0" encoding="UTF-8"?><root><a class="asig">I am the correct one.</a><a class="asig drcha">I am the wrong one.</a></root>'
tree = html.fromstring(sample)
print tree.xpath("//a[@class='asig']/text()")[0]
print tree.cssselect("a[class='asig']")[0].text

Result is as follows:

I am the correct one.
I am the correct one.
[Finished in 0.2s]

Notice how cssselect was used in the last line. Hope this helps.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top