Вопрос

I´m scraping a web with lxml html, but I´m getting a problem. When I make a selection of HTML for example:

 html.cssselect('a.asig')

I must get the elements with class="asig" but the selection also prints the elements that contains "asig" in his id for example:

<a class="asig drcha" ...>

What could I do for get only the elements with "asig" and not the elements that contains asig? Thanks!

Это было полезно?

Решение

Use either html.xpath and adjust accordingly, or be very implicit when declaring the class to locate. See the following code.

from lxml import html

sample = '<?xml version="1.0" encoding="UTF-8"?><root><a class="asig">I am the correct one.</a><a class="asig drcha">I am the wrong one.</a></root>'
tree = html.fromstring(sample)
print tree.xpath("//a[@class='asig']/text()")[0]
print tree.cssselect("a[class='asig']")[0].text

Result is as follows:

I am the correct one.
I am the correct one.
[Finished in 0.2s]

Notice how cssselect was used in the last line. Hope this helps.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top