In Python BeautifulSoup4, How to extract the special text like this

https://stackoverflow.com/questions/23505303

16-07-2023
|

Вопрос

I am trying to extract some string. from this text:

    text = "<li>(<a rel="nofollow" class="external text" href="http://www.icd9data.com/getICD9Code.ashx?
    icd9=999.1">999.1</a>) <a href="/wiki/Air_embolism" title="Air embolism">Air embolism</a> as
    a complication of medical care not elsewhere classified</li>"

My target is the "as a complication of medical care not elsewhere classified" But the syntax doesn't work:

    soup = bs4.Beautifulsoup(text)
    for tag in soup.find_all('li'):
        print tag.string

Any body know any method can call the string I want? Thanks.

Решение

for tag in soup.find_all('li'):
    print(tag.get_text())

prints

(999.1) Air embolism as
a complication of medical care not elsewhere classified

The get_text method returns all the text in a tag, even that text which is part of subtags.

Using lxml, you could use

import lxml.html as LH
text = """<li>(<a rel="nofollow" class="external text" href="http://www.icd9data.com/getICD9Code.ashx?
icd9=999.1">999.1</a>) <a href="/wiki/Air_embolism" title="Air embolism">Air embolism</a> as
a complication of medical care not elsewhere classified</li>"""

doc = LH.fromstring(text)
for tag in doc.xpath('//li/a[2]'):
    print(tag.tail)

to obtain

 as
a complication of medical care not elsewhere classified

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow