BeautifulSoup에서 두 개의 'findall'검색 블록을 하나로 결합 할 수 있습니까?

https://stackoverflow.com/questions/1825187

22-07-2019
|

문제

이 두 블록을 하나로 결합 할 수 있습니까?

편집 : Yacoby와 같은 루프를 결합하는 것 외에 다른 방법은 답을 얻었습니다.

for tag in soup.findAll(['script', 'form']):
    tag.extract()

for tag in soup.findAll(id="footer"):
    tag.extract()

또한 여러 블록을 하나로 사용할 수 있습니다.

for tag in soup.findAll(id="footer"):
    tag.extract()

for tag in soup.findAll(id="content"):
    tag.extract()

for tag in soup.findAll(id="links"):
    tag.extract()

또는 배열에 있는지 또는 다른 더 간단한 방법을 확인할 수있는 람다 표현이있을 수 있습니다.

또한 클래스가 예약 된 키워드이므로 속성 클래스가있는 태그를 어떻게 찾습니까?

편집 :이 부분은 수프에 의해 해결됩니다 .findall (attrs = { 'class': 'noprint'}) :

for tag in soup.findAll(class="noprint"):
    tag.extract()

해결책

기능을 전달할 수 있습니다 .findall() 이와 같이:

soup.findAll(lambda tag: tag.name in ['script', 'form'] or tag['id'] == "footer")

그러나 먼저 태그 목록을 구축 한 다음 반복하여 더 나을 수 있습니다.

tags = soup.findAll(['script', 'form'])
tags.extend(soup.findAll(id="footer"))

for tag in tags:
    tag.extract()

몇 가지를 필터링하려면 ids, 당신은 사용할 수 있습니다 :

for tag in soup.findAll(lambda tag: tag.has_key('id') and
                                    tag['id'] in ['footer', 'content', 'links']):
    tag.extract()

보다 구체적인 접근법은 람다를 id 매개 변수 :

for tag in soup.findAll(id=lambda value: value in ['footer', 'content', 'links']):
    tag.extract()

다른 팁

BeautifulSoup이 더 우아하게 할 수 있는지 모르겠지만 두 루프를 그렇게 병합 할 수 있습니다.

for tag in soup.findAll(['script', 'form']) + soup.findAll(id="footer"):
    tag.extract()

SO와 같은 수업을 찾을 수 있습니다 (선적 서류 비치):

for tag in soup.findAll(attrs={'class': 'noprint'}):
    tag.extract()

질문의 두 번째 부분에 대한 답은 바로 거기 에서 선적 서류 비치:

CSS 클래스에 의해 검색

attrs 논쟁은 한 가지가 아닌 모호한 특징이 될 것입니다 : CSS. 특정 CSS 클래스가있는 태그를 검색하는 것이 매우 유용하지만 CSS 속성의 이름 인 클래스도 Python 예약 단어입니다.

CSS 클래스에서 Soup.Find ( "tagname", { "class": "cssclass"})로 검색 할 수 있지만 이는 공통 작업을위한 많은 코드입니다. 대신 사전 대신 attrs의 줄을 전달할 수 있습니다. 문자열은 CSS 클래스를 제한하는 데 사용됩니다.
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""Bob's Bold Barbeque Sauce now available in 
 Hickory and Lime</a>""")

soup.find("b", { "class" : "lime" })
# Lime

soup.find("b", "hickory")
# Hickory

links = soup.find_all('a',class_='external') ,we can pass class_ to filter based on class values

from bs4 import BeautifulSoup
from urllib.request import urlopen

with urlopen('http://www.espncricinfo.com/') as f:
    raw_data= f.read()
    soup= BeautifulSoup(raw_data,'lxml')
    # print(soup)
    links = soup.find_all('a',class_='external')
    for link in links:
        print(link)

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow