이 기능은 urllib2와 BeautifulSoup과 관련된 파이썬에서 무엇을 하는가?

https://stackoverflow.com/questions/991967

13-09-2019
|

문제

그래서 나는 HTML 페이지의 높은 점수를 검색하는 것에 대해 일찍 질문했고 다른 사용자는 다음과 같은 코드를 제공했습니다. 나는 Python과 BeautifulSoup을 처음 접했기 때문에 다른 코드를 조각별로 살펴 보려고합니다. 나는 그것의 대부분을 이해하지만이 코드가 무엇인지, 그 기능이 무엇인지 알지 못합니다.

    def parse_string(el):
       text = ''.join(el.findAll(text=True))
       return text.strip()

전체 코드는 다음과 같습니다.

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import sys

URL = "http://hiscore.runescape.com/hiscorepersonal.ws?user1=" + sys.argv[1]

# Grab page html, create BeatifulSoup object
html = urlopen(URL).read()
soup = BeautifulSoup(html)

# Grab the <table id="mini_player"> element
scores = soup.find('table', {'id':'mini_player'})

# Get a list of all the <tr>s in the table, skip the header row
rows = scores.findAll('tr')[1:]

# Helper function to return concatenation of all character data in an element
def parse_string(el):
   text = ''.join(el.findAll(text=True))
   return text.strip()

for row in rows:

   # Get all the text from the <td>s
   data = map(parse_string, row.findAll('td'))

   # Skip the first td, which is an image
   data = data[1:]

   # Do something with the data...
   print data

해결책

el.findAll(text=True) 요소에 포함 된 모든 텍스트와 하위 요소를 반환합니다. 텍스트로 나는 태그 안에 있지 않은 것을 의미합니다. 그래서 <b>hello</b> 그러면 "안녕하세요"는 텍스트가 될 것입니다 <b> 그리고 </b> 그렇지 않을 것입니다.

따라서이 기능은 주어진 요소 아래에서 발견 된 모든 텍스트를 결합하고 앞뒤에서 공백을 벗겨냅니다.

여기에 대한 링크가 있습니다 findAll 선적 서류 비치: http://www.crummy.com/software/beautifulsoup/documentation.html#arg-text

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow