Frage

I've looked at the other beautifulsoup get same level type questions. Seems like my is slightly different.

Here is the website http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31

I'm trying to get that table on the right. Notice how the first row of the table expands into a detailed break down of that data. I don't want that data. I only want the very top level data. You can also see that the other rows also can be expanded, but not in this case. So just looping and skipping tr[2] might not work. I've tried this:

r = requests.get(page)
r.encoding = 'gb2312'
soup = BeautifulSoup(r.text,'html.parser')
table=soup.find('div', class_='right1').findAll('tr', {"class" : re.compile('list.*')})

but there is still more nested list* at other levels. How to get only the first level?

War es hilfreich?

Lösung

Limit your search to direct children of the table element only by setting the recursive argument to False:

table = soup.find('div', class_='right1').table
rows = table.find_all('tr', {"class" : re.compile('list.*')}, recursive=False)

Andere Tipps

@MartijnPieters' solution is already perfect, but don't forget that BeautifulSoup allows you to use multiple attributes as well when locating elements. See the following code:

from bs4 import BeautifulSoup as bsoup
import requests as rq
import re

url = "http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31"
r = rq.get(url)
r.encoding = "gb2312"

soup = bsoup(r.content, "html.parser")
div = soup.find("div", class_="right1")
rows = div.find_all("tr", {"class":re.compile(r"list\d+"), "style":"cursor:pointer;"})

for row in rows:
    first_td = row.find_all("td")[0]
    print first_td.get_text().encode("utf-8")

Notice how I also added "style":"cursor:pointer;". This is unique to the top-level rows and is not an attribute of the inner rows. This gives the same result as the accepted answer:

百度汇总
360搜索
新搜狗
谷歌
微软必应
雅虎
0
有道
其他
[Finished in 2.6s]

Hopefully this also helps.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top