Question

I've looked at the other beautifulsoup get same level type questions. Seems like my is slightly different.

Here is the website http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31

I'm trying to get that table on the right. Notice how the first row of the table expands into a detailed break down of that data. I don't want that data. I only want the very top level data. You can also see that the other rows also can be expanded, but not in this case. So just looping and skipping tr[2] might not work. I've tried this:

r = requests.get(page)
r.encoding = 'gb2312'
soup = BeautifulSoup(r.text,'html.parser')
table=soup.find('div', class_='right1').findAll('tr', {"class" : re.compile('list.*')})

but there is still more nested list* at other levels. How to get only the first level?

Était-ce utile?

La solution

Limit your search to direct children of the table element only by setting the recursive argument to False:

table = soup.find('div', class_='right1').table
rows = table.find_all('tr', {"class" : re.compile('list.*')}, recursive=False)

Autres conseils

@MartijnPieters' solution is already perfect, but don't forget that BeautifulSoup allows you to use multiple attributes as well when locating elements. See the following code:

from bs4 import BeautifulSoup as bsoup
import requests as rq
import re

url = "http://engine.data.cnzz.com/main.php?s=engine&uv=&st=2014-03-01&et=2014-03-31"
r = rq.get(url)
r.encoding = "gb2312"

soup = bsoup(r.content, "html.parser")
div = soup.find("div", class_="right1")
rows = div.find_all("tr", {"class":re.compile(r"list\d+"), "style":"cursor:pointer;"})

for row in rows:
    first_td = row.find_all("td")[0]
    print first_td.get_text().encode("utf-8")

Notice how I also added "style":"cursor:pointer;". This is unique to the top-level rows and is not an attribute of the inner rows. This gives the same result as the accepted answer:

百度汇总
360搜索
新搜狗
谷歌
微软必应
雅虎
0
有道
其他
[Finished in 2.6s]

Hopefully this also helps.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top