Frage

I'm scraping a web page and I need to know how many pages there is to scrape. It is as following:

<div class="pagination">
    <a href="/travel__world-desktop-wallpapers/page/2">2</a>
    <a href="/travel__world-desktop-wallpapers/page/3">3</a>
    <a href="/travel__world-desktop-wallpapers/page/4">4</a>
    ...
    <a href="/travel__world-desktop-wallpapers/page/31">31</a>
    <a href="/travel__world-desktop-wallpapers/page/32">32</a>
    <a href="/travel__world-desktop-wallpapers/page/33">33</a>
    <a href="/travel__world-desktop-wallpapers/page/2">Next »</a>
</div>

how can I set up a list comprehension that returns me the highest number of pages( in this case, 33)?

War es hilfreich?

Lösung

You don't. You set up a generator expression instead:

max(int(link.text) 
    for link in soup.find('div', class_='pagination').find_all('a')
    if link.text.strip().isdigit())

Demo:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div class="pagination">
...     <a href="/travel__world-desktop-wallpapers/page/2">2</a>
...     <a href="/travel__world-desktop-wallpapers/page/3">3</a>
...     <a href="/travel__world-desktop-wallpapers/page/4">4</a>
...     ...
...     <a href="/travel__world-desktop-wallpapers/page/31">31</a>
...     <a href="/travel__world-desktop-wallpapers/page/32">32</a>
...     <a href="/travel__world-desktop-wallpapers/page/33">33</a>
...     <a href="/travel__world-desktop-wallpapers/page/2">Next »</a>
... </div>
... ''')
>>> max(int(link.text) for link in soup.find('div', class_='pagination').find_all('a') if link.text.strip().isdigit())
33
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top