Question

I have to parse the player names of adelaide crows from this link, for that I have written a parser like this

import requests                                                                 
from bs4 import BeautifulSoup

href_val = requests.get("http://www.afl.com.au/news/teams?round=9")
soup1 = BeautifulSoup(href_val.content)

players_info_adel = soup1.find_all("ul", {"class" : "team1 team-adel"})
for li in players_info_adel:

    player_names_adel = li.find_all("li", {"class" : "player"})
    #print player_names_adel

#print player_names_adel

for span in player_names_adel:

    if span.find(text = True):
        text = ''.join(span.find(text = True))
        text1 = text.encode('ascii')
        print text

but whenever I run this code I always get bunch of "\n" printed instead of the names. what should I do to get the names of the players?

Was it helpful?

Solution

You don't want to loop over each player <li> element; the first element is a text node with just a newline in it. Better use Tag.get_text() to get all text from the element instead.

Using a CSS selector to simplify the code:

for player in soup1.select('ul.team1 li.player'):
    text = player.get_text().strip()
    print text

This includes the player number; you can separate this number and the player name by using:

number, name = player.span.get_text().strip(), player.span.next_sibling.strip()

instead.

Demo:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> href_val = requests.get("http://www.afl.com.au/news/teams?round=9")
>>> soup1 = BeautifulSoup(href_val.content)
>>> for player in soup1.select('ul.team1 li.player'):
...     text = player.get_text().strip()
...     print text
... 
24 Sam Jacobs
32 Patrick Dangerfield
26 Richard Douglas
41 Kyle Hartigan
25 Ben Rutten
16 Luke Brown
33 Brodie Smith
# .. etc ..
>>> for player in soup1.select('ul.team1 li.player'):
...     number, name = player.span.get_text().strip(), player.span.next_sibling.strip()
...     print name
... 
Sam Jacobs
Patrick Dangerfield
Richard Douglas
Kyle Hartigan
Ben Rutten
Luke Brown
Brodie Smith
# ... etc ...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top