Question

I am trying to do some scraping from wikipedia using BeautifulSoup4 Unfortunately I can't get past one findAll call, I have a work around, but would like to understand why this one is not working.

Sample code:

from bs4 import BeautifulSoup
import requests
import lxml

html = requests.get('http://en.wikipedia.org/wiki/Brazil_national_football_team').text
soup = BeautifulSoup(html, "html.parser")

title = "Edit section: Current squad"

print "findAll method : " , soup.findAll("a",{"title",title})
results = soup.findAll("a")

for r in results:
    if r.attrs.has_key('title'):
        if r.attrs['title']=='Edit section: Current squad':
            print "for if if method : ",r['href']

Sample output:

findAll method :  []
for if if method :  /w/index.php?title=Brazil_national_football_team&action=edit&section=35

So my alternative code with the 'for if if' method does return the right 'a href' but the beautifulsoup variant doesn't.

What am I doing wrong?

Was it helpful?

Solution

You made a mistake in your dictionary syntax:

soup.findAll("a",{"title",title})
#  ----------------------^

You passed in a set, not a dictionary there; replace the , with a ::

soup.findAll("a",{"title":title})

Alternatively, just use a keyword argument:

soup.findAll("a", title=title)

Demo:

>>> soup.findAll("a",{"title",title})
[]
>>> soup.findAll("a",{"title":title})
[<a href="/w/index.php?title=Brazil_national_football_team&amp;action=edit&amp;section=35" title="Edit section: Current squad">edit</a>]
>>> soup.findAll("a", title=title)
[<a href="/w/index.php?title=Brazil_national_football_team&amp;action=edit&amp;section=35" title="Edit section: Current squad">edit</a>]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top