First of all, omit the line read = BeautifulSoup(openurl.read())
.
Also, the line x = soup.find('ul', {"class": "i_p0"})
doesn't actually make any difference, because you are reusing x
variable in the loop.
Also, soup.findAll('a href')
doesn't find anything.
Also, instead of old-fashioned findAll()
, there is a find_all()
in BeautifulSoup4.
Here's the code with several alterations:
from bs4 import BeautifulSoup
import urllib2
openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
soup = BeautifulSoup(openurl)
sp = soup.find_all('a')
for x in sp:
print x['href']
This prints the values of href
attribute of all links on the page.
Hope that helps.