Question

I'm trying to parse a web page, and that's my code:

from bs4 import BeautifulSoup
import urllib2

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
read = BeautifulSoup(openurl.read())
soup = BeautifulSoup(openurl)
x = soup.find('ul', {"class": "i_p0"})
sp = soup.findAll('a href')
for x in sp:
    print x

I really with I could be more specific but as the title says, it gives me no response. No errors, nothing.

Was it helpful?

Solution

First of all, omit the line read = BeautifulSoup(openurl.read()).

Also, the line x = soup.find('ul', {"class": "i_p0"}) doesn't actually make any difference, because you are reusing x variable in the loop.

Also, soup.findAll('a href') doesn't find anything.

Also, instead of old-fashioned findAll(), there is a find_all() in BeautifulSoup4.

Here's the code with several alterations:

from bs4 import BeautifulSoup
import urllib2

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
soup = BeautifulSoup(openurl)
sp = soup.find_all('a')
for x in sp:
    print x['href']

This prints the values of href attribute of all links on the page.

Hope that helps.

OTHER TIPS

I altered a couple of lines in your code and I do get a response, not sure if that is what you want though.

Here:

openurl = urllib2.urlopen("http://pastebin.com/archive/Python")
soup = BeautifulSoup(openurl.read()) # This is what you need to use for selecting elements
# soup = BeautifulSoup(openurl) # This is not needed
# x = soup.find('ul', {"class": "i_p0"}) # You don't seem to be making a use of this either
sp = soup.findAll('a')
for x in sp:
    print x.get('href') #This is to get the href

Hope this helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top