Question

I'm getting this error:

NameError: name 'htmltext' is not defined

It comes from the code below:

from bs4 import BeautifulSoup
import urllib
import urllib.parse

url = "http://nytimes.com"

urls = [url]
visited = [url]

while len(urls) > 0:
        try:
           htmltext = urllib.urlopen(urls[0]).read()
        except:
           print(urls[0])      

        soup = BeautifulSoup(htmltext)    
        urls.pop(0)

        print(soup.findAll('a',href = true))
Was it helpful?

Solution

In Python 3.x, you have to import urllib.request instead of urllib. Then, change the line:

htmltext = urllib.urlopen(urls[0]).read()

to:

htmltext = urllib.request.urlopen(urls[0]).read()

Finally, change true to True.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top