In Python 3.x, you have to import urllib.request
instead of urllib
. Then, change the line:
htmltext = urllib.urlopen(urls[0]).read()
to:
htmltext = urllib.request.urlopen(urls[0]).read()
Finally, change true
to True
.
Question
I'm getting this error:
NameError: name 'htmltext' is not defined
It comes from the code below:
from bs4 import BeautifulSoup
import urllib
import urllib.parse
url = "http://nytimes.com"
urls = [url]
visited = [url]
while len(urls) > 0:
try:
htmltext = urllib.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(htmltext)
urls.pop(0)
print(soup.findAll('a',href = true))
Solution
In Python 3.x, you have to import urllib.request
instead of urllib
. Then, change the line:
htmltext = urllib.urlopen(urls[0]).read()
to:
htmltext = urllib.request.urlopen(urls[0]).read()
Finally, change true
to True
.