Question

I have this code :

from urllib.request import urlopen
from bs4 import BeautifulSoup

page = urlopen("http://www.doctoralia.com")
soup = BeautifulSoup(page)
myfile = open('data.txt','w')
myfile.write(soup.prettify())
myfile.close()
print('done boy !')

It works well ! but when I change urlopen("http://www.doctoralia.com") to urlopen("http://www.doctoralia.com/healthpros") it throw me this error :

Traceback (most recent call last):
File "test.py", line 4, in <module>
page = urlopen("http://www.doctoralia.com/healthpros")
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)

What's the problem ? Thanks

No correct solution

OTHER TIPS

If you still want to see the actual code you have to handle this HTTPError. Example:

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

try:
    page = urlopen("http://www.doctoralia.com/healthpros")
except HTTPError as e:
    if e.code == 404:
        soup = BeautifulSoup(e.fp.read())
        print(soup.prettify())

This will output the code if the page has given 404 HTTPError.

You can remove the if statement and do this for each HTTPError.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top