Download a URL only if it is a HTML Webpage

https://stackoverflow.com/questions/9750481

download
python
html-parsing
beautifulsoup
printing-web-page

24-05-2021
|

Question

I want to write a python script which downloads the web-page only if the web-page contains HTML. I know that content-type in header will be used. Please suggest someway to do it as i am unable to get a way to get header before the file download.

Solution

Use http.client to send a HEAD request to the URL. This will return only the headers for the resource then you can look at the content-type header and see if it text/html. If it is then send a GET request to the URL to get the body.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow