Download a URL only if it is a HTML Webpage
-
24-05-2021 - |
Question
I want to write a python script which downloads the web-page only if the web-page contains HTML. I know that content-type
in header
will be used. Please suggest someway to do it as i am unable to get a way to get header
before the file download.
Solution
Use http.client
to send a HEAD
request to the URL. This will return only the headers for the resource then you can look at the content-type
header and see if it text/html
. If it is then send a GET
request to the URL to get the body.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow