Question

I want to write a python script which downloads the web-page only if the web-page contains HTML. I know that content-type in header will be used. Please suggest someway to do it as i am unable to get a way to get header before the file download.

Was it helpful?

Solution

Use http.client to send a HEAD request to the URL. This will return only the headers for the resource then you can look at the content-type header and see if it text/html. If it is then send a GET request to the URL to get the body.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top