Question

I want to cache IPs of hosts to refresh contents of regularly accessed web pages more quickly. But I came across the problem that some web pages may be accessed by URL (such as 'www.auto-club74.ru'), but not by corresponding IP ('192.169.52.119' as returned by socket.gethostbyname('www.auto-club74.ru')).

Here is my sample code which fails:

import socket
import requests

sIP = socket.gethostbyname('www.auto-club74.ru')
print sIP
r1 = requests.get('http://www.auto-club74.ru')
r2 = requests.get('http://{}/'.format(sIP))
assert r1.text==r2.text

It gives AssertionError. Thanks in advance for your help!

Was it helpful?

Solution

Did you actually tried to open the IP based url? It renders different page. Most probably it is just some shared hosting with different sites located there with different domains.

You can try passing HOST in headers.

No, this apparently doesn't work:

header = {"Host": 'www.auto-club74.ru' }
r2 = requests.get('http://{}/'.format(sIP), headers=header)

Goes into redirect loop for me. Actually www.auto-club74.ru redirects to auto-club74.ru, so this code works

header = {"Host": 'auto-club74.ru' }
r2 = requests.get('http://{}/'.format(sIP), headers=header)

also comparing web pages with string equality doesn't make a lot of sense, because it is very likely that two rendered pages are not the same, because there are a lot of dynamic elements there.

Also this smells like premature optimization, DNS lookup takes fraction of what it takes to get the page, so this will not help you much.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top