How to parse URLs using urlparse and split() in python?

https://stackoverflow.com/questions/17478425

02-06-2022
|

Question

Could someone explain to me the purpose of this line host = parsed.netloc.split('@')[-1].split(':')[0]in the following code? I understand that we are trying to get the host name from netlock but I don't understand why we are splitting with the @ delimiter and then again with the : delimiter.

import urlparse
parsed = urlparse.urlparse('https://www.google.co.uk/search?client=ubuntu&channel=fs')
print parsed
host = parsed.netloc.split('@')[-1].split(':')[0]
print host


Result:

ParseResult(scheme='https', netloc='www.google.co.uk', path='/search', params='', query='client=ubuntu&channel=fs, fragment='')

www.google.co.uk

Surely if one just needs the domain, we can get that from urlparse.netloc

Solution

Netloc in its full form can have HTTP authentication credentials and a port number:

login:password@www.google.co.uk:80

See RFC1808 and RFC1738

So we potentially have to split that into ["login:password", "www.google.co.uk:80"], take the last part, split that into ["www.google.co.uk", "80"] and take the hostname.

If these parts are omitted, there's no harm in trying to split on nonexisting delimeters, and no need to check if they're omitted or not.

urlparse documentation

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow