Question

When netloc is empty urlparse.urlunparse is inconsistent:

>>> urlparse.urlunparse(('http','','test_path', None, None, None))
'http:///test_path'
>>> urlparse.urlunparse(('ftp','','test_path', None, None, None))
'ftp:///test_path'
>>> urlparse.urlunparse(('ssh','','test_path', None, None, None))
'ssh:test_path'

Is it a bug or a feature? I would expect urlunparse to behave always, as in first example, even if scheme is not recognized.

Was it helpful?

Solution

The data tuple you are passing to urlunparse has the following components:

scheme, netloc, url, query, fragment = data

When there is no netloc, and the scheme is not in uses_netloc, the url is

    url = scheme + ':' + url

That is the way urlunparse (which calls urlunsplit) is defined:

def urlunsplit(data):
    ...
    scheme, netloc, url, query, fragment = data
    if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
        if url and url[:1] != '/': url = '/' + url
        url = '//' + (netloc or '') + url
    if scheme:
        url = scheme + ':' + url

Note that 'ssh' is not in uses_netloc:

uses_netloc = ['ftp', 'http', 'gopher', 'nntp', 'telnet',
               'imap', 'wais', 'file', 'mms', 'https', 'shttp',
               'snews', 'prospero', 'rtsp', 'rtspu', 'rsync', '',
               'svn', 'svn+ssh', 'sftp','nfs','git', 'git+ssh']

You do get a url that begins with ssh:// if you supply a netloc:

In [140]: urlparse.urlunparse(('ssh','netloc','test_path', None, None, None))
Out[140]: 'ssh://netloc/test_path'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top