Question

I tried to get the home site of a url. Firstly I used a for loop and achieve the goal.

home = ''
my_url = 'http://www.mysite.com/subdir/subdir2/index.html'
for item in my_url.split('/')[:3]:
    home += item + '/'
print  home

and I can get

'http://www.mysite.com/' 

Then I come across reduce() which I had never used before. So I get it a shot,here is the code:

my_url = 'http://www.mysite.com/subdir/subdir2/index.html'
home = ''
home = reduce(lambda x,y : x + y + '/',my_url.split('/')[:3])
print home

this time I got

'http:/www.mysite.com/'

Is it reduce just ommit the None in it?What's the explanation for the result?

yeah,I know from this topic that I could just use urllib's parser function to do it,so I hope here the discussion be focused on the reduce()

Was it helpful?

Solution

my_url = 'http://www.mysite.com/subdir/subdir2/index.html'
home = ''
home = reduce(lambda x,y : x + y + '/',my_url.split('/')[:3])

my_url.split('/')[:3] #=> ['http:', '', 'www.mysite.com']

'http:' + '' + '/' #=> 'http:/'
'http:/' + 'www.mysite.com' + '/' #=> 'http:/www.mysite.com/'

This is not mysterious. Everything works as expected - the problem is that URLs are not uniform, in that the protocol is separated with a double slash.

A useful tool for understanding how reduce works is scanl from functional ( http://pypi.python.org/pypi/functional ):

In [11]: home = scanl(lambda x,y : '%s%s/'%(x,y),my_url.split('/')[0],my_url.split('/')[1:3])

In [12]: home
Out[12]: <generator object _scanl at 0x0000000003DEC828>

In [13]: list(home)
Out[13]: ['http:', 'http:/', 'http:/www.mysite.com/']

Note that str.join implements a slightly different algorithm:

In [16]: '/'.join(my_url.split('/'))
Out[16]: 'http://www.mysite.com/subdir/subdir2/index.html'

This is what people usually want - it is equivalent to:

In [22]: reduce(lambda x,y : '%s/%s'%(x,y),my_url.split('/'))
Out[22]: 'http://www.mysite.com/subdir/subdir2/index.html'

OTHER TIPS

yeah,I know from this topic that I could just use urllib's parser function to do it,so I hope here the discussion focus on the reduce()

I don't understand why you want to reinvent the wheel if there is a function in the standard library to do so. I really suggest you to not waste your time and get familiar with pythons standard library and use the functionality provided.

Anyways, back to your question: When I type: my_url.split('/')[:3] I get this:

['http:', '', 'www.mysite.com']

So there is no None in it, just an empty string, which can be used as any other string. And apparently this is what your lambda function for the reduce algorithm does, it just concatenates the strings back together. I suggest you use the strings join method, as it is more readable and easy understandable:

>>> parts = my_url.split('/')[:3]
>>> print "/".join(parts)
'http://www.mysite.com'

You have to append the last / yourself, though.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top