Question

I am parsing through a directory. I found a snippet here, that works well, but I cannot seem to figure out why and how their variable dir is updated where it is set.

What I am trying to do is leave out empty folders

import os

def get_directory_structure(rootdir):
    """
    Creates a nested dictionary that represents the folder structure of rootdir
    """
    dir = {}
    rootdir = rootdir.rstrip(os.sep)
    start = rootdir.rfind(os.sep) + 1
    for path, dirs, files in os.walk(rootdir):
        folders = path[start:].split(os.sep)
        subdir = dict.fromkeys(files)
        parent = reduce(dict.get, folders[:-1], dir)
        parent[folders[-1]] = subdir
    return dir

dir is being set to the same value as parent along the line:

        parent[folders[-1]] = subdir

How come?

dir is mutable and taken as input in the reduce line, but it is not set there, rather at the following line.

Any idea?

I want to be able to leave out the empty folders, and would rather find an elegant way to do it; Should I give up and skim through the dict as a second pass?

[Edit after solved] so as Hans and Adrin pointed out, reduce actually makes parent point to dir, so they are the same object, and any update to parent updates dir.

I ended up keeping the same code but renamed the vars for clarity:

dir -> token_dict
folders -> path_as_list
subdir -> files_in_dir
parent -> full_dir (and I end up returning full_dir)

More typing, but next time I look, I'll get to it straight away.

Was it helpful?

Solution 2

You're passing dir to the reduce function. Meaning, you're passing a pointer to the object to the function, and the function can change it.

Look at the implementation of the reduce function here. And note the line:

accum_value = function(accum_value, x)

At this point, accum_value is pointing to the same place as initializer which is your dir, and is passed to the function, which in your case is dict.get.

OTHER TIPS

Little bit of explanation about reduce with dictionary for anybody who are not much familiar with reduce:

Before we come to the snippet lets do a little bit of reduce function.

Reduce will apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value.

Here is the syntax:

reduce(function, sequence[, initial]) -> value

If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.

Without initial:

>>> reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])
15
>>>
smiliar to ((((1+2)+3)+4)+5)

With initial:

>>> reduce(lambda x, y: x+y, [], 1) 
1
>>>

That is about list, when it comes to dictionary:

First lets check what is dict.get() method can do :

>>> d = {'a': {'b': {'c': 'files'}}}
>>> dict.get(d,'a')
{'b': {'c': 'files'}}
>>>

So, when you put dict.get method inside reduce, this is what happens:

>>> d = {'a': {'b': {'c': 'files'}}}
{'b': {'c': 'files'}}
>>> reduce(dict.get, ['a','b','c'], d)
'files'
>>>

Which is similar to :

>>> dict.get(dict.get(dict.get(d,'a'),'b'),'c')
'files'
>>>

and when you got empty list, you will get empty dict which is the default value:

>>> reduce(dict.get, [], {})
{}
>>>

Lets come back to your snippet:

dir in your snippet != builtin dir() function, it is just a name bind to an empty dictionary.

parent = reduce(dict.get, folders[:-1], dir)

So, in the above line, folders[:-1] is just a list of directories. and dir is empty_dictionary.

Please let me know if it helps in anyway.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top