Python recursive setattr()-like function for working with nested dictionaries

Question 1

I have separated this out into two steps. In the first step, the query string is broken down into a series of instructions. This way the problem is decoupled, we can view the instructions before running them, and there is no need for recursive calls.

def build_instructions(obj, q):
    """
    Breaks down a query string into a series of actionable instructions.

    Each instruction is a (_type, arg) tuple.
    arg -- The key used for the __getitem__ or __setitem__ call on
           the current object.
    _type -- Used to determine the data type for the value of
             obj.__getitem__(arg)

    If a key/index is missing, _type is used to initialize an empty value.
    In this way _type provides the ability to
    """
    arg = []
    _type = None
    instructions = []
    for i, ch in enumerate(q):
        if ch == "[":
            # Begin list query
            if _type is not None:
                arg = "".join(arg)
                if _type == list and arg.isalpha():
                    _type = dict
                instructions.append((_type, arg))
                _type, arg = None, []
            _type = list
        elif ch == ".":
            # Begin dict query
            if _type is not None:
                arg = "".join(arg)
                if _type == list and arg.isalpha():
                    _type = dict
                instructions.append((_type, arg))
                _type, arg = None, []

            _type = dict
        elif ch.isalnum():
            if i == 0:
                # Query begins with alphanum, assume dict access
                _type = type(obj)

            # Fill out args
            arg.append(ch)
        else:
            TypeError("Unrecognized character: {}".format(ch))

    if _type is not None:
        # Finish up last query
        instructions.append((_type, "".join(arg)))

    return instructions

For your example

>>> x = {"a": "stuff"}
>>> print(build_instructions(x, "f[0].a"))
[(<type 'dict'>, 'f'), (<type 'list'>, '0'), (<type 'dict'>, 'a')]

The expected return value is simply the _type (first item) of the next tuple in the instructions. This is very important because it allows us to correctly initialize/reconstruct missing keys.

This means that our first instruction operates on a dict, either sets or gets the key 'f', and is expected to return a list. Similarly, our second instruction operates on a list, either sets or gets the index 0 and is expected to return a dict.

Now let's create our _setattr function. This gets the proper instructions and goes through them, creating key-value pairs as necessary. Finally, it also sets the val we give it.

def _setattr(obj, query, val):
    """
    This is a special setattr function that will take in a string query,
    interpret it, add the appropriate data structure to obj, and set val.

    We only define two actions that are available in our query string:
    .x -- dict.__setitem__(x, ...)
    [x] -- list.__setitem__(x, ...) OR dict.__setitem__(x, ...)
           the calling context determines how this is interpreted.
    """
    instructions = build_instructions(obj, query)
    for i, (_, arg) in enumerate(instructions[:-1]):
        _type = instructions[i + 1][0]
        obj = _set(obj, _type, arg)

    _type, arg = instructions[-1]
    _set(obj, _type, arg, val)

def _set(obj, _type, arg, val=None):
    """
    Helper function for calling obj.__setitem__(arg, val or _type()).
    """
    if val is not None:
        # Time to set our value
        _type = type(val)

    if isinstance(obj, dict):
        if arg not in obj:
            # If key isn't in obj, initialize it with _type()
            # or set it with val
            obj[arg] = (_type() if val is None else val)
        obj = obj[arg]
    elif isinstance(obj, list):
        n = len(obj)
        arg = int(arg)
        if n > arg:
            obj[arg] = (_type() if val is None else val)
        else:
            # Need to amplify our list, initialize empty values with _type()
            obj.extend([_type() for x in range(arg - n + 1)])
        obj = obj[arg]
    return obj

And just because we can, here's a _getattr function.

def _getattr(obj, query):
    """
    Very similar to _setattr. Instead of setting attributes they will be
    returned. As expected, an error will be raised if a __getitem__ call
    fails.
    """
    instructions = build_instructions(obj, query)
    for i, (_, arg) in enumerate(instructions[:-1]):
        _type = instructions[i + 1][0]
        obj = _get(obj, _type, arg)

    _type, arg = instructions[-1]
    return _get(obj, _type, arg)


def _get(obj, _type, arg):
    """
    Helper function for calling obj.__getitem__(arg).
    """
    if isinstance(obj, dict):
        obj = obj[arg]
    elif isinstance(obj, list):
        arg = int(arg)
        obj = obj[arg]
    return obj

In action:

>>> x = {"a": "stuff"}
>>> _setattr(x, "f[0].a", "test")
>>> print x
{'a': 'stuff', 'f': [{'a': 'test'}]}
>>> print _getattr(x, "f[0].a")
"test"

>>> x = ["one", "two"]
>>> _setattr(x, "3[0].a", "test")
>>> print x
['one', 'two', [], [{'a': 'test'}]]
>>> print _getattr(x, "3[0].a")
"test"

Now for some cool stuff. Unlike python, our _setattr function can set unhashable dict keys.

x = []
_setattr(x, "1.4", "asdf")
print x
[{}, {'4': 'asdf'}]  # A list, which isn't hashable

>>> y = {"a": "stuff"}
>>> _setattr(y, "f[1.4]", "test")  # We're indexing f with 1.4, which is a list!
>>> print y
{'a': 'stuff', 'f': [{}, {'4': 'test'}]}
>>> print _getattr(y, "f[1.4]")  # Works for _getattr too
"test"

We aren't really using unhashable dict keys, but it looks like we are in our query language so who cares, right!

Finally, you can run multiple _setattr calls on the same object, just give it a try yourself.

Question 2

>>> class D(dict):
...     def __missing__(self, k):
...         ret = self[k] = D()
...         return ret
... 
>>> x=D()
>>> x['f'][0]['a'] = 'whatever'
>>> x
{'f': {0: {'a': 'whatever'}}}

Question 3

You can hack something together by fixing two problems:

List that automatically grows when accessed out of bounds (PaddedList)
A way to delay the decision of what to create (list of dict) until you accessed it by the first time (DictOrList)

So the code will look like this:

import collections

class PaddedList(list):
    """ List that grows automatically up to the max index ever passed"""
    def __init__(self, padding):
        self.padding = padding

    def __getitem__(self, key):
        if  isinstance(key, int) and len(self) <= key:
            self.extend(self.padding() for i in xrange(key + 1 - len(self)))
        return super(PaddedList, self).__getitem__(key)

class DictOrList(object):
    """ Object proxy that delays the decision of being a List or Dict """
    def __init__(self, parent):
        self.parent = parent

    def __getitem__(self, key):
        # Type of the structure depends on the type of the key
        if isinstance(key, int):
            obj = PaddedList(MyDict)
        else:
            obj = MyDict()

        # Update parent references with the selected object
        parent_seq = (self.parent if isinstance(self.parent, dict)
                      else xrange(len(self.parent)))
        for i in parent_seq:
            if self == parent_seq[i]:
                parent_seq[i] = obj
                break

        return obj[key]


class MyDict(collections.defaultdict):
    def __missing__(self, key):
        ret = self[key] = DictOrList(self)
        return ret

def pprint_mydict(d):
    """ Helper to print MyDict as dicts """
    print d.__str__().replace('defaultdict(None, {', '{').replace('})', '}')

x = MyDict()
x['f'][0]['a'] = 'whatever'

y = MyDict()
y['f'][10]['a'] = 'whatever'

pprint_mydict(x)
pprint_mydict(y)

And the output of x and y will be:

{'f': [{'a': 'whatever'}]}
{'f': [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {'a': 'whatever'}]}

The trick consist on creating a defaultdict of objects that can be either a dict or a list depending how you access it. So when you have the assigment x['f'][10]['a'] = 'whatever' it will work the following way:

Get X['f']. It wont exist so it will return a DictOrList object for the index 'f'
Get X['f'][10]. DictOrList.getitem will be called with an integer index. The DictOrList object will replace itself in the parent collection by a PaddedList
Access the 11th element in the PaddedList will grow it by 11 elements and will return the MyDict element in that position
Assign "whatever" to x['f'][10]['a']

Both PaddedList and DictOrList are bit hacky, but after all the assignments there is no more magic, you have an structure of dicts and lists.

Question 4

It is possible to synthesize recursively setting items/attributes by overriding __getitem__ to return a return a proxy that can set a value in the original function.

I happen to be working on a library that does a few things similar to this, so I was working on a class that can dynamically assign its own subclasses at instantiation. It makes working with this sort of thing easier, but if that kind of hacking makes you squeamish, you can get similar behavior by creating a ProxyObject similar to the one I create and by creating the individual classes used by the ProxyObject dynamically in the a function. Something like

class ProxyObject(object):
    ... #see below

def instanciateProxyObjcet(val):
   class ProxyClassForVal(ProxyObject,val.__class__):
       pass
   return ProxyClassForVal(val)

You can use dictionary like I've used in FlexibleObject below would make that implementation significantly more efficient if this is the way you implement it. The code I will providing uses the FlexibleObject though. Right now it only supports classes that, like almost all of Python's builtin classes are capable of being generated by taking an instance of themselves as their sole argument to their __init__/__new__. In the next week or two, I'll add support for anything pickleable, and link to a github repository that contains it. Here's the code:

class FlexibleObject(object):
    """ A FlexibleObject is a baseclass for allowing type to be declared
        at instantiation rather than in the declaration of the class.

        Usage:
        class DoubleAppender(FlexibleObject):
            def append(self,x):
                super(self.__class__,self).append(x)
                super(self.__class__,self).append(x)

        instance1 = DoubleAppender(list)
        instance2 = DoubleAppender(bytearray)
    """
    classes = {}
    def __new__(cls,supercls,*args,**kws):
        if isinstance(supercls,type):
            supercls = (supercls,)
        else:
            supercls = tuple(supercls)
        if (cls,supercls) in FlexibleObject.classes:
            return FlexibleObject.classes[(cls,supercls)](*args,**kws)
        superclsnames = tuple([c.__name__ for c in supercls])
        name = '%s%s' % (cls.__name__,superclsnames)
        d = dict(cls.__dict__)
        d['__class__'] = cls
        if cls == FlexibleObject:
            d.pop('__new__')
        try:
            d.pop('__weakref__')
        except:
            pass
        d['__dict__'] = {}
        newcls = type(name,supercls,d)
        FlexibleObject.classes[(cls,supercls)] = newcls
        return newcls(*args,**kws)

Then to use this to use this to synthesize looking up attributes and items of a dictionary-like object you can do something like this:

class ProxyObject(FlexibleObject):
    @classmethod
    def new(cls,obj,quickrecdict,path,attribute_marker):
        self = ProxyObject(obj.__class__,obj)
        self.__dict__['reference'] = quickrecdict
        self.__dict__['path'] = path
        self.__dict__['attr_mark'] = attribute_marker
        return self
    def __getitem__(self,item):
        path = self.__dict__['path'] + [item]
        ref = self.__dict__['reference']
        return ref[tuple(path)]
    def __setitem__(self,item,val):
        path = self.__dict__['path'] + [item]
        ref = self.__dict__['reference']
        ref.dict[tuple(path)] = ProxyObject.new(val,ref,
                path,self.__dict__['attr_mark'])
    def __getattribute__(self,attr):
        if attr == '__dict__':
            return object.__getattribute__(self,'__dict__')
        path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
        ref = self.__dict__['reference']
        return ref[tuple(path)]
    def __setattr__(self,attr,val):
        path = self.__dict__['path'] + [self.__dict__['attr_mark'],attr]
        ref = self.__dict__['reference']
        ref.dict[tuple(path)] = ProxyObject.new(val,ref,
                path,self.__dict__['attr_mark'])

class UniqueValue(object):
    pass

class QuickRecursiveDict(object):
    def __init__(self,dictionary={}):
        self.dict = dictionary
        self.internal_id = UniqueValue()
        self.attr_marker = UniqueValue()
    def __getitem__(self,item):
        if item in self.dict:
            val = self.dict[item]
            try:
                if val.__dict__['path'][0] == self.internal_id:
                    return val
                else:
                    raise TypeError
            except:
                return ProxyObject.new(val,self,[self.internal_id,item],
                        self.attr_marker)
        try:
            if item[0] == self.internal_id:
                return ProxyObject.new(KeyError(),self,list(item),
                        self.attr_marker)
        except TypeError:
            pass #Item isn't iterable
        return ProxyObject.new(KeyError(),self,[self.internal_id,item],
                    self.attr_marker)
    def __setitem__(self,item,val):
        self.dict[item] = val

The particulars of the implementation will vary depending on what you want. It's obviously significantly easier to just override __getitem__ in the proxy than it is to override both __getitem__ and __getattribute__ or __getattr__. The syntax you are using in setbydot makes it look like you would be happiest with some solution that overrides a mixture of the two.

If you are just using the dictionary to compare values, using =,<=,>= etc. Overriding __getattribute__ works really nicely. If you are wanting to do something more sophisticated, you will probably be better off overriding __getattr__ and doing some checks in __setattr__ to determine whether you want to be synthesizing setting the attribute by setting a value in the dictionary or whether you want to be actually setting the attribute on the item you've obtained. Or you might want to handle it so that if your object has an attribute, __getattribute__ returns a proxy to that attribute and __setattr__ always just sets the attribute in the object (in which case, you can completely omit it). All of these things depend on exactly what you are trying to use the dictionary for.

You also may want to create __iter__ and the like. It takes a little bit of effort to make them, but the details should follow from the implementation of __getitem__ and __setitem__.

Finally, I'm going to briefly summarize the behavior of the QuickRecursiveDict in case it's not immediately clear from inspection. The try/excepts are just shorthand for checking to see whether the ifs can be performed. The one major defect of synthesizing the recursive setting rather than find a way to do it is that you can no longer be raising KeyErrors when you try to access a key that hasn't been set. However, you can come pretty close by returning a subclass of KeyError which is what I do in the example. I haven't tested it so I won't add it to the code, but you may want to pass in some human-readable representation of the key to KeyError.

But aside from all that it works rather nicely.

>>> qrd = QuickRecursiveDict
>>> qrd[0][13] # returns an instance of a subclass of KeyError
>>> qrd[0][13] = 9
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] = 'young'
>>> qrd[0][13] # 9
>>> qrd[0][13]['forever'] # 'young'
>>> qrd[0] # returns an instance of a subclass of KeyError
>>> qrd[0] = 0
>>> qrd[0] # 0
>>> qrd[0][13]['forever'] # 'young'

One more caveat, the things being returned is not quite what it looks like. It's a proxy to what it looks like. If you want the int 9, you need int(qrd[0][13]) not qrd[0][13]. For ints this doesn't matter much since, +,-,= and all that bypass __getattribute__ but for lists, you would lose attributes like append if you didn't recast them. (You'd keep len and other builtin methods, just not attributes of list. You lose __len__.)

So that's it. The code's a little bit convoluted, so let me know if you have any questions. I probably can't answer them until tonight unless the answer's really brief. I wish I saw this question sooner, it's a really cool question, and I'll try to update a cleaner solution soon. I had fun trying to code a solution into the wee hours of last night. :)