Question

I'm using Python 2.7 with plistlib to import a .plist in a nested dict/array form, then look for a particular key and delete it wherever I see it.

When it comes to the actual files we're working with in the office, I already know where to find the values -- but I wrote my script with the idea that I didn't, in the hopes that I wouldn't have to make changes in the future if the file structure changes or we need to do likewise to other similar files.

Unfortunately I seem to be trying to modify a dict while iterating over it, but I'm not certain how that's actually happening, since I'm using iteritems() and enumerate() to get generators and work with those instead of the object I'm actually working with.

def scrub(someobject, badvalue='_default'): ##_default isn't the real variable
    """Walks the structure of a plistlib-created dict and finds all the badvalues and viciously eliminates them.

Can optionally be passed a different key to search for."""
    count = 0

    try:
        iterator = someobject.iteritems()
    except AttributeError:
        iterator = enumerate(someobject)

    for key, value in iterator:
        try:
            scrub(value)
        except:
            pass
        if key == badvalue:
            del someobject[key]
            count += 1

    return "Removed {count} instances of {badvalue} from {file}.".format(count=count, badvalue=badvalue, file=file)

Unfortunately, when I run this on my test .plist file, I get the following error:

Traceback (most recent call last):
  File "formscrub.py", line 45, in <module>
    scrub(loadedplist)
  File "formscrub.py", line 19, in scrub
    for key, value in iterator:
RuntimeError: dictionary changed size during iteration

So the problem might be the recursive call to itself, but even then shouldn't it just be removing from the original object? I'm not sure how to avoid recursion (or if that's the right strategy) but since it's a .plist, I do need to be able to identify when things are dicts or lists and iterate over them in search of either (a) more dicts to search, or (b) the actual key-value pair in the imported .plist that I need to delete.

Ultimately, this is a partial non-issue, in that the files I'll be working with on a regular basis have a known structure. However, I was really hoping to create something that doesn't care about the nesting or order of the object it's working with, as long as it's a Python dict with arrays in it.

Was it helpful?

Solution

Adding or removing items to/from a sequence while iterating over this sequence is tricky at best, and just illegal (as you just discovered) with dicts. The right way to remove entries from a dict while iterating over it is to iterate on a snapshot of the keys. In Python 2.x, dict.keys() provides such a snapshot. So for dicts the solution is:

for key in mydict.keys():
    if key == bad_value:
        del mydict[key]

As mentionned by cpizza in a comment, for python3, you'll need to explicitely create the snapshot using list():

for key in list(mydict.keys()):
    if key == bad_value:
        del mydict[key]

For lists, trying to iterate on a snapshot of the indexes (ie for i in len(thelist):) would result in an IndexError as soon as anything is removed (obviously since at least the last index will no more exist), and even if not you might skip one or more items (since the removal of an item makes the sequence of indexes out of sync with the list itself). enumerate is safe against IndexError (since the iteration will stop by itself when there's no more 'next' item in the list, but you'll still skip items:

>>> mylist = list("aabbccddeeffgghhii")
>>> for x, v  in enumerate(mylist):
...     if v in "bdfh":
...         del mylist[x]
>>> print mylist
['a', 'a', 'b', 'c', 'c', 'd', 'e', 'e', 'f', 'g', 'g', 'h', 'i', 'i']

Not a quite a success, as you can see.

The known solution here is to iterate on reversed indexes, ie:

>>> mylist = list("aabbccddeeffgghhii")
>>> for x in reversed(range(len(mylist))):
...     if mylist[x] in "bdfh":
...         del mylist[x]
>>> print mylist
['a', 'a', 'c', 'c', 'e', 'e', 'g', 'g', 'i', 'i']

This works with reversed enumeration too, but we dont really care.

So to summarize: you need two different code path for dicts and lists - and you also need to take care of "not container" values (values which are neither lists nor dicts), something you do not take care of in your current code.

def scrub(obj, bad_key="_this_is_bad"):
    if isinstance(obj, dict):
        # the call to `list` is useless for py2 but makes
        # the code py2/py3 compatible
        for key in list(obj.keys()):
            if key == bad_key:
                del obj[key]
            else:
                scrub(obj[key], bad_key)
    elif isinstance(obj, list):
        for i in reversed(range(len(obj))):
            if obj[i] == bad_key:
                del obj[i]
            else:
                scrub(obj[i], bad_key)

    else:
        # neither a dict nor a list, do nothing
        pass

As a side note: never write a bare except clause. Never ever. This should be illegal syntax, really.

OTHER TIPS

Here a generalized version of the one of @bruno desthuilliers, with a callable to test against the keys.

def clean_dict(obj, func):
    """
    This method scrolls the entire 'obj' to delete every key for which the 'callable' returns
    True

    :param obj: a dictionary or a list of dictionaries to clean
    :param func: a callable that takes a key in argument and return True for each key to delete
    """
    if isinstance(obj, dict):
        # the call to `list` is useless for py2 but makes
        # the code py2/py3 compatible
        for key in list(obj.keys()):
            if func(key):
                del obj[key]
            else:
                clean_dict(obj[key], func)
    elif isinstance(obj, list):
        for i in reversed(range(len(obj))):
            if func(obj[i]):
                del obj[i]
            else:
                clean_dict(obj[i], func)

    else:
        # neither a dict nor a list, do nothing
        pass

And an example with a regex callable :

func = lambda key: re.match(r"^<div>", key)

clean_dict(obj, func)
def walk(d, badvalue, answer=None, sofar=None):
    if sofar is None:
        sofar = []
    if answer is None:
        answer = []
    for k,v in d.iteritems():
        if k == badvalue:
            answer.append(sofar + [k])
        if isinstance(v, dict):
            walk(v, badvalue, answer, sofar+[k])
    return answer

def delKeys(d, badvalue):
    for path in walk(d, badvalue):
        dd = d
        while len(path) > 1:
            dd = dd[path[0]]
            path.pop(0)
        dd.pop(path[0])

Output

In [30]: d = {1:{2:3}, 2:{3:4}, 5:{6:{2:3}, 7:{1:2, 2:3}}, 3:4}

In [31]: delKeys(d, 2)

In [32]: d
Out[32]: {1: {}, 3: 4, 5: {6: {}, 7: {1: 2}}}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top