Question

I'm looking for a data structure that preserves the order of its elements (which may change over the life of the data structure, as the client may move elements around).

It should allow fast search, insertion before/after a given element, removal of a given element, lookup of the first and last elements, and bidirectional iteration starting at a given element.

What would be a good implementation?

Here's my first attempt:

A class deriving from both collections.abc.Iterable and collections.abc.MutableSet that contains a linked list and a dictionary. The dictionary's keys are elements, values are nodes in the linked list. The dictionary would handle search for a node given an element. Once an element is found, the linked list would handle insertion before/after, deletion, and iteration. The dictionary would be updated by adding or deleting the relevant key/value pair. Clearly, with this approach the elements must be hashable and unique (or else, we'll need another layer of indirection where each element is represented by an auto-assigned numeric identifier, and only those identifiers are stored as keys).

It seems to me that this would be strictly better in asymptotic complexity than either list or collections.deque, but I may be wrong. [EDIT: Wrong, as pointed out by @roliu. Unlike list or deque, I would not be able to find an element by its numeric index in O(1). As of now, it is O(N) but I am sure there's some way to make it O(log N) if necessary.]

Was it helpful?

Solution

A slightly modified version of Raymond Hettinger's OrderedSet recipe seems to satisfy all my requirements. I only added support for position-based access and read/write.

# changes vs. original recipe at http://code.activestate.com/recipes/576696/:
# added a position parameter to add
# changed how pop works, and added popleft
# added find, get_start, get_end, next_pos, prev_pos, __getitem__, __setitem__

class OrderedSetPlus(collections.MutableSet, collections.Iterable):
    '''
    >>> oset = OrderedSetPlus([3, 3, 3, 2, 1, 8, 8])
    >>> oset.add(13)
    >>> p = oset.find(2)
    >>> oset.add(15, p)
    >>> oset
    OrderedSetPlus([3, 15, 2, 1, 8, 13])
    >>> p = oset.next_pos(p)
    >>> oset[p]
    1
    >>> oset.add(7, p)
    >>> oset
    OrderedSetPlus([3, 15, 2, 7, 1, 8, 13])
    >>> oset[p] = 20
    >>> oset
    OrderedSetPlus([3, 15, 2, 7, 20, 8, 13])
    '''

    class DuplicateElement(Exception):
        pass

    def __init__(self, iterable=None):
        self.end = end = [] 
        end += [None, end, end]         # sentinel node for doubly linked list
        self.map = {}                   # key --> [key, prev, next]
        if iterable is not None:
            self |= iterable

    def __len__(self):
        return len(self.map)

    def __contains__(self, key):
        return key in self.map

    def find(self, key):
        return self.map.get(key, None)

    # inserts element before the specified position
    # if pos is None, inserts at the end
    # position can only be obtained by calling instance methods
    def add(self, key, pos = None):
        if pos is None:
            pos = self.end
        if key not in self.map:
            curr = pos[PREV]
            curr[NEXT] = pos[PREV] = self.map[key] = [key, curr, pos]

    def discard(self, key):
        if key in self.map:        
            key, prev, next = self.map.pop(key)
            prev[NEXT] = next
            next[PREV] = prev

    def __iter__(self):
        end = self.end
        curr = end[NEXT]
        while curr is not end:
            yield curr[KEY]
            curr = curr[NEXT]

    def get_end(self):
        return self.end[PREV]

    def get_start(self):
        return self.end[NEXT]

    def next_pos(self, pos):
        pos = pos[NEXT]
        return None if pos is self.end else pos

    def prev_pos(self, pos):
        pos = pos[PREV]
        return None if pos is self.end else pos

    def __getitem__(self, pos):
        return pos[KEY]

    def __setitem__(self, pos, key):
        if key in self.map:
            raise DuplicateElement
        pos[KEY] = key

    def __reversed__(self):
        end = self.end
        curr = end[PREV]
        while curr is not end:
            yield curr[KEY]
            curr = curr[PREV]

    def popleft(self):
        return self.pop(pos = self.get_start())


    def pop(self, pos=None):
        if not self:
            raise IndexError()
        if pos is None:
            pos = self.get_end()
        key = self[pos]
        #key = next(reversed(self)) if last else next(iter(self))
        self.discard(key)
        return key

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, list(self))

    def __eq__(self, other):
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and list(self) == list(other)
        return set(self) == set(other)

OTHER TIPS

Using doubly-linked lists in Python is a bit uncommon. However, your own proposed solution of a doubly-linked list and a dictionary has the correct complexity: all the operations you ask for are O(1).

I don't think there is in the standard library a more direct implementation. Trees might be nice theoretically, but also come with drawbacks, like O(log n) or (precisely) their general absence from the standard library.

I do know this isn't exactly a direct answer to your question (because this is not a python-implemented solution), but if your data structure is going to be rather large I'd consider a Redis db. You can use redis-pi to talk to it from Python.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top