Question

I'd like to comb off the n largest extremes from a timeseries. heapq works perfectly for the nlargest

def nlargest(series, n):
    count = 0
    heap = []
    for e in series:
        if count < n:
            count+=1
            hp.heappush(heap, e)
        else:
            # keeps heap size fixed 
            hp.heappushpop(heap,e)  
    ''' note: heap[0] is smallest '''
    return heap

but what about the n smallest? Note that i want a subset of the original series, so heapify and reversing the order won't work. What I'd like is essentially to overload the comparison operator from gt to lt. Not so familiar with overloading in python.

A less attractive option would (assuming numerical values) would be to negate the item before inserting and then negate the entire return heap (return a list or re-heap the negated list) but this seems kludgy and it no longer works for non-numericals that do have gt and lt. Any elegant solutions?

Was it helpful?

Solution

You can easily 'create' an inverted heap by multiplying the priorities of your items by -1.

So your nsmallest simply needs to be told how to 'invert' the priorities, decorating each value as needed:

def nsmallest(series, n, invert=lambda x: -1 * x):
    count = 0
    heap = []
    for e in series:
        if count < n:
            count += 1
            hp.heappush(heap, (invert(e), e))
        else:
            # keeps heap size fixed
            hp.heappushpop(heap, (invert(e), e))  
    # note: heap[0][1] is largest, remove inverted priorities
    return [h[1] for h in heap]

Note that we use a (invertedpriority, value) tuple to keep the heap inverted.

For non-numericals, you'd have to simply provide a inversion function that turns the priorities upside down, it only needs to be a simple key, not something that is readable or anything:

alphanumeric_invert = lambda x: [(ord(c) * -1) for c in x] 

However, rather than write your own, you want to use the heapq.nsmallest() function, which uses an optimised max-heap implementation (there is an internal _heappop_max() function it uses), which also adds a tie-breaker counting value to keep the sort stable. And there is a matching heapq.nlargest() function.

OTHER TIPS

Use heapq.nsmallest from the Python standard library:

heapq.nsmallest(n, iterable[, key])

Return a list with the n smallest elements from the dataset defined by iterable. Equivalent to: sorted(iterable, key=key)[:n]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top