Python Sorting Question

https://stackoverflow.com/questions/3325574

29-09-2020
|

Question

i need to sort the following list of Tuples in Python:

ListOfTuples = [('10', '2010 Jan 1;', 'Rapoport AM', 'Role of antiepileptic drugs as preventive agents for migraine', '20030417'), ('21', '2009 Nov;', 'Johannessen SI', 'Antiepilepticdrugs in epilepsy and other disorders--a population-based study of prescriptions', '19679449'),...]

My purpose is to order it by Descending year (listOfTuples[2]) and by Ascending Author (listOfTuples[2]):

sorted(result, key = lambda item: (item[1], item[2]))

But it doesn't work. How can i obtain sort stability?

Solution

def descyear_ascauth(atup):
  datestr = atup[1]
  authstr = atup[2]
  year = int(datestr.split(None, 1)[0])
  return -year, authstr

... sorted(result, key=descyear_ascauth) ...

Notes: you need to extract the year as an integer (not as a string), so that you can change its sign -- the latter being the key trick in order to satisfy the "descending" part of the specifications. Squeezing it all within a lambda would be possible, but there's absolutely no reason to do so and sacrifice even more readability, when a def will work just as well (and far more readably).

OTHER TIPS

The easiest way is to sort on each key value separately. Start at the least significant key and work your way up to the most significant.

So in this case:

import operator
ListOfTuples.sort(key=operator.itemgetter(2))
ListOfTuples.sort(key=lambda x: x[1][:4], reverse=True)

This works because Python's sorting is always stable even when you use the reverse flag: i.e. reverse doesn't just sort and then reverse (which would lose stability, it preserves stability after reversing.

Of course if you have a lot of key columns this can be inefficient as it does a full sort several times.

You don't have to convert the year to a number this way as its a genuine reverse sort, though you could if you wanted.

Here is a idiom that works for everything, even thing you can't negate, for example strings:

data = [ ('a', 'a'), ('a', 'b'), ('b','a') ]

def sort_func( a, b ):
    # compare tuples with the 2nd entry switched
    # this inverts the sorting on the 2nd entry
    return cmp( (a[0], b[1]), (b[0], a[1]) ) 

print sorted( data )                    # [('a', 'a'), ('a', 'b'), ('b', 'a')]
print sorted( data, cmp=sort_func )     # [('a', 'b'), ('a', 'a'), ('b', 'a')]

Here's a rough solution that takes month abbreviature and day (if found) in account:

import time
import operator

def sortkey(seq):
    strdate, author = seq[1], seq[2]
    spdate = strdate[:-1].split()
    month = time.strptime(spdate[1], "%b").tm_mon
    date = [int(spdate[0]), month] + map(int, spdate[2:])
    return map(operator.neg, date), author  

print sorted(result, key=sortkey)

"%b" is locale's abbreviated month name, you can use a dictionary if you prefer not to deal with locales.

Here is the lambda version of Alex's answer. I think it looks more compact than Duncan's answer now, but obviously a lot of the readability of Alex's answer has been lost.

sorted(ListOfTuples, key=lambda atup: (-int(atup[1].split(None, 1)[0]), atup[2]))

Readability and efficiency should usually be preferred to compactness.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow