Sort Set of Numbers in the form (XX-YY) in Python
-
21-12-2019 - |
Question
I have a python list that contains values that follow the form
CCXX-YY where CC is 2 alphabetical characters that are the same for all values, X is an integer and YY are also integers.
e.g.
KA13-1 KA13-2 KA14-2 KA14-1 KA11-12 KA13-11
into
KA11-12 KA13-1 KA13-2 KA13-11 KA14-1 KA14-2
and not
KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2
Things I have tried:
natsort (issue with thinking the - is a negative sign)
Late edit: natsort works if you use the parameter alg=ns.UNSIGNED, as noted below.
naturalsort (doesn't work with python 3?)
sort using a key value (I'm sure this can be done, but I'm a bit new to python and am failing at it)
Things I'm currently trying:
Removing all the extra fields besides the numbers and attempting to sort based on that (this has some issues because KA12-10 will come after KA14-1 as it'll be 1210 compared to 141)
I cannot easily change the values to not include dashes as I am pulling the data from a request to a website and need the values to be in the original form to query individual items.
I'm sure someone that has more experience manipulating the built-in sort could help me out.
Thanks.
Solution
ISTM the real question is where you'd want something like KA13-12
to go. If you want it to come after KA13-2
, then I think you need something like
>>> seq = "KA13-1 KA13-2 KA14-2 KA14-1 KA11-12 KA13-12".split()
>>> seq.sort(key=lambda x: tuple(map(int, x[2:].split("-"))))
>>> seq
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-12', 'KA14-1', 'KA14-2']
OTHER TIPS
If you only want to consider the integer portion for the sort:
l = ['KB13-1', 'KA13-2', 'KC11-11', 'KA14-1', 'KA11-12']
sorted(l, key=lambda i: i[2:])
Output
['KC11-11', 'KA11-12', 'KB13-1', 'KA13-2', 'KA14-1']
If the first two characters, never change, then you can just use the plain vanilla sort
or sorted
functions.
x = ['KA13-1', 'KA13-2', 'KA11-11', 'KA14-1', 'KA11-12']
sorted(x)
Output
['KA11-11', 'KA11-12', 'KA13-1', 'KA13-2', 'KA14-1']
UPDATED ANSWER
As of natsort version 4.0.0, this will work for you right out of the box, without having to use and special options.
>>> from natsort import natsorted
>>> natsorted('KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2'.split())
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-11', 'KA14-1', 'KA14-2']
OLD ANSWER for natsort < 4.0.0
You mentioned that natsort did not work for you because of negative signs. This is because by default '-' is interpreted as part of the following number, but you can disable this with the "UNSIGNED" modifier.
>>> from natsort import natsorted, ns
>>> natsorted('KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2'.split(), alg=ns.UNSIGNED)
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-11', 'KA14-1', 'KA14-2']
Using versorted
would work also.
>>> from natsort import versorted
>>> versorted('KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2'.split())
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-11', 'KA14-1', 'KA14-2']
Full disclosure, I am the natsort
author.