Sort Set of Numbers in the form (XX-YY) in Python

https://stackoverflow.com//questions/24045348

21-12-2019
|

Question

I have a python list that contains values that follow the form

CCXX-YY where CC is 2 alphabetical characters that are the same for all values, X is an integer and YY are also integers.

e.g.

KA13-1 KA13-2 KA14-2 KA14-1 KA11-12 KA13-11

into

KA11-12 KA13-1 KA13-2 KA13-11 KA14-1 KA14-2

and not

KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2

Things I have tried:

natsort (issue with thinking the - is a negative sign)

Late edit: natsort works if you use the parameter alg=ns.UNSIGNED, as noted below.

naturalsort (doesn't work with python 3?)

sort using a key value (I'm sure this can be done, but I'm a bit new to python and am failing at it)

Things I'm currently trying:

Removing all the extra fields besides the numbers and attempting to sort based on that (this has some issues because KA12-10 will come after KA14-1 as it'll be 1210 compared to 141)

I cannot easily change the values to not include dashes as I am pulling the data from a request to a website and need the values to be in the original form to query individual items.

I'm sure someone that has more experience manipulating the built-in sort could help me out.

Thanks.

Solution

ISTM the real question is where you'd want something like KA13-12 to go. If you want it to come after KA13-2, then I think you need something like

>>> seq = "KA13-1 KA13-2 KA14-2 KA14-1 KA11-12 KA13-12".split()
>>> seq.sort(key=lambda x: tuple(map(int, x[2:].split("-"))))
>>> seq
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-12', 'KA14-1', 'KA14-2']

OTHER TIPS

Your desired sort output is just lexicographic ordering. Just use sorted/list.sort without any custom key or comparer.

If you only want to consider the integer portion for the sort:

l = ['KB13-1', 'KA13-2', 'KC11-11', 'KA14-1', 'KA11-12']
sorted(l, key=lambda i: i[2:])

Output

['KC11-11', 'KA11-12', 'KB13-1', 'KA13-2', 'KA14-1']

If the first two characters, never change, then you can just use the plain vanilla sort or sorted functions.

x = ['KA13-1', 'KA13-2', 'KA11-11', 'KA14-1', 'KA11-12']
sorted(x)

Output

['KA11-11', 'KA11-12', 'KA13-1', 'KA13-2', 'KA14-1']

UPDATED ANSWER

As of natsort version 4.0.0, this will work for you right out of the box, without having to use and special options.

>>> from natsort import natsorted
>>> natsorted('KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2'.split())
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-11', 'KA14-1', 'KA14-2']

OLD ANSWER for natsort < 4.0.0

You mentioned that natsort did not work for you because of negative signs. This is because by default '-' is interpreted as part of the following number, but you can disable this with the "UNSIGNED" modifier.

>>> from natsort import natsorted, ns
>>> natsorted('KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2'.split(), alg=ns.UNSIGNED)
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-11', 'KA14-1', 'KA14-2']

Using versorted would work also.

>>> from natsort import versorted
>>> versorted('KA11-12 KA13-1 KA13-11 KA13-2 KA14-1 KA14-2'.split())
['KA11-12', 'KA13-1', 'KA13-2', 'KA13-11', 'KA14-1', 'KA14-2']

Full disclosure, I am the natsort author.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow