Question

I have some files that need to be sorted by name, unfortunately I can't use a regular sort, because I also want to sort the numbers in the string, so I did some research and found that what I am looking for is called natural sorting.

I tried the solution given here and it worked perfectly.

However, for strings like PresserInc-1_10.jpg and PresserInc-1_11.jpg which causes that specific natural key algorithm to fail, because it only matches the first integer which in this case would be 1 and 1, and so it throws off the sorting. So what I think might help is to match all numbers in the string and group them together, so if I have PresserInc-1_11.jpg the algorithm should give me 111 back, so my question is, is this possible ?

Here's a list of filenames:

files = ['PresserInc-1.jpg', 'PresserInc-1_10.jpg', 'PresserInc-1_11.jpg', 'PresserInc-10.jpg', 'PresserInc-2.jpg', 'PresserInc-3.jpg', 'PresserInc-4.jpg', 'PresserInc-5.jpg', 'PresserInc-6.jpg', 'PresserInc-11.jpg']

Was it helpful?

Solution

Google: Python natural sorting.

Result 1: The page you linked to.

But don't stop there!

Result 2: Jeff Atwood's blog that explains how to do it properly.

Result 3: An answer I posted based on Jeff Atwood's blog.

Here's the code from that answer:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)] 
    return sorted(l, key=alphanum_key)

Results for your data:

PresserInc-1.jpg
PresserInc-1_10.jpg
PresserInc-1_11.jpg
PresserInc-2.jpg
PresserInc-3.jpg
etc...

See it working online: ideone

OTHER TIPS

If you don't mind third party libraries, you can use natsort to achieve this.

>>> import natsort
>>> files = ['PresserInc-1.jpg', 'PresserInc-1_10.jpg', 'PresserInc-1_11.jpg', 'PresserInc-10.jpg', 'PresserInc-2.jpg', 'PresserInc-3.jpg', 'PresserInc-4.jpg', 'PresserInc-5.jpg', 'PresserInc-6.jpg', 'PresserInc-11.jpg']
>>> natsort.natsorted(files)
['PresserInc-1.jpg',
 'PresserInc-1_10.jpg',
 'PresserInc-1_11.jpg',
 'PresserInc-2.jpg',
 'PresserInc-3.jpg',
 'PresserInc-4.jpg',
 'PresserInc-5.jpg',
 'PresserInc-6.jpg',
 'PresserInc-10.jpg',
 'PresserInc-11.jpg']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top