Question

what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4:

>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']

should put the 1234 one on the end. is there an easy way to do this?

Was it helpful?

Solution

is there an easy way to do this?

No

It's perfectly unclear what the real rules are. The "some have 3 digits and some have 4" isn't really a very precise or complete specification. All your examples show 4 letters in front of the digits. Is this always true?

import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
    m = key_pat.match(item)
    return m.group(1), int(m.group(2))

That key function might do what you want. Or it might be too complex. Or maybe the pattern is really r"^(.*)(\d{3,4})$" or maybe the rules are even more obscure.

>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

OTHER TIPS

is there an easy way to do this?

Yes

You can use the natsort module.

>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

Full disclosure, I am the package's author.

l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))

You need a key function. You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically.

sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:]))) 

Without the lambda and conditional expression that's

def key(s):
    if key[-4] in '0123456789':
         return (s[:-4], int(s[-4:]))
    else:
         return (s[:-3], int(s[-3:]))

sorted(list_, key=key)

This just takes advantage of the fact that tuples sort by the first element, then the second. So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890). If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for.

The issue is that the sorting is alphabetical here since they are strings. Each sequence of character is compared before moving to next character.

>>> 'a1234' < 'a124'  <----- positionally '3' is less than '4' 
True
>>> 

You will need to due numeric sorting to get the desired output.

>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>> 

What you're probably describing is called a Natural Sort, or a Human Sort. If you're using Python, you can borrow from Ned's implementation.

The algorithm for a natural sort is approximately as follows:

  • Split each value into alphabetical "chunks" and numerical "chunks"
  • Sort by the first chunk of each value
    • If the chunk is alphabetical, sort it as usual
    • If the chunk is numerical, sort by the numerical value represented
  • Take the values that have the same first chunk and sort them by the second chunk
  • And so on
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))

rather than splitting each line myself, I ask python to do it for me with re.findall():

import re
import sys

def SortKey(line):
  result = []
  for part in re.findall(r'\D+|\d+', line):
    try:
      result.append(int(part, 10))
    except (TypeError, ValueError) as _:
      result.append(part)
  return result

print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top