Question

After applying a regex to the namefiles of a directory which start by 'chr[0-9XY]'*, I obtain a list in the following order:

['chr9', 'chr8', 'chr7', 'chr6', 'chr5', 'chr4', 'chr3', 'chr2', 'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

I applied the glob.glob module to iterate through the desired files in the directory, and it sorts this way.

My question is if it's possible to make glob module sorting files in a different way, which is to sort by integers, and finally both X and Y. Like this:

['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

Is there any way to accomplish that? Thanks in advance!

Was it helpful?

Solution 2

You can use a third party library called natsort, its called this because it naturally sorts the elements.

You can install it via pip install natsort. You will need pip, and if you dont already have it installed, then look here (if you're using windows), otherwise, there are different ways to install pip if its not already installed on your system, simple do a simple search and you'll find a suitable guide.

Once this is done, you can easily use natsort to do all the work for you:

>>> import natsort
>>> var = ['chr9', 'chr8', 'chr7', 'chr6', 'chr5', 'chr4', 'chr3', 'chr2', 'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']
>>> natsort.natsorted(var)
['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

OTHER TIPS

Try lambda

sorted(var, key=lambda var: int(var[3:]) if var[3:].isdigit() else var[3:])

Use this code to sort the blobs:

blobs = ['chr9', 'chr8', 'chr7', 'chr6', 'chr5', 'chr4', 'chr3', 'chr2', 'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

def blob_key(item):
    item_id = item.replace("chr", "")
    if item_id.isdigit():
        return int(item_id)
    return item_id

blobs.sort(key=blob_key)

print blobs

Output: ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10' , 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19 ', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

The blob_key function receives an item (on this case a string) and returns the key associated to this item (if the item is an integer, the integer key is associated, in the other case: the string after chr is used), then this key is used for sorting, and as Python puts integer before strings on sorting, the output is what we desire.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top