Question

I have just read the post Sorting alphanumeric hash keys in Perl?. But I am starting with Perl, and I don't understand it very clearly.

So I have a hash like this one:

  %hash = (
        "chr1" => 1,
        "chr2" => 3,
        "chr19" => 14,
        "chr22" => 1,
        "X" => 2,
    )

I'm trying to obtain output like this:

chr1
chr2
chr19
chr22

But I'm obtaining output like this:

chr1
chr19
chr2
chr22

I have written this code, but it is creating the above wrong output:

foreach my $chr (sort {$a cmp $b} keys(%hash)) {
    my $total= $hash{$chr};
    my $differentpercent= ($differenthash{$chr} / $total)*100;
    my $round=(int($differentpercent*1000))/1000;
    print "$chr\t$hash{$chr}\t$differenthash{$chr}\t$round\n";
}

It prints:

chr1    342421    7449    2.175
chr10    227648    5327    2.34
chr11    220415    4468    2.027
chr12    213263    4578    2.146
chr13    172379    3518    2.04
chr14    143534    2883    2.008
chr15    126441    2588    2.046
chr16    138239    3596    2.601
chr17    122137    3232    2.646
chr18    130275    3252    2.496
chr19    99876    2836    2.839
chr2    366815    8123    2.214

How can I fix this?

Was it helpful?

Solution

Update Note @Miller's comment below on some shortcomings of the Sort::Naturally module.

What you are asking for is a relatively complicated sort that splits each string into alphabetical and numeric fields, and then sorts the letters lexically and the numbers by value.

The module Sort::Naturally will do what you ask, or you can write something like this. You appear to have ignored the X key, so I have sorted it to the end using a case-independent sort.

use strict;
use warnings;

my %hash = map { $_ => 1 } qw(
    chr22  chr20  chr19  chr13  chr21  chr16  chr12  chr10  chr18
    chr17  chrY   chr5   chrX   chr8   chr14  chr6   chr3   chr9
    chr1   chrM   chr11  chr2   chr7   chr4   chr15
);

my @sorted_keys = sort {
    my @aa = $a =~ /^([A-Za-z]+)(\d*)/;
    my @bb = $b =~ /^([A-Za-z]+)(\d*)/;
    lc $aa[0] cmp lc $bb[0] or $aa[1] <=> $bb[1];
} keys %hash;

print "$_\n" for @sorted_keys;

output

chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chr21
chr22
chrM
chrX
chrY

Using the Sort::Naturally module (you will probably have to install it) you could write this instead.

use strict;
use warnings;

use Sort::Naturally;

my %hash = map { $_ => 1 } qw(
    chr22  chr20  chr19  chr13  chr21  chr16  chr12  chr10  chr18
    chr17  chrY   chr5   chrX   chr8   chr14  chr6   chr3   chr9
    chr1   chrM   chr11  chr2   chr7   chr4   chr15
);

my @sorted_keys = nsort keys %hash;

print "$_\n" for @sorted_keys;

The output is identical to the above.

OTHER TIPS

This can also be solved with a common Perl idiom called map-sort-map:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my %hash = (
   "chr1"  => 1,
   "chr2"  => 3,
   "chr19" => 14,
   "chr22" => 1,
);

my @sorted = map  { $_->[0]             }
             sort { $a->[1] <=> $b->[1] }
             map  { [$_, (/chr(\d+)/) || 0]  } keys %hash;

print Dumper \@sorted;
__END__
[
  'chr1',
  'chr2',
  'chr19',
  'chr22'
];

Note: Unlike @Borodin I chose sort X to the front because it wasn't specified so I just choose an end.

This has been the way I've been doing it for the longest time... I'm stealing the code from Borodin's post for reference. Borodin's sort code is very simple to follow if you understand regex. I prefer putting complicated sorts into a sub because it really gets messy otherwise. Anyway here you go:

my %hash = (
    "chr1" => 1,
    "chr2" => 3,
    "chr19" => 14,
    "chr22" => 1,
    "X" => 2,
);

foreach my $key (sort {&sortalphanum} keys %hash)
{
  print "  $key = $hash{$key}\n";
}

sub sortalphanum
{
  my @aa = $a =~ /^([A-Za-z]+)(\d*)/;
  my @bb = $b =~ /^([A-Za-z]+)(\d*)/;
  lc $aa[0] cmp lc $bb[0] or $aa[1] <=> $bb[1];
}

You could try this:

#!/usr/bin/perl

use warnings;
use strict;

my %records;
while (<DATA>) {
    my ($key, undef) = split;
    $records{$key} = $_;
}

my @keys = sort {
    my ($aa) = $a =~ /(\d+)/;
    my ($bb) = $b =~ /(\d+)/;
    $aa <=> $bb;
} keys %records;

foreach my $key (@keys) {
    printf "$records{$key}";
}


__DATA__
chr1    342421  7449    2.175
chr10   227648  5327    2.34
chr11   220415  4468    2.027
chr12   213263  4578    2.146
chr13   172379  3518    2.04
chr14   143534  2883    2.008
chr15   126441  2588    2.046
chr16   138239  3596    2.601
chr17   122137  3232    2.646
chr18   130275  3252    2.496
chr19   99876   2836    2.839
chr2    366815  8123    2.214

Output:

$ perl t01.pl 
chr1    342421  7449    2.175
chr2    366815  8123    2.214
chr10   227648  5327    2.34
chr11   220415  4468    2.027
chr12   213263  4578    2.146
chr13   172379  3518    2.04
chr14   143534  2883    2.008
chr15   126441  2588    2.046
chr16   138239  3596    2.601
chr17   122137  3232    2.646
chr18   130275  3252    2.496
chr19   99876   2836    2.839
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top