Question

I need a Perl comparison function that can be used with sort.

Each key is a text string that has an arbitrary number of subkeys, separated by delimiter characters (dot, colon, space, and slash). Some subkeys are numeric, and need to be sorted numerically. The key format and number of subkeys varies. Therefore, the comparison has to handle one key being longer than the other, and has to handle the case where a subkey is numeric in one key but not in another (in which case a textual comparison is appropriate for that subkey).

This works, but I bet there are better solutions:

use warnings;
use strict;
use Scalar::Util qw[looks_like_number];

sub hier_cmp {

    my $aa = $a;
    my $bb = $b;

    # convert all delims (. : / space) to the same delim

    $aa =~ tr/.:\/ /::::/;
    $bb =~ tr/.:\/ /::::/;
    my @lista = split(":", $aa);
    my @listb = split(":", $bb);

    my $result;

    for my $ix (0 .. min($#lista, $#listb)) {
        if (exists($lista[$ix]) && exists($listb[$ix])) {
            if ( looks_like_number($lista[$ix]) && looks_like_number($listb[$ix])) {
                # compare numerically
                $result = ($lista[$ix] <=> $listb[$ix]);
            } else {
                # compare as strings
                $result = ($lista[$ix] cmp $listb[$ix]);
            }
            if ($result == 0) {
                next;
            }
            return $result;

        } elsif (exists($lista[$ix])) {
            return 1;
        } else {
            return -1;
        }
    }
}

For my purposes, readability is more important than speed. This is just for an internal tool, and lists will rarely have more than hundreds of elements. However, any opportunity to learn something is good.

As you can see, I'm not a perl wizard. Even trivial improvements on my code would be appreciated.

Thanks!

Was it helpful?

Solution

It would help if you gave us some data to test with, but this code passes a few basic tests and it looks right.

It simplifies the problem by using the List::MoreUtils function pairwise to create an array of field pairs.

Then it is just a matter of checking whether only one is defined, when one of the lists has come to an end before the other and should be sorted first; if they are both numeric, when they should be compared with a numeric comparison; or otherwise simply compare them as strings.

If the end of the array of pairs is reached then everything has matched and zero is returned to indicate equiality.

Update

I have changed this code to remove the dependency on List::MoreUtils::pairwise.

use strict;
use warnings;

use Scalar::Util 'looks_like_number';

sub hier_cmp {

  our ($a, $b);

  my @a = split m|[.: /]+|, $a;
  my @b = split m|[.: /]+|, $b;

  for my $i (0 .. $#a > $#b ? $#a : $#b) {
    my @ab = ( $a[$i], $b[$i] );
    if (grep defined, @ab < 2) {
      return defined $ab[0] ? 1 : -1;
    }
    else {
      my $numeric = grep(looks_like_number($_), @ab) == 2;
      my $result = $numeric ? $ab[0] <=> $ab[1] : $ab[0] cmp $ab[1];
      return $result if $result;
    }
  }

  return 0;
}

OTHER TIPS

That looks like natural sorting. There are several modules on CPAN that already do that such as Sort::Naturally or Sort::Key::Natural.

For instance:

use Sort::Key::Natural qw(natsort);
my @sorted = natsort @data;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top