Question

I have a method returning a list of String that need to be sorted. However, I'm running into the old String number sorting issue and was wondering if any one could assist with a Comparator implementation or point me in the direction of one.

The list is going to return something list this:

State Lower Legislative District 1
State Lower Legislative District 11
State Lower Legislative District 12
...
State Lower Legislative District 2
...
State Lower Legislative District 100
...
State Upper Legislative District 1
State Upper Legislative District 11
...

So, first I need to do a basic String sort, but then I need to sort by the number. The number to sort on should always trail, and may be 2 or 3 digits.

(Edit) My initial thought is to split the string on space, run StringUtils.isNumeric on the number portion, then sort. However, it seems a bit of a kludge to me.

Can anyone assist?

Was it helpful?

Solution

There is an article about this on Coding Horror. This is called natural sorting, where you effectively treat a group of digits as a single "character". See this question for some Java implementations of the idea.

Sorting for Humans : Natural Sort Order

The default sort functions in almost every programming language are poorly suited for human consumption. What do I mean by that? Well, consider the difference between sorting filenames in Windows explorer, and sorting those very same filenames via Array.Sort() code:

Windows Explorer Array.sort()

continued...

OTHER TIPS

I wrote a variation on String.CompareTo that compares the length of numbers found in the two strings. When encounting two numbers of the same length the alphanumeric compare is resumed as normal. It also skips leading zeros.

public static int compareNatural(String a, String b) {
    int la = a.length();
    int lb = b.length();
    int ka = 0;
    int kb = 0;
    while (true) {
        if (ka == la)
            return kb == lb ? 0 : -1;
        if (kb == lb)
            return 1;
        if (a.charAt(ka) >= '0' && a.charAt(ka) <= '9' && b.charAt(kb) >= '0' && b.charAt(kb) <= '9') {
            int na = 0;
            int nb = 0;
            while (ka < la && a.charAt(ka) == '0')
                ka++;
            while (ka + na < la && a.charAt(ka + na) >= '0' && a.charAt(ka + na) <= '9')
                na++;
            while (kb < lb && b.charAt(kb) == '0')
                kb++;
            while (kb + nb < lb && b.charAt(kb + nb) >= '0' && b.charAt(kb + nb) <= '9')
                nb++;
            if (na > nb)
                return 1;
            if (nb > na)
                return -1;
            if (ka == la)
                return kb == lb ? 0 : -1;
            if (kb == lb)
                return 1;

        }
        if (a.charAt(ka) != b.charAt(kb))
            return a.charAt(ka) - b.charAt(kb);
        ka++;
        kb++;
    }
}

One way would be to use a simple regex to parse out the fields of interest in your comparator and then compare them manually. Here's an untested example:

private static final Pattern pattern = Pattern.compile("^State (Lower|Upper) Legislative District (\\d+)$");

public int compare(String a, String b) {
    Matcher matcher1 = pattern.matcher(a);
    Matcher matcher2 = pattern.matcher(b);
    if( matcher1.matches() && matcher2.matches() ) {
        //compare upper/lower
        int upperLowerComparison = matcher1.group(1).compareTo(matcher2.group(1));
        if ( upperLowerComparison != 0 ) {
            return upperLowerComparison;
        }

        //number comparison
        return Integer.valueOf(matcher1.group(2)).compareTo(Integer.valueOf(matcher2.group(2));
    }

    //...what to do if they don't match?
}

You have two options. The first one is to create a class having two fields - the name and the number. Of course first parse the name and numbers. Then in the comparator first compare the name and then the number. The second one is to do the parsing at place in the compare method. Choose which one is more appropriate to you.

Have a look at this implementation:

public static int naturalCompare(String a, String b, boolean ignoreCase) {
    if (ignoreCase) {
        a = a.toLowerCase();
        b = b.toLowerCase();
    }
    int aLength = a.length();
    int bLength = b.length();
    int minSize = Math.min(aLength, bLength);
    char aChar, bChar;
    boolean aNumber, bNumber;
    boolean asNumeric = false;
    int lastNumericCompare = 0;
    for (int i = 0; i < minSize; i++) {
        aChar = a.charAt(i);
        bChar = b.charAt(i);
        aNumber = aChar >= '0' && aChar <= '9';
        bNumber = bChar >= '0' && bChar <= '9';
        if (asNumeric)
            if (aNumber && bNumber) {
                if (lastNumericCompare == 0)
                    lastNumericCompare = aChar - bChar;
            } else if (aNumber)
                return 1;
            else if (bNumber)
                return -1;
            else if (lastNumericCompare == 0) {
                if (aChar != bChar)
                    return aChar - bChar;
                asNumeric = false;
            } else
                return lastNumericCompare;
        else if (aNumber && bNumber) {
            asNumeric = true;
            if (lastNumericCompare == 0)
                lastNumericCompare = aChar - bChar;
        } else if (aChar != bChar)
            return aChar - bChar;
    }
    if (asNumeric)
        if (aLength > bLength && a.charAt(bLength) >= '0' && a.charAt(bLength) <= '9') // as number
            return 1;  // a has bigger size, thus b is smaller
        else if (bLength > aLength && b.charAt(aLength) >= '0' && b.charAt(aLength) <= '9') // as number
            return -1;  // b has bigger size, thus a is smaller
        else
            return lastNumericCompare;
    else
        return aLength - bLength;
}

It should be fast, without any regular expressions or array manipulation, just a couple of flags and a lot of cases.

This should sort any combination of numbers inside strings and properly support numbers which are equal and move on.

I usually do this by prefixing zeros to the number and handle the whole entity as a string. then sort it.

See this:

public abstract class MyNumberComparator {

    protected int doCompare(final String number1, final String number2) {
       String strNumber1 = fillUpLeftWithZeros(number1, 30);
       String strNumber2 = fillUpLeftWithZeros(number2, 30);    

       return strNumber1.toUpperCase().compareTo(strNumber2.toUpperCase());    
   }

}

A simple implementation would be like this one (this works with any string that ends with a number):

public class SplitComparator implements Comparator<String> {

  static class Pair implements Comparable<Pair> {

      private String name;
      private Integer number;

      public Pair(String value) {       
        value = value.trim();
        this.name = value.substring( 0, value.lastIndexOf(" ") );
        this.number = Integer.valueOf( value.substring( value.lastIndexOf(" ") + 1, value.length() ) );
      }

      @Override
      public int compareTo( Pair right) {

        int result = this.name.compareTo( right.name );

        if ( result == 0 ) {
            result = this.number.compareTo( right.number );
        }

        return result;
      } 

  }

  @Override
  public int compare(String left, String right) {                       
    return new Pair( left ).compareTo( new Pair( right ) );
  }

  public static void main( String ... args ) {

    String[] values = { "State Lower Legislative District 1", 
            "State Lower Legislative District 11",
            "State Upper Legislative District 1",
            "State Upper Legislative District 11"};

    SplitComparator comparator = new SplitComparator();

    System.out.println( comparator.compare( values[1] , values[0]) );
    System.out.println( comparator.compare( values[0] , values[1]) );
    System.out.println( comparator.compare( values[0] , values[3]) );

}

}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top