Question

I have following C# code compiled as Sort.exe:

using System;
using System.Collections.Generic;

class Test
{
    public static int Main(string[] args)
    {
        string text = null;
        List<string> lines = new List<string>();
        while((text = Console.In.ReadLine()) != null)
        {
            lines.Add(text);
        }

        lines.Sort();

        foreach(var line in lines)
            Console.WriteLine(line);

        return 0;
    }
}

I have a file input.txt which has following 5 lines as its content:

x000000000000000000093.000000000
x000000000000000000037.000000000
x000000000000000100000.000000000
x000000000000000000538.000000000
x-00000000000000000020.000000000

Now if I run it on command prompt following is the output:

C:\Users\girijesh\AppData\Local\Temp>sort < input.txt
x000000000000000000037.000000000
x000000000000000000093.000000000
x-00000000000000000020.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

I am not able to understand what kind of string sorting it is where string starting with x-(3rd line in output) comes in middle of strings starting with x0. Either 3rd line should have been at the top or at the bottom. Excel is also showing the same behaviour.

Was it helpful?

Solution

In many cultures (including the invariant culture) the hyphen is a character that is of only minor importance for sorting purposes. In most texts, this makes sense: pre-whatever and prewhatever are pretty similar. For example, the following list is sorted as this, which I think is good:

preasdf
prewhatever
pre-whatever
prezxcv

You seem to want an Ordinal comparison, where values are compared purely by their unicode code point values. If you change the line to:

lines.Sort(StringComparer.Ordinal);

Then your results are:

x-00000000000000000020.000000000
x000000000000000000037.000000000
x000000000000000000093.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

If you're wondering why the -...20.0 value ended up where it did, consider what it'd look like if you removed the - (and compare with the above pre list).

x000000000000000000037.000000000
x000000000000000000093.000000000
x00000000000000000020.000000000
x000000000000000000538.000000000
x000000000000000100000.000000000

If your input is always in the format x[some number], I'd parse the value after x as a decimal or double, and do the sorting on that. That would make it easier to ensure expected behavior, and overall better.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top