Question

A lot of sorting algorithms are based on comparisons of numbers. If I correctly understand, when we are using comparison algorithms for alphabetical sort, we compare char codes(their integer presentation) and sort depending on their values. (That's why at ASCII table letter B has a bigger code then A). But during this comparing we sort only by the first letter and not the whole word. When we use db query with ORDER BY we are getting sorting for the whole words. (As I understand the reason of it is db background mechanisms like indexes etc.). Also I heard about Radix sort (sorry, but never used it before) and as I can see it can help with alphabetical sort(maybe I'm wrong).

What algorithm is better to use for sorting by the whole words?

Not correct:

Adam
Aaron
Antony

Correct:

Aaron
Adam
Antony

And am I correct with my assumptions about the whole workflow?

Was it helpful?

Solution

You're not quite correct with the assumption about "Compare only the first letter". The algorithm is - if the first letters are the same, compare the next letter. And the next. And the next. Until either you find some letters that are different, or one of the strings runs out.

Also note that simply comparing by ASCII codes is not always enough. Sometimes you need to do a case-insensitive comparison where you consider A to be equal to a. Sometimes you need to do accent-insensitive comparison where you consider ā to be equal to a. And sometimes you need to account for crazy language shit where ß is equal to ss or worse.

My advice is - your programming language should probably have some mechanism for comparing strings. Use that. Don't roll out your own.

After that, any sorting algorithm will work. They all use one simple assumption - that you can compare the items that you sort. Whether they are integers, strings or complex objects, is irrelevant. As long as you can take any two objects and say "this one is bigger and this one is smaller", you're good to go.

(Note also that you need to be consistent about it. If A==B and B==C, then also you need to make sure that A==C. Similarly if A < B and B < C, then you must A < C. Etc.)

OTHER TIPS

No, the sorting is not based on first character or length. Alphabetical or better to put it as lexicographical ordering are done in the following way,

In C++ the comparison function would look like this,

bool operator<(const string &a, const string &b){
    int l = min(a.size(),b.size());
    for(int i = 0; i < l; i++){
        if( a[i] > b[i]) return false; // a is greater than b
        if( b[i] > a[i]) return true;  // b is greater than a
    }
    if ( a.size() > l) return false;   // a is greater than b
    return true;                       // b is greater than a
}

Every sorting algorithm works, if provided correct comparison function.

Just make sure the comparison between elements compares the whole strings instead of looking at only the 1st character when you implement your favorite sorting algorithm or call your sorting library function of choice.

As you already mentioned the only difference between sorting Strings instead of numbers is the compare method used by nearly all sorting algorithms (Radix-sort, bucket-sort are exceptions). Most times one of the fastest sorting algorithms is quick-sort.

You just have to implement comparing function which any sort algorithm can use. In that compare function you will compare whole words, letter by letter, as you need it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top