Question

I have a school project where I need to sort all kind of data types with different sorting algorithms. Radix sort works well but it can't sort anything else than integers. I'm probably not going to add sorting results for anything else than integer since every data type will get sorted as integers.

That said, I'd like to know if there is a better way to convert strings to integers? Here's what I came with. I didn't want to outsmart python and tried to use standard function as much as possible.

def charToHex(char):
    return hex(ord(char))[2:]

def stringToHex(text):
    t = ''
    for char in text:
        t += charToHex(char)

    return t

def stringToInt(text):
    return int(stringToHex(text), 16)

print stringToInt('allo')
print stringToInt('allp')
print stringToInt('all')

It does work well but I'd be happy to know if there is a better way to handle that. For what it's worth, sorting anything else than integers with the radix sort sounds pointless. Because even if you can sort a list of integers. You'll have to get the values for all keys back into the list.

I had in mind to do something like that. For each value in my list, get an integer key. Put that key inside a hashtable and the value in a list for that hash table. Replace the value in the list by the integer key and then sort the list of keys.

For each keys in the sorted list, get the list of value for that key and pop one item. Put that item inside the list and continue.

I'd also like to know if there is a way to optimize this process in a way to make it worth it using radix sort instead of an other sort that doesn't require any conversion. The amount of item in the list may go beyond 50000.

edit

Actually the code here doesn't work for strings of different sizes. I'm not so sure how to check that. Padding the strings with space seems to work.

def getMaxLen(ls):
    lenght = 0

    for text in ls:
        lenght = max(lenght, len(text))

    return lenght

def convertList(ls):
    size = getMaxLen(ls)
    copy = ls[:]

    for i, val in enumerate(copy):
        copy[i] = stringToInt(val.ljust(size, ' '))

    return copy

print convertList(["allo", "all", "bal"])
Was it helpful?

Solution

First, take a look at this article. That article shows that yes, in some cases, you can figure out a radix sort algorithm for strings that is faster than any other sort.

Second, and more importantly, I'd ask yourself if you are doing premature optimization. Sorting 50k items with python's sort() function is going to be incredibly fast. Unless you're sure this is a bottleneck in your application I wouldn't worry about it and would just use the sort() function. If it is a bottleneck, I'd also make sure there isn't someway you can avoid doing all of these sorts (e.g. caching, algorithms that work on unsorted data, etc.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top