How to encode a number as a string such that the lexicographic order of the generated string is in the same order as the numeric order

StackOverflow https://stackoverflow.com/questions/12751624

  •  05-07-2021
  •  | 
  •  

Question

For eg. if we have two strings 2 and 10, 10 will come first if we order lexicographically.

The very trivial sol will be to repeat a character n number of time.

eg. 2 can be encoded as aa

10 as aaaaaaaaaa

This way the lex order is same as the numeric one.

But, is there a more elegant way to do this?

Was it helpful?

Solution

When converting the numbers to strings make sure that all the strings have the same length, by appending 0s in the front if necessary. So 2 and 10 would be encoded as "02" and "10".

OTHER TIPS

While kjampani's solution is probably the best and easiest in normal applications, another way which is more space-efficient is to prepend every string with its own length. Of course, you need to encode the length in a way which is also consistently sorted.

If you know all the strings are fairly short, you can just encode their length as a fixed-length base-X sequence, where X is the number of character codes you're willing to use (popular values are 64, 96, 255 and 256.) Note that you have to use the character codes in lexicographical order, so normal base64 won't work.

One variable-length order-preserving encoding is the one used by UTF-8. (Not UTF-8 directly, which has a couple of corner cases which will get in the way, but the same encoding technique. The order-preserving property of UTF-8 is occasionally really useful.) The full range of such compressed codes can encode values up to 42 bits long, with an average of five payload bits per byte. That's sufficient for pretty long strings; four terabyte long strings are pretty rare in the wild; but if you need longer, it's possible, too, by extending the size prefix over more than one byte.

Break the string into successive sub strings of letters and numbers and then sort by comparing each substring as an integer if it's an numeric string

"aaa2" ---> aaa + 2

"aaa1000" ---> aaa + 1000

aaa == aaa

Since they're equal, we continue:

1000 > 2

Hence, aaa1000 > aaa2.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top