How do I convert a number to a custom 2char BaseX and back? (aka: How to do Azure Table property compression)

StackOverflow https://stackoverflow.com/questions/3740865

Question

Similar in how one would count from 0 to F in Hex, I have an array of numbers and letters I want to "count" from... and when I hit the max value, I want to start all over again in the "tens" column.

I need this to increase storage efficiency in Azure Table, and to keep my PrimaryKeys tiny (so I can use them in a tinyURL). First consider that only these characters are permitted as a propertyName, as documented here. In the array below, each character is positioned according to how Azure will sort it.

  public static string[] AzureChars = new string[]
   {
        "0","1","2","3","4","5","6","7","8","9","A",
        "B","C","D","E","F","G","H","I",
        "J","K","L","M","N","O","P","Q",
        "R","S","T","U","V","W","X","Y",
        "Z","a","b","c","d","e","f","g",
        "h","i","j","k","l","m","n","o",
        "p","q","r","s","t","u","v","w",
        "x","y","z"       
   };

My goal is to use 2 string/ASCII characters to count from the string "00" to lowercase "zz".

What is the best way to approach this concept using C#?
-- Is an array the correct object to use?
-- How will I associate a given character (uppercase 'Y') with it's position in the Array?

I'm just experimenting with this idea. At first brush it seems like a good one, but I haven't seen anyone consider doing things this way. What do you think?

Was it helpful?

Solution

You question is really about converting a number into a two digit base 62 number. Here is a general snippet of code for converting a positive number into an arbitrary base:

var n = 1234;
var baseNumber = 62;
var numberOfDigits = 2;
var digits = new Int32[numberOfDigits];
for (var i = 0; i < digits.Length; i += 1) {
  digits[i] = n%baseNumber;
  n /= baseNumber;
}

You have to map the digits into characters and a lookup table or a small function for doing that is suitable.

For your specific problem with the additional feature of having a variable number of digits I would write this code:

var n = 123456; 
var digitCount = 3;
var digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
var number = String.Empty;
for (var i = 0; i < digitCount; ++i) {
  number = digits[n%digits.Length] + number;
  n /= digits.Length;
}

Note that this code will convert 0 into 000, 1 into 001 etc. but I think that is actually what you want.

To convert back you can use this code:

var n = 0;
for (var i = 0; i < number.Length; ++i)
  n = n*digits.Length + digits.IndexOf(number[i]);

The String.IndexOf() isn't the most efficient way to do the conversion but in most cases should be OK.

Note that if your original number is greater than the largest number that can be stored in your base 62 number the conversion back will result in a different number. For 3 digits in base 62, this is true if the original number is greater than or equal to zzz = 62^3 - 1 = 238327.

OTHER TIPS

Since the elements of your array are all single characters, you can probably declare it as an array of characters:

public static char[] AzureChars = new char[]
{
    '0', '1', '2', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E',
    'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S',
    'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g',
    'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
    'v', 'w', 'x', 'y', 'z'       
};

Now you can easily write a function that returns the entire set of all n-character strings for any desired string length n. My version is recursive; if you find that it’s too slow for longish strings, you can probably optimise it:

public static IEnumerable<string> AzureStrings(int desiredLength)
{
    if (desiredLength == 0)
        return new[] { "" };
    return AzureChars.SelectMany(ch => AzureStrings(desiredLength - 1)
                                       .Select(str => ch + str));
}

Now we can generate any chunk of the sequence using Skip and Take:

// Prints “5v, 5w, 5x, 5y, 5z, 60, 61, 62, 64, 65”
Console.WriteLine(string.Join(", ", AzureStrings(2).Skip(300).Take(10)));
// Prints “3721”
Console.WriteLine(AzureStrings(2).Count());

Despite the fact that this computes the first 300 elements before outputting anything, it is way fast enough for me. Even this crazy computation here takes less than a second:

// Prints “5PkS, 5PkT, 5PkU, 5PkV, 5PkW, 5PkX, 5PkY, 5PkZ, 5Pka, 5Pkb”
Console.WriteLine(string.Join(", ", AzureStrings(4).Skip(1000000).Take(10)));

use the modulus for that (and get the remainder)

        int i = AzureChars.Length;
        int index = 62 //character to lookup;
        string a = AzureChars[index % i];

Get the index of a char:

        int index = Array.IndexOf(AzureChars, "Y");

like:

        string text = "YY";
        int index1 = Array.IndexOf(AzureChars, text[1].ToString());
        int index2 = Array.IndexOf(AzureChars, text[0].ToString());

perhaps you should use a CharArray (char[]) instead, or just a long string like:

 static string AzureChars= "012456789.....qrstuvwxyz";

all together to make it clear:

    static void Main(string[] args)
    {
        char[] b = AzureCharConverter.ToCharArray(522);
        int i = AzureCharConverter.ToInteger(b);
    }


    public static class AzureCharConverter
    {
         private static readonly string _azureChars
         = "012456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

         public static int ToInteger(string chars)
         {
                 int l = _azureChars.IndexOf(chars[0]);
                 int r = _azureChars.IndexOf(chars[1]);
                 return (l * _azureChars.Length) + r;
         }


         public static char[] ToCharArray(int value)
         {
                  char l = _azureChars[value / _azureChars.Length];
                  char r = _azureChars[value % _azureChars.Length];
                  return new char[] { l, r };
         }
    }

providing that an input alpha is always two digits and the result is always lesser than 3720

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top