Question

I get a possibly large number (UInt.MaxValue: 18446744073709551615) as a normal base10 number. This number would eventually become a filename: 12345678945768.txt

Since filenames on Windows aren't limited to just numerical digits, I would like to "compress" this in to a shorter string but need to make sure the strings can be mapped back to a number.

For smaller numbers: 0001365555, hexed is much shorter than anything else. Everything I've found so far states that Base64 would be shortest, but it isn't.

So far I've tried this:

//18446744073709551615 - 20
UInt64 i = UInt64.MaxValue; // 0001365555

//"//////////8=" - 12
string encoded = Convert.ToBase64String(BitConverter.GetBytes(i)); 

//"FFFFFFFFFFFFFFFF" - 16
string hexed = i.ToString("X"); 

//"MTg0NDY3NDQwNzM3MDk1NTE2MTU=" - 28
string utf = Convert.ToBase64String(System.Text.Encoding.ASCII.GetBytes(i.ToString())); 

Is there a better way to "compress" integer to convert similar to Hex but use 00-zz and not just 00-FF?

Thanks in advance!

Was it helpful?

Solution

Everything I've found so far states that Base64 would be shortest, but it isn't.

You don't want to use Base64. Base64 encoded text can use the / character, which is disallowed in file names on Windows. You need to come up with something else.

What else?

Well, you could write your own base conversion, perhaps something like this:

public static string Convert(ulong number)
{
    var validCharacters = "qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM1234567890!@#$%^&()_-";
    char[] charArray = validCharacters.ToCharArray();
    var buffer = new StringBuilder();
    var quotient = number;
    ulong remainder;
    while (quotient != 0)
    {
        remainder = quotient % (ulong)charArray.LongLength;
        quotient = quotient / (ulong)charArray.LongLength;
        buffer.Insert(0, charArray[remainder].ToString());
    }
    return buffer.ToString();
}

This is a "base-73" result, The more characters in validCharacters, the smaller the output will be. Feel free to add more, so long as they are legal characters in your file system.

OTHER TIPS

What is your allowed character set? If you could identify 7132 different Unicode characters that were safe to use, you could encode a 64-bit number as five Unicode characters. On the other hand, not all file systems will support such characters. If you could identify 139 legal characters, you could compress the data to a nine-character string. With 85, you could use a ten-character string.

You misused Base64.

(System.Text.Encoding.ASCII.GetBytes(i.ToString())

This produces a byte sequence that contains the base10 encoded integer and the encode it again in base64. That's obviously inefficient.

You need to get the raw bytes of your integer and encode them with base64. Which encoding is the most efficient depends on how many characters you want to allow. If you want the sho

And you should trim 0 bytes on one side of the array.

var bytes=BitConverter.GetBytes(input);
int len=8;
for(int i=7;i>=0;i--)
{
  if(bytes[i]!=0)
  {
    len=i+1;
    break;
  }
}
string s=Convert.ToBase64String(bytes,0,len).ReplaceString('/','-');

Note that this will not work as expected on big-endian systems.

But perhaps you should avoid byte encodings all together, and just use integer encodings with a higher base.

A simple version might be:

string digitChars="0123..."
while(i!=0)
{
  int digit=i%digitChars.Length;
  i/=digitChars.Length;
  result=digitChars[digit]+result;
}

Here's some code that uses vcsjones answer above, but has the reverse conversion included also. Like in his answer, feel free to add more characters if needed to reduce the string size. The characters below produce a string size of 13 for ulong.MaxValue.

private const string _conversionCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

public static string UlongToCompressedString(ulong number)
{
    char[] charArray = _conversionCharacters.ToCharArray();
    var buffer = new System.Text.StringBuilder();
    var quotient = number;
    ulong remainder;
    while (quotient != 0)
    {
        remainder = quotient % (ulong)charArray.LongLength;
        quotient = quotient / (ulong)charArray.LongLength;
        buffer.Insert(0, charArray[remainder].ToString());
    }
    return buffer.ToString();
}

public static ulong? CompressedStringToULong(string compressedNumber)
{
    if (compressedNumber == null)
        return null;

    if (compressedNumber.Length == 0))
        return 0;

    ulong result   = 0;
    int   baseNum  = _conversionCharacters.Length;
    ulong baseMult = 1;

    for (int i=compressedNumber.Length-1; i>=0; i--)
    {
        int cPos = _conversionCharacters.IndexOf(compressedNumber[i]);
        if (cPos < 0)
            return null;
        result += baseMult * (ulong)cPos;
        baseMult *= (ulong)baseNum;
    }

    return result;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top