How do you prove these string/number radix encoding/decoding algorithms work?

https://cs.stackexchange.com/questions/130498

29-09-2020
|

Pergunta

A while back I learned of these great algorithms:

function parseInt(value, code) {
  let x = 0
  let i = 0
  while (i < value.length) {
    const a = value[i]
    x = x * code.length + code.indexOf(a)
    i++
  }
  return x
}

function toString(value, code) {
  const radix = code.length
  let result = ''

  do {
    const digit = value % radix
    result = code[digit] + result
    value = Math.floor(value / radix)
  } while (value)

  return result
}

which allow you to convert toString any (albeit 32-bit) number to a string using a custom "alphabet" or "code", and do the reverse and parse a corresponding string to int.

console.log(parseInt('dj', 'abcdefghijklmnopqrstuvwxyz0123456789+-'));
// 123
console.log(toString(123, 'abcdefghijklmnopqrstuvwxyz0123456789+-'));
// dj
console.log(parseInt('a', 'abcdefghijklmnopqrstuvwxyz0123456789+-'));
// 0
console.log(toString(0, 'abcdefghijklmnopqrstuvwxyz0123456789+-'));
// a

How do you prove that these algorithms actually do the trick? What does the proof look like from base principles? Tangent/bonus: How would you ever go about figuring this out from scratch? To me this is pure magic and I would never have been able to figure this out "from scratch". I would like to know what I could've done to figure this out using mathematical techniques/principles.

Solução

First, you need to figure out what it is you want to prove. Here, I'd say we want to prove that

toString(parseInt(x, code),code) = x

and

parseInt(toString(x, code),code) = x

It's pretty obvious that trying to prove this is going to fail if:

we overflow the result in parseInt, because then parseInt isn't surjective
the input in parseInt contains characters that aren't in the code

So, our proof has to include the assumption that both are not the case.

The first proof is probably easier than the second. We can do induction on the length of `value':

toString(parseInt(x, code),code) = x

Given an arbitrary code which contains all characters in x

If x.length == 0:

x == ''
parseInt(x, code) returns 0
toString(0, code) returns '' <- actually it returns 'a' so the proof breaks here, but you can do the same if you assume length 1

If x.length > 1

our induction hypothesis: for all prefixes y of x toString(parseInt(y, code),code) = y
parseInt(x, code) = parseInt(x[:-1], code) * code.length + code.indexOf(x[-1]) (I am using python syntax here to say everything but the last element, and the last element. This in and of itself would have to be proved I guess but it would be very easy if the algorithm was written recursively instead of iteratively)
toString(a * code.length + b, code) = code[b] + toString(a, code) (again, this would be straightforward if the algorithm was recursive, but it should be convincing enough that this is the case here)
that means toString(parseInt(x,code),code) = toString(parseInt(x[:-1], code) * code.length + code.indexOf(x[-1]), code) = code[code.indexOf(x[-1])] + toString(parseInt(x[:-1], code), code) = (using induction hypothesis) x[-1] + x[:-1] which breaks our proof because toString has a bug and inverts the string, as far as I can tell.

Assuming the line result = code[digit] + result actually read result = result + code[digit] like I think it should:

toString(a * code.length + b, code) = toString(a, code) + code[b] (again, this would be straightforward if the algorithm was recursive, but it should be convincing enough that this is the case here)
that means toString(parseInt(x,code),code) = toString(parseInt(x[:-1], code) * code.length + code.indexOf(x[-1]), code) = toString(parseInt(x[:-1], code), code) + code[code.indexOf(x[-1])] = (using induction hypothesis) x[:-1] + x[-1] = x

QED.

The other direction is similar, you can again use induction as long as you take care that your induction hypothesis is "for all y < x: parseInt(toString(y, code),code) = y".

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange