Pregunta

I have found through various stack Q&As that a Base64 encoded 256-bit number will have one = for padding and will end only with one of AEIMQUYcgkosw048.

I'm fairly confident that a Base64 encoded 512-bit number will have two ==s of padding because of the bit quotient.

For Base64 encoded 512-bit numbers, what is the range for the final character? The modulus of the quotient of the bits is the same, so does that mean that the final character range is the same for both 256-bit encoded and 512-bit encoded?

This is for space conservation and regexing of readable Ed25519 signatures.


Specifically, I'm converting Java byte[64]s to Stringswith org.apache.commons.codec.binary.Base64's encodeBase64.

¿Fue útil?

Solución

I am assuming here that the 256-bit and 512-bit numbers in question are encoded using exactly 32 or 64 bytes respectively (i.e. no dropping of leading zeros, no additional bit to prevent signed/unsigned issues, no ASN.1 BER encoding header, ...).

Base64 uses 4 characters for each byte triple, each character representing 6 bit of the data:

        byte #1    |    byte #2    |    byte #3
bit 7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0

becomes

bit 5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0
      char #1  |  char #2  |  char #3  |  char #4

Which char is used for which 6-tupel of bits is specified by means of a table, cf e.g. the Wikipedia article.

Thus, in case of the 256-bit number 32 bytes have to be encoded, i.e. 11 character quadruples are used the last of which only encodes 2 instead of the maximum of 3 bytes, i.e. only 16 bit of data. The last character (for which there is no data), therefore, is a =, and the second to last character (for which there only is data for the top 4 bits) can only be one representing 6-tupels of bits the two lowest bits are 0, i.e. the characters you enumerated.

And in case of the 512-bit number 64 bytes have to be encoded, i.e. 22 character quadruples are used the last of which only encodes 1 instead of the maximum of 3 bytes, i.e. only 8 bit of data. The last two characters (for which there is no data), therefore, are both =, and the second character (for which there only is data for the top 2 bits) can only be one representing 6-tupels of bits the four lowest bits are 0, i.e. the characters AQgw.

As mentioned above, though, I made certain assumptions on the encoding of the numbers...

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top