What text encoding scheme do you use when you have binary data that you need to send over an ascii channel?

StackOverflow https://stackoverflow.com/questions/2090009

  •  21-09-2019
  •  | 
  •  

Question

If you have binary data that you need to encode, what encoding scheme do you use?

I know about:

  • Hex encoding. Very simple, but quite verbose, expands one byte to two.
  • Base 64. Most common, not so verbose, expands three bytes to four.
  • Base 85. Not common, less verbose again, expands four bytes to five.

Are there any other encoding schemes in common use? If so, what are there advantages and disadvantages?

Edit: This is useful, for example, when trying to store arbitrary data in a cookie. Cookies can only store text, not arbitrary data, so you need to convert it in some way, preferably with a way to convert it back. Further, assume that you are using a stateless server so that you cannot save the state on the server and just put an identifier into the cookie. Of course, if you do this you would also need some way of verifying that what the user is passing back to you is what you passed to the user, for example a signature.

Also, since the current consensus is that you should use base64 since it is widespread, I will also point out that this is what I use... I am just curious if anyone used anything else, and if so, why.

Edit: Just in case someone stumbles across this, if you do want to use Base64 to store data in a cookie, you need to use a modified Base64 implementation. See this answer for the reason why.

Was it helpful?

Solution

For encoding cookie values, you need to be careful. See this older answer:

With Version 0 cookies, values should not contain white space, brackets, parentheses, equals signs, commas, double quotes, slashes, question marks, at signs, colons, and semicolons. Empty values may not behave the same way on all browsers.

Base64 encoding can generate = symbols for certain inputs, and this technically is not permitted in cookies (version 0 cookies, anyway, which are the most widely supported). In practice, I suspect the = will actually work fine, but maybe not.

I would suggest that to be absolutely sure that your encoded binary is cookie-compatible, then basic hex encoding is safest (e.g. in java).

edit: As @Paul helpfully pointed out, there is a modified version of Base 64 that is "URL safe" (and, I assume, "cookie safe"). Using a modified version of a standard algorithm rather dilutes its charm, mind you.

edit: @shoosh pointed out that the = is only used to denote the end of the base64 string, so you could trim the =, set the cookie, then reattach the = again when you need to decode it.

OTHER TIPS

Base64 wins because it's so common that I don't have to ever worry about rolling my own encoder/decoder. I haven't run into any applications where I've been worried about saving bandwidth or filespace in encoded binary data.

Once upon a time, there was UTF-7. It's officially deprecated, but it still works as an ACE (ASCII Compatible Encoding). Now there's IDN.

Base64 is the de-facto standard. Using anything else is asking for trouble.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top