Question

Simple yes or no question, and I'm 90% sure that it is no... but I'm not sure.

Can a Base64 string contain tabs?

Was it helpful?

Solution

It depends on what you're asking. If you are asking whether or not tabs can be base-64 encoded, then the answer is "yes" since they can be treated the same as any other ASCII character.

However, if you are asking whether or not base-64 output can contain tabs, then the answer is no. The following link is for an article detailing base-64, including which characters are considered valid:

http://en.wikipedia.org/wiki/Base64

OTHER TIPS

The short answer is no - but Base64 cannot contain carriage returns either.

That is why, if you have multiple lines of Base64, you strip out any carriage returns, line feeds, and anything else that is not in the Base64 alphabet

That includes tabs.

From wikipedia.com:

The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The original specification, RFC 989, additionally used the "*" symbol to delimit encoded but unencrypted data within the output stream.

As you can see, tab characters are not included. However, you can of course encode a tab character into a base64 string.

Sure. Tab is just ASCII character 9, and that has a base64 representation just like any other integer.

Haha, as you see from the responses, this is actually not such a simple yes no answer.

A resulting Base64 string after conversion cannot contain a tab character, but It seems to me that you are not asking that, seems to me that you are asking can you represent a string (before conversion) containing a tab in Base64, and the answer to that is yes.

I would add though that really what you should do is make sure that you take care to preserve the encoding of your string, i.e. convert it to an array of bytes with your correct encoding (Unicode, UTF-8 whatever) then convert that array of bytes to base64.

EDIT: A simple test.

private void button2_Click(object sender, EventArgs e)
{
  StringBuilder sb = new StringBuilder();
  string test = "The rain in spain falls \t mainly on the plain";
  sb.AppendLine(test);
  UTF8Encoding enc = new UTF8Encoding();
  byte[] b = enc.GetBytes(test);
  string cvtd = Convert.ToBase64String(b);
  sb.AppendLine(cvtd);
  byte[] c = Convert.FromBase64String(cvtd);
  string backAgain = enc.GetString(c);
  sb.AppendLine(backAgain);
  MessageBox.Show(sb.ToString());
}

Base64 specification (RFC 4648) states in Section 3.3 that any encountered non-alphabet characters should be rejected unless explicitly allowed by another specification:

Implementations MUST reject the encoded data if it contains
characters outside the base alphabet when interpreting base-encoded
data, unless the specification referring to this document explicitly states otherwise. Such specifications may instead state, as MIME does, that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any adjacent carriage return/ line feed (CRLF) characters constitute "non-alphabet characters" and are ignored.

Specs such as PEM (RFC 1421) and MIME (RFC 2045) specify that Base64 strings can be broken up by whitespaces. Per referenced RFC 822, a tab (HTAB) is considered a whitespace character.

So, when Base64 is used in context of either MIME or PEM (and probably other similar specifications), whitespace, including tabs, should be handled (stripped out) while decoding the encoded content.

Convert.FromBase64String() in the .NET framework does not seem to mind them. I believe all whitespace in the string is ignored.

string xxx = "ABCD\tDEFG";   //simulated Base64 encoded string w/added tab
Console.WriteLine(xxx);
byte[] xx = Convert.FromBase64String(xxx); // convert string back to binary
Console.WriteLine(BitConverter.ToString(xx));

output:

ABCD    DEFG
00-10-83-0C-41-46

The relevant clause of RFC-2045 (6:8)

The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.

YES!

Base64 is used to encode ANY 8bit value (Decimal 0 to 255) into a string using a set of safe characters. TAB is decimal 9.

Base 64 uses one of the following character sets:

Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

Binary Attachments (eg: email) in text are also encoded using this system.

It seems that there is lots of confusion here; and surprisingly most answers are of "No" variety. I don't think that is a good canonical answer. The reason for confusion is probably the fact that Base64 is not strictly specified; multiple practical implementations and interpretations exist. You can check out link text for more discussion on this.

In general, however, conforming base64 codecs SHOULD understand linefeeds, as they are mandated by some base64 definitions (76 character segments, then linefeed etc). Because of this, most decoders also allow for indentation whitespace, and quite commonly any whitespace between 4-character "triplets" (so named since they encode 3 bytes).

So there's a good chance that in practice you can use tabs and other white space.

But I would not add tabs myself if generating base64 content sent to a service -- be conservative at what you send, (more) liberal at what you receive.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top