Domanda

This is my fourth attempt at doing base64 encoding. My first tries work but it isn't standard. It's also extremely slow!!! I used vectors and push_back and erase a lot.

So I decided to re-write it and this is much much faster! Except that it loses data. -__- I need as much speed as I can possibly get because I'm compressing a pixel buffer and base64 encoding the compressed string. I'm using ZLib. The images are 1366 x 768 so yeah.

I do not want to copy any code I find online because... Well, I like to write things myself and I don't like worrying about copyright stuff or having to put a ton of credits from different sources all over my code..

Anyway, my code is as follows below. It's very short and simple.

const static std::string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

inline bool IsBase64(std::uint8_t C)
{
    return (isalnum(C) || (C == '+') || (C == '/'));
}

std::string Copy(std::string Str, int FirstChar, int Count)
{
    if (FirstChar <= 0)
        FirstChar = 0;
    else
        FirstChar -= 1;
    return Str.substr(FirstChar, Count);
}

std::string DecToBinStr(int Num, int Padding)
{
    int Bin = 0, Pos = 1;
    std::stringstream SS;
    while (Num > 0)
    {
        Bin += (Num % 2) * Pos;
        Num /= 2;
        Pos *= 10;
    }
    SS.fill('0');
    SS.width(Padding);
    SS << Bin;
    return SS.str();
}

int DecToBinStr(std::string DecNumber)
{
    int Bin = 0, Pos = 1;
    int Dec = strtol(DecNumber.c_str(), NULL, 10);

    while (Dec > 0)
    {
        Bin += (Dec % 2) * Pos;
        Dec /= 2;
        Pos *= 10;
    }
    return Bin;
}

int BinToDecStr(std::string BinNumber)
{
    int Dec = 0;
    int Bin = strtol(BinNumber.c_str(), NULL, 10);

    for (int I = 0; Bin > 0; ++I)
    {
        if(Bin % 10 == 1)
        {
            Dec += (1 << I);
        }
        Bin /= 10;
    }
    return Dec;
}

std::string EncodeBase64(std::string Data)
{
    std::string Binary = std::string();
    std::string Result = std::string();

    for (std::size_t I = 0; I < Data.size(); ++I)
    {
        Binary += DecToBinStr(Data[I], 8);
    }

    for (std::size_t I = 0; I < Binary.size(); I += 6)
    {
        Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
        if (I == 0) ++I;
    }

    int PaddingAmount = ((-Result.size() * 3) & 3);
    for (int I = 0; I < PaddingAmount; ++I)
        Result += '=';

    return Result;
}

std::string DecodeBase64(std::string Data)
{
    std::string Binary = std::string();
    std::string Result = std::string();

    for (std::size_t I = Data.size(); I > 0; --I)
    {
        if (Data[I - 1] != '=')
        {
            std::string Characters = Copy(Data, 0, I);
            for (std::size_t J = 0; J < Characters.size(); ++J)
                Binary += DecToBinStr(Base64Chars.find(Characters[J]), 6);
            break;
        }
    }

    for (std::size_t I = 0; I < Binary.size(); I += 8)
    {
        Result += (char)BinToDecStr(Copy(Binary, I, 8));
        if (I == 0) ++I;
    }

    return Result;
}

I've been using the above like this:

int main()
{
    std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));  //IMG.677*604
    std::cout<<DecodeBase64(Data);        //Prints IMG.677*601
}

As you can see in the above, it prints the wrong string. It's fairly close but for some reason, the 4 is turned into a 1!

Now if I do:

int main()
{
    std::string Data = EncodeBase64("IMG." + ::ToString(1366) + "*" + ::ToString(768));  //IMG.1366*768
    std::cout<<DecodeBase64(Data);        //Prints IMG.1366*768
}

It prints correctly.. I'm not sure what is going on at all or where to begin looking.

Just in-case anyone is curious and want to see my other attempts (the slow ones): http://pastebin.com/Xcv03KwE

I'm really hoping someone could shed some light on speeding things up or at least figuring out what's wrong with my code :l

È stato utile?

Soluzione

The main encoding issue is that you are not accounting for data that is not a multiple of 6 bits. In this case, the final 4 you have is being converted into 0100 instead of 010000 because there are no more bits to read. You are supposed to pad with 0s.

After changing your Copy like this, the final encoded character is Q, instead of the original E.

std::string data = Str.substr(FirstChar, Count);
while(data.size() < Count) data += '0';
return data;

Also, it appears that your logic for adding padding = is off because it is adding one too many = in this case.

As far as comments on speed, I'd focus primarily on trying to reduce your usage of std::string. The way you are currently converting the data into a string with 0 and 1 is pretty inefficent considering that the source could be read directly with bitwise operators.

Altri suggerimenti

I'm not sure whether I could easily come up with a slower method of doing Base-64 conversions.

The code requires 4 headers (on Mac OS X 10.7.5 with G++ 4.7.1) and the compiler option -std=c++11 to make the #include <cstdint> acceptable:

#include <string>
#include <iostream>
#include <sstream>
#include <cstdint>

It also requires a function ToString() that was not defined; I created:

std::string ToString(int value)
{
    std::stringstream ss;
    ss << value;
    return ss.str();
}

The code in your main() — which is what uses the ToString() function — is a little odd: why do you need to build a string from pieces instead of simply using "IMG.677*604"?

Also, it is worth printing out the intermediate result:

int main()
{
    std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));
    std::cout << Data << std::endl;
    std::cout << DecodeBase64(Data) << std::endl;        //Prints IMG.677*601
}

This yields:

SU1HLjY3Nyo2MDE===
IMG.677*601

The output string (SU1HLjY3Nyo2MDE===) is 18 bytes long; that has to be wrong as a valid Base-64 encoded string has to be a multiple of 4 bytes long (as three 8-bit bytes are encoded into four bytes each containing 6 bits of the original data). This immediately tells us there are problems. You should only get zero, one or two pad (=) characters; never three. This also confirms that there are problems.

Removing two of the pad characters leaves a valid Base-64 string. When I use my own home-brew Base-64 encoding and decoding functions to decode your (truncated) output, it gives me:

Base64:
0x0000: SU1HLjY3Nyo2MDE=
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 31 00               IMG.677*601.

Thus it appears you have encode the null terminating the string. When I encode IMG.677*604, the output I get is:

Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 34                  IMG.677*604
Base64: SU1HLjY3Nyo2MDQ=

You say you want to speed up your code. Quite apart from fixing it so that it encodes correctly (I've not really studied the decoding), you will want to avoid all the string manipulation you do. It should be a bit manipulation exercise, not a string manipulation exercise.

I have 3 small encoding routines in my code, to encode triplets, doublets and singlets:

/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
    quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
    quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
    quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
    quad[3] = base_64_map[triplet[2] & 0x3F];
}

/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
    quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
    quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
    quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
    quad[3] = pad;
}

/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
    quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
    quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
    quad[2] = pad;
    quad[3] = pad;
}

This is written as C code rather than using native C++ idioms, but the code shown should compile with C++ (unlike the C99 initializers elsewhere in the source). The base_64_map[] array corresponds to your Base64Chars string. The pad character passed in is normally '=', but can be '\0' since the system I work with has eccentric ideas about not needing padding (pre-dating my involvement in the code, and it uses a non-standard alphabet to boot) and the code handles both the non-standard and the RFC 3548 standard.

The driving code is:

/* Encode input data as Base-64 string.  Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
    size_t outlen = BASE64_ENCLENGTH(datalen);
    const char *bin_data = (const void *)data;
    char *b64_data = (void *)buffer;

    if (outlen > buflen)
        return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
    while (datalen >= 3)
    {
        encode_triplet(bin_data, b64_data);
        bin_data += 3;
        b64_data += 4;
        datalen -= 3;
    }
    b64_data[0] = '\0';

    if (datalen == 2)
        encode_doublet(bin_data, b64_data, pad);
    else if (datalen == 1)
        encode_singlet(bin_data, b64_data, pad);
    b64_data[4] = '\0';
    return((b64_data - buffer) + strlen(b64_data));
}

/* Encode input data as Base-64 string.  Output length returned, or negative error */
int base64_encode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
    return(base64_encode_internal(data, datalen, buffer, buflen, base64_pad));
}

The base64_pad constant is the '='; there's also a base64_encode_nopad() function that supplies '\0' instead. The errors are somewhat arbitrary but relevant to the code.

The main point to take away from this is that you should be doing bit manipulation and building up a string that is an exact multiple of 4 bytes for a given input.

std::string EncodeBase64(std::string Data)
{
    std::string Binary = std::string();
    std::string Result = std::string();

    for (std::size_t I = 0; I < Data.size(); ++I)
    {
        Binary += DecToBinStr(Data[I], 8);
    }

    if (Binary.size() % 6)
    {
        Binary.resize(Binary.size() + 6 - Binary.size() % 6, '0');
    }

    for (std::size_t I = 0; I < Binary.size(); I += 6)
    {
        Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
        if (I == 0) ++I;
    }

    if (Result.size() % 4)
    {
        Result.resize(Result.size() + 4 - Result.size() % 4, '=');
    }

    return Result;
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top