Вопрос

I'm trying to decode base64 data in C. I found the implementation I want to use, but I'm not sure exactly how it works and I'm hoping for some help/explanation on the general syntax that is used here: Base64Decode

The code I'm trying to understand is:

int base64decode (char *in, size_t inLen, unsigned char *out, size_t *outLen) { 
    char *end = in + inLen;
    size_t buf = 1, len = 0;

    while (in < end) {
        unsigned char c = d[*in++];

        switch (c) {
        case WHITESPACE: continue;   /* skip whitespace */
        case INVALID:    return 1;   /* invalid input, return error */
        case EQUALS:                 /* pad character, end of data */
            in = end;
            continue;
        default:
            buf = buf << 6 | c;

            /* If the buffer is full, split it into bytes */
            if (buf & 0x1000000) {
                if ((len += 3) > *outLen) return 1; /* buffer overflow */
                *out++ = buf >> 16;
                *out++ = buf >> 8;
                *out++ = buf;
                buf = 1;
            }   
        }
    }

    if (buf & 0x40000) {
        if ((len += 2) > *outLen) return 1; /* buffer overflow */
        *out++ = buf >> 10;
        *out++ = buf >> 2;
    }
    else if (buf & 0x1000) {
        if (++len > *outLen) return 1; /* buffer overflow */
        *out++ = buf >> 4;
    }

    *outLen = len; /* modify to reflect the actual output size */
    return 0;
}

I'm stuck at *out++ = buf >> 16; , it reads to me that the value at out will be equal to buff, which should be 1000000 shifted 16 bits to the right? There aren't even 16 bits in the value, so wouldn't it zero out? I'd really like to understand the rest of this code as well, any help would be greatly appreciated. Thanks in advance!

Это было полезно?

Решение

You can work this out step by step. Let's skip the whitespace, equal-sign and invalid characters and the code to do with padding at the end and focus on the loop and the defualt clause:

size_t buf = 1;

while (in < end) {
    unsigned char c

    /* read next byte */
    c = d[*in++]; 

    /* append byte to number */
    buf = buf << 6 | c;

    /* If the buffer is full, split it into bytes */
    if (buf & 0x1000000) {
        *out++ = buf >> 16;
        *out++ = buf >> 8;
        *out++ = buf;
        buf = 1;
    }
}

The code is read byte by byte and then appended to buf. The input comes in 6-ybit-chunks and the output should be 8-bit-chunks, aka bytes. (Illegal input characters are those with any of the top two bits set.)

The idea is to use buf as an auxiliary buffer that stores four six-bit values until it is full. Then, write the contents of that buffer out as three eight-bit values.

We start with buf == 1:

.... .... .... .... .... .... .... ...1

Empty bits are represented as dots here, it's easier to read than zeros. The 1 is the sentinel value. Okay, read teh next byte, denoted by a. Shift the buffer by six places

.... .... .... .... .... .... .1.. ....    // buf = buf << 6

and do a logical or with the data:

.... .... .... .... .... .... .1aa aaaa    // buf = buf | 'a'

Okay, next byte, 'b':

.... .... .... .... ...1 aaaa aa.. ....    // buf = buf << 6
.... .... .... .... ...1 aaaa aabb bbbb    // buf = buf | 'b'

Next byte, 'c':

.... .... .... .1aa aaaa bbbb bb.. ....    // buf = buf << 6
.... .... .... .1aa aaaa bbbb bbcc cccc    // buf = buf | 'c'

And 'd':

.... ...1 aaaa aabb bbbb cccc cc.. ....    // buf = buf << 6
.... ...1 aaaa aabb bbbb cccc ccdd dddd    // buf = buf | 'd'

Now look whether the buffer is full. (This is done after every byte read, but I've left it out for clarity.) This is done by bit-wise anding buf with 0x1000000:

.... ...1 aaaa aabb bbbb cccc ccdd dddd    // buf
.... ...1 .... .... .... .... .... ....    // 0x1000000
.... ...1 .... .... .... .... .... ....    // buf & 0x1000000

This value is now true for the first time, which means we've read four six-bit chunks and we need to write the data as three eight bit chunks now.

.... .... .... .... .... ...1 aaaa aabb    // buf >> 16
.... .... .... ...1 aaaa aabb bbbb cccc    // buf >> 8
.... ...1 aaaa aabb bbbb cccc ccdd dddd    // buf

These values are written to bytes, i.e. unsigned chars, which will truncate them to the lowest eight bits:

---- ---- ---- ---- ---- ---- aaaa aabb    // (uchar) (buf >> 16)
---- ---- ---- ---- ---- ---- bbbb cccc    // (uchar) (buf >> 8)
---- ---- ---- ---- ---- ---- ccdd dddd    // (uchar) buf

Now, reset the buf to 1 and read the next bytes.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top