Combining two 16 bits RGB colors with alpha blending

Question 1

My (untested) solution: I split the foreground and background colors to (red + blue) and (green) components and multiply them with a 6bit alpha value. Enjoy! (Only if it works :)

                            //   rrrrrggggggbbbbb
#define MASK_RB       63519 // 0b1111100000011111
#define MASK_G         2016 // 0b0000011111100000
#define MASK_MUL_RB 4065216 // 0b1111100000011111000000
#define MASK_MUL_G   129024 // 0b0000011111100000000000
#define MAX_ALPHA        64 // 6bits+1 with rounding

uint16 alphablend( uint16 fg, uint16 bg, uint8 alpha ){

  // alpha for foreground multiplication
  // convert from 8bit to (6bit+1) with rounding
  // will be in [0..64] inclusive
  alpha = ( alpha + 2 ) >> 2;
  // "beta" for background multiplication; (6bit+1);
  // will be in [0..64] inclusive
  uint8 beta = MAX_ALPHA - alpha;
  // so (0..64)*alpha + (0..64)*beta always in 0..64

  return (uint16)((
            (  ( alpha * (uint32)( fg & MASK_RB )
                + beta * (uint32)( bg & MASK_RB )
            ) & MASK_MUL_RB )
          |
            (  ( alpha * ( fg & MASK_G )
                + beta * ( bg & MASK_G )
            ) & MASK_MUL_G )
         ) >> 6 );
}

/*
  result masks of multiplications
  uppercase: usable bits of multiplications
  RRRRRrrrrrrBBBBBbbbbbb // 5-5 bits of red+blue
        1111100000011111 // from MASK_RB * 1
  1111100000011111000000 //   to MASK_RB * MAX_ALPHA // 22 bits!


  -----GGGGGGgggggg----- // 6 bits of green
        0000011111100000 // from MASK_G * 1
  0000011111100000000000 //   to MASK_G * MAX_ALPHA
*/

Question 2

The correct formula is something like this:

unsigned short blend(unsigned short fg, unsigned short bg, unsigned char alpha)
{
    // Split foreground into components
    unsigned fg_r = fg >> 11;
    unsigned fg_g = (fg >> 5) & ((1u << 6) - 1);
    unsigned fg_b = fg & ((1u << 5) - 1);

    // Split background into components
    unsigned bg_r = bg >> 11;
    unsigned bg_g = (bg >> 5) & ((1u << 6) - 1);
    unsigned bg_b = bg & ((1u << 5) - 1);

    // Alpha blend components
    unsigned out_r = (fg_r * alpha + bg_r * (255 - alpha)) / 255;
    unsigned out_g = (fg_g * alpha + bg_g * (255 - alpha)) / 255;
    unsigned out_b = (fg_b * alpha + bg_b * (255 - alpha)) / 255;

    // Pack result
    return (unsigned short) ((out_r << 11) | (out_g << 5) | out_b);
}

There is a shortcut you can use for dividing by 255. The compiler should be able to provide some strength reduction, but you might be able to do better by using the following formula instead:

// Alpha blend components
unsigned out_r = fg_r * a + bg_r * (255 - alpha);
unsigned out_g = fg_g * a + bg_g * (255 - alpha);
unsigned out_b = fg_b * a + bg_b * (255 - alpha);
out_r = (out_r + 1 + (out_r >> 8)) >> 8;
out_g = (out_g + 1 + (out_g >> 8)) >> 8;
out_b = (out_b + 1 + (out_b >> 8)) >> 8;

Note the large number of variables in the function... this is okay. If you try to "optimize" the code by rewriting the equations so that it creates fewer temporary variables, you are only doing work that the compiler already does for you. Unless you have a really bad compiler.

If this is not fast enough, there are a few options for how to proceed. However, choosing the correct option depends on the results of profiling, how the code is used, and the target architecture.

Question 3

I've found an alternative method which is ~25% faster than biziclop's approximation. This is also an approximation because it reduces the levels of alpha from 0-255 to 0-31 (32 levels of alpha) but it does not truncate color bits as far as I can tell.

Visually on my TFT display the results look the same as those from biziclop's algorithm, but I haven't checked individual pixel values to see what the difference is (if any).

Note that although fg and bg arguments are 32bit unsigned, you must actually just pass in the 16bit RGB565 colors. The 32bit width is required within the function by the algorithm.

/**
 * Fast RGB565 pixel blending
 * @param fg      The foreground color in uint16_t RGB565 format
 * @param bg      The background color in uint16_t RGB565 format
 * @param alpha   The alpha in range 0-255
 **/
color alphaBlendRGB565( uint32_t fg, uint32_t bg, uint8_t alpha ){
    alpha = ( alpha + 4 ) >> 3;
    bg = (bg | (bg << 16)) & 0b00000111111000001111100000011111;
    fg = (fg | (fg << 16)) & 0b00000111111000001111100000011111;
    uint32_t result = ((((fg - bg) * alpha) >> 5) + bg) & 0b00000111111000001111100000011111;
    return (uint16_t)((result >> 16) | result);
}

I found this solution in a pull request by Chris Chua to the Adafruit Arduino framebuffer libraries. Here is an expanded version with comments to explain the math:

// Fast RGB565 pixel blending
// Found in a pull request for the Adafruit framebuffer library. Clever!
// https://github.com/tricorderproject/arducordermini/pull/1/files#diff-d22a481ade4dbb4e41acc4d7c77f683d
color alphaBlendRGB565( uint32_t fg, uint32_t bg, uint8_t alpha ){
    // Alpha converted from [0..255] to [0..31]
    alpha = ( alpha + 4 ) >> 3;

    // Converts  0000000000000000rrrrrggggggbbbbb
    //     into  00000gggggg00000rrrrr000000bbbbb
    // with mask 00000111111000001111100000011111
    // This is useful because it makes space for a parallel fixed-point multiply
    bg = (bg | (bg << 16)) & 0b00000111111000001111100000011111;
    fg = (fg | (fg << 16)) & 0b00000111111000001111100000011111;

    // This implements the linear interpolation formula: result = bg * (1.0 - alpha) + fg * alpha
    // This can be factorized into: result = bg + (fg - bg) * alpha
    // alpha is in Q1.5 format, so 0.0 is represented by 0, and 1.0 is represented by 32
    uint32_t result = (fg - bg) * alpha; // parallel fixed-point multiply of all components
    result >>= 5;
    result += bg;
    result &= 0b00000111111000001111100000011111; // mask out fractional parts
    return (color)((result >> 16) | result); // contract result
}

Question 4

This one is based on http://www-personal.umich.edu/~bazald/l/api/_s_d_l___r_l_eaccel_8c_source.html

On my RGB565 16_bit device, it produced the cleanest result.

COLOUR ALPHA_BLIT16_565(uint32_t fg, uint32_t bg, int8u alpha) {
    // Alpha converted from [0..255] to [0..31]
    uint32_t ALPHA = alpha >> 3;     
    fg = (fg | fg << 16) & 0x07e0f81f;
    bg = (bg | bg << 16) & 0x07e0f81f;
    bg += (fg - bg) * ALPHA >> 5;
    bg &= 0x07e0f81f;
        return (COLOUR)(bg | bg >> 16);
}