fast padded strcpy for a single word

Question 1

How about something like this:

typedef unsigned int word;
int spacePad(word input) {
    static const word spaces = 0x20202020;

    word mask =
       !input ?                0 :
       !(input & 0x00ffffff) ? 0xff:
       !(input & 0x0000ffff) ? 0xffff :
       !(input & 0x0000ff)   ? 0xffffff :
                               0xffffffff;
    // or without branches
    word branchless_mask =
       1u << (8 * (
         bool(input & 0xff000000) +
         bool(input & 0x00ff0000) +
         bool(input & 0x0000ff00) +
         bool(input & 0x000000ff)
       ));

    return (spaces & mask) | (input & ~mask);
}

And if I didn't screw up, spacePad(0xaabb0000) is 0xaabb2020.

Instead of computing and-masks, you could use SSE intrinsics which would probably be faster since you'd get the mask in a couple of instruction, and then masked move would do the rest, but the compiler would probably move your variables arround from SSE to standard registers which could outweight the slight gain. It all depends on how much data you need to process, how it's packed in memory, etc.

If the input in a char* and not an int, normally additionnal code would be necessary since a cast could read into unallocated memory. But since you mention all strings are word-aligned a cast is enough, indeed even if there are a few unallocated bytes, they are on the same word as at least one allocated byte. Since you are only reading there's no risk of memory corruption and on all architectures I know of, hardware memory protection has a granularity larger than a word. For instance on x86 a memory page is often 4k aligned.

Now that's all nice and hacky, but: before selecting a solution, benchmark it, that's the only way to know which is best for you (except of course the warm fuzzy feeling of writing code like this ^^)

Question 2

If speed is your issue - use brute force.

This does not access input outside its bounds, nor destroys it.

 const char* input = TBD();
 char output[4] = {' '};
 if (input[0]) {
   output[0] = input[0];
   if (input[1]) {
     output[1] = input[1];
     if (input[2]) {
       output[2] = input[2];
       if (input[3]) {
         output[3] = input[3];
       }
     }
   }
 }

Question 3

char* input = "AB";
char output[4];

input += (output[0] = *input ? *input : ' ') != ' ';
input += (output[1] = *input ? *input : ' ') != ' ';
input += (output[2] = *input ? *input : ' ') != ' ';
output[3] = *input ? *input : ' ';

Note that this destroys the original input pointer, so make a copy of that if you need to preserve it.

Question 4

For short strings like this, I don't think you can do much better than the trivial implementation:

char buffer[4];

const char * input = "AB";
const char * in = input;
char * out = buffer;
char * end = buffer + sizeof buffer;

while (out < end)
{
    *out = *in != 0 ? *in++ : ' ';
    out++;
}

Question 5

If your input is null terminated a simple strcpy will suffice. The memcpy is faster but will copy whatever garbage it find after the null char.

Question 6

You are looking for memcpy:

char* input = "AB\0\0";
char output[4];
memcpy(output, input, 4);

If your input is variable, you'll need to calculate the size first:

char* input = "AB";
std::size_t len = strlen(input);
char output[4] = {' ', ' ', ' ', ' '};
memcpy(output, input, std::min(4, len));