Question

I'm trying to write a very cheap C++ code snippet to do the following operation on a short null terminated string.

The input is a string like "ABC". It is null terminated and has maximum length of 4 (or 5 with the null terminator).

The output goes to a char[4] which is not null terminated and should be space padded on the right. So in this case it would be {'A','B','C',' '}

It is ok to assume that the input string is properly null terminated, so there's no need to read a second word of the input to make sure. 4 bytes is the longest it can be.

So the code around it looks like this:

char* input = "AB";
char output[4];
// code snippet goes here
// afterward output will be populated with {'A','B',' ',' '}

How cheaply can this be done? If it matters: I'm working with:

Linux 2.6.32-358.11.1.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux

Lastly, the input is word aligned.

Was it helpful?

Solution

How about something like this:

typedef unsigned int word;
int spacePad(word input) {
    static const word spaces = 0x20202020;

    word mask =
       !input ?                0 :
       !(input & 0x00ffffff) ? 0xff:
       !(input & 0x0000ffff) ? 0xffff :
       !(input & 0x0000ff)   ? 0xffffff :
                               0xffffffff;
    // or without branches
    word branchless_mask =
       1u << (8 * (
         bool(input & 0xff000000) +
         bool(input & 0x00ff0000) +
         bool(input & 0x0000ff00) +
         bool(input & 0x000000ff)
       ));

    return (spaces & mask) | (input & ~mask);
}

And if I didn't screw up, spacePad(0xaabb0000) is 0xaabb2020.

Instead of computing and-masks, you could use SSE intrinsics which would probably be faster since you'd get the mask in a couple of instruction, and then masked move would do the rest, but the compiler would probably move your variables arround from SSE to standard registers which could outweight the slight gain. It all depends on how much data you need to process, how it's packed in memory, etc.

If the input in a char* and not an int, normally additionnal code would be necessary since a cast could read into unallocated memory. But since you mention all strings are word-aligned a cast is enough, indeed even if there are a few unallocated bytes, they are on the same word as at least one allocated byte. Since you are only reading there's no risk of memory corruption and on all architectures I know of, hardware memory protection has a granularity larger than a word. For instance on x86 a memory page is often 4k aligned.

Now that's all nice and hacky, but: before selecting a solution, benchmark it, that's the only way to know which is best for you (except of course the warm fuzzy feeling of writing code like this ^^)

OTHER TIPS

If speed is your issue - use brute force.

This does not access input outside its bounds, nor destroys it.

 const char* input = TBD();
 char output[4] = {' '};
 if (input[0]) {
   output[0] = input[0];
   if (input[1]) {
     output[1] = input[1];
     if (input[2]) {
       output[2] = input[2];
       if (input[3]) {
         output[3] = input[3];
       }
     }
   }
 }
char* input = "AB";
char output[4];

input += (output[0] = *input ? *input : ' ') != ' ';
input += (output[1] = *input ? *input : ' ') != ' ';
input += (output[2] = *input ? *input : ' ') != ' ';
output[3] = *input ? *input : ' ';

Note that this destroys the original input pointer, so make a copy of that if you need to preserve it.

For short strings like this, I don't think you can do much better than the trivial implementation:

char buffer[4];

const char * input = "AB";
const char * in = input;
char * out = buffer;
char * end = buffer + sizeof buffer;

while (out < end)
{
    *out = *in != 0 ? *in++ : ' ';
    out++;
}

If your input is null terminated a simple strcpy will suffice. The memcpy is faster but will copy whatever garbage it find after the null char.

You are looking for memcpy:

char* input = "AB\0\0";
char output[4];
memcpy(output, input, 4);

If your input is variable, you'll need to calculate the size first:

char* input = "AB";
std::size_t len = strlen(input);
char output[4] = {' ', ' ', ' ', ' '};
memcpy(output, input, std::min(4, len));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top