C++ templated array operator[] using ints

Question 1

Since you said in a comment that your index is always a template parameter, then you can indeed make the branching at compile-time instead of runtime. Here is a possible solution using std::enable_if:

#include <iostream>
#include <type_traits>

struct f32x4
{
    float fLow[2];
    float fHigh[2];

    f32x4(float a, float b, float c, float d)
    {
        fLow[0] = a; 
        fLow[1] = b;
        fHigh[0] = c;
        fHigh[1] = d;
    }

    template <int x>
    float& get(typename std::enable_if<(x >= 0 && x < 2)>::type* = 0)
    {
        return fLow[x];
    }

    template <int x>
    float& get(typename std::enable_if<(x >= 2 && x < 4)>::type* = 0)
    {
        return fHigh[x-2];
    }
};

int main()
{
    f32x4 f(0.f, 1.f, 2.f, 3.f);

    std::cout << f.get<0>() << " " << f.get<1>() << " "
              << f.get<2>() << " " << f.get<3>(); // prints 0 1 2 3
}

Regarding performance, I don't think there will be any difference since the optimizer should be able to easily propagate the constants and remove dead code subsequently, thereby removing the branch altogether. However, with this approach, you get the benefit that any attempts to invoke the function with an invalid index will result in a compiler error.

Question 2

Either the index x is a runtime variable, or a compile-time constant.

if it is a compile-time constant, there's a good chance the optimizer will be able to prune the dead branch when inlining operator[] anyway.
if it is a runtime variable, like
```
for (int i=0; i<4; ++i) { dosomething(f[i]); }
```
you need the branch anyway. Unless, of course, your optimizer unrolls the loop, in which case it can replace the variable with four constants, inline & prune as above.

Did you profile this to show there's a real problem, and compile it to show the branch really happens where it could be avoided?

Example code:

float foo(f32x4 &f)
{
    return f[0]+f[1]+f[2]+f[3];
}

example output from g++ -O3 -S

.globl _Z3fooR5f32x4
        .type       _Z3fooR5f32x4, @function
_Z3fooR5f32x4:
.LFB4:
        .cfi_startproc
        movss       (%rdi), %xmm0
        addss       4(%rdi), %xmm0
        addss       8(%rdi), %xmm0
        addss       12(%rdi), %xmm0
        ret
        .cfi_endproc

Question 3

Seriously, don't do this!! Just combine the arrays. But since you asked the question, here's an answer:

#include <iostream>

float fLow [2] = {1.0,2.0};
float fHigh [2] = {50.0,51.0};

float * fArrays[2] = {fLow, fHigh};

float getFloat (int i)
{
    return fArrays[i>=2][i%2];
}

int main()
{
    for (int i = 0; i < 4; ++i)
        std::cout << getFloat(i) << '\n';
    return 0;
}

Output:

Question 4

Create one array (or vector) with all 4 elements in it, the fLow values occupying the first two positions, then high in the second 2. Then just index into it.

inline float& operator[] (int x) {
    return newFancyArray[x]; //But do some bounds checking above.
}

Question 5

Based on Luc Touraille's answer, without using type traits due to their lack of compiler support, I found the following to achieve the purpose of the question. Since the operator[] can not be templatized with an int parameter and work syntactically, I introduced an at method. This is the result:

struct f32x4
{
    float fLow[2];
    float fHigh[2];

    f32x4(float a, float b, float c, float d)
    {
        fLow[0] = a; 
        fLow[1] = b;
        fHigh[0] = c;
        fHigh[1] = d;
    }


    template <unsigned T>
    const float& at() const;

};
template<>
const float& f32x4::at<0>() const { return fLow[0]; }
template<>
const float& f32x4::at<1>() const { return fLow[1]; }
template<>
const float& f32x4::at<2>() const { return fHigh[0]; }
template<>
const float& f32x4::at<3>() const { return fHigh[1]; }