Generating a C++ function from a list of argument

Question 1

The key problem you are trying to solve is that a "leaf" function in your calculation that will be called many times will also most often do no work given the problem domain. The hope is that the redundant work - namely multiplying a value with an element of an array known at compile time to be zero - can be collapsed as part of a compile time step.

C++ has language facilities to deal with this, namely template metaprogramming. C++ templates are very powerful (ie Turing complete) and allow for things like recursive calculations based on compile time constants.

Below is an example of how to implement your example using templates and template specialization (you can also find a runnable example I've created here http://ideone.com/BDtBt7). The basic idea behind the code is to generate a type with a static function that returns the resulting dot product of an input vector of values and a compile time constant array. The static function recursively calls instances of itself, passing a lower index value as it moves through the input/constant arrays of elements. It is also templated with whether the value in the compile time constant array p that is being evaluated is zero. If it is, we can skip calculating that and move onto the next value in the recursion. Lastly, there is a base case that stops the recursion once we have reached the first element in the array.

#include <array>
#include <iostream>
#include <vector>

constexpr std::array<double, 5> p = { 1.0, 0.0, 3.0, 5.0, 0.0 };

template<size_t index, bool isZero>
struct DotProductCalculator
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return (xArg[index] * p[index]) 
            + DotProductCalculator<index - 1, p[index - 1] == 0.0>::Calculate(xArg);
    }
};

template<>
struct DotProductCalculator<0, true>
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return 0.0;
    }
};

template<>
struct DotProductCalculator<0, false>
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return xArg[0] * p[0];
    }
};

template<size_t index>
struct DotProductCalculator<index, true>
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return 0.0 + DotProductCalculator<index - 1, p[index - 1] == 0.0>::Calculate(xArg);
    }
};

template<typename ArrayType>
double f_p_driver(const std::vector<double>& xArg, const ArrayType& pAsArgument) 
{
     return DotProductCalculator<std::tuple_size<ArrayType>::value - 1, 
                            p[std::tuple_size<ArrayType>::value -1] == 0.0>::Calculate(xArg); 
}

int main() 
{
    std::vector<double> x = { 1.0, 2.0, 3.0, 4.0, 5.0 };
    double result = f_p_driver(x, p);
    std::cout << "Result: " << result;
    return 0;
}

Question 2

You say in the comments that P really is a row or column of a matrix, and that the matrix is sparse. I'm not familiar with the specific physical problem you are solving, but often, sparse matrices have a fixed diagonal "banding" structure of some kind, e.g.:

| a1 b1  0  0  0  0  0 d1 |
| c1 a2 b2  0  0  0  0  0 |
| 0  c2 a3 b3  0  0  0  0 |
| 0  0  c3 a4 b4  0  0  0 |
| 0  0  0  c4 a5 b5  0  0 |
| 0  0  0  0  c5 a6 b6  0 |
| 0  0  0  0  0  c6 a7 b7 |
| e1 0  0  0  0  0  c7 a8 |

The most efficient way to store such matrices tends to be to store the diagonals as arrays/vectors, so:

A = [a1, a2, a3, a4, a5, a6, a7, a8]
B = [b1, b2, b3, b4, b5, b6, b7]
C = [c1, c2, c3, c4, c5, c6, c7]
D = [d1]
E = [e1]

Multiplying a row-vector X = [x1, x2, x3, x4, x5, x6, x7, x8] by the above matrix thus becomes:

Y = X . M

Y[0] =               X[0] * A[0] + X[1] * C[0] + X[7] * E[0]
Y[1] = X[0] * B[0] + X[1] * A[1] + X[2] * C[1]

etc.

or more generally:

Y[i] = X[i-7] * D[i] + X[i-1] * B[i] + X[i] * A[i] + X[i+1] * C[i] + X[i+7] * E[i]

Where out-of-range array accesses (< 0 or >= 8 should be treated as evaluating to 0. To avoid having to test for out-of-bounds everywhere, you can actually store each diagonal and the vector itself in oversize arrays which have leading and trailing elements filled with zeroes.

Note that this will also be highly cache efficient, as all array accesses are linear.

Question 3

With the given constraints I would create a custom function object which stores the matrix p and computes the operation in its function call operator. I would implement two versions of the function: one which preprocesses the matrix upon construction to "know" where the non-zero elements are and one which just does the operations as stated, accepting that many of the computations just result in 0. The quoted amount of 10% non-zero elements sounds likely to be too dense for the complication from taking advantage of the sparsity to pay off.

Ignoring that p is a matrix and using it as a vector, the version without preprocessing would be something like this:

class dotProduct {
    std::vector<double> p;
public:
    dotProduct(std::vector<double> const& p): p(p) {}
    double operator()(std::vector<double> const& x) const {
        return std::inner_product(p.begin(), p.end(), x.begin());
    }
};
// ...
... RungeKutta(dotProduct(p), initial conditions IC, etc.);

When using C++11 a lambda function could be used instead:

... RungeKutta([=](std::vector<double> const& x) {
        return std::inner_product(p.begin(), p.end(), x.begin());
    }, intitial conditions IC, etc.);

For the preprocessing version you'd store a std::vector<std::pair<double, std::size_t>> indicating which indices actually need to be multiplied:

class sparseDotProduct {
    typedef std::vector<std::pair<double, std::size_t>> Vector;
    Vector p;
public:
     sparsedotProduct(std::vector<double> const& op) {
         for (std::size_t i(0), s(op.size()); i != s; ++i) {
             if (op[i]) {
                 p.push_back(std::make_pair(op[i], i));
             }
         }
     }
     double operator()(std::vector<double> const& x) {
         double result(0);
         for (Vector::const_iterator it(p.begin()), end(p.end()); it != end; ++it) {
             result += it->first * x[it->second];
         }
         return result;
     }
};

The use of this function object is just the same although it may be reasonable to keep this object around if p doesn't change.

I would personally expect that the non-sparse version actually outperforms the sparse version if there are 10% non-zero values. However, with these two versions around it should be relatively simple to measure the performance of the different approaches. I wouldn't expect a custom created code to be substantially better although it could improve on the computation. If so, it may work to use meta programming techniques to create the code but I doubt that this would be too practical.