I've discovered my own algorithm to calculate the coefficients of the polynomial obtained from Faulhaber's formula; it, its proof and several implementations can be found at github.com/fcard/PolySum. This question inspired me to include a c++ implementation (using the GMP library for arbitrary precision numbers), which, as of the time of writing and minus several usability features, is:
#include <gmpxx.h>
#include <vector>
namespace polysum {
typedef std::vector<mpq_class> mpq_row;
typedef std::vector<mpq_class> mpq_column;
typedef std::vector<mpq_row> mpq_matrix;
mpq_matrix make_matrix(size_t n) {
mpq_matrix A(n+1, mpq_row(n+2, 0));
A[0] = mpq_row(n+2, 1);
for (size_t i = 1; i < n+1; i++) {
for (size_t j = i; j < n+1; j++) {
A[i][j] += A[i-1][j];
A[i][j] *= (j - i + 2);
}
A[i][n+1] = A[i][n-1];
}
A[n][n+1] = A[n-1][n+1];
return A;
}
void reduced_row_echelon(mpq_matrix& A) {
size_t n = A.size() - 1;
for (size_t i = n; i+1 > 0; i--) {
A[i][n+1] /= A[i][i];
A[i][i] = 1;
for (size_t j = i-1; j+1 > 0; j--) {
auto p = A[j][i];
A[j][i] = 0;
A[j][n+1] -= A[i][n+1] * p;
}
}
}
mpq_column sum_coefficients(size_t n) {
auto A = make_matrix(n);
reduced_row_echelon(A);
mpq_column result;
for (auto row: A) {
result.push_back(row[n+1]);
}
return result;
}
}
We can use the above like so:
#include <cmath>
#include <gmpxx.h>
#include <polysum.h>
mpq_class power_sum(size_t K, unsigned int N) {
auto coeffs = polysum::sum_coefficients(K)
mpq_class result(0);
for (size_t i = 0; i <= K; i++) {
result += A[i][n+1] * pow(N, i+1);
}
return result;
}
The full implementation provides a Polynomial
class that is printable and callable, as well as a polysum
function to construct one as a sum of another polynomial.
#include "polysum.h"
void power_sum_print(size_t K, unsigned int N) {
auto F = polysum::polysum(K);
std::cout << "polynomial: " << F;
std::cout << "result: " << F(N);
}
As for efficiency, the above calculates the result for K=1000
and N=1e16
in about 1.75
seconds on my computer, compared to the much more mature and optimized SymPy implementation which takes about 90
seconds on the same machine, and mathematica which takes 30
seconds. For K=3000
the above takes about 4
minutes, mathematica took almost 20
minutes, (but uses much less memory) and I left sympy running all night but it didn't finish, maybe due to it running out of memory.
Among the optimizations that can be done here are making the matrix sparse and taking advantage of the fact that only half of the rows and columns need to be calculated. The Rust version in the linked repository implements the sparse and rows optimizations, and takes about 0.7
seconds to calculate K=1000
, and about 45
to calculate K=3000
(using 105mb
and 2.9gb
of memory respectively). The Haskell version implements all three optimizations and takes about 1
second for K=1000
and about 34
seconds for K=3000
. (using 60mb
and 880mb
of memory respectively) and The completely unoptimized python implementation takes about 12
seconds for K=1000
but runs out of memory for K=3000
.
It's looking like this method is the fastest regardless of the language used, but the research is ongoing. Since Schultz's method also boils down to solving a system of n+1
equations and should be able to be optimized the same way, it will depend on whether his matrix is faster to calculate or not. Also, memory usage is not scaling well at all, and Mathematica is still the clear winner here, using only 80mb
for K=3000
. We'll see.