문제

New to SO. I am test-driving Armadillo+OpenBLAS, and a simple Monte-Carlo geometric Brownian motion logic shows much longer runtime than MATLAB. I believe something must be wrong.

Environment: Intel i-5 4 core, 8GB ram, VS 2012 Express, Armadillo 4.2, OpenBLAS (official x64 binary) v0.2.9.rc2,

MATLAB takes 2 seconds for the same logic, but Armadillo+OB takes 12 seconds. I also noticed that the program is running on single thread, but I turned to OpenBLAS because I heard of its multi-core capability.

Thanks for any advice.

#include <iostream>
#include <armadillo>
#include <ctime>

using namespace std;
using namespace arma;

int main()
{
clock_t start;
start = clock();
unsigned int R=100000;
vec Spre = 100*ones<vec> (R);
vec S = zeros<vec> (R);
double r = 0.03;
double Vol = 0.2;
double TTM = 5;
unsigned int T=260*TTM;
double dt = TTM/T;
for (unsigned int iT=0; iT<T; ++iT)
{
    S = Spre%exp((r-0.5*Vol*Vol)*dt + Vol*sqrt(dt)*randn(R));
    Spre = S;
}
cout << mean(S) << endl;
cout << (clock()-start) / (double) CLOCKS_PER_SEC << endl;
system("pause");
return 0;
}
도움이 되었습니까?

해결책 2

Key observation is that Armadillo exp() function is way slower than MATLAB. Similar overhead is observed in log(), pow() and sqrt().

다른 팁

First, the bottleneck is not exp(), though std::exp is slow. The problem is randn().

on my machine, randn() takes most of the time. And when I use MKL_VSL 's implementation of randn, the time cost dropped from 12s to 4s, comparable to matlab's 3s or so.

#include <iostream>
#include <armadillo>
#include <ctime>
#include "mkl_vml.h"
#include "mkl_vsl.h"
using namespace std;
using namespace arma;

#define SEED 0
#define BRNG VSL_BRNG_MCG31
#define METHOD 0
int main()
{
clock_t start;
VSLStreamStatePtr stream;
start = clock();
vslNewStream(&stream, BRNG, SEED);
unsigned int R=100000;
vec Spre = 100*ones<vec> (R);
vec S = zeros<vec> (R);
double r = 0.03;
double Vol = 0.2;
double TTM = 5;
unsigned int T=260*TTM;
double dt = TTM/T;
double tmp = sqrt(dt);
vec tmp2=100*zeros<vec>(R);
vec tmp3=100*zeros<vec>(R);
for (unsigned int iT=0; iT<T; ++iT)
{
    vdRngGaussian(METHOD,stream, R, tmp3.memptr(), 0, 1);
    tmp2 =(r - 0.5 * Vol * Vol) * dt + Vol * tmp * tmp3;
    vdExp(R, tmp2.memptr(), tmp3.memptr());
    S = Spre%tmp3;
    Spre = S;
}
cout << mean(S) << endl;
cout << (clock()-start) / (double) CLOCKS_PER_SEC << endl;
vslDeleteStream(&stream);
//system("pause");
return 0;
}

Just a guess, but it looks like you need to set the number of threads to use in OpenBLAS via the OPENBLAS_NUM_THREADS environment variable.

Try something like:

set OPENBLAS_NUM_THREADS=4

...on the command line before you run your program. Substitute the number of cores in your system where I put "4" (some would say set it to twice the number of cores in your system--YMMV).

Make sure you have Streaming SIMD Extensions enabled when you compile your code. In Visual Studio, check your project C/C++ compiler code generation options.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top