I suspect that the performance issue is in the bodies of your carbon()
, hydrogen()
, nitrogen()
, oxygen()
, and sulfur()
functions. You should show how they produce the random data.
Or it could be in the if (sum < threshold) {} else {}
code.
I wanted to keep setting the seed so the results would not be deterministic (closer to being truly random)
Since you're using the result of time(0)
as a seed you're not getting particularly random results either way.
Instead of using srand()
and rand()
you should take a look at the <random>
library and choose an engine with the performance/quality characteristics that meed your needs. If your implementation supports it you can even get non-deterministic random data from std::random_device
(either to generate seeds or as an engine).
Additionally <random>
provides pre-made distributions such as std::uniform_real_distribution<double>
which is likely to be better than the average programmer's method of manually computing the distribution you want from the results of rand()
.
Okay, here's how you can eliminate the inner loops from your code and drastically speed it up (In Java or C++).
Your code:
double carbon() {
if (rand() % 10000 < 107)
return 13.0033548378;
else
return 12.0;
}
picks one of two values with a particular probability. Presumably you intended the first value to be picked about 107 times out of 10000 (although using %
with rand()
doesn't quite give you that). When you run this in a loop and sum the results as in:
for (int i = 0; i < composition[0]; i++) sum += carbon();
you'll essentially get sum += X*13.0033548378 + Y*12.0;
where X is the number of times the random number stays under the threshold and Y is (trials-X). It just so happens that you can simulate running a bunch of trials and calculating the number of successes using a binomial distribution, and <random>
happens to provide a binomial distribution.
Given a function sum_trials()
std::minstd_rand0 eng; // global random engine
double sum_trials(int trials, double probability, double A, double B) {
std::binomial_distribution<> dist(trials, probability);
int successes = dist(eng);
return successes*A + (trials-successes)*B;
}
You can replace your carbon()
loop:
sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials
I don't have the actual values you're using, but your whole loop will look something like:
for (int i = 0; i < 100000000; i++) {
double sum = 0;
sum += sum_trials(composition[0], 107.0/10000.0, 13.003354378, 12.0); // carbon trials
sum += sum_trials(composition[1], 107.0/10000.0, 13.003354378, 12.0); // hydrogen trials
sum += sum_trials(composition[2], 107.0/10000.0, 13.003354378, 12.0); // nitrogen trials
sum += sum_trials(composition[3], 107.0/10000.0, 13.003354378, 12.0); // oxygen trials
sum += sum_trials(composition[4], 107.0/10000.0, 13.003354378, 12.0); // sulfur trials
if (sum > threshold) {
} else {
}
}
Now one thing to note is that inside the function we're constructing distributions over and over with the same data. We can extract that by replacing the function sum_trials()
with a function object, which we construct with the appropriate data once before the loop, and then just use the functor repeatedly:
struct sum_trials {
std::binomial_distribution<> dist;
double A; double B; int trials;
sum_trials(int t, double p, double a, double b) : dist{t, p}, A{a}, B{b}, trials{t} {}
double operator() () {
int successes = dist(eng);
return successes * A + (trials - successes) * B;
}
};
int main() {
int threshold = 5;
int composition[5] = { 10, 10, 10, 10, 10 };
sum_trials carbon = { composition[0], 107.0/10000.0, 13.003354378, 12.0};
sum_trials hydrogen = { composition[1], 107.0/10000.0, 13.003354378, 12.0};
sum_trials nitrogen = { composition[2], 107.0/10000.0, 13.003354378, 12.0};
sum_trials oxygen = { composition[3], 107.0/10000.0, 13.003354378, 12.0};
sum_trials sulfur = { composition[4], 107.0/10000.0, 13.003354378, 12.0};
for (int i = 0; i < 100000000; i++) {
double sum = 0;
sum += carbon();
sum += hydrogen();
sum += nitrogen();
sum += oxygen();
sum += sulfur();
if (sum > threshold) {
} else {
}
}
}
The original version of the code took my system about one minute 30 seconds. The last version here takes 11 seconds.
Here's a functor to generate the oxygen sums using two binomial_distributions. Maybe one of the other distributions can do this in one shot but I don't know.
struct sum_trials2 {
std::binomial_distribution<> d1;
std::binomial_distribution<> d2;
double A; double B; double C;
int trials;
double probabilty2;
sum_trials2(int t, double p1, double p2, double a, double b, double c)
: d1{t, p1}, A{a}, B{b}, C{c}, trials{t}, probability2{p2} {}
double operator() () {
int X = d1(eng);
d2.param(std::binomial_distribution<>{trials-X, p2}.param());
int Y = d2(eng);
return X*A + Y*B + (trials-X-Y)*C;
}
};
sum_trials2 oxygen{composition[3], 17.0/1000.0, (47.0-17.0)/(1000.0-17.0), 17.9999, 16.999, 15.999};
You can further speed this up if you can just calculate the probability that the sum is under your threshold
:
int main() {
std::minstd_rand0 eng;
std::bernoulli_distribution dist(probability_sum_is_over_threshold);
for (int i=0; i< 100000000; ++i) {
if (dist(eng)) {
} else {
}
}
}
Unless the values for the other elements can be negative then the probability that the sum is greater than five is 100%. In that case you don't even need to generate random data; execute the 'if' branch of your code 100,000,000 times.
int main() {
for (int i=0; i< 100000000; ++i) {
//execute some code
}
}