You need to make sure your parallel processes do not share any state.
For example, in the case of the factorial, I would do the following:
- set a degree of parallelism (DOP) - usually the number of processors on your machine for compute-bound operations
- divide the problem into independent subsets
- process each subset in parallel
- aggregate the results obtained from the parallel processes
This is somehow a simplified Map-Reduce.
The problem consists in multiplying together a set numbers. One way to divide this set into subsets is to use N
parallel for loops where each start at a value i
(where 0 < i <= N
) with a step of N
(and N
= DOP
).
Here's the code that does that:
/// <summary>
/// The max number of parallel tasks
/// </summary>
static readonly int DegreeOfParallelism = Environment.ProcessorCount;
public BigInteger Factorial(long x)
{
// Make as many parallel tasks as our DOP
// And make them operate on separate subsets of data
var parallelTasks =
Enumerable.Range(1, DegreeOfParallelism)
.Select(i => Task.Factory.StartNew(() => Multiply(x, i),
TaskCreationOptions.LongRunning))
.ToArray();
// after all tasks are done...
Task.WaitAll(parallelTasks);
// ... take the partial results and multiply them together
BigInteger finalResult = 1;
foreach (var partialResult in parallelTasks.Select(t => t.Result))
{
finalResult *= partialResult;
}
return finalResult;
}
/// <summary>
/// Multiplies all the integers up to upperBound, with a step equal to DOP
/// starting from a different int
/// </summary>
/// <param name="upperBoud"></param>
/// <param name="startFrom"></param>
/// <returns></returns>
public BigInteger Multiply(long upperBound, int startFrom)
{
BigInteger result = 1;
for (var i = startFrom; i <= upperBound; i += DegreeOfParallelism)
result *= i;
return result;
}
On my machine this computes 100000!
in about 30 seconds and the result is what Wolfram Alpha says it should be.
Update
After running a few more tests, I discovered something which I didn't expect: printing out the 100000!
result to the console takes ~18 seconds (the result has 456574
digits).
The results of the 100000!
computation alone (without printing the number) the are:
- ~10 seconds for parallel execution
- ~16 seconds for sequential execution