You are not using the Fork-Join pool the way it was intended. You need to fragment the work into small sections. Then decompose() each section and combine the results.
compute() needs code such as:
if (work < max) {
work left = split left half
work right = split right half
fork (left)
right.compute()
left.join()
}
Without splitting the work you are not going to use multiple threads. Using semaphores single threads the work and you will never see a speed up.
Look at the examples in the API for using this framework.