Question

On a parallel machine with $n$ processors we can compute the sum (or product, or the result of any associative operation) on $n$ numbers in $\log n$ steps. In the first step combine neighbors to get $n/2$ numbers left, then combine them so that $n/4$ numbers are left and so on until a single number, the result, is left which happens after $\lceil \log n \rceil$ steps. With this is it obvious that we can compute the matrix product of two $n \times n$ matrices $A = (A_{ij}), B = (B_{ij})$ with $n^3$ processors in $\log n + 1$ steps, in a first step compute all the $n^3$ products of the entries $A_{ik} \cdot B_{kj}$ for $i,j,k \in \{1,\ldots, n\}$. Then with $n^2$ processors compute the $n^2$ sums of the $n$ numbers $A_{i1}\cdot B_{1j} + \ldots + A_{ik}\cdot B_{kj}$. Hence we get by with $1 + \log n$ parallel steps.

Now the following argument is from C. Papadimitriou, Computational Complexity, page 361, to show that we get time $2\log n$ using just $n^2 / \log n$ processors (the argument used is called Brent's principle according to the book).

We compute the $n^3$ products not in a single step as before, but rather in $\log n$ ''shifts'' using $\lceil n^3 / \log \rceil$ processors at each shift. We use shifts of the same $\lceil n^3 / \log n \rceil$ processors to compute the first $\log \log n$ parallel addition steps, where more than $\frac{n^3}{\log n}$ processors would be ordinary needed. The total number of parallel steps is now no more than $2\log n$, with $\frac{n^3}{\log n}$ processors [...].

(those arguments could also be found on these slides).

But I get more parallel steps as $2\log n$.

1) First we compute the $n^3$ products $A_{ik}\cdot B_{kj}$ with $\log n$ shifts, hence we need $\log n$ parallel steps for this.

2) If we combine the results as written in the introduction, then in the $k$-th step we have $n/2^k$ numbers to combine. Hence if $2^k < \log n$ we do not have enough processors to do this in a single step, hence we have to ''shift'' again, this time $\lceil \log n / 2^k \rceil$ times. If $2^k \ge \log n$ we can use a single parallel step as we have enough processors at hand.

Combining the above reasoning we get the following equation (I omit the rounding) for the number of parallel steps: $$ \log n + \sum_{k=1}^{\log \log n} \frac{\log n}{2^k} + \log n - \log \log n. $$ The first summand commes from the shift over the $n^3$ numbers, the second summand for the shifts needed if in the combination/summation of the product we still have to many numbers for our processors, and the last for the number of steps if we do have enough processors. This equals \begin{align*} & \log n + \log n \left( 1 - \frac{1}{2^{\log \log n}} \right) + \log n - \log \log n \\ & = \log n + \log n - 1 + \log n - \log \log n \\ & = 3 \log n - \log \log n - 1 \end{align*} The last is in general more than $2\log n$, so what am I missing. Why does they argue they get by with $2\log n$ steps. Do I miss anything here?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top