Easiest way would be to write a cilk_for loop that loops over the output coefficients, and inside the loop, for each output coefficient, accumulate an inner product.
Call the output coefficient c[k]. The loop will look like:
cilk_for( k=0; k<2n-1; ++k )
c[k] = __sec_reduce( a[...:...]*b[...:...:-1] );
The ... need to be expressions that yield the subsections that contribute to each output coefficient. I have an intermittent Internet connection, so I'm leaving that as an exercise to the reader.
The downdload site for the book (http://parallelbook.com/downloads) has a recursive version that is asymptotically faster than the scheme above.