You actually shouldn't need to lock the matrix (unless its in a file). If its in a file, I would just load the matrix into memory first and then you own't need a lock. Look at it this way:
If you have an nxm matrix, have your parent process fork off m child processes and wait for the child processes.
In each child process, have each one take each one of the m rows.
Have each child process add up each row, and set the values to 0 and put the sum in the last column.
End each child process.
When all are done, have your parent process sum up the nth row of the column.
Since all the child processes will be acting on their own data set, they won't need to lock any part of the matrix since we won't be accessing the same region of memory ever.