You may consider implementing lock striping in order to reduce the number of locks and still maintain performance.
Basically, an internal array holding mylockscount = min(concurrencylevel, size)
locks is created.
Whenever an read/write operation occurs, you lock on e.g. mylocks[somehashfunction(x, y) % mylockscount]
.
This way, the number of locks should only scale with the number of concurrent threads and not the size of the matrix.