If I were using 2D notation , I would simple use U[l][k] and so on.
So, add that layer of abstraction - don't let everything else get complicated. You have:
A = (float *) malloc( M * N * sizeof(float) );
At a minimum, you can use:
float& at(float* p, int rows, int col, int row) { return p[rows * col + row]; }
(reorder arguments to taste)
Then you can say:
at(A, M, col, row)
(Or similar - I wouldn't swear I got all the rows/column names right - but IMHO you should have used Rows and Columns instead of M and N so I'm not going to bust a gut over it.)
If you want to get a little fancier, in C++ you can wrap the allocations in a class that stores the pointer and #rows/columns, then overloads float& operator()(int col, int row)
and const float& operator()(int col, int row) const
(or just float operator()(int col, int row) const
if you don't care about ability to take the address of the array entry).