Improving a badly conditioned matrix

Question 1

In discrete inverse theory, adding a small value c to the diagonal of the matrix A about to be inverted, is called damping the inversion and the small value to be added c is called Marquardt-Levenberg coefficient. Sometimes matrix A has zero or close to zero eigenvalues, due to which the matrix becomes singular; adding a small damping coefficient to the diagonal elements makes it stable. Bigger is the value of c, bigger is the damping, your matrix inversion is more stable, but you are further away from true solution. Smaller is the value of c, smaller is the damping, closer is your inverted matrix to true inverted matrix but it might become unstable. One 'adaptive damping' technique sometimes used is - start with a test value of c, invert the matrix A, then decrease the value of c, do the inversion again and so on. stop when you get weird values in inverted matrix due to A becoming singular again, like really large numbers. I think this doesn't exactly answer your question, but it was too long to put it in a comment.

Question 2

As already pointed out in the comments, the answer to your question crucially depends on your application. Maybe adding a small multiple of the identity matrix is the right thing to do, maybe not. In order to determine that you need to tell us: How did this matrix arise? And what do you need the inverse for?

Two common cases are:

If you know the matrix A exactly, e.g. because it is the design matrix in a general linear model b = A * X, then modifying it is not a good idea. In this case, the matrix defines a linear system of equations, and if the matrix is singular this means there is no unique solution to this system. To pick one from the infinitely many possible solutions, there are different strategies: X = A \ b picks a solution with as many zero coefficients as possible, while X = pinv(A) * b picks the solution with the minimum L2 norm. See the examples in the documentation of pinv.
If the matrix A is estimated from data, e.g. a covariance matrix for an LDA classifier, and you have reason to believe the true value is not singular and the singularity is just due to not having enough data points for the estimate, then applying regularization or "shrinkage" by adding a small multiple of the identity matrix is a common strategy. In this case, Schäfer and Strimmer (2005) describe a way to estimate the optimal regularization coefficient from the data itself.

But I'm sure there are other cases with other answers, too.

Question 3

Adding small values to the diagonal of A is roughly equivalent to introducing an L2-norm regularization term in the least squares problem Ax=b. That is one seeks to minimize the residual as well the added constraint:

min ||Ax-b||^2 + lambda*||x||^2

where lamdba controls the weight given to minimizing the constraint vs. minimizing the residual norm.

Usually this parameter is chosen using some kind of cross-validation technique.