Your instincts are almost correct.
Generally, the metric is the sum of squared distances; with the goal of finding the least-squares fit (minimizing the sum of all the individual square distances). Essentially this minimizes the standard deviation (actually it minimizes variance, but same end effect).
So take all your corresponding pairs, calculate the distance squared between them (fast calculation, no sqrt involved; faster than calculating actual distances) add them up and the lower the better. If your point sets differ in count you may wish to divide by the count to get a proper variance value.
This metric applies to pretty much any registration algorithm.
By the way, if you already have a point correspondance and you know there is no scaling/skewing, you might also be interested in Horn's method, which is a closed-form (non-iterative) algorithm that just spits out the least-squared fit directly. It's very efficient.
(P.S. For a very simple explanation of why the variance is a better indicator than the mean distance, check out this page).