How to efficiently shrink an array to a target standard deviation

Question 1

Instead of removing values from the array and recomputing the standard deviation, conceive an empty array, add values to it and compute the standard deviation using an updating formula.

The trick to this is to add the values in an ordering such that the standard deviation is non-decreasing as values are added. This can be accomplished by adding values that are closest to the mean of the original array. In other words, the values with the smallest absolute deviation from the mean.

So, the algorithm that should work:

Compute mean of array
Sort array comparing items by absolute deviation from the mean
Set N = 2
Compute the standard deviation for the first two elements
While standard deviation < target standard deviation AND the array still has elements
1. Set N = N + 1
2. Take next item from array
3. Recompute the standard deviation with the additional value using an updating formula
If array still has elements return N
Else return N - 1

Here is an example case to demonstrate the algorithm:

Source Array: [10, 14, 16, 18, 20, 22, 24]
Target SD: 2.5

Step 1) Mean = 17.0
Step 2) [16, 18, 14, 20, 22, 10, 24]
Step 4) SD([16,18]) = 1.4142
Step 5.2) SD([16,18,14]) = 2
Step 5.2) SD([16,18,14,20]) = 2.5820

Return 3

Question 2

If you have a collection of n values with a mean m and standard deviation s, then if you add another element x you can compute the new standard deviation by noticing (using the ' to indicate "new" value)

m' = (n * m + x) / (n + 1) = m + (x-m) / (n+1)
s'^2 = (n-1)/n * s + (m-m')^2 / (n+1)

Going the other way, you can find the new mean and variance when you remove a value as well.

Using that, you can compute the new standard deviation from the old one with a simple expression - you don't have to loop over all the values again and again.

All you have to do is:

sort the data
compute mean and standard deviation
find the "leftmost" and "rightmost" value
whichever of these is the furthest from the mean is the one to remove
after removing, compute the new mean and standard deviation
see if you are within limits
if not, allocate a new "leftmost" and "rightmost" (one of these is the same as before - you only replace the one that you took away)
repeat from step 5 until you are done

As for step 5, the formula for the mean when you remove an element is basically the same. If you had n samples and remove one (x), the new mean is

m' = (n * m - x) / (n - 1) = m + (m - x)/(n - 1)

and denoting the current sum of squares as M (= sum((x-m)^2) ), we find

M' = M - (x-m)*(x-m')
s' = sqrt( M' / (n-2))

I think it got this right... you might want to check to make sure of the signs and the +1, -2 etc though.