How to compute a threshold for a given vector/array of float numbers

Question 1

You could just eliminate the values above 90% or 95%. Technicaly you calculate the p = 0.9 (or 0.95) percentile of the array distribution.

Just sort the array ascending:

int[] data;

Arrays.sort(data); // or use ArrayList<Integer> which has Collections.sort(dataArrayList),

Then calculate position of percentile p:

float p = data.length * p; // e.g p = 0.9 for 90% percentile.
// cut of fractional part.
int posInt = (int) p;

// this is the threshold value 
int threshold = data[posInt]

Now filter array by keeping all value < or <= threshold. This keeps the 90% of smallest values.

int i = 0;
while (i < data.length && data[i] <= threshold) {
  // output data[i];
}

For mathematically "perfect" results you could search for "calculate percentile of discrete array / values). As i remeber there are two valid algorithms, describeing whether one has to round down or round up the posInt. I my example above I just truncated.

Question 2

An idea would be to compute both the mean mu and the standard deviation sigma (e.g. using the algorithm described at "Accurately computing running variance" ) and use both of them for the definition of your threshold.

If your data are assumed to be Gaussian, you know that 97.5% of your data should be below mu + 2*sigma, so that can be a good threshold.

Remark: you might want to recompute your threshold once you have rejected the extreme values since these values can have a significant impact on the mean and the standard deviation.

EDIT:

I just computed the thresholds using the method I proposed and it does not look satisfactory for you: for the first case, threshold is around 130 (so maybe taking 1.5 sigma could help getting rid of the largest entries), for the second case, threshold is around 8 and for the third case, threshold is around 262.

Actually, I'm not that surprised with these results: for your last example, you want to get rid of more than the half of the data! Assuming that the data are Gaussian with just a few extrem values is far from what you have at hand...