Question

Today a friend of mine showed me a bug on a website. (Link)

You can see that the sum of the percentages is 100.1%.

49 + 20.7 + 10.9 + 7 + 5.5 + 7 = 100.1%

And I thought: If I was programming and had those 7 (float) numbers, how could I solve this problem?


Assuming all the numbers have one decimal place:

49.0 + 20.7 + 10.9 + 7.0 + 5.5 + 7.0 = 100.1%

I believe this is a rounding half problem, I don't see other explanation for the error.

e.g.:

49.5% + 50.5% = 100%, if we round them up, 50% + 51% = 101%.

But in this case, because we have two numbers, we could use the round to even.

49.5% + 50.5% = 100%, if we round to even, 50% + 50% = 100%.

The numbers are already tainted, since their sum is 100.1%, therefore, at least one number is equal to himself minus 0.5.

The rounding to even doesn't work in this 7 number example, because it wouldn't in the following sum:

49 + 20.65 + 10.85 + 7 + 5.5 + 7 = 100%

49 + 20.6 + 10.8 + 7 + 5.5 + 7 = 99.9%

Is there any easy/quick way to solve this problem? Use a different method of rounding?

Or the only way I can solve this is by creating a function to handle specifically my problem?


Note: I'm not looking for a specific programming language approach. I hope I can solve this problem in different languages (e.g. In both PHP and C++).

Was it helpful?

Solution

First, your analysis about the probable cause is not the actual cause here. But the concept you're talking about is called bias in rounding schemes. It does exist and is real - it just isn't the cause of your specific issue here.

In the example you claim is causing this: 49.5% + 50.5% = 100%, rounded up, 50% + 51% = 101%. Rounding up (rounding-towards-positive-infinity) is equivalent to round-away-from-zero for positive numbers. See the list of rounding-schemes below 1. But again, that's pretty unlikely to be the cause here, unless you happened to get two adjacent numbers which were identically equal to a.5 and b.5 Not a.7 + b.8 + c.5, or any other combination of digits. To prove why this is not the actual cause, in this list of n numbers, there are (n-1) adjacent pairs and if we make the reasonable assumption that every last digit is equally probable, then the chance of getting adjacent digits a.5, b.5 is only (0.1)^2 = 0.01

Anyway the real cause here is the numerical error introduced by missing precision (due to truncated representations of numbers converted to strings '%2.1f') (in whichever language they used, presumably PHP, Javascript or Java)...

The usual and simplest solution is to just carry more precision. Strictly you might only need one (or two) digits here, but IEEE 754 floats give you 23 digits of mantissa for free, so everyone uses that.

However if you really insist on setting yourself the (artificial) challenge of rounding numbers with missing precision and under the constraint that they must sum to 100.0% (or maximize the chance that they do), there are several lesser-used rounding schemes. You can find these in textbooks and they aren't used much in the real world for obvious reasons, because they introduce randomness and possibly nondeterminism (although you could set the random seed, to at least ensure reproducibility).

So for whatever it's worth here are those rounding schemes (and many others, see the entire article):

[2] http://en.wikipedia.org/wiki/Rounding#Tie-breaking

The following all result in bias for the q=.5 case, and you said you want to avoid using them at all (instead of carrying extra precision, which makes the issue go away):

  • Round half up
  • Round half down
  • Round half away from zero
  • Round half towards zero
  • Round half to even
  • Round half to odd

Now here are the ones of interest to you:

  • Stochastic rounding: Another unbiased tie-breaking method is stochastic rounding:

If the fractional part of y is .5, choose q randomly among y + 0.5 and y − 0.5, with equal probability. Advantages: essentially free of overall bias; but it is also 'fair' among even and odd q values. On the other hand, it introduces a random component into the result; performing the same computation twice on the same data may yield two different results. Also, it is open to nonconscious bias if humans (rather than computers or devices of chance) are "randomly" deciding in which direction to round.

  • Alternating tie-breaking: One method, more obscure than most, is round half alternatingly.

If the fractional part is 0.5, alternate round up and round down: for the first occurrence of a 0.5 fractional part, round up; for the second occurrence, round down; so on so forth. This suppresses the random component of the result, if occurrences of 0.5 fractional parts can be effectively numbered. But it can still introduce a positive or negative bias according to the direction of rounding assigned to the first occurrence, if the total number of occurrences is odd.

If you want to read all about this stuff (computer arithmetic, and the hardware circuits that implement it), one good reference (that goes heavy on the hardware side) is

Computer Arithmetic Algorithms, 2nd Edition by Israel Koren www.ecs.umass.edu/ece/koren/arith/‎ University of Massachusetts Amherst, 2010

OTHER TIPS

You should not accumulate rounded values but rather use the (nearly) exact values.

An other dirty way to solve it could be:

if (sum > 100){
    sum = 100;
}

An alternative would be to correct the values by the difference of both sums, as described in the comments (implementation of algorithm is in JavaScript and quick&dirty only for demonstration purposes and also only works when rounding to full integers, otherwise it needs to be modified a bit):

var numbers = [49, 20.7, 10.8, 7, 5.5, 7];
var roundedNumbers = [49, 21, 11, 7, 6, 7];

var sum = numbers.sum();
var roundedSum = roundedNumbers.sum();

while (roundedSum != sum){
    var isRoundeSumLarger = roundedSum > sum;
    var maxDifferenceIndex;
    var maxDifferenceValue = 0;
    for (var n = 0; n < numbers.length; n++){
       var difference = Math.abs(roundedNumbers[n] - numbers[n]);
       if ((isRoundeSumLarger && roundedNumbers[n] > numbers[n] && maxDifferenceValue < difference)
         ||(!isRoundeSumLarger && roundedNumbers[n] < numbers[n] && maxDifferenceValue < difference)){
           maxDifferenceValue = difference;
           maxDifferenceIndex = n;
       }
    }
    var modifyValue = (isRoundeSumLarger ? -1 : 1); // or whatever granularity the rounding algorithm is
    roundedNumbers[maxDifferenceIndex] += modifyValue;
    roundedSum += modifyValue;
}
console.log(numbers, sum); // [49, 20.7, 10.8, 7, 5.5, 7] 100
console.log(roundedNumbers, roundedSum); // [49, 21, 11, 7, 5, 7] 100

The 'error' is inherently introduced when calculating the percentages and dropping the accuracy by rounding down/up. Naturally when you sum those rounded numbers you're going to get a different total.

What I think you're suggesting is fudging the rounded figures so that they'll sum an 'original' total which you could do programatically like this for example:

1. sum the percentages = 100.1
2. calculate error: 100 - 100.1 = -0.1
3. select any one of the percentages and negate the error.
4. sum them again, will equal 100.

Of course this means the amended percentage is not longer reflecting the original data (even if by a fractional measure), and there is no correct way to do this.

The reality is that there's no 'problem' with the original maths, only with semantics. I would be inclined to either specify that the figures are rounded, or show the full fractional amount.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top