I am trying to learn the features of IEEE rounding from the following source On fast IEEE Rounding

enter image description here Can anyone one explain the equation for round up ? What does round up with fix up mean ? And what are floor and ceiling functions ? I tried IEEE 754 , but it does not mention these

有帮助吗?

解决方案

Let's start with floor( ) and ceiling( ) (which I'm going to call "ceil" from here on). These are basic mathematical functions which map real numbers to integers. Formally, they are defined as follows:

floor(x) = max { n in Z | n <= x }
ceil(x) = min { n in Z | n >= x }

More plainly, the floor of x is the largest integer that is no bigger than x, and the ceil is the smallest integer that is no smaller than x. Some examples:

  • floor(1.5) is 1.
  • ceil(2) is 2.
  • floor(-3.14159) is -4.

Consult wikipedia for more details.

Ok, now lets move on to rounding. Every real number x either is an integer (in which case floor(x) == x == ceil(x)), or lies between two integers floor(x) < x < ceil(x). Mathematically, a "rounding rule" is a function f that maps real numbers to integers with the following property: for every real number x, f(x) = floor(x) or f(x) = ceil(x). This leaves lots of flexibility about which possible result is chosen in any situation, so there are lots of different rounding rules. Here are some examples (these certainly aren't exhaustive):

  • each of floor( ) and ceil( ) is a rounding rule.

  • "round toward zero": simply throw away the fractional part of the input. This is also called truncation, and is often written as a mathematical function called trunc( ). It can be defined as trunc(x) = ceil(x) if x < 0, and trunc(x) = floor(x) otherwise*. For example, trunc(1.5) is 1 and trunc(-2.7) is -2.

  • "round away from zero" or "round towards infinity": This is the "opposite" of truncation; if x < 0 the result is floor(x), and the result is ceil(x) otherwise. There isn't a common mathematical name for this rule, so I'll just call it round-away( ). Examples: round-away(1.001) is 2, and round-away(-0.7071067812) is -1.

  • "round to odd": If the input x is an integer, return x. Otherwise, look at floor(x) and ceil(x). Because they are consecutive integers, one of them will be even and the other will be odd. Return the one that is odd. Some examples: round-to-odd(1.001) is 1, round-to-odd(-2.001) is -3, and round-to-odd(4.0) is 4.0.

  • "round to nearest, ties to even": This is the default rounding mode of IEEE-754. I would call it round( ), but that name is (rather perversely) used for a different rounding rule in the C library, and I don't want to confuse everyone, so I'll call it rne( ) instead here. Here the idea is as follows: if there is a unique integer closest to x, return that integer. Otherwise, x lies exactly halfway between two integers; one of them is even and the other is odd. Return the even one.

This last rule can be written as "RU with fix-up", though that is a somewhat odd way to think of it, mathematically. More commonly, it's formally defined more or less as follows:

rne(x) = floor(x)  if x - floor(x) < 0.5
         floor(x)  if x - floor(x) = 0.5 and floor(x) is even.
         ceil(x)   if x - floor(x) = 0.5 and floor(x) is odd.
         ceil(x)   if x - floor(x) > 0.5

Some examples of this rne( ) rule in action: rne(0.5) is 0. rne(-1.5) is -2. rne(1.3) is 1. rne(1.8) is 2.

Ok, so this is all talking about rounding to integral values. What does that have to do with rounding to the nearest floating-point number as in IEEE-754? A rounding rule may be used not only to round to integer, but to round to any fixed number of digits as well, by simply scaling it by a factor of b**n, where b is the base of the representation and n is chosen so that the desired rounding point of the number ends up in the units position (the LSB). Of course, we don't actually need to scale the number and un-scale the result; instead we simply replace ceil(x) and floor(x) in the rounding rule with the values of x rounded down and up to the desired number of digits.

[*] I'm defining mathematical functions on real numbers here, not giving IEEE-754 implementations. Thus, there's no need to deal with edge cases like -0, inf, or nan.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top