Let's start with floor( ) and ceiling( ) (which I'm going to call "ceil" from here on). These are basic mathematical functions which map real numbers to integers. Formally, they are defined as follows:
floor(x) = max { n in Z | n <= x }
ceil(x) = min { n in Z | n >= x }
More plainly, the floor of x
is the largest integer that is no bigger than x
, and the ceil is the smallest integer that is no smaller than x
. Some examples:
floor(1.5)
is1
.ceil(2)
is2
.floor(-3.14159)
is-4
.
Consult wikipedia for more details.
Ok, now lets move on to rounding. Every real number x either is an integer (in which case floor(x) == x == ceil(x)
), or lies between two integers floor(x) < x < ceil(x)
. Mathematically, a "rounding rule" is a function f
that maps real numbers to integers with the following property: for every real number x
, f(x) = floor(x)
or f(x) = ceil(x)
. This leaves lots of flexibility about which possible result is chosen in any situation, so there are lots of different rounding rules. Here are some examples (these certainly aren't exhaustive):
each of
floor( )
andceil( )
is a rounding rule."round toward zero": simply throw away the fractional part of the input. This is also called truncation, and is often written as a mathematical function called
trunc( )
. It can be defined astrunc(x) = ceil(x)
ifx < 0
, andtrunc(x) = floor(x)
otherwise*. For example,trunc(1.5)
is1
andtrunc(-2.7)
is-2
."round away from zero" or "round towards infinity": This is the "opposite" of truncation; if
x < 0
the result isfloor(x)
, and the result isceil(x)
otherwise. There isn't a common mathematical name for this rule, so I'll just call itround-away( )
. Examples:round-away(1.001)
is2
, andround-away(-0.7071067812)
is-1
."round to odd": If the input
x
is an integer, returnx
. Otherwise, look atfloor(x)
andceil(x)
. Because they are consecutive integers, one of them will be even and the other will be odd. Return the one that is odd. Some examples:round-to-odd(1.001)
is1
,round-to-odd(-2.001)
is-3
, andround-to-odd(4.0)
is4.0
."round to nearest, ties to even": This is the default rounding mode of IEEE-754. I would call it
round( )
, but that name is (rather perversely) used for a different rounding rule in the C library, and I don't want to confuse everyone, so I'll call itrne( )
instead here. Here the idea is as follows: if there is a unique integer closest tox
, return that integer. Otherwise,x
lies exactly halfway between two integers; one of them is even and the other is odd. Return the even one.
This last rule can be written as "RU with fix-up", though that is a somewhat odd way to think of it, mathematically. More commonly, it's formally defined more or less as follows:
rne(x) = floor(x) if x - floor(x) < 0.5
floor(x) if x - floor(x) = 0.5 and floor(x) is even.
ceil(x) if x - floor(x) = 0.5 and floor(x) is odd.
ceil(x) if x - floor(x) > 0.5
Some examples of this rne( )
rule in action: rne(0.5)
is 0
. rne(-1.5)
is -2
. rne(1.3)
is 1
. rne(1.8)
is 2
.
Ok, so this is all talking about rounding to integral values. What does that have to do with rounding to the nearest floating-point number as in IEEE-754? A rounding rule may be used not only to round to integer, but to round to any fixed number of digits as well, by simply scaling it by a factor of b**n
, where b
is the base of the representation and n
is chosen so that the desired rounding point of the number ends up in the units position (the LSB). Of course, we don't actually need to scale the number and un-scale the result; instead we simply replace ceil(x)
and floor(x)
in the rounding rule with the values of x
rounded down and up to the desired number of digits.
[*] I'm defining mathematical functions on real numbers here, not giving IEEE-754 implementations. Thus, there's no need to deal with edge cases like -0
, inf
, or nan
.