Why aren't Floating-Point Decimal numbers hardware accelerated like Floating-Point Binary numbers?

https://stackoverflow.com/questions/1447215

22-07-2019
|

Question

Is it worth it to implement it in hardware? If yes why? If not why not?

Sorry I thought it is clear that I am talking about Decimal Rational Numbers! Ok something like decNumber++ for C++, decimal for .NET... Hope it is clear now :)

Solution

The latest revision of the IEEE 754:2008 standard does indeed define hardware decimal floating point numbers, using the representations shown in the software referenced in the question. The previous version of the standard (IEEE 754:1985) did not provide decimal floating point numbers. Most current hardware implements the 1985 standard and not the 2008 standard, but IBM's iSeries computers using Power6 chips have such support, and so do the z10 mainframes.

The standardization effort for decimal floating point was spearheaded by Mike Cowlishaw of IBM UK, who has a web site full of useful information (including the software in the question). It is likely that in due course, other hardware manufacturers will also introduce decimal floating point units on their chips, but I have not heard a statement of direction for when (or whether) Intel might add one. Intel does have optimized software libraries for it.

The C standards committee is looking to add support for decimal floating point and that work is TR 24732.

OTHER TIPS

Some IBM processors have dedicated decimal hardware included (Decimal Floating Point | DFP- unit).

In contribution of answered Sep 18 at 23:43 Daniel Pryden

the main reason is that DFP-units need more transistors in a chip then BFP-units. The reason is the BCD Code to calculate decimal numbers in a binary environment. The IEEE754-2008 has several methods to minimize the overload. It seems that the DPD hxxp://en.wikipedia.org/wiki/Densely_packed_decimal method is more effective in comparison to the BID hxxp://en.wikipedia.org/wiki/Binary_Integer_Decimal method.

Normally, you need 4 bits to cover the decimal range from 0 to 9. Bit the 10 to 15 are invalid but still possible with BCD. Therefore, the DPD compress 3*4=12 bit into 10 bit to cover the range from 000 to 999 with 1024 (10^2)possibilities.

In general it is to say, that BFP is faster then DFP. And BFP need less space on a chip then DFP.

The question why IBM implemented a DFP unit is quite simple answered: They build servers for the finance market. If data represents money, it should be reliable.

With hardware accelerated decimal arithmetic, some errors do not accour as in binary. 1/5 = 0.2 => 0.0110011001100110011001100110... in binary so recurrent fractions could be avoided.

And the overhelming round() function in excel would be useless anymore :D (->function =1*(0,5-0,4-0,1) wtf!)

hope that explain your question a little!

There is (a tiny bit of) decimal string acceleration, but...

This is a good question. My first reaction was "macro ops have always failed to prove out", but after thinking about it, what you are talking about would go a whole lot faster if implemented in a functional unit. I guess it comes down to whether those operations are done enough to matter. There is a rather sorry history of macro op and application-specific special-purpose instructions, and in particular the older attempts at decimal financial formats are just legacy baggage now. For example, I doubt if they are used much, but every PC has the Intel BCD opcodes, which consist of

DAA, AAA, AAD, AAM, DAS, AAS

Once upon a time, decimal string instructions were common on high-end hardware. It's not clear that they ever made much of a benchmark difference. Programs spend a lot of time testing and branching and moving things and calculating addresses. It normally doesn't make sense to put macro-operations into the instruction set architecture, because overall things seem to go faster if you give the CPU the smallest number of fundamental things to do, so it can put all its resources into doing them as fast as possible.

These days, not even all the binary ops are actually in the real ISA. The cpu translates the legacy ISA into micro-ops at runtime. It's all part of going fast by specializing in core operations. For now the left-over transisters seem to be waiting for some graphics and 3D work, i.e., MMX, SSE, 3DNow!

I suppose it's possible that a clean-sheet design might do something radical and unify the current (HW) scientific and (SW) decimal floating point formats, but don't hold your breath.

No, they are very memory-inefficient. And the calculations are also on hardware not easy to implement (of course it can be done, but it also can use a lot of time). Another disadvantage of the decimal format is, it's not widly used, before research showed that the binary-formatted numbers were more accurate the format was popular for a time. But now programmers know better. The decimal format is't efficient and is more lossy. Also additional hardware-representations require additional instruction-sets, that can lead to more difficult code.

The hardware you want used to be fairly common.

Older CPU's had hardware BCD (Binaray coded decimal) arithmetic. ( The little Intel chips had a little support as noted by earlier posters)

Hardware BCD was very good at speeding up FORTRAN which used 80 bit BCD for numbers.

Scientific computing used to make up a significant percentage of the worldwide market.

Since everyone (relatively speaking) got home PC running windows, the market got tiny as a percentage. So nobody does it anymore.

Since you don't mind having 64bit doubles (binary floating point) for most things, it mostly works.

If you use 128bit binary floating point on modern hardware vector units it's not too bad. Still less accurate than 80bit BCD, but you get that.

At an earlier job, a colleague formerly from JPL was astonished we still used FORTRAN. "We've converted to C and C++ he told us." I asked him how they solved the problem of lack of precision. They'd not noticed. (They have also not the same space probe landing accuracy they used to have. But anyone can miss a planet.)

So, basically 128bit doubles in the vector unit are more okay, and widely available.

My twenty cents. Please don't represent it as a floating point number :)

Decimal floating-point standard (IEEE 754-2008) is already implemented in hardware by two companies; IBM's POWER 6/7 based servers, and SilMinds SilAx PCIe-based acceleration card.

SilMinds published a case study about converting the Decimal arithmetic execution to use its HW solutions. A great boost in time and slashed energy consumption are presented.

Moreover several publications by "Michael J. Schulte" and others reveal very positive benchmarks results, and some comparison between DPD and BID formats (both defined in the IEEE 754-2008 standard)

You can find pdfs to:

Performance analysis of decimal floating-point libraries and its impact on decimal hardware and software solutions
A survey of hardware designs for decimal arithmetic
Energy and Delay Improvement via Decimal Floating Point Units

Those 3 papers should be more than enough for your questions!

I speculate that there are no compute-intensive applications of decimal numbers. On the other hand, floating points number are extensively used in engineering applications, which must handle enormous amounts of data and do not need exact results, just need to stay within a desired precision.

Decimals (and more generally, fractions) are relatively easy to implement as a pair of integers. General purpose libraries are ubiquitous and easily fast enough for most applications.

Anyone who needs the ultimate in speed is going to hand-tune their implementation (eg changing the divisor to suit a particular usage, algebraicly combining/reordering the operations, clever use of SIMD shuffles...). Merely encoding the most common functions into a hardware ISA would surely never satisfy them -- in all likelihood, it wouldn't help at all.

The simple answer is that computers are binary machines. They don't have ten fingers, they have two. So building hardware for binary numbers is considerably faster, easier, and more efficient than building hardware for decimal numbers.

By the way: decimal and binary are number bases, while fixed-point and floating-point are mechanisms for approximating rational numbers. The two are completely orthogonal: you can have floating-point decimal numbers (.NET's System.Decimal is implemented this way) and fixed-point binary numbers (normal integers are just a special case of this).

Floating point math essentially IS an attempt to implement decimals in hardware. It's troublesome, which is why the Decimal types are created partly in software. It's a good question, why CPUs don't support more types, but I suppose it goes back to CISC vs. RISC processors -- RISC won the performance battle, so they try to keep things simple these days I guess.

Modern computers are usually general purpose. Floating point arithmetic is very general purpose, while Decimal has a far more specific purpose. I think that's part of the reason.

Do you mean the typical numeric integral types "int", "long", "short" (etc.)? Because operations on those types are definitely implemented in hardware. If you're talking about arbitrary-precision large numbers ("BigNums" and "Decimals" and such), it's probably a combination of rarity of operations using these data types and the complexity of building hardware to deal with arbitrarily large data formats.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow