Decimal vs. Integer; Given a fixed range of values, which is preferable for accurate computation?

https://softwareengineering.stackexchange.com/questions/260072

05-10-2020
|

質問

After getting into a "heated discussion" with someone, I figured I'd ask this question for the sake of posterity. I'm willing to be corrected if my assumption is incorrect but I'd like to hear a third-party opinion by someone with more credibility than myself.

Let's assume that we need to calculate and store numeric values which are expressed in a decimal format. However, all numerical inputs and outputs must fit within a specific range of values. Also, decimals are important but we will never be interested in any value less than a specific decimal place. This is true for any calculation. If a calculation results in value containing significant digits under the 10^-8 range, they are essentially 0. Likewise, if the value is above our upper bound, then this is considered an error-case.

Considering these fixed constraints, which will never change, is it better to use a traditional Decimal(s, p) data type or to use an Integer type that can contain all values represented by shifting the value s-fixed decimal digits to the left? For instance, for efficiency and accuracy, should we use integers in the range of 0 to 2,100,000,000,000,000 to represent all values between .00000001 and 21,000,000 or should we use decimals defined as (8, 16)?

CLARIFICATION

My specific discussion was regarding data types within MySQL. I'd prefer a generic response, assuming that most decimal representations are stored in a standardized format. However, if there is no reliable "standard" for decimal type definitions, consider this question as it relates to MySQL.

解決

For database storage, decimals.

TL;DR Backstory

If you were processing these results primarily in main memory, it might be a toss-up. There are many small-range floating point / decimal problems that can be solved with scaled integers and fixed point math. For example, I solved this this recent partitioning problem from Stack Overflow in the integer realm, then returned the answer as floating point. I've also seen many text formatting and graphical layout algorithms use similar approximations. The integer math is simple, very fast, and works well.

But, without further constraining your use case, there are drawbacks, such as it becoming your responsibility to

Provide any mathematic functions like sin and sqrt you might need.
Convert from your custom-scaled integers to other types you might need, like formatting into strings, or into floats and proper decimal types.

Those are "a simple matter of effort," but the details of getting those things right can be vexing and distract you from your primary coding goals.

Add to this that you are specifically interested in storing these values in a database. That implies permanence and a long data lifetime. If there's anything we've learned about fixed-precision data in this industry, it's that requirements will change over time, and then you'll have to change your data, and the apps that use them. Think Y2K or financial market decimalization. Fixed-point seems a solid foundation, but in a few years your decimal precision needs might change. The scale chosen for printer internals when 300dpi printers seemed impossibly high-resolution wasn't so precise when 1200- and 2400-dpi printers rolled in a few years later.

So, if you're interested in efficient in-memory representations fitted to specific problems, fixed-point can work well. But if you're storing data over time, decimals make for a richer, more future-proofed representation that someone else takes the responsibility for making work, getting you back to your own development tasks.

他のヒント

Use decimals. Consider

2.00 * 3.00 = 6.00 (as-is)

200 * 300 = 60000 (hmm: /100)

There is no need to think "more optimal" - the computer is likely to do it optimal too. All catches dealt with.

It is especially cumbersome in SQL, to consistently use your own integers correctly~~, maybe introducing your own function, but at the same time preventing SQL to optimize some queries which would have been column based~~.

Also consider that at some moment the data must be used in code. And then exactly the same problem repeats itself: multiplication. And automatisms like ORM become more complicated.

ライセンス： CC-BY-SA と帰属

所属していません softwareengineering.stackexchange