Is there a practical benefit to using the smallest datatype possible?

https://softwareengineering.stackexchange.com/questions/368133

31-01-2021
|

Question

In C# (and other languages), we can define a numerical variable as a short, an int, or a long (among other types), mostly depending on how big we expect the numbers to get. Many mathematical operations (e.g., addition +) evaluate to an integer (int), and so require an explicit cast to store the result in a short, even when operating on two shorts. It is much easier (and arguably more readable) to simply use ints, even if we never expect the numbers to exceed the storage capacity of a short. Indeed, I'm sure most of us write for loops using int counters, rather than short counters even when a short would suffice.

One could argue that using an int is simply future-proofing, but there are certainly cases where we know the short will be big enough.

Is there a practical benefit to using these smaller datatypes, that compensates for the additional casting needed and decrease in readability? Or is it more practical to just use int everywhere, even when we are sure the values will never exceed the capacity of short (e.g., the number of axes on a graph)? Or is the benefit only realized when we absolutely need the space or performance of those smaller datatypes?

Edit to address dupes:

This is close, but is too broad - and speaks more to CPU-performance than memory-performance (though there are a lot of parallels).

This is close, too, but doesn't get to the practical aspect of my question. Yes, there are times when using a short is appropriate, which that question does a good job of illuminating. But, appropriate is not always practical, and increases in performance (if any) may not actually be realized.

Solution

Regarding the primitive types when we feel we have to use them:

Local variables generally perform at the same speed or just slightly worse when using smaller data types, so this is of no to negative value in the context of statements — i.e. for, while, and associated counters & loop variables as well as other locals and even parameters. Further, you should try to use the native int size that is the same size as pointers or else you risk numeric overflow when iterating over collections of arbitrary size.

The only place to even consider concerning yourself with smaller primitive sizes is in data structures that have very high object volumes. Even there, most languages have a minimum object size that has to do with alignment in memory allocation. So using a single short vs. a single int may buy you nothing (depending on the language implementation) because the unused space simply goes to alignment-oriented padding, whereas using two shorts can often save space over two ints because the shorts can be packed together.

OTHER TIPS

Erik Eidt provided the primary case where small data types should be used--big arrays of them.

However, there is one other case where it's worthwhile--when you a few megabytes of them in a data structure that will be accessed very heavily. The issue here is the CPU cache--there is a considerable benefit from keeping your working data in cache rather than having to go to main memory. The times when this is relevant are low.

There are a lot of file formats that actually use shorts and other types of small data types. If you need to read a field defined as a short, you don't want to read an int, because that would give you data that don't belong to this field.

There's unlikely to be any benefit unless you can predict tens of millions of bytes "saved". ie; big arrays, or multiple shortened items are packed into structures for which millions of copies will co-exist. Note that just declaring a bunch of byte variables in a structure won't guarantee that they'll be packed - compilers and languages vary how they handle this.

Of course. Doubling the size of your data types may double the amount of RAM needed, which may make your program unusable. It all depends on how many instances you need in RAM.

Sometimes it is not an optimization, but rather a signal to the API user that a value must be in a certain range.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange