Using low bitsize integral types like `Int8` and what they are for
-
18-04-2021 - |
Question
Recently I've learned that every computation cycle performs on machine words which on most contemporary processors and OS'es are either 32-bit or 64-bit. So what are the benefits of using the smaller bit-size values like Int16
, Int8
, Word8
? What are they exactly for? Is it storage reduction only?
I write a complex calculation program which consists of several modules but is interfaced by only a single function which returns a Word64
value, so the whole program results in Word64
value. I'm interested in the answer to this question because inside this program I found myself utilizing a lot of different Integral
types like Word16
and Word8
to represent small entities, and seeing that they quite often got converted with fromIntegral
got me thinking: was I making a mistake there and what was the exact benefit of those types which I not knowing about got blindly attracted by? Did it make sense at all to utilize other integral types and evetually convert them with fromIntegral
or maybe I should have just used Word64
everywhere?
Solution
These smaller types give you a memory reduction only when you store them in unboxed arrays or similar. There, each will take as many bits as indicated by the type suffix.
In general use, they all take exactly as much storage as an Int
or Word
, the main difference is that the values are automatically narrowed to the appropriate bit size when using fixed-width types, and there are (still) more optimisations (in the form of rewrite rules mainly) for Int
and Word
than for Int8
etc., so some operations will be slower using those.
Concerning the question whether to use Word64
throughout or to use smaller types, that depends. On a 64-bit system, when compiling with optimisations, the performance of Word
and Word64
should mostly be the same since where it matters both should be unpacked and the work is done on the raw machine Word#
. But there probably still are a few rules for Word
that have no Word64
counterpart yet, so perhaps there is a difference after all. On a 32-bit system, most operations on Word64
are implemented via C calls, so there operations on Word64
are much slower than operations on Word
.
So depending on what is more important, simplicity of code or performance on different systems, either
- use
Word64
throughout: simple code, good performance on 64-bit systems - use
Word
as long as your values are guaranteed to fit into 32 bits and transform toWord64
at the latest safe moment: more complicated code, but better performance on 32-bit systems.
OTHER TIPS
In GHC, the fixed-size integral types all take up a full machine word, so there's no space savings to be had. Using machine-word-sized types (i.e. Int
and Word
) will probably be faster than the fixed-size types in most cases, but using a fixed-size integral type will be faster than doing explicit wrap-around.
You should choose the appropriate type for the range of values you're using. maxBound :: Word8
is 255, 255 + 1 :: Word8
is 0 — and if you're dealing with octets, that's exactly what you want. (For instance, ByteString
s are defined as storing Word8
s.)
If you just have some integers that don't need a specific number of bits, and the calculations you're doing aren't going to overflow, just use Int
or Word
(or even Integer
). Fixed-size types are less common than the regular integral types because, most of the time, you don't need a specific size.
So, don't use them for performance; use them if you're looking for their specific semantics: fixed-size integral types with defined overflow behaviour.