256-bit version
__uint128_t a[2], b[2], c[2]; // c = a + b
c[0] = a[0] + b[0]; // add low part
c[1] = a[1] + b[1] + (c[0] < a[0]); // add high part and carry
Edit: 192-bit version. This way you can eliminate the 128-bit comparison like what @harold's stated:
struct uint192_t {
__uint128_t H;
uint64_t L;
} a, b, c; // c = a + b
c.L = a.L + b.L;
c.H = a.H + b.H + (c.L < a.L);
Alternatively you can use the integer overflow builtins or checked arithmetic builtins
bool carry = __builtin_uaddl_overflow(a.L, b.L, &c.L);
c.H = a.H + b.H + carry;
If you do a lot of additions in a loop you should consider using SIMD and/or running them in parallel with multithreading. For SIMD you may need change the layout of the type so that you can add all the low parts at once and all the high parts at once. Once possible solution is an array of struct of array as suggested here practical BigNum AVX/SSE possible?
SSE2: llhhllhhllhhllhh
AVX2: llllhhhhllllhhhh
AVX512: llllllllhhhhhhhh
With AVX-512 you can add eight 64-bit values at once. So you can add eight 192-bit values in 3 instructions plus a few more for the carry. For more information read Is it possible to use SSE and SSE2 to make a 128-bit wide integer?
With AVX-2 or AVX-512 you may also have very fast horizontal add so it may also worth a try for 256-bit even if you don't have parallel addition chains. But for 192-bit addition then 3 add/adc instructions would be much faster
There are also many libraries with a fixed-width integer type. For example Boost.Multiprecision
#include <boost/multiprecision/cpp_int.hpp>
using namespace boost::multiprecision;
uint256_t myUnsignedInt256 = 1;
Some other libraries:
See also