How can you easily calculate the square root of an unsigned long long in C?

Question 1

Function sqrtl(), taking a long double, is part of C99.

Note that your compilation platform does not have to implement long double as 80-bit extended-precision. It is only required to be as wide as double, and Visual Studio implements is as a plain double. GCC and Clang do compile long double to 80-bit extended-precision on Intel processors.

Question 2

There is no need for long double; the square root can be calculated with double (if it is IEEE-754 64-bit binary). The rounding error in converting a 64-bit integer to double is nearly irrelevant in this problem.

The rounding error is at most one part in 2⁵³. This causes an error in the square root of at most one part in 2⁵⁴. The sqrt itself has a rounding error of less than one part in 2⁵³, due to rounding the mathematical result to the double format. The sum of these errors is tiny; the largest possible square root of a 64-bit integer (rounded to 53 bits) is 2³², so an error of three parts in 2⁵⁴ is less than .00000072.

For a uint64_t x, consider sqrt(x). We know this value is within .00000072 of the exact square root of x, but we do not know its direction. If we adjust it to sqrt(x) - 0x1p-20, then we know we have a value that is less than, but very close to, the square root of x.

Then this code calculates the square root of x, truncated to an integer, provided the operations conform to IEEE 754:

uint64_t y = sqrt(x) - 0x1p-20;
if (2*y < x - y*y)
    ++y;

(2*y < x - y*y is equivalent to (y+1)*(y+1) <= x except that it avoids wrapping the 64-bit integer if y+1 is 2³².)

Question 3

Yes, the standard library has sqrtl() (since C99).

Question 4

If you only want to calculate sqrt for integers, using divide and conquer should find the result in max 32 iterations:

uint64_t mysqrt (uint64_t a)
{
  uint64_t min=0;
  //uint64_t max=1<<32;
  uint64_t max=((uint64_t) 1) << 32; //chux' bugfix
  while(1)
    {
       if (max <= 1 + min)
         return min;           

       uint64_t sqt = min + (max - min)/2;
       uint64_t sq = sqt*sqt;

       if (sq == a) 
         return sqt;

       if (sq > a)
         max = sqt;
       else
         min = sqt;
    }

Debugging is left as exercise for the reader.

Question 5

Here we collect several observations in order to arrive to a solution:

In standard C >= 1999, it is garanted that non-netative integers have a representation in bits as one would expected for any base-2 number.
----> Hence, we can trust in bit manipulation of this type of numbers.
If x is a unsigned integer type, tnen x >> 1 == x / 2 and x << 1 == x * 2.
(!) But: It is very probable that bit operations shall be done faster than their arithmetical counterparts.
sqrt(x) is mathematically equivalent to exp(log(x)/2.0).
If we consider truncated logarithms and base-2 exponential for integers, we could obtain a fair estimate: IntExp2( IntLog2(x) / 2) "==" IntSqrtDn(x), where "=" is informal notation meaning almost equatl to (in the sense of a good approximation).
If we write IntExp2( IntLog2(x) / 2 + 1) "==" IntSqrtUp(x), we obtain an "above" approximation for the integer square root.
The approximations obtained in (4.) and (5.) are a little rough (they enclose the true value of sqrt(x) between two consecutive powers of 2), but they could be a very well starting point for any algorithm that searchs for the square roor of x.
The Newton algorithm for square root could be work well for integers, if we have a good first approximation to the real solution.

http://en.wikipedia.org/wiki/Integer_square_root

The final algorithm needs some mathematical comprobations to be plenty sure that always work properly, but I will not do it right now... I will show you the final program, instead:

    #include <stdio.h>   /* For printf()...  */
    #include <stdint.h>  /* For uintmax_t... */
    #include <math.h>    /* For sqrt() ....  */

    int IntLog2(uintmax_t n) {
        if (n == 0) return -1; /* Error */
        int L;
        for (L = 0; n >>= 1; L++)
           ;
        return L; /* It takes < 64 steps for long long */
    }

    uintmax_t IntExp2(int n) {
        if (n < 0)
           return 0; /* Error */            
        uintmax_t E;
        for (E = 1; n-- > 0; E <<= 1)
            ;
        return E; /* It takes < 64 steps for long long */
    }

    uintmax_t IntSqrtDn(uintmax_t n) { return IntExp2(IntLog2(n) / 2); }

    uintmax_t IntSqrtUp(uintmax_t n) { return IntExp2(IntLog2(n) / 2 + 1); }

    int main(void) {
        uintmax_t N = 947612934;  /* Try here your number! */

        uintmax_t sqrtn  = IntSqrtDn(N),  /* 1st approx. to sqrt(N) by below */
                  sqrtn0 = IntSqrtUp(N);  /* 1st approx. to sqrt(N) by above */

        /* The following means while( abs(sqrt-sqrt0) > 1) { stuff... } */
        /* However, we take care of subtractions on unsigned arithmetic, just in case... */
        while ( (sqrtn > sqrtn0 + 1) ||  (sqrtn0 > sqrtn+1) )
           sqrtn0 = sqrtn, sqrtn = (sqrtn0  + N/sqrtn0) / 2; /* Newton iteration */

        printf("N==%llu, sqrt(N)==%g, IntSqrtDn(N)==%llu, IntSqrtUp(N)==%llu, sqrtn==%llu, sqrtn*sqrtn==%llu\n\n",  
                N,            sqrt(N),       IntSqrtDn(N),       IntSqrtUp(N),      sqrtn,       sqrtn*sqrtn);

        return 0;
    }

The last value stored in sqrtn is the integer square root of N.
The last line of the program just shows all the values, with comprobation purposes.
So, you can try different values of Nand see what happens.

If we add a counter inside the while-loop, we'll see that no more than a few iterations happen.

Remark: It is necessary to verify that the condition abs(sqrtn-sqrtn0)<=1 is always achieved when working in the integer-number setting. If not, we shall have to fix the algorithm.

Remark2: In the initialization sentences, observe that sqrtn0 == sqrtn * 2 == sqrtn << 1. This avoids us some calculations.

Question 6

// sqrt_i64 returns the integer square root of v.
int64_t sqrt_i64(int64_t v) {
    uint64_t q = 0, b = 1, r = v;
    for( b <<= 62; b > 0 && b > r; b >>= 2);
    while( b > 0 ) {
        uint64_t t = q + b;
        q >>= 1;           
        if( r >= t ) {     
            r -= t;        
            q += b;        
        }
        b >>= 2;
    }
    return q;
}

The for loop may be optimized by using the clz machine code instruction.