Find position of prime number

Question 1

Your range only goes to ten million, which is small for this kind of thing. I have two suggestions:

1) Create a table of pi(n) at convenient intervals, then use a segmented Sieve of Eratosthenes to count primes between the two table entries that bracket the desired value. The size of the interval determines both the size of the required table and the speed at which results can be computed.

2) Use Legendre's phi(x,a) function and Lehmer's prime-counting formula to calculate the result directly. The phi function requires some storage, I'm not sure exactly how much.

Of the two, I would probably choose the first alternative given your problem size. Implementations of both the segmented Sieve of Eratosthenes and Lehmer's prime-counting function are available at my blog.

EDIT 1:

On reflection, I have a third alternative:

3) Use the logarithmic integral to estimate pi(n). It is monotone increasing, and always larger than pi(n) over the interval you require. But the differences are small, never more than about 200. So you could precompute the differences for all values less than ten million, make a table of the 200 change points, then when requested compute the logarithmic integral and look up the correction factor in the table. Or you could do something similar with Riemann's R function.

The third alternative takes the least amount of space, but I suspect the space required for the first alternative wouldn't be too large, and sieving is probably faster than calculating the logarithmic integral. so I'll stick with my original recommendation. There is an implementation of both the logarithmic integral and the Riemann R function at my blog.

EDIT 2:

That didn't work very well, as the comments indicate. Please ignore my third suggestion.

As penance for my error in suggesting a solution that doesn't work, I wrote the program that uses a table of values of pi(n) and a segmented Sieve of Eratosthenes to calculate values of pi(n) for n < 10000000. I'll use Python, rather than the C requested by the original poster, because Python is simpler and easier to read for pedagogical purposes.

We begin by calculating the sieving primes less than the square root of ten million; these primes will be used both in building the table of values of pi(n) and in performing the sieve that computes the final answer. The square root of ten million is 3162.3. We don't want to use 2 as a sieving prime -- we'll sieve only on odd numbers, and treat 2 as a special case -- but we do want the next prime larger than the square root, so that the list of sieving primes is never exhausted (which would cause an error). So we use this very simple version of the Sieve of Eratosthenes to compute the sieving primes:

def primes(n):
    b, p, ps = [True] * (n+1), 2, []
    for p in xrange(2, n+1):
        if b[p]:
            ps.append(p)
            for i in xrange(p, n+1, p):
                b[i] = False
    return ps

The Sieve of Eratosthenes works in two parts. First, make a list of the numbers less than the target number, starting from 2. Then, repeatedly run through the list, starting with the first uncrossed number, and cross out all multiples of the number from the list. Initially, 2 is the first uncrossed number, so cross out 4, 6, 8, 10, and so on. Then 3 is the next uncrossed number, so cross out 6, 9, 12, 15, and so on. Then 4 was crossed out as a multiple of 2, and the next uncrossed number is 5, so cross out 10, 15, 20, 25, and so on. Continue until all the uncrossed numbers have been considered; the numbers that remain uncrossed are prime. The loop on p considers each number in turn, and if it is uncrossed, the loop on i crosses out the multiples.

The primes function returns a list of 447 primes: 2, 3, 5, 7, 11, 13, ..., 3121, 3137, 3163. We strike 2 from the list and store 446 sieving primes in the global ps variable:

ps = primes(3163)[1:]

The primary function that we will need counts the primes on a range. It uses a sieve that we will store in a global array so that it can be reused instead of being reallocated at each invocation of the count function:

sieve = [True] * 500

The count function uses a segmented Sieve of Eratosthenes to count the primes on a range from lo to hi (lo and hi are both included in the range). The function has four for loops: the first clears the sieve, the last counts the primes, and the other two perform the sieving, in a manner similar to the simple sieve shown above:

def count(lo, hi):
    for i in xrange(500):
        sieve[i] = True
    for p in ps:
        if p*p > hi: break
        q = (lo + p + 1) / -2 % p
        if lo+q+q+1 < p*p: q += p
        for j in xrange(q, 500, p):
            sieve[j] = False
    k = 0
    for i in xrange((hi - lo) // 2):
        if sieve[i]: k += 1
    return k

The heart of the function is the loop for p in ps that performs the sieving, taking each sieving prime p in turn. The loop terminates when the square of the sieving prime is greater than the limit of the range, since all primes will be identified at that point (the reason we needed the next prime larger than the square root is so that there would be a sieving prime to stop the loop). The mysterious variable q is the offset into the sieve of the smallest multiple of p in the range lo to hi (note that it is not the smallest multiple of p in the range, but the index of the offset of the smallest multiple of p in the range, which can be confusing). The if statement increments q when it refers to a number that is a perfect square. Then the loop on j strikes the multiples of p from the sieve.

We use the count function in two ways. The first use builds a table of the values of pi(n) at multiples of 1000; the second use interpolates within the table. We store the table in a global variable piTable:

piTable = [0] * 10000

We choose the parameters 1000 and 10000 based on the original request to keep memory usage within fifty kilobytes. (Yes, I know the original poster relaxed that requirement. But we can honor it anyway.) Ten thousand 32-bit integers will take 40,000 bytes of storage, and sieving over a range of just 1000 from lo to hi will require only 500 bytes of storage and will be very fast. You might want to try other parameters to see how they affect the space and time usage of the program. Building the piTable is done by calling the count function ten thousand times:

for i in xrange(1, 10000):
    piTable[i] = piTable[i-1] + \
        count(1000 * (i-1), 1000 * i)

All of the computation up to this point can be done at compile time instead of run time. When I did these computations at ideone.com, they took about five seconds, but that time doesn't count, because it can be done once for all time when the programmer first writes the code. As a general rule, you should look for opportunities to move code from run time to compile time, to make your programs run very quickly.

The only thing left is to write the function that actually calculates the number of primes less than or equal to n:

def pi(n):
    if type(n) != int and type(n) != long:
        raise TypeError('must be integer')
    if n < 2: return 0
    if n == 2: return 1
    i = n // 1000
    return piTable[i] + count(1000 * i, n+1)

The first if statement does type checking. The second if statement returns a correct response to ridiculous input. The third if statement handles 2 specially; our sieving makes 1 a prime and 2 a composite, both of which are incorrect, so we make the fix here. Then i is calculated as the largest index of piTable less than the requested n, and the return statement adds the value from piTable to the count of primes between the table value and the requested value; the hi limit is n+1, because otherwise in the case that n is prime it wouldn't be counted. As an example, saying:

print pi(6543223)

will cause the number 447519 to be displayed on the terminal.

The pi function is very fast. At ideone.com, a thousand random calls to pi(n) were computed in about half a second, so about half a millisecond each; that includes the time to generate the prime number and sum the result, so the time to actually compute the pi function is even less than half a millisecond. That's a pretty good return on our investment in building the table.

If you're interested in programming with prime numbers, I've done quite a bit of work on my blog. Please come and visit.

Question 2

I would suggest a heuristic hybrid model would work here. Store every nth prime, and then do a linear search via primality testing. To speed things up, you can use a fast and simple primality test (like the Fermat test with a==2) and precompute the false positives. Some fine-tuning based on the maximum size of the input and your storage constraints should be easy to work out.

Question 3

If you know a priori that the input is prime, you may be able to use the approximation pi(n) ≈ n / log n with a small table of fixups for the primes where rounding the result isn't sufficient to get the correct value n. I think this is your best bet within the size constraint, aside from a slow brute-force approach.

Question 4

Here's some code that works. You should replace the primality test based on trial division with a deterministic Miller-Rabin test that works for your input range. Sieving to find primes in the appropriate small range would work better than trial division, but it's a step in the wrong direction.

#include <stdio.h>
#include <bitset>
using namespace std;

short smallprimes[549]; // about 1100 bytes
char in[19531]; // almost 20k

// Replace me with Miller-Rabin using 2, 7, and 61.
int isprime(int j) {
 if (j<3) return j==2;
 for (int i = 0; i < 549; i++) {
  int p = smallprimes[i];
  if (p*p > j) break;
  if (!(j%p)) return 0;
 }
 return 1;
}

void init() {
 bitset<4000> siv;
 for (int i = 2; i < 64; i++) if (!siv[i])
  for (int j = i+i; j < 4000; j+=i) siv[j] = 1;
 int k = 0;
 for (int i = 3; i < 4000; i+=2) if (!siv[i]) {
  smallprimes[k++] = i;
 }

 for (int a0 = 0; a0 < 10000000; a0 += 512) {
  in[a0/512] = !a0;
  for (int j = a0+1; j < a0+512; j+=2)
   in[a0/512] += isprime(j);
 }
}

int whichprime(int k) {
 if (k==2) return 1;
 int a = k/512;
 int ans = 1 + !a;
 for (int i = 0; i < a; i++) ans += in[i];
 for (int i = a*512+1; i<k; i+=2) ans += isprime(i);
 return ans;
}

int main() {
 int k;
 init();
 while (1 == scanf("%i", &k)) printf("%i\n", whichprime(k));
}

Question 5

The following sounds like what you are looking for. http://www.geekviewpoint.com/java/numbers/index_of_prime. There you will find the code and unit tests. Since your list is relatively small (i.e. 10^7), it should handle it.

Basically you find all the prime numbers between 2 and n and then count all the primes less than n to find the index. Also, in case n is not prime, the function returns -1.

Question 6

I did exactly this once. Wrote a code, that given n, can quickly find the nth prime, up to n = 203542528, so about 2e8. Or, it can go backwards, for any number n, can tell how many primes are less than n.

A databased is employed. I store all of the primes up to a certain point (the sqrt of my upper limit.) In your case, this means you would store all of the primes up to sqrt(1e7). There are 446 of them, and you can store that list in compressed form, since the maximum difference up to that point is only 34. Beyond that point, store every kth prime (for some value of k.) Then a quick sieve is sufficient to generate all primes in a short interval.

So in MATLAB, to find the 1e7'th prime:

nthprime(1e7)
ans =
   179424673

Or, it can find the number of primes less than 1e7:

nthprime(1e7,1)
ans =
      664579

The point is, such a database is easy to build and search. If your database can be no more than 50k, there should be no problem.

Question 7

What you suggest is best. Pre-compute (or download) the list of primes less than 10^7, then binary search them.

There are only 664579 primes less than 10^7, so the list will consume ~2.7 MB of space. The binary search to solve each instance will be super speedy - only ~20 operations.