Sum of all prime numbers below 2 million

Question 1

You appear to be trying to implement the Sieve of Eratosthenes which should perform better that O(N^2) (In fact, Wikipedia says it is O(N log(log N)) ...).

The fundamental problem is your choice of data structure. You've chosen to represent the set of remaining prime candidates as an ArrayList of primes. This means that your test to see if a number is still in the set takes O(N) comparisons ... where N is the number of remaining primes. Then you are using ArrayList.remove(int) to remove the non-primes ... which is O(N) also.

That all adds up to making your Sieve implementation worse than O(N^2).

The solution is to replace the ArrayList<Integer> with an boolean[] where the positions (indexes) in the boolean array represent the numbers, and the value of the boolean says whether the number is prime / possibly prime, or not prime.

^{(There were other problems too that I didn't notice ... see the other answers.)}

Question 2

There are a few issues here. First, lets talk about the algorithm. Your isPrime method is actually the very thing that the sieve is designed to avoid. When you get to a number in the sieve, you already know it's prime, you don't need to test it. If it weren't prime, it would already have been eliminated as a factor of a lower number.

So, point 1:

You can eliminate the isPrime method altogether. It should never return false.

Then, there are implementation issues. primes.contains and primes.remove are problems. They run in linear time on an ArrayList, because they require checking each element or rewriting a large portion of the backing array.

Point 2:

Either mark values in place (use boolean[], or use some other more appropriate data structure.)

I typically use something like boolean primes = new boolean[upper+1], and define n to be included if !(primes[n]). (I just ignore elements 0 and 1 so I don't have to subtract indices.) To "remove" an element, I set it to true. You could also use something like TreeSet<Integer>, I suppose. Using boolean[], the method is near-instantaneous.

Point 3:

sum needs to be a long. The answer (roughly 1.429e11) is larger than the maximum value of an integer (2^31-1)

I can post working code if you like, but here's a test output, without spoilers:

public static void main(String[] args) {
    long value;
    long start;
    long finish;

    start = System.nanoTime();
    value = arrayMethod(2000000);
    finish = System.nanoTime();
    System.out.printf("Value: %.3e, time: %4d ms\n", (double)value, (finish-start)/1000000);

    start = System.nanoTime();
    value = treeMethod(2000000);
    finish = System.nanoTime();
    System.out.printf("Value: %.3e, time: %4d ms\n", (double)value, (finish-start)/1000000);
}

output:

Using boolean[]
    Value: 1.429e+11, time:   17 ms
Using TreeSet<Integer>
    Value: 1.429e+11, time: 4869 ms

Edit: Since spoilers are posted, here's my code:

public static long arrayMethod(int upper) {
    boolean[] primes = new boolean[upper+1]; 
    long sum = 0;
    for (int i = 2; i <=upper; i++) {
        if (!primes[i]) {
            sum += i;
            for (int k = 2*i; k <= upper; k+=i) {
                primes[k] = true;
            }
        }
    }
    return sum;
}

public static long treeMethod(int upper) {
    TreeSet<Integer> primes = new TreeSet<Integer>();
    for (int i = 2; i <= upper; i++) {
        primes.add(i);
    }
    long sum = 0;
    for (Integer i = 2; i != null; i=primes.higher(i)) {
        sum += i;
        for (int k = 2*i; k <= upper; k+=i) {
            primes.remove(k);
        }
    }
    return sum;
}

Question 3

Two things:

Your code is hard to follow. You have a list called "primes", that contains non prime numbers!

Also, you should strongly consider whether or not an array list is appropriate. In this case, a LinkedList would be much more efficient.

Why is this? An array list must constantly resize an array by: asking for new memory to create an array, then copying the old memory over in the newly created array. A Linked list would just resize the memory by changing a pointer. This is a lot quicker! However, I do not think that by making this change you can salvage your algorithm.

You should use an array list if you need to access the items non-sequentially, here, (with a suitable algorithm) you need to access the items sequentially.

Also, your algorithm is slow.Take the advice of SJuan76 (or gyrogearless), thanks sjuan76

Question 4

The key to the efficiency of classic implementation of the sieve of Eratosthenes on modern CPUs is the direct (i.e. non-sequential) memory access. Fortunately, ArrayList<E> does implement RandomAccess.

Another key to the sieve's efficiency is its conflation of index and value, just like in integer sorting. Actually removing any number from the sequence destroys this ability to directly address without any computations. We must mark, not remove, any composite as we find them, so any numbers greater than it will remain in their places in the sequence.

ArrayList<Integer> can be used for that (except taking more memory than is strictly necessary, but for 2 million this is inconsequential).

So your code with a minimal edit fix (also changing sum to be long as others point out too), becomes

import java.util.ArrayList;

public class Main
{
  /**
   * The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17
   * Find the sum of all the primes below two million.
   * @param args
   */
  public static void main(String[] args)
  {
    ArrayList<Integer> primes = new ArrayList<Integer>();
    int upper = 5000;
    primes.ensureCapacity(upper);
    for (int i = 0; i < upper; i++) {
      primes.add(i);
    }
    long sum = 0;
    for (int i = 2; i <= upper / i; i++) {
      if ( primes.get(i) > 0 ) {
        for (int k = i*i; k < upper ; k+=i) {
          primes.set(k, 0);
        }
      }
    }
    for (int i = 2; i < upper; i++) {
      sum += primes.get(i);
    }
    System.out.println(sum);
  }
}

Finds the result for 2000000 in half a second on Ideone. The projected run time for your original code there: between 10 and 400 hours (!).

To find rough estimates for the run time when faced with a slow code, you should always try to find out its empirical orders of growth: run it for some small size n1, then a bigger size n2, record the run times t1 and t2. If t ~ n^a, then a = log(t2/t1) / log(n2/n1).

For your original code the empirical orders of growth measured on 10k .. 20k .. 40k range of upper limit value N, are ~ N^1.7 .. N^1.9 .. N^2.1. For the fixed code it's faster than ~ N (in fact, it's ~ N^0.9 in the tested range 0.5 mln .. 1 mln .. 2 mln). The theoretical complexity is O(N log (log N)).

Question 5

Your program is not the Sieve of Eratosthenes; the modulo operator gives it away. Your program will be O(n^2), where a proper Sieve of Eratosthenes is O(n log log n), which is essentially n. Here's my version; I'll leave it to you to translate to Java with appropriate numeric datatypes:

function sumPrimes(n)
    sum := 0
    sieve := makeArray(2..n, True)
    for p from 2 to n step 1
        if sieve[p]
            sum := sum + p
            for i from p * p to n step p
                sieve[i] := False
    return sum

If you're interested in programming with prime numbers, I modestly recommend this essay at my blog.