Is this more efficient than Sieve of Eratosthenes?

Question 1

No, that's trial division which is much worse than the sieve of eratosthenes in time-complexity. Its space complexity is a bit better, but, since primes are about n/log(n) you are not saving huge quantities of memory. also the sieve can be done using bit-vectors which reduce the constants by 32/64 times(and thus for practical purposes it might be even better).

Small benchmark that shows the difference in timings:

>>> timeit.timeit('primes_list(1000)', 'from __main__ import primes_list', number=1000)
0.901777982711792
>>> timeit.timeit('erat(1000)', 'from __main__ import erat', number=1000)
0.2097640037536621

As you can see, even with n=1000 eratosthenes is more than 4 times faster. If we increase the search up to 10000:

>>> timeit.timeit('primes_list(10000)', 'from __main__ import primes_list', number=1000)
50.41101098060608
>>> timeit.timeit('erat(10000)', 'from __main__ import erat', number=1000)
2.3083159923553467

Now eratosthenes is 21 times faster. As you can see it's clear that eratosthenes is much faster.

using numpy arrays it's quite easy to reduce the memory by 32 or 64(depending on your machine architecture) and obtain much faster results:

>>> import numpy as np
>>> def erat2(n):
...     ar = np.ones(n, dtype=bool)
...     ar[0] = ar[1] = False
...     ar[4::2] = False
...     for j in xrange(3, n, 2):
...             if ar[j]:
...                     ar[j**2::2*j] = False
...     return ar.nonzero()[0]
... 
>>> timeit.timeit('erat2(10000)', 'from __main__ import erat2', number=1000)
0.5136890411376953

An other 4 times faster than the other sieve.

Question 2

What you are doing is trial divison - testing each candidate prime against each known prime below it. Skipping odd numbers in the call to range will save you some divisions, but this is exactly the technique sieveing is based on: you know that every second number is divisible by 2, and therefore composite. The sieve just extends this to:

every third number is divible by three, and therefore composite;
every fifth by five, and therefore composite

and so on. Since the Sieve is regarded as one of the most time-efficient algorithms available, and trial division as one of the least (wikipedia describes the sieve as O(n log log n), and per the comments below, your algorithm is likely O(n^2 / log n)) ) , it is reasonable to assume that trial division with some Sieve-like optimisation falls short of sieving.

Question 3

"how to benchmark the efficiencies?"

measure empirical orders of growth, that's how! :)

e.g. with data from the accepted answer, log_10(50.41/0.9) = 1.75, and log_10(2.31/0.21) = 1.04, so it's ~ n^1.75 empirical order of growth for your TD (trial division) code (in the range 1,000 ... 10,000) vs. ~ n^1.04 for the sieve of Eratosthenes.

Which, the latter, is consistent with n log log n, and the former, with n^2 / log(n)^2, as it should.

With g(n) = n^2 / log(n)^2, we have g(10000)/g(1000) = 56.25, which is just 0.4% off the empirical value 50.41/0.9 = 56.01. But with g2(n) = n^2 / log(n), we have g2(10000)/g2(1000) = 75, which is way off from the evidence.

About time complexity: actually, most of composites fail early (are multiples of small primes). To produce k=n/log(n) primes takes O(k^2) time here, testing each prime number by all its preceding primes, a.o.t. just those not exceeding its square root.

Composites don't add to the complexity (exact analysis in M. ONeill's JFP article (pg 4) gives the same complexity for composites as for primes when testing up to sqrt - each composite is guaranteed to have a prime factor not greater than its sqrt - so complexity for composites is even less than the complexity for primes here).

So overall it's O(k^2), which is to say, O(n^2/log(n)^2).

By adding just two lines you can improve your code's speed drastically, from O(n^2/log(n)^2) to O(n^1.5/log(n)^2) time complexity:

    for j in ans: 
        if j*j > i: 
            ans.append(i); break; 
        if i % j == 0:
            break
    # else:
    #     ans.append(i)

The improvement in time complexity means the run time ratio of 17.8x for 10,000 over 1,000, instead of 56.25x as before. This translates to ~ n^1.25 empirical order of growth in this range (instead of ~ n^1.75). The absolute run time for 10,000 call will be much closer to the 2.31 seconds than the old 50.41 seconds.

BTW your original code is equivalent to the famous code by David Turner,

primes = sieve [2..]
sieve (x:xs) = x : sieve [y | y <- xs, rem y x /= 0]

and the improved code, to this:

primes = 2 : sieve [3..] primes 
sieve xs (p:ps) | (h,t) <- span (< p*p) xs =
              h ++ sieve [y | y <- t, rem y p /= 0] ps

(the code is in Haskell by I think it's readable enough. x:xs stands for a list with x the head element and xs the rest of list).

Question 4

Since range in Python 2 returns a list, your algorithm's space complexity is still O(n), so it is not more space efficient (at least not asymptotically). If you used xrange (or range in Python 3), it would be somewhat more space efficient because you'd only be storing the primes - not all numbers up to n.

Your time efficiency will be worse than a sieve either way.