Did I just prove that sieve of Eratosthenes is less efficient than trial division?

Question 1

TL;DR: comparing the speed of code variants at just one input size is meaningless; comparing empirical orders of growth truly reflects algorithmic nature of the code and will be consistent across different test platforms, for the same test range of input sizes. Comparing absolute speed values is only meaningful for code variants which exhibit same asymptotic or at least local growth behaviour.

It is not enough to measure speed of your two implementations just at one input size. Usually several data points are needed, to assess the run time empirical orders of growth of our code (because the code can be run with varying input sizes). It is found as the logarithm of the ratio of run times, in base of the ratio of input sizes.

So even if at some input code_1 runs 10 times faster than code_2, but its run time doubles with each doubling of the input size, whereas for code_2 it only grows as 1.1x, very soon code_2 will become much much faster than code_1.

So the real measure of an algorithm's efficiency is its run time complexity (and the complexity of its space i.e. memory requirements). And when we measure it empirically, we only measure if for the particular code at hand (at a particular range of input sizes), not for the algorithm itself, i.e. the ideal implementation of it.

In particular, the theoretical complexity of trial division is O(n^1.5 / (log n)^0.5), in n primes produced, usually seen as ~ n^1.40..1.45 empirical order of growth (but it can be ~n^1.3 initially, for smaller input sizes). For the sieve of Eratosthenes it is O(n log n log (log n)), seen usually as ~ n^1.1..1.2. But there certainly are sub-optimal implementations of both the trial division and the sieve of Eratosthenes that run at ~n^2.0 and worse.

So no, this proves nothing. One datapoint is meaningless, at least three are needed to get a "big picture" i.e. to be able to predict with some certainty the run time ⁄ space needed for bigger input sizes.

Prediction with known certainty is what the scientific method is all about.

BTW your run times are very long. The calculation of 10,000 primes should be nearly instantaneous, much less than 1/100th of a second for a C program run on a fast box. Perhaps you're measuring printing time as well. Don't. :)

Question 2

No, elapsed run time is not a standard for measuring efficiency as it varies from platform to platform -- saying "my algorithm ran in 10 seconds" gives little to no information about the algorithm itself. In addition to that, you would need to list the entire environment specs and other processes running at the same time and it would be a huge mess. Hence, the development of the order notations (Big Oh, Little Oh, Omega, etc.).

Efficiency is typically branched into two subsections:

Time efficiency.
Space efficiency.

... where one algorithm may be extremely time efficiency, but very inefficient space-wise. Vice-versa applies. Algorithms are analyzed based on their asymptotic behaviour when scaling the amount of instructions they need to execute for a given input n. This is a very high-level explanation on a field that is meticulously studied by PhD Computer Scientists -- I suggest you read more about it here for the best low-level explanation that you will find.

Note, I am attaching the link for Big Oh notation -- the sister notations can all be found off of that Wikipedia page and it's typically a good place to start. It will get into the difference of space and time efficiency as well.

Small Application of Time Efficiency using Big Oh:

Consider the following recursive function in Racket (would be in Python if I knew it -- best pseudo code I can do):

(define (fn_a input_a)
  (cond
    [(empty? input_a) empty]
    [(empty? (rest input_a)) input_a]
    [(> (first input_a) (fn_a (rest input_a))) (cons (first input_a) empty)]
    [else (fn_a (rest input_a))]))

... we see that: empty?, rest, > and first are all O(1). We also notice that in the worst case, a call is made to fn_a in the third condition and fourth condition on the rest of input_a. We can then write our recurrence relation as, T(n) = O(1) + 2T(n - 1). Looking this up on a recurrence relation chart we see that fn_a is of order O(2^n) because in the worst case, two recursive calls are made.

It's also important to note that, by the formal definition of Big Oh it is also correct (however useless) to state that fn_a is O(3^n). Lot's of algorithms when analyzed are stated using Big Oh however it would be more appropriate to use Big Theta to tighten the bounds, essentially meaning: the lowest, most accurate order with respect to a given algorithm.

Be careful, read the formal definitions!

Question 3

Does a longer run-time mean a less efficient algorithm?

Not necessary. The efficiency of program is measured not only by the time it takes but also by the resources which it is taking. Space is one other factor which is kept in mind while considering the efficiency.

From the wiki:-

For maximum efficiency we wish to minimize resource usage. However, the various resources (e.g. time, space) can not be compared directly, so which of two algorithms is considered to be more efficient often depends on which measure of efficiency is being considered as the most important, e.g. is the requirement for high speed, or for minimum memory usage, or for some other measure?

Question 4

In general: yes, but when you are down in the sub 1 second range, there is a lot of noise which could be confusing...

Runs each test many many times and use some stats on the results (e.g. average or mean/deviation depending on how much you care)

And/or make it do more work - like finding a larger number of primes

Question 5

In short, yes, if by efficiency you mean time efficient. There are memory considerations too.

Be careful how you measure though - make sure your timing tools are precise.

Make sure you measure on the same machine when nothing else is running.
Make sure you measure several times and take an average and the varinace for a decent comparison.
Consider getting someone to review your code to check it is doing what you think it is doing.

Question 6

The efficiency of algorithms is normally measured by how efficiently they handle large inputs. 10,000 numbers is not a very large input, so you may need to use a larger one before the sieve of Eratosthenes starts to become quicker.

Alternatively, there may be a big in one of your implementations

Lastly, efficiency in algorithms can be measured by amount of memory needed (but this measure is less common, especially since memory is so cheap nowadays)