Threading as "slow" as non threaded

Question 1

Specifically addressing your (general) questions

Is it true that threads are not necessary faster than using no thread? 
If yes what is the explanation for this?

The efficiency of using multiple threads to accomplish a task is limited, primarily, by the number of CPU cores (including hyper-threading where available). For example, if your system has two cores, then two threads can run at the same time. In your case (i5), you may have a 2-core, or 4-core processor. With hyper-threading, your system can run 4 or 8 threads at the same time.

Where your application appears to have only two threads (three, including the parent 'main()' thread), there should be a notable improvement. However, keep in mind that your threads are not the only ones active on your system. Likely, there are many threads of execution on your machine already; all competing for CPU resources.

As a CPU resource becomes available, the thread scheduler pulls another thread from a the queue of threads waiting for a CPU. It is unlikely that one of your threads will always be at the top of the run-queue. Hence, they will continue to wait for their turn in the run-queue.

Each time your code calls a 'blocking' function, the thread's context is stored in memory, and the thread is returned to the run-queue. Even innocent functions like 'printf()', which may block, will cause the thread to be returned to the run-queue.

Often, peer threads compete for resources other than CPU resources; such as shared memory, shared file access, etc. Generally these resources are protected by semaphores, locks, etc. This can also impact the efficiency of multiple threads vs a single thread.

These, and many other factors (including that mentioned by Mark Ransom) may have an effect on the timing results.

Question 2

Your primes calculator is O(n^2). Note that 5000^2 = 25000000, while (10,000^2)/2 = 50000000.

This makes the second thread the bottleneck of the algorithm, and is waiting a significant amount of time for the first one.
In other words, the first thread is doing very little work, compared to the second one, and thus the first is idling for most of the work.

Question 3

clock() returns CPU time. If you're using 2 CPUs concurrently for 1 second, clock() will increase by 2. You will want to measure wall time (actual elapsed real world time) instead. Also, as other answerers have said, your thread loads are imbalanced, so one thread will run for much longer than the other, although total wall time should still only be a little over 75% of the single-threaded case. (for a sufficiently long workload)

Question 4

I think you'll find that your isPrime function is O(n), so the second half with large n will dominate the overall timings. You should time both halves individually for the unthreaded test.

Question 5

You can load-balance your threads by partitioning the work differently. Note that 2 is the only even prime, so give each thread half of the odd numbers with code like this

void *calcFirstHalf()
{
    int i;
    for ( i = 1; i < 1000000; i += 4 )  // 1, 5, 9, 13...
       if ( isPrime( i ) )
       {
       }
    return NULL;
}

void *calcSecondHalf()
{
    int i;
    for ( i = 3; i < 1000000; i += 4 )  // 3, 7, 11, 15...
       if ( isPrime( i ) )
       {
       }
    return NULL;
}

Side note: you can also improve the efficiency of the isPrime function by only checking factors up to the square root of the proposed prime, since every non-prime must have at least one factor that is less than or equal to the square root.

Doing performance measurements on a MAC

The high-precision timer on a MAC is accessed through the mach_absolute_time function, as demonstrated by the code below.

#include <mach/mach.h>
#include <mach/mach_time.h>

void testTimer( void )
{
    uint64_t start, end;
    mach_timebase_info_data_t info;

    mach_timebase_info( &info );
    printf( "numer=%u denom=%u\n", info.numer, info.denom );

    start = mach_absolute_time();
    sleep( 1 );
    end = mach_absolute_time();

    printf( "%llu\n", end - start );
}

Note that the precision of the timer is not a fixed value, but must be calculated based on the information returned from the mach_timebase_info function. The calculation is

timer_rate = 1Ghz * numer / denom

You can confirm the timer rate by calling sleep for one second to see approximately how many ticks you get per second.