How can I reproduce the race conditions in this python code reliably?

Question 1

Traditionally, forcing race conditions in multithreaded code is done with semaphores, so you can force a thread to wait until another thread has achieved some edge condition before continuing.

For example, your object has some code to check that start is not called if the object is already running. You could force this condition to make sure it behaves as expected by doing something like this:

starting a KitchenTimer
having the timer block on a semaphore while in the running state
starting the same timer in another thread
catching AlreadyRunningError

To do some of this you may need to extend the KitchenTimer class. Formal unit tests will often use mock objects which are defined to block at critical times. Mock objects are a bigger topic than I can address here, but googling "python mock object" will turn up a lot of documentation and many implementations to choose from.

Here's a way that you could force your code to throw AlreadyRunningError:

import threading

class TestKitchenTimer(KitchenTimer):

    _runningLock = threading.Condition()

    def start(self, duration=1, whenTimeUp=None):
        KitchenTimer.start(self, duration, whenTimeUp)
        with self._runningLock:
            print "waiting on _runningLock"
            self._runningLock.wait()

    def resume(self):
        with self._runningLock:
            self._runningLock.notify()

timer = TestKitchenTimer()

# Start the timer in a subthread. This thread will block as soon as
# it is started.
thread_1 = threading.Thread(target = timer.start, args = (10, None))
thread_1.start()

# Attempt to start the timer in a second thread, causing it to throw
# an AlreadyRunningError.
try:
    thread_2 = threading.Thread(target = timer.start, args = (10, None))
    thread_2.start()
except AlreadyRunningError:
    print "AlreadyRunningError"
    timer.resume()
    timer.stop()

Reading through the code, identify some of the boundary conditions you want to test, then think about where you would need to pause the timer to force that condition to arise, and add Conditions, Semaphores, Events, etc. to make it happen. e.g. what happens if, just as the timer runs the whenTimeUp callback, another thread tries to stop it? You can force that condition by making the timer wait as soon as it's entered _whenTimeUp:

import threading

class TestKitchenTimer(KitchenTimer):

    _runningLock = threading.Condition()

    def _whenTimeup(self):
        with self._runningLock:
            self._runningLock.wait()
        KitchenTimer._whenTimeup(self)

    def resume(self):
        with self._runningLock:
            self._runningLock.notify()

def TimeupCallback():
    print "TimeupCallback was called"

timer = TestKitchenTimer()

# The timer thread will block when the timer expires, but before the callback
# is invoked.
thread_1 = threading.Thread(target = timer.start, args = (1, TimeupCallback))
thread_1.start()
sleep(2)

# The timer is now blocked. In the parent thread, we stop it.
timer.stop()
print "timer is stopped: %r" % timer.isStopped()

# Now allow the countdown thread to resume.
timer.resume()

Subclassing the class you want to test isn't an awesome way to instrument it for testing: you'll have to override basically all of the methods in order to test race conditions in each one, and at that point there's a good argument to be made that you're not really testing the original code. Instead, you may find it cleaner to put the semaphores right in the KitchenTimer object but initialized to None by default, and have your methods check if testRunningLock is not None: before acquiring or waiting on the lock. Then you can force races on the actual code that you're submitting.

Some reading on Python mock frameworks that may be helpful. In fact, I'm not sure that mocks would be helpful in testing this code: it's almost entirely self-contained and doesn't rely on many external objects. But mock tutorials sometimes touch on issues like these. I haven't used any of these, but the documentation on these like a good place to get started:

Question 2

The most common solution to testing thread (un)safe code is to start a lot of threads and hope for the best. The problem I, and I can imagine others, have with this is that it relies on chance and it makes tests 'heavy'.

As I ran into this a while ago I wanted to go for precision instead of brute force. The result is a piece of test code to cause race-conditions by letting the threads race neck to neck.

Sample racey code

spam = []

def set_spam():
    spam[:] = foo()
    use(spam)

If set_spam is called from several threads, a race condition exists between modification and use of spam. Let's try to reproduce it consistently.

How to cause race-conditions

class TriggeredThread(threading.Thread):
    def __init__(self, sequence=None, *args, **kwargs):
        self.sequence = sequence
        self.lock = threading.Condition()
        self.event = threading.Event()
        threading.Thread.__init__(self, *args, **kwargs)

    def __enter__(self):
        self.lock.acquire()
        while not self.event.is_set():
            self.lock.wait()
        self.event.clear()

    def __exit__(self, *args):
        self.lock.release()
        if self.sequence:
            next(self.sequence).trigger()

    def trigger(self):
        with self.lock:
            self.event.set()
            self.lock.notify()

Then to demonstrate the use of this thread:

spam = []  # Use a list to share values across threads.
results = []  # Register the results.

def set_spam():
    thread = threading.current_thread()
    with thread:  # Acquires the lock.
        # Set 'spam' to thread name
        spam[:] = [thread.name]
    # Thread 'releases' the lock upon exiting the context.
    # The next thread is triggered and this thread waits for a trigger.
    with thread:
        # Since each thread overwrites the content of the 'spam'
        # list, this should only result in True for the last thread.
        results.append(spam == [thread.name])

threads = [
    TriggeredThread(name='a', target=set_spam),
    TriggeredThread(name='b', target=set_spam),
    TriggeredThread(name='c', target=set_spam)]

# Create a shifted sequence of threads and share it among the threads.
thread_sequence = itertools.cycle(threads[1:] + threads[:1])
for thread in threads:
    thread.sequence = thread_sequence

# Start each thread
[thread.start() for thread in threads]
# Trigger first thread.
# That thread will trigger the next thread, and so on.
threads[0].trigger()
# Wait for each thread to finish.
[thread.join() for thread in threads]
# The last thread 'has won the race' overwriting the value
# for 'spam', thus [False, False, True].
# If set_spam were thread-safe, all results would be true.
assert results == [False, False, True], "race condition triggered"
assert results == [True, True, True], "code is thread-safe"

I think I explained enough about this construction so you can implement it for your own situation. I think this fits the 'extra points' section quite nicely:

extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.

Solving race-conditions

Shared variables

Each threading issue is solved in it's own specific way. In the example above I caused a race-condition by sharing a value across threads. Similar problems can occur when using global variables, such as a module attribute. The key to solving such issues may be to use a thread-local storage:

# The thread local storage is a global.
# This may seem weird at first, but it isn't actually shared among threads.
data = threading.local()
data.spam = []  # This list only exists in this thread.
results = []  # Results *are* shared though.

def set_spam():
    thread = threading.current_thread()
    # 'get' or set the 'spam' list. This actually creates a new list.
    # If the list was shared among threads this would cause a race-condition.
    data.spam = getattr(data, 'spam', [])
    with thread:
        data.spam[:] = [thread.name]
    with thread:
        results.append(data.spam == [thread.name])

# Start the threads as in the example above.

assert all(results)  # All results should be True.

Concurrent reads/writes

A common threading issue is the problem of multiple threads reading and/or writing to a data holder concurrently. This problem is solved by implementing a read-write lock. The actual implementation of a read-write lock may differ. You may choose a read-first lock, a write-first lock or just at random.

I'm sure there are examples out there describing such locking techniques. I may write an example later as this is quite a long answer already. ;-)

Notes

Have a look at the threading module documentation and experiment with it a bit. As each threading issue is different, different solutions apply.

While on the subject of threading, have a look at the Python GIL (Global Interpreter Lock). It is important to note that threading may not actually be the best approach in optimizing performance (but this is not your goal). I found this presentation pretty good: https://www.youtube.com/watch?v=zEaosS1U5qY

Question 3

You can test it by using a lot of threads:

import sys, random, thread
def timeup():
    sys.stdout.write("Timer:: Up %f" % time())

def trdfunc(kt, tid):
    while True :
        sleep(1)
        if not kt.isRunning():
            if kt.start(1, timeup):
                sys.stdout.write("[%d]: started\n" % tid)
        else:
            if random.random() < 0.1:
                kt.stop()
                sys.stdout.write("[%d]: stopped\n" % tid)
        sys.stdout.write("[%d] remains %f\n" % ( tid, kt.timeRemaining))

kt = KitchenTimer()
kt.start(1, timeup)
for i in range(1, 100):
    thread.start_new_thread ( trdfunc, (kt, i) )
trdfunc(kt, 0)

A couple of problem problems I see:

When a thread sees the timer as not running and try to start it, the code generally raises an exception due to context switch in between test and start. I think raising an exception is too much. Or you can have an atomic testAndStart function
A similar problem occurs with stop. You can implement a testAndStop function.
Even this code from the timeRemaining function:
```
if self.isRunning():
   self._timeRemaining = self.duration - self._elapsedTime()
```
Needs some sort of atomicity, perhaps you need to grab a lock before testing isRunning.

If you plan to share this class between threads, you need to address these issues.

Question 4

In general - this is not viable solution. You can reproduce this race condition by using debugger (set breakpoints in some locations in the code, than, when it hits one of the breakpoints - freeze the thread and run the code until it hits another breakpoint, then freeze this thread and unfreeze the first thread, you can interleave threads execution in any way using this technique).

The problem is - the more threads and code you have, the more ways to interleave side effects they will have. Actually - it will grow exponentially. There is no viable solution to test it in general. It is possible only in some simple cases.

The solution to this problem are well known. Write code that is aware of it's side effects, control side effects with synchronisation primitives like locks, semaphores or queues or use immutable data if its possible.

Maybe more practical way is to use runtime checks to force correct call order. For example (pseudocode):

class RacyObject:
    def __init__(self):
        self.__cnt = 0
        ...

    def isReadyAndLocked(self):
        acquire_object_lock
            if self.__cnt % 2 != 0:
                # another thread is ready to start the Job
                return False
            if self.__is_ready:
                self.__cnt += 1
                return True
            # Job is in progress or doesn't ready yet
            return False
        release_object_lock

    def doJobAndRelease(self):
        acquire_object_lock
            if self.__cnt % 2 != 1:
                raise RaceConditionDetected("Incorrect order")
            self.__cnt += 1
            do_job()
        release_object_lock

This code will throw exception if you doesn't check isReadyAndLock before calling doJobAndRelease. This can be tested easily using only one thread.

obj = RacyObject()
...
# correct usage
if obj.isReadyAndLocked()
    obj.doJobAndRelease()