How to keep interrupts short?

Question 1

The traditional approach for this is for Interrupts to schedule a deferred procedure and end the interrupt as soon as possible.

Once the interrupt has finished, the list of deferred procedures is walked from most-important to least important.

Consider the case where you have your main (lower proiority) action, and two interrupts I1 and I2, where I2 is more important than main, but less important than I1.

In this case, let's suppose you're running main and I1 fires. I1 schedules a deferred procedure and signals to the hardware that I1 is done. I1's DPC now begins running. Suddenly I2 comes in from the hardware. I2's interrupt takes over from I1's DPC and schedules I2's DPC and signals to the hardware that it's done.

The scheduler then returns to I1's DPC (because it is more important), and when I1's DPC completes, I2's DPC begins (because it is more important than main), and then eventually returns execution to main.

This design allows you to schedule the importance of different interrupts, encourages you to keep your interrupts small, and allows you to complete DPCs in an ordered and in-order prioritized way.

Question 2

There are 100 different ways to skin this cat, depending on CPU architecture (interrupt nesting & prioritization, software interrupt support, etc.) but let's take a pretty straightforward approach that is relatively simple to understand and free from the race conditions and resource-sharing hazards of a preemptive kernel.

(Disclaimer: my first choice is typically a preemptive real time kernel, many of them can run in extremely resource-constrained systems... SecurityMatt's suggestion is good but if you're not comfortable implementing your own preemptible kernel / task switcher, particularly one that handles asynchronous (interrupt-triggered) preemption, you can get wrapped around the axle pretty quickly. So what I'm proposing below is not as responsive as a preemption-based kernel, but it's much simpler and often adequate).

Create 3 event/work queues:

Q1 is the lowest priority and handles your slow, background SD card writes
Q2 holds requests to process incoming UART packets
Q3 (highest priority) holds UART RX FIFO read requests.

I split up the UART RX FIFO reading and the processing of the read packet so that the FIFO reading is always serviced ahead of the packet processing; maybe you want to keep them together, your choice.

For this to work, you break your large (~100ms) SD card write process into a bunch of smaller, discrete, run to completion steps.

So for example, to write 5 blocks, 20ms each, you write the first block, then enqueue "write next block" to Q1. You go back to your scheduler at the end of each step & scan the queues in priority order, starting with Q3. If Q2 and Q3 are empty, you pull the next event off of Q1 ("write next block"), and run that command for another 20ms before returning and scanning the queues again. If 20ms is not responsive enough, you break up each 20ms block write into a more fine-grained set of steps, continually posting to Q1 the next work step.

Now for the incoming UART stuff; in the UART RX ISR, you simple enqueue a "read UART FIFO" command in Q3, and return from interrupt back into the 20ms "write block" step that was interrupted. As soon as the CPU finishes the write, it goes back and scans the queues in priority order (worst case response will be 20ms if the block write had just begun at the time of the interrupt). The queue scanner (scheduler) will see that Q3 now has work to do, and it will run that command before going back and scanning again.

The responsiveness in your system, worst case, will be determined by the longest run-to-completion step in the system, regardless of priority. You keep your system very responsive by doing work in small, discrete, run to completion steps.

Note that I have to speak in generalities here. Maybe you want to read the UART RX FIFO in the ISR, put the data into a buffer, and only defer the packet processing, not the actual reading of the FIFO (then you'd only have 2 queues). You have to work this out for yourself. But I hope the approach makes sense.

This event-driven approach with prioritized queues is exactly the approach used by the Quantum Platform (QP) event-driven framework. The QP actually supports an underlying non-preemptive (cooperative) scheduler, such as what was described here, or a preemptive scheduler which runs the scheduler each an event is queued (similar to the approach suggested by SecurityMatt). You can see the code/implementation of the QP's cooperative scheduler over at QP website.

Question 3

An alternative solution would be as follow:

Anywhere the FAT library can capture the processor for a long time, you insert a call to a new function which is normally very fast and return to the caller after a few machine cycles. Such fast function would not impact the real-time performance of your time consuming operation, such as reading/writing to SD Flash. You would insert such call in any loop that wait for a flash sector to be erased. You also insert a call to such function in between every 512 bytes written or 512 bytes read.

The goal of that function is to perform most of the task that you would normally have inside the "while(1)" loop in a typical "main()" for embedded device. It would first increment an integer and perform a fast modulo on the new value, then return if the modulo is not equal to an arbitrary constant. The code is as follow:

void premption_check(void)
{
    static int fast_modulo = 0;
    //divide the number of call
    fast_modulo++;
    if( (fast_modulo & 0x003F) != 3)
    {
        return;
    }
    //the processor would continue here only once every 64 calls to "premption_check"

Next, you call the functions that extract RS232 characters/strings from the serial port interrupts, process any command if complete strings are received, etc

The binary mask 0x3F used above means that we look only at the 6 least significant bits of the counter. When these 6 bits happen to be equal to the arbitrary value 5, when go ahead with the calls to functions which may take some micro-second or even milli-second to execute. You may want to try smaller or larger binary mask depending on the speed at which you want to service the serial port and other operations. You may even use simultaneously more than one mask to service some operation faster than other.

The FAT library and the SD card should not experience any problem when some sporadic delay happen in between two Flash erase operation, for example.

The solution given here works even with a micro-controller with only 2K byte, like many variant of 8051. As incredible as it may seems, the pinball machine of 1980 to 1990 had a few K of RAM, slow processors (like 10 MHz) and they where able to test one hundred switch... fully debounced, update a X/Y matrix display, produce sound effects, etc The solutions developed by these engineer can still be used to boost the performance of large system. Even with the best servers with 64 Gig RAM and many Terabyte of hard disk, I presume that any bytes count when some company want to index billions of WEB pages.

Question 4

As no-one has suggested coming at it from this end yet I'll throw it in the hat:

It's possible that sticking the SD card service routine in a low-priority interrupt, maybe throwing in some DMA if you can, would free up your main loop & other interrupts to be more responsive, rather than being stuck in a main() loop waiting for longtime for something to finish.

The caveat to this is I don't know if the hardware has any way of triggering the interrupt when the SD card is ready for more, you might have to cheat by running a polling timer to check & force the interrupt. I'm not above that sort of thing though, if you have spare hardware timers & interrupts it can be done with very little overhead.

Resorting to an RTOS for something like this would seem overkill & an admission of failure to me... ;)