improving C circular buffer efficiency

https://stackoverflow.com//questions/9718116

16-12-2019
|

Question

I'd like some help improving the efficiency of my circular buffer code.

I had a look around stackoverflow and found that (nearly) all of the topics on circular buffers are about the uses of such a buffer or the basic implementation of a circular buffer. I really need information about how to make it super efficient.

The plan is to use this buffer with the STM32F4 microcontroller which has a single precicion FPU. I plan to make heavy use of especially the write() and readn() functions. We're literally talking a few million calls a second here so shaving of a few clock cycles here and there is really going to make a difference.

I'll put the most important bits of code here, the full buffer code is available via http://dl.dropbox.com/u/39710897/circular%20buffer.rar

Can anyone provide me with a few pointers on how to improve the efficiency of this buffer?

#define BUFF_SIZE 3             // buffer size set at compile time

typedef struct buffer{
    float buff[BUFF_SIZE];
    int readIndex;
    int writeIndex;
}buffer;

/********************************\
* void write(buffer* buffer, float value)
* writes value into the buffer
* @param buffer* buffer
*   pointer to buffer to be used
* @param float value
*   valueto be written in buffer
\********************************/
void write(buffer* buffer,float value){
    buffer->buff[buffer->writeIndex]=value;
    buffer->writeIndex++;
    if(buffer->writeIndex==BUFF_SIZE)
        buffer->writeIndex=0;
}

/********************************\
* float readn(buffer* buffer, int Xn)
* reads specified value from buffer
* @param buffer* buffer
*   pointer to buffer to be read from
* @param int Xn
*   specifies the value to be read from buffer counting backwards from the most recently written value
*   i.e. the most recently writen value can be read with readn(buffer, 0), the value written before that with readn(buffer, 1)
\********************************/
float readn(buffer* buffer, int Xn){
    int tempIndex;

    tempIndex=buffer->writeIndex-(Xn+1);
    while(tempIndex<0){
        tempIndex+=BUFF_SIZE;
    }

    return buffer->buff[tempIndex];
}

Solution

As "Oli Charlesworth" suggested - you'd be able to simplify things if your buffer size is a power of 2. I'd like to write the read/write function bodies, so that the intent is more clear.

#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)

struct buffer {
    float buff[BUFF_SIZE];
    unsigned writeIndex;
};

void write(struct buffer *buffer, float value) {
    buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}

float readn(struct buffer *buffer, unsigned Xn){
    return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}

Some explanations. Note that there's no branching (if) at all. We don't limit the array index to the array bounds, instead we're AND-ing it with the mask.

OTHER TIPS

If you can make your buffer size a power-of-2, then the check against zero can be replaced with unconditional bit-masking. On most processors, this should be faster.

This may not be seem elegant but is efficient. Accessing structure elements through the pointer is taking up a lot of instructions. Why not remove the structure altogether and make buffer and writeIndex as global variables? This will considerably decrease the size of your readn and write functions.

I tried in gcc and here is the output with and without the structure

With Structure

_write:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %ecx
    movl    8(%ebp), %eax
    movl    16(%eax), %edx
    movl    12(%ebp), %eax
    movl    %eax, (%ecx,%edx,4)
    movl    8(%ebp), %eax
    incl    16(%eax)
    movl    8(%ebp), %eax
    cmpl    $3, 16(%eax)
    jne L1
    movl    8(%ebp), %eax
    movl    $0, 16(%eax)
L1:
    popl    %ebp
    ret

Without Structure. ie Making buffer and writeIndex as global

_write:
    pushl   %ebp
    movl    %esp, %ebp
    movl    _writeIndex, %edx
    movl    8(%ebp), %eax
    movl    %eax, _buff(,%edx,4)
    incl    _writeIndex
    cmpl    $3, _writeIndex
    jne L1
    movl    $0, _writeIndex
L1:
    popl    %ebp
    ret

Keeping track of the start and the end of the circular buffer with pointers, is likely a bit faster than array indexing, since the address will computed in runtime in case of the latter. Try to replace readIndex and writeIndex with float* instead. The code would then be

*buffer->writeIndex = value;
buffer->writeIndex++;
if(buffer->writeIndex == buffer + BUFF_SIZE)
  buffer->writeIndex=buffer->buff;

buffer + BUFF_SIZE will still be a constant expression and the compiler will translate that to a fixed address at compile time.

The accepted answer contains code that is incorrect and will invoke undefined behavior. Correction below:

#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)

struct buffer {
    float buff[BUFF_SIZE];
    unsigned writeIndex;
};

void write(struct buffer *buffer, float value) {
    buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}

float readn(struct buffer *buffer, unsigned Xn){
    return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}

The error in the original answer was to assume that 'int' will wrap around. Using binary masks with int is also unwise.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow