Question

I've got some C code I'm targeting for an AVR. The code is being compiled with avr-gcc, basically the gnu compiler with the right backend.

What I'm trying to do is create a callback mechanism in one of my event/interrupt driven libraries, but I seem to be having some trouble keeping the value of the function pointer.

To start, I have a static library. It has a header file (twi_master_driver.h) that looks like this:

#ifndef TWI_MASTER_DRIVER_H_
#define TWI_MASTER_DRIVER_H_

#define TWI_INPUT_QUEUE_SIZE 256

// define callback function pointer signature
typedef void (*twi_slave_callback_t)(uint8_t*, uint16_t);

typedef struct {
    uint8_t buffer[TWI_INPUT_QUEUE_SIZE];
    volatile uint16_t length; // currently used bytes in the buffer
    twi_slave_callback_t slave_callback;
} twi_global_slave_t;

typedef struct {
    uint8_t slave_address;
    volatile twi_global_slave_t slave;
} twi_global_t;

void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback);

#endif

Now the C file (twi_driver.c):

#include <stdint.h>
#include "twi_master_driver.h"

void twi_init(uint8_t slave_address, twi_global_t *twi, twi_slave_callback_t slave_callback)
{
    twi->slave.length = 0;
    twi->slave.slave_callback = slave_callback;

    twi->slave_address = slave_address;

    // temporary workaround <- why does this work??
    twi->slave.slave_callback = twi->slave.slave_callback;
}

void twi_slave_interrupt_handler(twi_global_t *twi)
{
    (twi->slave.slave_callback)(twi->slave.buffer, twi->slave.length);

    // some other stuff (nothing touches twi->slave.slave_callback)
}

Then I build those two files into a static library (.a) and construct my main program (main.c) #include #include #include #include #include "twi_master_driver.h"

//  ...define microcontroller safe way for mystdout ...

twi_global_t bus_a;

ISR(TWIC_TWIS_vect, ISR_NOBLOCK)
{
    twi_slave_interrupt_handler(&bus_a);
}

void my_callback(uint8_t *buf, uint16_t len)
{
    uint8_t i;

    fprintf(&mystdout, "C: ");
    for(i = 0; i < length; i++)
    {
        fprintf(&mystdout, "%d,", buf[i]);
    }
    fprintf(&mystdout, "\n"); 
}

int main(int argc, char **argv)
{
    twi_init(2, &bus_a, &my_callback);

    // ...PMIC setup...

    // enable interrupts.
    sei();

    // (code that causes interrupt to fire)

    // spin while the rest of the application runs...
    while(1){
        _delay_ms(1000);
    }
    return 0;
}

I carefully trigger the events that cause the interrupt to fire and call the appropriate handler. Using some fprintfs I'm able to tell that the location assigned to twi->slave.slave_callback in the twi_init function is different than the one in the twi_slave_interrupt_handler function.

Though the numbers are meaningless, in twi_init the value is 0x13b, and in twi_slave_interrupt_handler when printed the value is 0x100.

By adding the commented workaround line in twi_driver.c:

twi->slave.slave_callback = twi->slave.slave_callback;

The problem goes away, but this is clearly a magic and undesirable solution. What am I doing wrong?

As far as I can tell, I've marked appropriate variables volatile, and I've tried marking other portions volatile and removing the volatile markings. I came up with the workaround when I noticed removing fprintf statements after the assignment in twi_init caused the value to be read differently later on.

The problem seems to be with how I'm passing around the function pointer -- and notably the portion of the program that is accessing the value of the pointer (the function itself?) is technically in a different thread.

Any ideas?

Edits:

  • resolved typos in code.

  • links to actual files: http://straymark.com/code/ [test.c|twi_driver.c|twi_driver.h]

  • fwiw: compiler options: -Wall -Os -fpack-struct -fshort-enums -funsigned-char -funsigned-bitfields -mmcu=atxmega128a1 -DF_CPU=2000000UL

  • I've tried the same code included directly (rather than via a library) and I've got the same issue.

Edits (round 2):

  • I removed all the optimizations, without my "workaround" the code works as expected. Adding back -Os causes an error. Why is -Os corrupting my code?
Was it helpful?

Solution

Just a hunch, but what happens if you switch these two lines around:

twi->slave.slave_callback = slave_callback;
twi->slave.length = 0;

Does removing the -fpack-struct gcc flag fix the problem? I wonder if you haven't stumbled upon a bug where writing that length field is overwriting part of the callback value.


It looks to me like with the -Os optimisations on (you could try combinations of the individual optimisations enabled by -Os to see exactly which one is causing it), the compiler isn't emitting the right code to manipulate the uint16_t length field when its not aligned on a 2-byte boundary. This happens when you include a twi_global_slave_t inside a twi_global_t that is packed, because the initial uint8_t member of twi_global_t causes the twi_global_slave_t struct to be placed at an odd address.

If you make that initial field of twi_global_t a uint16_t it will probably fix it (or you could turn off struct packing). Try the latest gcc build and see if it still happens - if it does, you should be able to create a minimal test case that shows the problem, so you can submit a bug report to the gcc project.

OTHER TIPS

This really sounds like a stack/memory corruption issue. If you run avr-size on your elf file, what do you get? Make sure (data + bss) < the RAM you have on the part. These types of issues are very difficult to track down. The fact that removing/moving unrelated code changes the behavior is a big red flag.

Replace "&my_callback" with "my_callback" in function main().

Because different threads access the callback address, try protecting it with a mutex or read-write lock.

If the callback function pointer isn't accessed by a signal handler, then the "volatile" qualifier is unnecessary.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top