Question

I'm programming an embedded 32 system with a 32 kbyte 8-way set associative L2 instruction cache. To avoid cache thrashing we align functions in a way such that the text of a set of functions called at a high frequency (think interrupt code) ends up in separate cache sets. We do this by inserting dummy functions as needed, e.g.

void high_freq1(void)
{
   ...
}

void dummy(void)
{
   __asm__(/* Silly opcodes to fill ~100 to ~1000 bytes of text segment */);
}

void high_freq2(void)
{
   ...
}

This strikes me as ugly and suboptimal. What I'd like to do is

  • avoid __asm__ entirely and use pure C89 (maybe C99)
  • find a way to create the needed dummy() spacer that the GCC optimizer does not touch
  • the size of the dummy() spacer should be configurable as a multiple of 4 bytes. Typical spacers are 260 to 1000 bytes.
  • should be feasible for a set of about 50 functions out of a total of 500 functions

I'm also willing to explore entirely new techniques of placing a set of selected functions in a way so they aren't mapped to the same cache lines. Can a linker script do this?

Was it helpful?

Solution 2

Maybe linker scripts are the way to go. The GNU linker can use these I think... I've used LD files for the AVR and on MQX both of which we using GCC based compilers... might help...

You can define your memory sections etc and what goes where... Each time I come to write one its been so long since the last I have to read up again...

Have a search for SVR3-style command files to gem up.

DISCLAIMER: Following example for a very specific compiler... but the SVR3-like format is pretty general... you'll have to read up for your system

For example you can use commands like...

ApplicationStart = 0x...;
MemoryBlockSize = 0x...;
ApplicationDataSize  = 0x...;
ApplicationLength    = MemoryBlockSize - ApplicationDataSize;

MEMORY {
    RAM: ORIGIN = 0x...                LENGTH = 1M
    ROM: ORIGIN = ApplicationStart     LENGTH = ApplicationLength   
}

This defines three memory sections for the linker. Then you can say things like

SECTIONS
{
    GROUP :
    {       
        .text :
        {
            * (.text)
            * (.init , '.init$*')
            * (.fini , '.fini$*')
        }

        .my_special_text ALIGN(32): 
        {
            * (.my_special_text)
        } 

        .initdat ALIGN(4):
        // Blah blah
    } > ROM
    // SNIP
}

The SECTIONS command tells the linker how to map input sections into output sections, and how to place the output sections in memory... Here we're saying what is going into the ROM output section, which we defined in the MEMORY definition above. The bit possible of interest to you is .my_special_text. In your code you can then do things like...

__attribute__ ((section(".my_special_text")))
void MySpecialFunction(...)
{
    ....
}

The linker will put any function preceded by the __attribute__ statement into the my_special_text section. In the above example this is placed into ROM on the next 4 byte aligned boundary after the text section, but you can put it anyway you like. So you could make a few sections, one for each of the functions you describe, and make sure the addresses won't cause clashes...

You can the size and memory location of the section using linker defined variables of the form

extern char_fsection_name[]; // Set to the address of the start of section_name
extern char_esection_name[]; // Set to the first byte following section_name

So for this example...

extern char _fmy_special_text[]; // Set to the address of the start of section_name
extern char _emy_special_text[]; // Set to the first byte following section_name

OTHER TIPS

Use GCC's __attribute__(( aligned(size) )).

Or, pass -falign-functions=n on your GCC command line.

If you are willing to expend some effort, you can use

__attribute__((section(".text.hotpath.a")))

to place the function into a separate section, and then in a custom linker script explicitly place the functions.

This gives you a bit more fine-grained control than simply asking for the functions to be aligned, but requires more hand-holding.

Example, assuming that you want to lock 4KiB into cache:

SECTIONS {
    .text.hotpath.one BLOCK(0x1000) {
        *(.text.hotpath.a)
        *(.text.hotpath.b)
    }
}
ASSERT(SIZEOF(.text.hotpath.one) <= 0x1000, "Hot Path functions do not fit into 4KiB")

This will make sure the hot path functions a and b are next to each other and both fit into the same block of 4 KiB that is aligned on a 4 KiB boundary, so you can simply lock that page into the cache; if the code doesn't fit, you get an error.

You can even use

NOCROSSREFS(.text.hotpath.one .text)

to forbid hot path functions calling other functions.

Assuming you're using GCC and GAS, this may be a simple solution for you:

void high_freq1(void)
{
   ...
}
asm(".org .+288"); /* Advance location by 288 bytes */
void high_freq2(void)
{
   ...
}

You could, possibly, even use it to set absolute locations for the functions rather than using relative increments in address, which would insulate you from consequences due to the functions changing in size when/if you modify them.

It's not pure C89, for sure, but it may be less ugly than using dummy functions. :)

(Then again, it should be mentioned that linker scripts aren't standardized either.)

EDIT: As noted in the comments, it seems to be important to pass the -fno-toplevel-reorder flag to GCC in this case.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top