Question

In this slides (after slide 15) it is suggested to use

void updateAims(float* aimDir, const AimingData* aim, vec3 target, uint count)
{
     for(uint i = 0; i < count; i++)
     {
          aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i];
     }
}

because it's more cache efficient.

What about if I have a class

class Bot
{
    vec3 position;
    float mod;
    float aimDir;

    void UpdateAim(vec3 target)
    {
         aimDir = dot3(position, target) * mod;
    }
 };

 void updateBots(Bots* pBots, uint count, vec3 target)
 {
      for(uint i = 0; i < count; i++)
            pBots[i]->UpdateAim(target);
  }

And I store all objects of that class in a single linear array.

Since they're all in the same array will there be cache misses? Why would the first approach be better?

Was it helpful?

Solution

Modern cache architectures are usually structured as lines of data, each large enough to hold several words; 64 bytes is a typical line size. When you try to read data that's not in the cache, a whole line is fetched, not just the word that you need. When writing, data in the cache is updated if it's there, but typically does not need to be fetched if it's not there.

In the first case, for every cache line of input data that's fetched, you'll use every single word of it. In the second, you'll only use some of the structure fields; fetching the others has wasted some bandwidth.

Specifically, you're fetching the old value of aimDir each time, which isn't needed for the calculation. In general, the "object" is likely to contain more fields, which you don't want for this particular calculation, wasting even more bandwidth as they are fetched into the cache and then ignored.

OTHER TIPS

The memory layout differs quite a lot and the benefit of the first approach will be destroyed if you use an array of Bots.

In the first approach, all the aimDir data is stored in a non-fragmented block of memory. So if you processed the first you can immediately proceed to the next item since it is stored in the next unit of memory.

If you have an array of Bots, then you have Bot objects stored in a non-fragmented block of memory. But the different aimDir data of two bots is now separated by other data by bot (positionand mod).

Graphically, the first approach (if one assumes arrays for position and mod too) looks like

[R] means random unknown data not related to bots

[R][position_0][position_1]...[position_n][R][mod_0][mod_1]...[mod_n][R][aimDir_0][aimDir_1]...[aimDir_n][R]

The second approach looks like this:

[R][[position_0],[mod_0],[aimDir_0]][[position_1][mod_1][aimDir_1]]...[[position_n][mod_n][aimDir_n]][R]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top