Cache prefetching scenario - power architecture

https://stackoverflow.com/questions/17110456

31-05-2022
|

Вопрос

I'm using the asm dcbt command to touch a range of memory I know will be required for performing certain computations onto. My profiler shows a pattern of cache misses because of the sporadic access to elements inside this range (4 touched, 5 skipped and so on - producing a cache miss each 5th operation).

There is a function A() that has access to the exact range and its size. This A() function is called before another section that will also touch and use data from the range A() utilizes. Can I just use dcbt inside A() and then expect an improvement in B(), or do I have to use dcbt on the range in the same function that utilizes that collection of data?

Решение

Assuming ALL the data used in A() fits in the cache, you should see improvement in B() too. However, you can also end up reading data into the cache that isn't being used, which serves no purpose to anything, and just causes the memory bus to be busy when it could be used to load some ACTUAL data that is needed, if your pattern is as sporadic as you say. By all means give it a try, but don't expect it to magically work effectively - it often takes a bit of "tuning" - particularly with regard to "how far ahead of where you are right now do you read the data".

Depending on the exact behaviour of A() and B(), for example if you are switching between reads and writes, and reading from one section and writing to a completely different section, batching up the writes to a "holding area", which is then copied to RAM is often a good plan - make the holding area something like 1/8-1/4 of the L1 cache.

[Caveat: I've got absolutely no experience at all with PowerPC architecture, but I have used cache prefetching and other memory optimisation techniques in my work with x86 processors, with some success at times, not so much success at other times]

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow