質問

Is there an expression of performance for an algorithm or data structure that attempts to consider cache and other hardware concerns?

Context: in my class we're looking at binary trees and it seems like it would be extremely difficult to optimize data retrieval with the structure. I know B-Trees address this problem, but I haven't seen a formal analysis of how much speed is improved considering some amount of cache.

役に立ちましたか?

解決

I've never applied it to rigorously prove the cost of some algorithm or data structure but I believe what you're looking for is the Idealized-cache model. The idea is to do Big O analysis where the cost is the number of cache block transfers instead of atomic machine instructions. E.g. Traversing an array under that model is O(n/B) where n is the number of elements and B is the cache block size. (Technically n is the number of bytes the array elements take up, but the difference between that and the number of elements is a constant factor, which Big O notation ignores.)

You might want to read the paper Cache Efficient Functional Algorithms, which applies this kind of analysis in the context of a purely functional language. I'm sure there's plenty of other papers as well; cache-efficient and cache-oblivious data structures and algorithms are an active research topic.

In a language with a generational garbage collector you make use of the heuristic that things that are allocated sequentially in time will generally end up adjacent in memory. The reason for this is that generational garbage collectors usually allocate sequentially in the youngest generation, and when a collection happens the live objects will be compacted and possibly moved to an older generation as well. When this happens the object graph will generally be traversed in depth-first or breadth-first order, but either way objects end up adjacent to other objects they point to.

If your language's garbage collector is non-generational and also non-moving/copying/compacting I don't know what you'd do, but then your performance is probably screwed either way since your memory is fragmented.

On the subject cache-efficient trees, you might want to read Improving RRB-Tree Performance through Transience which talks about a vector/list implementation using trees whose nodes have 31 or 32 children each.

他のヒント

What kind of formal analysis do you mean? If you look at Big O notation, the complexity does not change when you change a constant factor. What I mean is: a cache may be faster than RAM, but that does not change your complexity from O(n) to O(log n).

I think the problem is that things like cache optimization are so much dependent on the specific hardware that a formal analysis is not really useful.

ライセンス: CC-BY-SA帰属
所属していません softwareengineering.stackexchange
scroll top