Ropes: what's “large enough to benefit from cache effects”?

https://stackoverflow.com/questions/1320444

19-09-2019
|

Question

The main disadvantages are greater overall space usage and slower indexing, both of which become more severe as the tree structure becomes larger and deeper. However, many practical applications of indexing involve only iteration over the string, which remains fast as long as the leaf nodes are large enough to benefit from cache effects.

I'm implementing a sort of compromise between ropes and strings. Basically it's just ropes, except that I'm flattening concatenation objects into strings when the concatenated strings are short. There are a few reasons for this:

The benefits of concatenation objects are minimal when the concatenated strings are short (it doesn't take too long to concatenate two strings in their normal form).
Doing this reduces the largeness/depth of the tree (reducing the downsides of ropes).
Doing this increases the size of the leaf nodes (to take better advantage of cache).

However, as length gets longer, the advantages of the ropes also decrease, so I'd like to find some compromise. The "sweet spot" logically seems to be around where "the leaf nodes are large enough to benefit from cache effects". The problem is, I don't know how large that is.

EDIT: While I was writing this, it occurred to me that the ideal size would be the size of a cache page, because then the rope only causes cache misses when they would happen anyway in a string. So my second question is, is this reasoning correct? And is there a cross-platform way to detect the size of a cache page?

My target language is C++.

Solution

The limit case for a rope-like string would be built on top of a std::list<char>. That obviously isn't very effective. When iterating, you are likely to have have one cache miss per "leaf"/char. As the number of characters per leaf goes up, the average number of misses goes down, with a discontinuity as soon as your leaf allocation exceeds a single cache line.

It might still be a good idea to have larger leafs; memory transfers in cache hierarchies might have different granularities at different levels. Also, when targetting a mixed set of CPUs (i.e. consumer PCs) a leaf size which is a higher power of two will be an integral multiple of the cache line size on more machines. E.g. if you're addressing CPUs with 16 and 32 byte cache lines, 32 bytes would be the better choice, as it's an always integral number of cache lines. Wasting half a cache line is a shame.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow