Are there versions of the C++ STL's associative data structures optimized for numerous partial copies?

StackOverflow https://stackoverflow.com/questions/7651058

Pregunta

I have a large tree that grows as my algorithm progresses. Each node contains set, which I suppose is implemented as balanced binary search tree. Each node's set shall remain fixed after that node's creation, before its use in creating that node's children.

I fear however that copying each set is prohibitively expensive. Instead, I would prefer that each newly created node's set utilize all appropriate portions of the parent node's set. In short, I'm happy copying O(log n) of the set but not O(n).

Are there any variants of the STL's associative data structures that offer such an partial copy optimization? Perhaps in Boost? Such a data structure would be trivial to implement in Haskell or OCaML of course, but it'd require more effort in C++.

¿Fue útil?

Solución

I know it's not generally productive to suggest a different language, but Haskell's standard container libraries do exactly this. I remember seeing a video (was it Simon Peyton Jones?) talking about this exact problem, and how a Haskell solution ended up being much faster than a C++ solution for the given programmer effort. Of course, this was for a specific problem that had a lot of sets with a lot of shared elements.

There is a fair amount of research into this subject. If you are looking for keywords, I suggest searching for "functional data structures" instead of "immutable data structures", since most functional paradigms benefit from immutability in general. Structures such as finger tree were developed to solve exactly this problem.

I know of no C++ library that implements these data structures. There is nothing stopping you from reading the relevant papers (or the Haskell source code, which is about 1k lines for Data.Set including tests) and implementing it yourself, but I know that is not what you'd want to hear. You'd also need do some kind of reference counting for the shared nodes, which for such deep structures can have a higher overhead than even simple garbage collectors.

Otros consejos

It's practically impossible in C++, since the notion of an immutable container doesn't exist. You may know that you'll be making no changes, and that some sort of shared representation would be preferable, but the compiler and the library don't, and there's no way of communicating this to them.

Each node contains set, which I suppose is implemented as balanced binary search tree. Each node's set shall remain fixed after that node's creation, before its use in creating that node's children.

That's a pretty unique case. I would recommend using std::vector instead. (No really!) The code is creating the node can still use a set, and switching to a vector at the last second. However, the vector is smaller, and takes only a tiny number of memory allocations (one if you use reserve), making the algorithm much faster.

typedef unsigned int treekeytype;
typedef std::vector<unsigned int> minortreetype;
typedef std::pair<treekeytype, minortreetype> majornode
typedef std::set<treekeytype, minortreetype> majortype;
majortype majortree;

void func(majortype::iterator perform) {
     std::set<unsigned int> results;
     results.assign(perform->second.begin(), perform->second.end());
     majortree[perform->first+1].assign(results.begin(), results.end()); //the only change is here
     majortype::iterator next = majortree.find(perform->first+1);
     func(next);
}

You can even use std::lower_bound and std::upper_bound to still get O(log(n)) memory accesses since it's still sorted the same as the set was, so you won't lose any efficiency. It's pure gain as long as you don't need to insert/remove frequently.

I fear however that copying each set is prohibitively expensive.

If this fear is caused because each set contains mostly the same nodes as it's parents, and the data is costly (to copy or in memory, whichever), with only a few nodes changed, make the subtrees contain std::shared_pointers instead of the data themselves. This means the data itself will not get copied, only the pointers.

I realize this isn't what you were aiming at with the question, but as JamesKanze said, I know of no such container. Other than possibly a bizarre and dangerous use of the STL's rope class. Note that I said and meant STL, not the standard C++ library. They're different.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top