Question

I'm trying to figure out the best way to do a cache for resources. I am mainly looking for native C/C++/C++11 solutions (i.e. I don't have boost and the likes as an option).

What I am doing when retrieving from the cache is something like this:

Object *ResourceManager::object_named(const char *name) {
    if (_object_cache.find(name) == _object_cache.end()) {
        _object_cache[name] = new Object();
    }
    return _object_cache[name];
}

Where _object_cache is defined something like: std::unordered_map <std::string, Object *> _object_cache;

What I am wondering is about the time complexity of doing this, does find trigger a linear-time search or is it done as some kind of a look-up operation?

I mean if I do _object_cache["something"]; on the given example it will either return the object or if it doesn't exist it will call the default constructor inserting an object which is not what I want. I find this a bit counter-intuitive, I would have expected it to report in some way (returning nullptr for example) that a value for the key couldn't be retrieved, not second-guess what I wanted.

But again, if I do a find on the key, does it trigger a big search which in fact will run in linear time (since the key will not be found it will look at every key)?

Is this a good way to do it, or does anyone have some suggestions, perhaps it's possible to use a look up or something to know if the key is available or not, I may access often and if it is the case that some time is spent searching I would like to eliminate it or at least do it as fast as possible.

Thankful for any input on this.

Was it helpful?

Solution

The default constructor (triggered by _object_cache["something"]) is what you want; the default constructor for a pointer type (e.g. Object *) gives nullptr (8.5p6b1, footnote 103).

So:

auto &ptr = _object_cache[name];
if (!ptr) ptr = new Object;
return ptr;

You use a reference into the unordered map (auto &ptr) as your local variable so that you assign into the map and set your return value in the same operation. In C++03 or if you want to be explicit, write Object *&ptr (a reference to a pointer).

Note that you should probably be using unique_ptr rather than a raw pointer to ensure that your cache manages ownership.

By the way, find has the same performance as operator[]; average constant, worst-case linear (only if every key in the unordered map has the same hash).

OTHER TIPS

Here's how I'd write this:

auto it = _object_cache.find(name);
return it != _object_cache.end()
       ? it->second
       : _object_cache.emplace(name, new Object).first->second;

The complexity of find on an std::unordered_map is O(1) (constant), specially with std::string keys which have good hashing leading to very low rate of collisions. Even though the name of the method is find, it doesn't do a linear scan as you pointed out.

If you want to do some kind of caching, this container is definitely a good start.

Note that a cache typically is not just a fast O(1) access but also a bounded data structure. The std::unordered_map will dynamically increase its size when more and more elements are added. When resources are limited (e.g. reading huge files from disk into memory), you want a bounded and fast data structure to improve the responsiveness of your system.

In contrast, a cache will use an eviction strategy whenever size() reaches capacity(), by replacing the least valuable element.

You can implement a cache on top of a std::unordered_map. The eviction strategy can then be implemented by redefining the insert() member. If you want to go for an N-way (for small and fixed N) associative cache (i.e. one item can replace at most N other items), you could use the bucket() interface to replace one of the bucket's entries.

For a fully associative cache (i.e. any item can replace any other item), you could use a Least Recently Used eviction strategy by adding a std::list as a secondary data structure:

using key_tracker_type = std::list<K>; 
using key_to_value_type = std::unordered_map< 
    K,std::pair<V,typename key_tracker_type::iterator> 
>;

By wrapping these two structures inside your cache class, you can define the insert() to trigger a replace when your capacity is full. When that happens, you pop_front() the Least Recently Used item and push_back() the current item into the list.

On Tim Day's blog there is an extensive example with full source code that implements the above cache data structure. It's implementation can also be done efficiently using Boost.Bimap or Boost.MultiIndex.

The insert/emplace interfaces to map/unordered_map are enough to do what you want: find the position, and insert if necessary. Since the mapped values here are pointers, ekatmur's response is ideal. If your values are fully-fledged objects in the map rather than pointers, you could use something like this:

Object& ResourceManager::object_named(const char *name, const Object& initialValue) {
    return _object_cache.emplace(name, initialValue).first->second;
}

The values name and initialValue make up arguments to the key-value pair that needs to be inserted, if there is no key with the same value as name. The emplace returns a pair, with second indicating whether anything was inserted (the key in name is a new one) - we don't care about that here; and first being the iterator pointing to the (perhaps newly created) key-value pair entry with key equivalent to the value of name. So if the key was already there, dereferencing first gives the original Ojbect for the key, which has not been overwritten with initialValue; otherwise, the key was newly inserted using the value of name and the entry's value portion copied from initialValue, and first points to that.

ekatmur's response is equivalent to this:

Object& ResourceManager::object_named(const char *name) {
    bool res;
    auto iter = _object_cache.end();
    std::tie(iter, res) = _object_cache.emplace(name, nullptr);
    if (res) {
        iter->second = new Object(); // we inserted a null pointer - now replace it
    }
    return iter->second;
}

but profits from the fact that the default-constructed pointer value created by operator[] is null to decide whether a new Object needs to be allocated. It's more succinct and easier to read.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top