Should I use a map or a set if my key is part of my value?

https://stackoverflow.com/questions/12755395

05-07-2021
|

Domanda

In C++, I have an class which is ordered by its name which is a std::string. I wish to only have one per each unique name in either a std::map or std::set.

I could use a std::set since the operator< will order my instances by their name, however, I need to lookup an instance by its name. Using a map where the key is the name is straight forward, however, I could also use a set and construct a dummy instance of my class with the name I wish to lookup to locate in the set the actual instance of the class for the given name.

I imagine I should just go with the map to make the code straight forward, but wonder if there might be a way to go with the set since the key is effectively part of my object anyway and thus avoid some redundancy.

Is there a way to use the set and be able to locate objects by their key in a clean way or should I just use a map and be done with it?

Here is the class to be inserted (in draft form) and in each directory there is either a set or map of Node(s) keyed off the Node's name:

class Node {
public:
  Node(Directory &parent, const std::string &name)
    : _name(name),
      _parent(&parent),
      _isRoot(false) {
    if (name.empty()) {
      throw InvalidNodeNameError(name);
    }
  }

protected:
  // This is only used for the root directory:
  Node()
    : _name(""),
      _parent(0),
      _isRoot(true) {
  }
  Node(const std::string &name)
    : _name(name),
      _parent(0),
      isRoot(false) {
  }

public:
  virtual ~Node() {
    if (parent()) {
      parent()->remove(*this);
    }
  }

  bool operator<(const Node &rhs) const {
    return _name < rhs._name;
  }

  Directory *parent() const {
    return _parent;
  }
  void setParent(Directory *parent) {
    _parent = parent;
  }

  const std::string &name() const {
    return _name;
  }

  bool isRoot() const {
    return _isRoot;
  }

  std::string pathname() const {
    std::ostringstream path;

    if (parent()) {
      path << parent()->pathname() << '/';
    } else {
      path << '/';
    }
    path << name();

    return path.str();
  }

private:
  // Not defined:
  Node(const Node &rhs);
  Node &operator=(const Node &rhs);

private:
  std::string  _name;
  Directory   *_parent;
  const bool   _isRoot;

};

Soluzione

I think that because Node requires a reference to Directory during construction, making a dummy node to search your set by name will make the Node class more cluttered.

To use set you'd probably need to make a static Directory somewhere, and use that as a dummy reference in a new dummy constructor Node(const std::string&). If you don't declare that explicit you can use a string directly in your call to set::find.

You could instead convert the class to use pointers... But that would change its internal semantics: Directory& is always valid, whereas Directory* doesn't have to be. Ask yourself whether you want to make the semantics less clear to the reader simply because of your preference for the set container.

So my opinion is pretty clear in this case... You have a choice: Either use map and keep your class clean, or use set and write a bit of supporting junk code that has no use for anything else. =)

Altri suggerimenti

Actually, you can just use map<std::string&, Node>, at the cost of one extra pointer, but I think you probably knew that, and it requires some mucking about to get what you want.

I've always thought it was a real pain that std::set didn't come with an explicit KeyExtractor template parameter, particularly since every implementation I've seen uses one of those under the hood in order to not duplicate code between (multi)maps and (multi)sets. Here's a quick and dirty hack, not close to complete, which exposes some of the mechanics of the GNU standard C++ library in order to create a "keyed_set" container:

// Deriving from the tree is probably not a good idea, but it was easy.

template<typename Key, typename Val, typename Extract,
         typename Compare = std::less<Key>, typename Alloc = std::allocator<Val>>
class keyed_set : public std::_Rb_tree<Key, Val, Extract, Compare, Alloc> {
  using Base = std::_Rb_tree<Key, Val, Extract, Compare, Alloc>;

  public:
    template<typename ...Args>
    auto insert(Args... args)
         ->decltype(Base()._M_insert_unique(std::declval<Args>()...)) {
      return this->_M_insert_unique(args...);
    }

    typename Base::iterator insert(typename Base::const_iterator i,
                                   const Val& val) {
      return this->_M_insert_unique_(i, val);
    }

    Val& operator[](const Key& key) {
      auto i = this->lower_bound(key);
      if (i == this->end() || this->key_comp()(key, Extract()(*i))) {
        i = this->_M_insert_unique_(i, Val(key));
      }
      return *i;
    }
};

To make this work, you need to provide a Key Extractor, like this:

template<class T>
struct KeyExtractor;

template<>
struct KeyExtractor<Node> {
  const std::string& operator()(const Node& n) { return n.name(); }
};

To get my version of operator[] to work, you need the value type to have a constructor which takes its key type as an argument.

I left out lots of stuff (erase, for example); but it was good enough to do a simple test.

It would probably have been better to default the key type from the return type of the KeyExtractor, but that would have involved putting the template arguments in a different order, and I already wasted too much time not noticing that _M_insert_unique and _M_insert_unique_ are spelled differently (presumably to avoid template instantiation problems.)

Here's the example I used to check to make sure that works; MyKeyedClass has a name, with a vector of strings, and a double associated with each one. (There's no sublime purpose.)

int main(void) {
  keyed_set<std::string, MyKeyedClass, KeyExtractor<MyKeyedClass>> repo;
  for (std::string name, val; std::cin >> name >> val; ) {
    try {
      size_t end;
      double d = std::stod(val, &end);
      if (end != val.size())
        throw std::invalid_argument("trailing letters");
      repo[name].increment(d);
    } catch (std::invalid_argument(e)) {
      repo[name].push(val);
    } catch (std::out_of_range(e)) {
      std::cerr << "You managed to type an out of range double" << std::endl;
    }
  }
  std::for_each(repo.begin(), repo.end(),
                [](MyKeyedClass& c){ std::cout << c << std::endl; });
  return 0;
}

I implement such classes through a proxy for key, for example in case of std::string I have a class called lightweight_string, that implement operator < and internally it point to an std::string then I use map and have both simplicity of using map and performance of not having 2 version of key.

For your very case check if your compiler is old enough to still implement std::string with COW (copy on write) strategy. This changed in C++11, but old compiler versions still are COWs... This has the advantage that it will cost you almost nothing to have map with string as key and as part of value. But be aware this will change in future (or already changed)...

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow