Having a composite key for hash map in c++

https://stackoverflow.com/questions/9532289

03-12-2019
|

Question

I have a data structure which has,

<Book title>, <Author>, and <rate>

Since Book title or Author can be duplicated, I'd like to build a composite key. (let's say I cannot make extra unique key, such as ID)

Since data is pretty huge, I'm using GCC unordered_map for the sake of speed, and I built my structure like this:

typedef pair<string, string> keys_t
typedef unordered_map<keys_t, double> map_t;

Everything works okay in general, But the problem happens when I want to refer one specific key.

For example, let's suppose I'd like to find the best-rated book among books titled as "math", or I'd like to find the average rate of "Tolstoy"'s books.
In this case, this becomes very bothersome, since I cannot only refer only one of the key pair.

I happened to find boost::multi_index but I'm having some trouble understanding the documents. Does anyone have some idea or guideline for this?

Solution to make multiple indices, succinct example for multi_index, any other approach, etc.. any help will be appreciated.

Thank you.

Solution

I figured out how to use boost::multi_index I referred this code: Boost multi_index composite keys using MEM_FUN

and here's my code for your reference.

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/mem_fun.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/composite_key.hpp>
#include <boost/multi_index/member.hpp>
#include <iostream>
#include <string>

using namespace boost::multi_index;
using namespace std;

class Book {
public:
    Book(const string &lang1, const string &lang2, const double &value) : m_lang1(lang1) , m_lang2(lang2) , m_value(value) {}

    friend std::ostream& operator << (ostream& os,const Book& n)    {
        os << n.m_lang1 << " " << n.m_lang2 << " " << n.m_value << endl;
        return os;
    }

    const string &lang1() const { return m_lang1; }
    const string &lang2() const { return m_lang2; }
    const double &value() const { return m_value; }
private:
    string m_lang1, m_lang2;
    double m_value;
};

// These will be Tag names
struct lang1 {};
struct lang2 {};
struct value {};

typedef multi_index_container <
    Book, 
    indexed_by<
        ordered_non_unique<tag<lang1>, BOOST_MULTI_INDEX_CONST_MEM_FUN( Book, const string &, lang1)
        >,
        ordered_non_unique<tag<lang2>, BOOST_MULTI_INDEX_CONST_MEM_FUN(Book, const string &, lang2)
        >,
        ordered_non_unique<tag<value>, BOOST_MULTI_INDEX_CONST_MEM_FUN(Book, const double &, value), greater<double>
        >,
        ordered_unique<
            // make as a composite key with Title and Author
            composite_key<
                Book,
                BOOST_MULTI_INDEX_CONST_MEM_FUN(Book, const string &, lang1),
                BOOST_MULTI_INDEX_CONST_MEM_FUN(Book, const string &, lang2)
            >
        >
    >
> Book_set;

// Indices for iterators
typedef Book_set::index<lang1>::type Book_set_by_lang1;
typedef Book_set::index<lang2>::type Book_set_by_lang2;
typedef Book_set::index<value>::type Book_set_by_value;

int main() {

    Book_set books;
    books.insert(Book("Math", "shawn", 4.3));
    books.insert(Book("Math", "john", 4.2));
    books.insert(Book("Math2", "abel", 3.8));
    books.insert(Book("Novel1", "Tolstoy", 5.0));
    books.insert(Book("Novel1", "Tolstoy", 4.8)); // This will not be inserted(duplicated)
    books.insert(Book("Novel2", "Tolstoy", 4.2));
    books.insert(Book("Novel3", "Tolstoy", 4.4));
    books.insert(Book("Math", "abel", 2.5));
    books.insert(Book("Math2", "Tolstoy", 3.0));

    cout << "SORTED BY TITLE" << endl;
    for (Book_set_by_lang1::iterator itf = books.get<lang1>().begin(); itf != books.get<lang1>().end(); ++itf)
        cout << *itf;

    cout << endl<<"SORTED BY AUTHOR" << endl;
    for (Book_set_by_lang2::iterator itm = books.get<lang2>().begin(); itm != books.get<lang2>().end(); ++itm)
        cout << *itm;

    cout << endl<<"SORTED BY RATING" << endl;
    for (Book_set_by_value::iterator itl = books.get<value>().begin(); itl != books.get<value>().end(); ++itl)
        cout << *itl;

    // Want to see Tolstoy's books? (in descending order of rating)
    cout << endl;
    Book_set_by_lang2::iterator mitchells = books.get<lang2>().find("Tolstoy");
    while (mitchells->lang2() == "Tolstoy")
        cout << *mitchells++;

    return 0;
}

Thank you all who made comments!

OTHER TIPS

What I did in a similar case was use a single container to contain the objects and separate std::multiset<ObjectType const*, CmpType> for each possible index; when inserting, I'd do a push_back, then recover the address from back(), and insert it into each of the std::set. (std::unordered_set and std::unordered_multiset would be better in your case: in my case, not only was the order significant, but I didn't have access to a recent compiler with unordered_set either.)

Note that this supposes that the objects are immutable once they are in the container. If you're going to mutate one of them, you probably should extract it from all of the sets, do the modification, and reinsert it.

This also supposes that the main container type will never invalidate pointers and references to an object; in my case, I knew the maximum size up front, so I could do a reserve() and use std::vector. Failing this, you could use std::deque, or simply use an std::map for the primary (complete) key.

Even this requires accessing the complete element in the key. It's not clear from your posting whether this is enough—“books titled math” makes me thing that you might need a substring search in the title (and should “Tolstoy” match “Leo Tolstoy”?). If you want to match an arbitrary substring, either your multiset will be very, very large (since you'll insert all possible substrings as entries), or you'll do a linear search. (On a long running system where the entries aren't changing, it might be worth compromizing: do the linear search the first time the substring is requested, but cache the results in a multiset, so that the next time, you can find them quickly. It's likely that people will often use the same substrings, e.g. “math” for any book with “math” in its title.)

There is an article on the same subject: http://marknelson.us/2011/09/03/hash-functions-for-c-unordered-containers/

The author, Mark Nelson, was trying to do the similar : "use of a simple class or structure to hold the person’s name", basically he's using a pair as key (just like you) for his unordered_map:

typedef pair<string,string> Name;

int main(int argc, char* argv[])
{
    unordered_map<Name,int> ids;
    ids[Name("Mark", "Nelson")] = 40561;
    ids[Name("Andrew","Binstock")] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first.first
        << " "
        << ii->first.second
        << " : "
        << ii->second
        << endl;
        return 0;
}

He realized that the unordered_map doesn’t know how to create a hash for the given key type of std::pair. So he demonstrates 4 ways of creating a hash function for use in unordered_map.

If it is an infrequent operation you can search for the value.

for(auto& p : m)
{
     if(p.second.name==name_to_find)
     {
          //you now have the element
     }
}

however if the map is large this will be problematic because it will be a linear procedure rather than O(log n), this is a problem because maps are inherently slow.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow