NoSQL Design Principles vs. Joins

https://stackoverflow.com/questions/23530070

17-07-2023
|

Question

I have a set of users (user_id, user_name), and a set of attributes (attribute_id, user_id, attribute_key, attribute_value), so that each user has a username and key-value attributes.

In SQL, when I want to view a user's attribute, I do SELECT attribute_key, attribute_value FROM users JOIN attributes USING (user_id) WHERE user_id == ?. In NoSQL, to my understanding, I create a collection/superkey users, with a document/key <user-id>, with a field/subkey <attribute-key>, which equals to <attribute-value>. For example:

{id: <user-id>, attributes: {<attribute-key>: <attribute-value>}}

or:

users:<user-id>:<attribute-key> = <attribute-value>

NoSQL intuitively stores the attributes in the user object, with its dynamic schemas. SQL, on the other hand, requires this counter-intuitive table holding ALL the attributes of ALL the users separately from the users, mapping them back with foreign keys, and so on. Once you get the idea, it's not that bad, but it's certainly less elegant.

However, if I want to view all the user's having a certain attribute (SELECT user_name FROM users JOIN attributes USING (user_id) WHERE attribute_key == '?'), or a specific attribute (SELECT user_name FROM users JOIN attributes USING (user_id) WHERE attribute_key == ? AND attribute_value == ?), I just can't think of a way to do so efficiently in NoSQL, because its design seems inherently "depth" oriented, versus the tabular, two dimensional nature of SQL.

What am I going to do? run over ALL the user objects, going over ALL their attributes? That doesn't sound too efficient, and NoSQL is all about efficiency (I even improvised some benchmarks, and MySQL scored better than MongoDB and Redis). But my design, or variants thereof, is really very basic and common, so I feel like I'm misunderstanding some basic NoSQL design principle. What would you do?

Solution

I don't know about Redis, but MongoDB allows you to put indexes on nested documents and even documents nested in arrays.

When you choose the first option, you would need to create an index for each attribute you want to ckeck for.

db.collection.ensureIndex({ attributes.is_admin: 1 },{ sparse:true });

Or for the second option:

db.collection.ensureIndex({ attribute_key: 1 },{ sparse:true });

When you do not know how many attributes you will have and want to create only one index which covers all current attributes and any new attributes you are going to add in the future, you should put the attributes into an array of key/value pairs:

{ id: <user-id>,
  attributes: [
      { key: "isAdmin", value:true },
      { key: "favoriteFood", value:"Waffles" },
      { key: "maximumSize", value:42 }
  ]
}

With that schema you could create an index like this:

db.collection.ensureIndex({ "attributes.key": 1 });

and you will be able to quickly find documents by attribute. You can also create an index like this:

db.collection.ensureIndex({ "attributes": 1});

and you can search for exact key/value pairs, but not for keys without supplying an exact value or values without keys. When you also want to search for keys only, for value-ranges or sort by value, use a compound-index like this:

db.collection.ensureIndex({ "attributes.key": 1, "attributes.value": 1 });

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow