When to use a key-value data store vs. a more traditional relational DB?

https://stackoverflow.com/questions/1500611

19-09-2019
|

Question

When would one choose a key-value data store over a relational DB? What considerations go into deciding one or the other? When is mix of both the best route? Please provide examples if you can.

Solution

In my experience, if you're even asking the question whether to use traditional vs esoteric practices, then go traditional. While esoteric practices are sexy, challenging, and fun, 99.999% of applications call for a traditional approach.

With regards to relational vs KV, the question you should be asking is:

Why would I not want to use a relational model for this scenario: ...

Since you have not described the scenario, it's impossible for anyone to tell you why you shouldn't use it. The "catch all" reason for KV is scalability, which isn't a problem now. Do you know the rules of optimization?

Don't do it.
(for experts only) Don't do it now.

KV is a highly optimized solution to scalability that will most likely be completely unecessary for your application.

OTHER TIPS

Key-value, heirarchical, map-reduce, or graph database systems are much closer to implementation strategies, they are heavily tied to the physical representation. The primary reason to choose one of these is if there is a compelling performance argument and it fits your data processing strategy very closely. Beware, ad-hoc queries are usually not practical for these systems, and you're better off deciding on your queries ahead of time.

Relational database systems try to separate the logical, business-oriented model from the underlying physical representation and processing strategies. This separation is imperfect, but still quite good. Relational systems are great for handling facts and extracting reliable information from collections of facts. Relational systems are also great at ad-hoc queries, which the other systems are notoriously bad at. That's a great fit in the business world and many other places. That's why relational systems are so prevalent.

If it's a business application, a relational system is almost always the answer. For other systems, it's probably the answer. If you have more of a data processing problem, like some pipeline of things that need to happen and you have massive amounts of data, and you know all of your queries up front, another system may be right for you.

If your data is simply a list of things and you can derive a unique identifier for each item, then a KVS is a good match. They are close implementations of the simple data structures we learned in freshman computer science and do not allow for complex relationships.

A simple test: can you represent your data and all of its relationships as a linked list or hash table? If yes, a KVS may work. If no, you need an RDB.

You still need to find a KVS that will work in your environment. Support for KVSes, even the major ones, is nowhere near what it is for, say, PostgreSQL and MySQL/MariaDB.

A traditional relational database has problems scaling beyond a point. Where that point is depends a bit on what you are trying to do.

All (most?) of the suppliers of cloud computing are providing key-value data stores.

However, if you have a reasonably sized application with a complicated data structure, then the support that you get from using a relational database can reduce your development costs.

IMO, Key value pair (e.g. NoSQL databases) works best when the underlying data is unstructured, unpredictable, or changing often. If you don't have structured data, a relational database is going to be more trouble than its worth because you will need to make lots of schema changes and/or jump through hoops to conform your data to the structure.

KVP / JSON / NoSql is great because changes to the data structure do not require completely refactoring the data model. Adding a field to your data object is simply a matter of adding it to the data. The other side of the coin is there are fewer constraints and validation checks in a KVP / Nosql database than a relational database so your data might get messy.

There are performance and space saving benefits for relational data models. Normalized relational data can make understanding and validating the data easier because there are table key relationships and constraints to help you out.

One of the worst patterns i've seen is trying to have it both ways. Trying to put a key-value pair into a relational database is often a recipe for disaster. I would recommend using the technology that suits your data foremost.

If you want O(1) lookups of values based on keys, then you want a KV store. Meaning, if you have data of the form k1={foo}, k2={bar}, etc, even when the values are larger/ nested structures, and want fast lookups, you want a KV store. Even with proper indexing, you cannot achieve O(1) lookups in a relational DB for arbitrary keys. Sometimes this is referred to as "random lookups".

Alliteratively stated, if you only ever query by one column, a "primary key" if you will, to retrieve the rest of the data, then using that column as a keyspace and the rest of the data as a value in a KV store is the most efficient way to do lookups.

In contrast, if you often query the data by any of several columns, aka you support a richer query API for the data, then you may want a relational database.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow