Question

Trying to define some policy for keys in a key-value store (we are using Redis). The keyspace should be:

  • Shardable (can introduce more servers and spread out the keyspace between them)

  • Namespaced (there should be some mechanism to "group" keys together logically, for example by domain or associated concepts)

  • Efficient (try to use as little as possible space in the DB for keys, to allow for as much data as possible)

  • As collision-less as possible (avoid keys for two different objects to be equal)


Two alternatives that I have considered are these:

  1. Use prefixes for namespaces, separated by some character (like human_resources:person:<some_id>).The upside of this is that it is pretty scalable and easy to understand. The downside would be possible conflicts depending on the separator (what if id has the character : in it?), and possibly size efficiency (too many nested namespaces might create very long keys).

  2. Use some data structure (like Ordered Set or Hash) to store namespaces. The main drawback to this would be loss of "shardability", since the structure to store the namespaces would need to be in a single database.

Question: What would be a good way to manage a keyspace in a sharded setup? Should we use one these alternatives, or is there some other, better pattern that we have not considered?

Thanks very much!

Was it helpful?

Solution

The generally accepted convention in the Redis world is option 1 - i.e. namespaces separated by a character such as colon. That said, the namespaces are almost always one level deep. For example : person:12321 instead of human_resources:person:12321.

How does this work with the 4 guidelines you set?

Shardable - This approach is shardable. Each key can get into a different shard or same shard depending on how you set it up.

Namespaced Namespace as a way to avoid collisions works with this approach. However, namespaces as a way to group keys doesn't work out. In general, using keys as a way to group data is a bad idea. For example, what if the person moves from department to another? If you change the key, you will have to update all references - and that gets tricky.

Its best to ensure the key never changes for an object. Grouping can then be handled externally by creating a separate index.

For example, lets say you want to group people by department, by salary range, by location. Here's how you'd do it -

  1. Individual people go in separate hash with keys persons:12321
  2. Create a set for each group by - For example : persons_by:department - and only store the numeric identifiers for each person in this set. For example [12321, 43432]. This way, you get the advantages of Redis' Integer Set

Efficient The method explained above is pretty efficient memory wise. To save some more memory, you can compress the keys further on the application side. For example, you can store p:12321 instead of persons:12321. You should do this only if you have determined via profiling that you need such memory savings. In general, it isn't worth the cost.

Collision Free This depends on your application. Each User or Person should have a primary key that never changes. Use this in your Redis key, and you won't have collisions.

You mentioned two problems with this approach, and I will try to address them

What if the id has a colon?

It is of course possible, but your application's design should prevent it. Its best not to allow special characters in identifiers - because they will be used across multiple systems. For example, the identifier will very likely be a part of the URL, and colon is a reserved character even for urls.

If you really must allow special characters in your identifier, you would have to write a small wrapper in your code that encodes the special characters. URL encoding is perfectly capable of handling this.

Size Efficiency

There is a cost to long keys, however it isn't too much. In general, you should worry about the data size of your values rather than the keys. If you think keys are consuming too much memory, profile the database using a tool like redis-rdb-tools.

If you do determine that key size is a problem and want to save the memory, you can write a small wrapper that rewrites the keys using an alias.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top