Pergunta

I'm looking to get a list of redis keys as efficient as possible. We can model this on the redis server however we like so this is as much about the proper way to do it as it is solving the problem. Let me describe the situation.

Assume a large set of "Customers" that are stored in Redis as Strings.

customer__100000
customer__100001
customer__100002

Each customer has quite a few attributes. Of those would be the city that they live in. Each city is also stored in Redis.

city__New York
city__San Francisco
city__Washington DC

Through a different proces I will end up with a set of customer keys (intersecting sets for a pre-filter.) Once I have those keys I need to find out which distinct cities I have within those customers. My end goal here is to get the names of the cities, however if I can get keys which I can pull the city names that's fine as well.

To give an idea for the scale I'm talking about here assume we're dealing with 200-300k customers with around 70 attributes (city being one of them) and each attribute anywhere from 50 - 100,000. I'd like to keep it as efficient as possible.

Foi útil?

Solução

Instead of storing customers as strings you should store them as hashes. Redis' ziplist encoding for hashes is very space efficient. If you are storing more than 70 elements you should consider raising the hash-max-ziplist-entries limit in your redis.conf

You can do fun things with SORT when you are using Redis hashes. By using SORT with GET and STORE you can get all the cities from your customers and store them as a list (not distinct). You can then convert the list to a set by calling lpop and sadd over the list.

Here is an example Redis Lua script:

-- a key which holds a set of customer keys
local set_of_customer_keys = KEYS[1]
-- a maybe-existing key which will hold the set of cities
local distinct_set = ARGV[1]
-- attribute to get (defaults to city)
local attribute = ARGV[2] or 'city'
-- remove current set of distinct_cities
redis.call("DEL", distinct_set)
-- use SORT to build a list out of customer hash values for `attribute` 
local cities = redis.call("SORT", set_of_customer_keys, "BY", "nosort", "GET", "*->"..attribute)
-- loop through all cities in the list and add them to the distinct cities set
for i, city in pairs(cities) do
  redis.call("SADD", distinct_set, city)
end
-- return the distinct cities
return redis.call("SMEMBERS", distinct_set)

You could also keep a customer__100000__cities set that is stored permanently along with the customer's attributes and use sinter *customer_cities_keys to get a distinct set of cities, but that would be less memory efficient.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top