DynamoDB partitioning with a numeric hash key. Does this key scheme keep uniform data access?

StackOverflow https://stackoverflow.com/questions/22831687

Pergunta

The documentation for Dynamodb is reasonably clear on how to evenly spread data across partitions by managing your hash/range key naming scheme.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.UniformWorkload

Due to this I tend to more often than not use unique alphanumeric hash keys. However in this instance we have a situation where the actual size of the key itself is of great importance since the hash key chosen in dynamodb will be replicated over and over again in various streams in redis.

Therefore we need a key which suits both dynamodb from a data access/partitioning point of view as well as redis from a pure key size point of view.

With this in mind we have decided to keep an incrementing counter in redis and use a single NUMBER hash key for dynamodb items. Incrementing the redis counter each time we insert a new item in to dynamodb.

These integer keys are very nicely compressed in redis and from our testing yield storage space improvements in excess of 300-400% over unique string based ID's (since these ID's could potentially be pushed into 100's of streams, all stored in redis lists/zsets.

To my understanding though, this is not good for dynamodb since a single incrementing hash key:

101
102
103
104

etc...

Would be slow on writes when inserting multiple items and given our access pattern, we would expect groups of these keys to be retrieved together.

In order to work around this we are thinking of concatenating a random number onto the end of the hash key.

(float)$itemId . '.' . mt_rand(0, 200)

Resulting in keys like so:

101.26
102.199
103.87
104.5

Using these keys we would still get the storage improvements in redis and we also manage to preserve the insertion order (meaning that we don't need to store a timestamp)...

However I am not completely clear on how dynamodb would manage and partition these.

So my question is, would single hash keys as shown above be optimal and encourage dynamodb to partition our table effectively and ultimately allow us to meet or throughput allocations.

Thanks in advance.

Foi útil?

Solução

The dynamo access speed depends on "key access patterns" (and not just keys being random)

Even if you are having incrementing keys it is ok, IF you are sure say 101 is accessed as frequently as 102 or 104. On the other hand, if you think 103 is going to be accessed "a lot more" than others, it causes problems (and then you will have to spread the 103 access across multiple keys by appending the random)

Quoting them:

For example, if a table has a very small number of heavily accessed hash key elements, possibly even a single very heavily used hash key element, traffic is concentrated on a small number of partitions – potentially only one partition.

To get the most out of DynamoDB throughput, build tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top