Question

Why in Cassandra keys usually defined as UUID. Looks like the key is generated on client side, so why not just store as string? What's benefit of storing specifically as UUID?

No correct solution

OTHER TIPS

One might have any key with Cassandra, a key is a bytearray anyway. If clients wants to have key like "foobar" or any other string of arbitrary length, there is nothing wrong with it. Cassandra client converts it into into an array of bytes before transmission to Cassandra server. Technically it will be stored as "foobar" on the server side.

There are other things one need to consider when deciding on key format:

  • Key length has direct impact on Cassandra performance. Keep them as short as is reasonable such that they can still be useful for required data access. A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs when designing keys. If you have long strings as keys, it might be a good idea to hash them into UUIDs.
  • Note that you can store UUID as human readable string which has UUID like 'f5606950-98d1-11e3-a5e2-0800200c9a66' but a way better idea is to use internal datatype that just uses 16 bytes to store it.
  • You need to make a decision whether to use the OrderedPreservingPartitioner or RandomPartitioner upfront, there are number of trade-offs, but what is most important is how it will affect key distribution across the cluster. One typically goes with OrderedPreservingPartitioner as it allows to have meaningful scans, depending on they key values it typically leads to hot/cold Cassandra Nodes. To help with that one again, either uses hash of the original key - UUID or prepend a real key with some UUID - .
  • How do you plan to access your keys, this goes from simple get, to slice and overly ignored delete, often people find that UUID is a good compromise
  • How do you plan to load-balance your data

Cassandra Keys can be defined as any type (or combination therof) so you aren't restricted to UUID.

But as to why you would use a UUID over a string:

A UUID is 128 bits. A string is variable length and the string hexadecimal representation of the UUID would require 32 characters. If you were using 16-bit unicode characters that means each key would require 512s bit or 4 times as much space.

This saves disk space when there are a large number of rows.

Down increase performance by reducing the amount of data to fetch off disk, when there is a large number if rows.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top