One might have any key with Cassandra, a key is a bytearray
anyway. If clients wants to have key like "foobar" or any other string of arbitrary length, there is nothing wrong with it. Cassandra client converts it into into an array of bytes before transmission to Cassandra server. Technically it will be stored as "foobar" on the server side.
There are other things one need to consider when deciding on key format:
- Key length has direct impact on Cassandra performance. Keep them as short as is reasonable such that they can still be useful for required data access. A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs when designing keys. If you have long strings as keys, it might be a good idea to hash them into UUIDs.
- Note that you can store UUID as human readable string which has UUID like 'f5606950-98d1-11e3-a5e2-0800200c9a66' but a way better idea is to use internal datatype that just uses 16 bytes to store it.
- You need to make a decision whether to use the OrderedPreservingPartitioner or RandomPartitioner upfront, there are number of trade-offs, but what is most important is how it will affect key distribution across the cluster. One typically goes with OrderedPreservingPartitioner as it allows to have meaningful scans, depending on they key values it typically leads to hot/cold Cassandra Nodes. To help with that one again, either uses hash of the original key - UUID or prepend a real key with some UUID - .
- How do you plan to access your keys, this goes from simple
get
, toslice
and overly ignoreddelete
, often people find that UUID is a good compromise - How do you plan to load-balance your data