cassandra supercolumn data from one partition or multiple?

https://stackoverflow.com/questions/8360686

27-10-2019
|

Question

Assume I have a supercolumn family. Also assume I have multiple partitions running on different machine instances. My supercolumn family data looks like the following (for one row)

RowKey: 4818d991-9df5-4899-aa07-461f4ed19996
=> (super_column=4dddb83e-4096-428d-8d1b-8b0235ae772f,
     (column=1322847333862, value=, timestamp=1322847333863001)
     (column=1322847637237, value=, timestamp=1322847637237000)
     (column=1322847837206, value=, timestamp=1322847837206001)
     (column=1322848197819, value=, timestamp=1322848197819000))

Now I am wondering, if I do a query against the supercolumn family for returning the data/sub-columns for the {row, super_column}, will all the values of those sub-columns coming back from one machine or different machine? Basically this is a question of - is partitioning happens at row level or super_column level or sub-column level? And also, even just back from one machine, will the order of each sub-columns be returned as is, just as what shown above?

Solution

Partitioning is done at the row level, i.e. the entire row is stored on a single machine (possibly with copies on other machines, depending on your replication factor).

Subcolumns are stored in sorted order according to the column names - a super column can specify a comparator on both the super column name as well as on the sub-column names. See http://www.datastax.com/docs/0.8/ddl/column_family

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow