Is storing 86k super columns (with 1-10 small columns each) per row a good idea in Cassandra?

StackOverflow https://stackoverflow.com/questions/8957430

문제

tldr: Is ~90,000 super columns with 1 to 10 columns each too many in one row? How about ~1500? Column values are about 6 bytes each.

full question:

I am researching various data stores for time series data. Column oriented databases such as Cassandra and HBase look to be a very good fit.

The requirements are to store millions of series of data coming it at (minimum) a 1 minute interval. Ideally we would be able to support a 1 second interval if the business needs demand it (they might probably will).

The advice offered in this blog post as well as used by OpenTSDB make a ton of sense.

Essentially keys are the series id concatenated to the first time stamp of the day, columns are created for each measurement in the day. That is about 86400 columns per row.

However immutability/versioning of the data is quite important. Business needs dictate the ability to update series values while retaining full history of the data.

Exploring Cassandra's super columns to provide another dimension in order to version the values results in 86400 super columns. Each super column would then contain one column when the value is first created (possibly a TimeUUID), then have one more column added on each "update". Updates will occur to a regularly to limited subsets of series and values. Under ideal conditions there will be no updates. Ideally this means each super column does not have a huge amount of data to load, and most access will be only to the most recent value.

So to come back to the question:

Is there a performance hit or issue I am over looking for using that many (86k) super columns per row?

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top