Question

I am trying to load data from SQL to No-SQL i.e Cassandra. but somehow few rows are not matching. Can somebody tell me how to count the number of row keys for a particular column_family in Cassandra.

I tried get_count and get_multicount, but these methods require keys to passed, In my case i do not know the keys, Instead I need the row count of the row_keys. list column_family_name gives me the list but limited to only 100 rows. is there any way, I can override the 100 limit.

Was it helpful?

Solution

As far as I know, there is no way to get a row count for a column family. You have to perform a range query over the whole column family instead.

If cf is your column family, something like this should work:

num_rows = len(list(cf.get_range()))

However, the documentation for get_range indicates that this might cause issues if you have too many rows. You might have to do it in chunks, using start and row_count.

OTHER TIPS

You can count Cassandra rows without reading all rows.

See the implementation in Spark for cassandraCount() which does this quite efficiently.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top