Question

I'm writing a file with user profiles into cassandra with 5M profiles. My write operation finished sucessfully. I want to count the number of rows in my column family.

Keyspace keyspaceOperator = HFactory.createKeyspace(KEY_SPACE, cluster);
CqlQuery<String,String,Long> cqlQuery = new CqlQuery<String,String,Long>(keyspaceOperator, se, se, new LongSerializer());
cqlQuery.setQuery("SELECT COUNT(*) FROM up");
QueryResult<CqlRows<String,String,Long>> result = cqlQuery.execute();
System.out.println(result.get().getAsCount());

But the following code prints me always 10000. What am I doing wrong? And how can I make this operation from cli?

Was it helpful?

Solution

You can't for now. There's a default limit of 10K rows per query. There's an open ticket for this (CASSANDRA-3702) but no fix as of yet.

OTHER TIPS

Only other alternative is to iterate via RangeSlicesQuery. I created a "census" program to count both rows and total columns; here's a version for long types. But, if this is a frequent activity, conventional wisdom seems to be to use a separate counter column to keep track; some discussion here.

You simply need to give a limit that's as large as you want to count. If you don't expect the count ever to go over 1e9, then do

SELECT COUNT(*) FROM up LIMIT 1000000000;

But be aware that COUNT (and RangeSlicesQuery too) are not at all performant, or even meant to be. They're essentially the same as a "sequential scan" in relational db parlance. A counter is a better way to address this sort of problem in a distributed system.

Please refer here for an example that does this.

You can freely use the code. Please note that Astyanax has been branched out of Hector and we are finding that it is a very good Cassandra client in Java.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top