Question

I have a requirement to update all users with a specific value in a job.

i have million of users in my Cassandra database. is it okay to query million user first and do some kind of batch update? or is there some implementation available to do these kind of work. I am using hector API to interact with Cassandra. What can be the best possible way to do this.?

Was it helpful?

Solution

You never want to fetch 1 million users and keep them locally. Ideally you want to iterate over all those user keys using a range query. Hector calls this RangeSliceQuery. There is a good example here:

http://irfannagoo.wordpress.com/2013/02/27/hector-slice-query-options-with-cassandra/

For start and end key use null and add this also:

rangeQuery.setRowCount(100) to fetch 100 rows at a time.

Do this inside a loop. The first time you fetch with null being start and end key, the last key you get from the first result set should be the start key of your next query. And you continue paginating like this.

You can then use batch mutate and update in batches.

http://hector-client.github.io/hector/source/content/API/core/1.0-1/me/prettyprint/cassandra/service/BatchMutation.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top