Question

Our column names are purposefully picked so that they are returned in the order we want (the key is basically an internal sequence number). Our rowkeys are similarly ordered. Basically one rowkey per day, all columns for the day added to that row.

Given that, how do I create a query in Hector to return me the most recent column from the most recent row? Or the oldest? In a nutshell, the two most common queries are "get me the most recent entry" and "get me the oldest entry".

I'm not familiar enough with Cassandra or Hector to puzzle out the correct query though. It should look something like this?

QueryResult<OrderedRows<String, String, Long>> result = 
  rangeSlicesQuery.setColumnFamily(cf).setKeys("", "").setRowCount(1).setRange("","",true,1).execute();

Since the column names are dynamically generated values and I have no idea when the last or first value is, I don't see any way of getting around the open ended values for keys and column ranges. Hopefully Hector/Cassandra is smart enough to do this quickly or is there some optimization I should make?

Was it helpful?

Solution

You want to make sure that the columns are ordered in reverse, that way you can do a slice query on the row with the limit 1 and get only the most recent value. If you don't have reverse ordering you need to read the whole row.

As to how to get the most recent row there's no way to find it in one query. One way is to maintain an index of all your rows (again in reverse order so you can use the same trick to pick the most recent one), and hit this index first, then the row. Another way is if you have a rough idea of its value, and you can predict the sequence of values (which it sounds like you can, there should be one row per day, and there should be no rows later than today), then you pick the latest possible value, and try to load that one, if you don't get anything back you try to load the next most recent value, and so on. If it's unlikely that you will get a hit on your first try (for example if there isn't a row for every day, just for most days) you can query for something like five or ten values at a time, and just pick the most recent you get back (and repeat if you get nothing back).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top