Question

I am wondering if there is a way to get previous versions of a particular rowkey in HBase without having to write a MapReduce program and average the values out. I was curious whether this was possible using Hive or Impala (or another similar program) and how you would do this.

My table looks like this:

  Composite keys          Values 
  (md5 + date + id) | (value)

I'd like to average all the values for the particular date and a sub string of the id ("411") for all versions.

Thanks ahead of time.

Was it helpful?

Solution

Impala uses the Hive metastore to map its logical notion of a table onto data physically stored in HDFS or HBase (for more details, see the Cloudera documentation).

To learn more about how to tell the Hive metastore about data stored in HBase, see the Hive documentation.

Unfortunately, as noted in the Hive documentation linked above:

there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp

There was some work done to add this feature against an older version of Hive in HIVE-2828, though unfortunately that work has not yet been merged into trunk.

So for your application you'll have to redesign your HBase schema to include a "version" column, tell the Hive metastore about this new column, and make your application aware of this column.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top