Question

We are using HBase for storing data which is sqooped from oracle to hdfs. Here we designed the row key as byte array value. Row key is framed with composite key like (Md5(schema name).getBytes() + Md5(date (format = yyyy-mm-dd)).getBytes() + ByteBuffer.allocate(8).putLong(pkid).array()). Here PKID is a long value.

If I want to get all the rows for a particular schema and for particular date, I can a query the hbase table using startrow and endrow, or any other way to query like this?

When I store my row key as string like user1_20130123, ..., user1_20130127 I am able to filter the table using

scan 'TempTable', {
    COLUMNS => ['CF:NAME'],
    LIMIT => 10,
    STARTROW => 'user1_20100101',
    ENDROW => 'user1_20100115'
}

Here I am getting the rows for user1 with in those dates. When I store the row key as like above how can I query?

Was it helpful?

Solution

You have a problem with your rowkeys, if you hash the date you won't be able to use it as a start/stop row for your scans.

Your rowkeys should be something like this:

[16B_schema_MD5_hash][8B_long_timestamp][8B_pkid]

Which you can query like this:

Scan myScan = new Scan( 
    Bytes.add(Bytes.toBytes(schemaNameMD5Hash), Bytes.toBytes(startTimestamp)),
    Bytes.add(Bytes.toBytes(schemaNameMD5Hash), Bytes.toBytes(stopTimestamp))
);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top