Question

suppose that I have this RDBM table (Entity-attribute-value_model):

col1: entityID
col2: attributeName
col3: value

and I want to use HBase due to scaling issues.

I know that the only way to access Hbase table is using a primary key (cursor). you can get a cursor for a specific key, and iterate the rows one-by-one .

The issue is, that in my case, I want to be able to iterate on all 3 columns. for example :

  • for a given an entityID I want to get all its attriutes and values
  • for a give attributeName and value I want to all the entitiIDS ...

so one idea I had is to build one Hbase table that will hold the data (table DATA, with entityID as primary index), and 2 "index" tables one with attributeName as a primary key, and the other one with value

each index table will hold a list of pointers (entityIDs) for the DATA table.

Is it a reasonable approach ? or is is an 'abuse' of Hbase concepts ?

In this blog the author say:

HBase allows get operations by primary key and scans (think: cursor) over row ranges. (If you have both scale and need of secondary indexes, don’t worry - Lucene to the rescue! But that’s another post.)

Do you know how Lucene can help ?

-- Yonatan

Was it helpful?

Solution

Secondary indexes would indeed be useful for many potential applications of HBase, and I believe the developers are in fact looking at it. Checkout http://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg04801.html.

In the mean time though, if your application data storage can be modelled as a star schema (see http://en.wikipedia.org/wiki/Star_schema) you might like to checkout the solution that Hypertable proposes for secondary index-type needs http://markmail.org/message/rphm4q6cbar2ycgp

OTHER TIPS

I recommend having two different flat tables: one for looking up attributes+values given entityID, and one for looking up the entityID given attributes+values.

Table 1 would look like this:

entityID1 {
  attribute1: value1;
  attribute2: value2;
  ...
}

and Table 2:

attribute1_value1 {
  entityID1;
}
attribute2_value2 {
  entityID1;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top