Question

I would like to represent and store a huge list of sequences as a prefix tree in many machines as follows: +A master machine will represent prefixes of sequences. +n slaves machines will represent n sub-prefix trees with each contains the rest of sequences.

I wonder if I can use Hbase for solving my problem? Could you share me any experience about that?

Was it helpful?

Solution 2

Maybe your concept of "Master" is not precisely the same as HMaster. The HMaster is for administrative purposes, such as identifying which RegionServer contains the Region for a given set of rows of a give table.

All of the data in the rows of any of your tables will be inside RegionServer's. Reading/writing data to/from the RegionServers is accomplished without any interaction with the HMaster except to determine which server the rows live on.

Coming back to your "master" vs "slaves" machines topology: you might decide to store the sequence prefixes as separate tables. Then the RegionServers for the prefixes may be managed separately from the sub-prefix trees . In any case there is not a "single master machine" to store the data but instead one or more Regions on one or more RegionServers.

OTHER TIPS

HBase tables are lexicographically sorted by row key. So the natural way keys would be stored in HBase in your scenario is that subtrees would immediately follow each prefix i.e. most likely the subtrees would be on the same region where the parent tree is (since they share the same prefix).

If it is very important to you that the you'd have one node which holds the masters and others that hold the rest you'd need to work hard both on partitioning keys, balancing regions etc. As Javadba said the likely solution for you in HBase is to separate the concepts into separate tables and you'd still have to work on balancing if you want to ensure that they don't share machines

If the exact physical architecture is less important to you and what you really want is efficiency in storying- you may want to look at graph databases e.g. Titan which builds on HBase (or Cassandra), Neo4J etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top