Question

I have some columns at an hbase instance running in a Cloudera cluster.

I have a column that holds phone numbers (let's say phoneNum) which are not normalized (they come from a contact center calls with many prefixes, #s and *s. For example *107*065777113512#).

I need to query the database in order to find an specific phone number (without options) so it will be a substring of some records in phoneNum.

The question is as it follows:

How can I index hbase in order to achieve a good performance and avoid fullscan?

I read about solr integration, lucene, lily, but I don't know if any of them apply to this problem.

Thanks in advance!

Was it helpful?

Solution

You are looking for some solution where you'll be able to do some substring search. That you can achieve using Solr.

Solr gives you free text searching. There are lot of searching advantages you can find in Solr, which HBase wo't be able to give. Start with looking into https://wiki.apache.org/solr/.

OTHER TIPS

HBase does not have indexes on columns. It is purely indexed by the key. You could create a second table with a normalized phone number as the key, and then use the column values to link back to the original table, but this is all manual and would not update with changes in the original table.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top