Pregunta

I have a huge HBase table of about half a billion rows, with about 100 columns (varies per row) of data.

I would like to query this data, based on any column qualifier value, as fast as possible.

I know that HBase is optimized for fast reads when we know the ROW-KEY but I want to query based on different column values. But applying Column Filters (using JAVA API) leads to full table scans which slows the system down

What are my options?

  • INDEXING: The columns present in every row changes. Can I still do indexing?
  • Do I continue to use HBase to store data? Or use it along with Solr or ElasticSearch?
  • What sort of performance can I expect for random queries based on any column values with maybe a billion rows?

Any other suggestions are welcome.

No hay solución correcta

Otros consejos

Getting data from the row key is fast in Hbase, but since values are not indexed, querying with a value filter is sloooooooooow. If the number of columns to be indexed is small you can consider reversed table index.

But if you want more things, like multi-criteria queries, you should have a look to elasticsearch and use it to store only the index on your columns and keep your data in hbase. Don't forget to disable the source store with "_source" : {"enabled" : false} when creating your index, all your data is already in hbase, don't waste your HDD :)

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top