How to increase the performance of random get operation with huge(10million records) small(240Bytes in avg) record size in Hbase?

StackOverflow https://stackoverflow.com/questions/22138141

Question

I have a Hbase table with four column families(totally 10 columns), the primary key is a fixed 10bytes id.The average row size is 240Bytes.

When I test the random get operation in HBase with 1 million rows, it gets 1000+rows/s, 0.25MB/s in average.

But when I test the same operation with 10million rows, it gets 160row/s, 0.04MB/s. After read some materials, I increased the HBASE_HEAPSIZE from 1G to 5G, after that I got 320rows/s, 0.08MB/s(cache hit raito is 87%), but it still much less than the speed in 1 million testsuite.

Does it has any methods to increase the performance? Thanks.

Was it helpful?

Solution

For random gets:

  • decrease the block size, not more than 64kb, 32k should be good
  • add a bloom filter on your table, at the row level
  • split your table in multiple regions by setting a low region file max to 1Go or lower and presplit your table (by country, merchants, or whatever you want)
  • activate the in memory
  • use a fast compression codec (lzo or snappy are good)
  • use a tablepool on your client side
  • use memcache (...)

Enjoy ;)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top