Poor write Performance by HBase client

https://stackoverflow.com/questions/18275454

24-06-2022
|

Question

I'm using HBase client in my application server (-cum web-server) with HBase cluster setup of 6 nodes using CDH3u4 (HBase-0.90). HBase/Hadoop services running on cluster are:

NODENAME-- ROLE

Node1 -- NameNode
Node2 -- RegionServer, SecondaryNameNode, DataNode, Master
Node3 -- RegionServer, DataNode, Zookeeper
Node4 -- RegionServer, DataNode, Zookeeper
Node5 -- RegionServer, DataNode, Zookeeper
Node6 -- Cloudera Manager, RegionServer, DataNode

I'm using following optimizations for my HBase client:

auto-flush = false
ClearbufferOnFail=true
HTable bufferSize = 12MB
Put setWriteToWAL = false (I'm fine with loss of 1 data).

In order to be closely consistent between read and write, I'm calling flush-commits on all the buffered tables at every 2 sec.

In my application, I place the HBase write call in a Queue (async manner) and draining the queue using 20 Consumer threads. On hitting web-server locally using curl, I'm able to see TPS of 2500 for HBase after curl completes, but with Load-test where request is coming at high rate of 1200 hits per second on 3 application servers,the Consumer(drain) threads which are responsible to write to HBase are not writing data at a rate comparable to input rate. I'm seeing not more than 600 TPS when request rate is 1200 hits per second.

Can anyone suggest what we can do to improve performance? I've tried with reduced threads to 7 on each of 3 app server but still no effect. An expert opinion would be helpful. As this is a production server, so not thinking to swap the roles, unless someone point severe performance benefit.

[EDIT]: Just to highlight/clarify our HBase writing pattern, our 1st Transaction checks the row in Table-A (using HTable.exists). It fails to find the row first time and so write to three tables. Subsequent 4 Transaction make exist check on Table-A and as it finds the row, it writes only to 1 Table.

Solution

So that's a pretty ancient version of HBase. As of Aug 18, 2013, I would recommend upgrading to something based off of 0.94.x.

Other than that it's really hard to tell you for sure. There are lots of tuning knobs. You should :

Make sure that HDFS has enough xceivers.
Make sure that HBase has enough heap space.
Make sure there is no swapping
Make sure there are enough handlers.
Make sure that you have compression turned on. [1]
Check disk io
Make sure that your row keys, column family names, column qualifiers, and values are as small as possible
Make sure that your writes are well distributed across your key space'
Make sure your regions are (pre-)split
If you're on a recent version then you might want to look at encoding [2]

After all of those things are taken care of then you can start looking at logs and jstacks.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow