Question

I have an HBase cluster with a replication factor of 3. I'm trying to improve read performance so I can have more 'Get's per second.

When I bench marked Gets (reads), I set up a connection to HBase using a htable pool and connected to a table. Then I repeatedly read the same row over and over at faster intervals and monitored the cluster load. All nodes in the cluster have the same hardware. What I noticed is that the CPU load on a single node when up. Given that I've configured the cluster with a replication factor of 3, I would assume that the reads would be performed on any copy of data, on not a single copy on a single node (i would expect to see the load increase for 3 servers, not just one).

How can I improve read performance so that my Get queries are better balanced across the 3 replicated sets of data, and not read from a single node.

Thanks

Was it helpful?

Solution

The replication factor of 3 means that the underlying HDFS blocks are replicated 3 times, but each HBase region is served from a single HRegionServer, so one row can only be served from one location.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top