The errors about "RPCs waiting on ..."
are caused by the fact that HBase isn't keeping up. OpenTSDB will retain data points in memory and retry up to a certain limit. But past a certain point, it will start discarding data and throw this error back at you to indicate that there is a problem.
Just like for any database (distributed or not) you need to do basic tuning on HBase. Typically the two most commonly useful recommendations for new starters are:
- Making sure the max region size is large enough so you don't split too often.
- Pre-creating regions in order to avoid stalling when starting up (this was discussed recently on the mailing list)
The last problem about waiting on "-ROOT-,,0"
is less expected. You mentioned an HBase failure: have you actually seen HBase die during the test? If yes, check that it's not dying because it's running out of memory or experiencing GC pauses that are too long and that cause it to lose its ZooKeeper session (which forces it to commit suicide by design). Since you mentioned running in a VMware image, I assume you're in a constrained environment used for testing, so make sure that HBase (and thus the VM it's running on) is given enough memory for your write-heavy workload.