Question

I am new in Big Data and HBase, in participle. Now I am trying to use OpenTSDB to store data from sensors.

Configuration is: Cloudera vmware image with the last stable OpenTSDB installed on it. After configuring, I started server with

./build/tsdb tsd --port=4242 --staticroot=build/staticroot/ --cachedir=/tmp/tsd/ --auto-metric

Then, I ran simple netcat client:

#!/bin/bash
set -e
while true; do
  ./run $1 $2
  sleep 1
done | nc -w 30 localhost 4242

With ./run compiled from:

#include <cstdio>
#include <cstdlib>
#include <time.h>       /* time */

int main(int argc, char **argv)
{
  if ( argc <= 2 ) {
    fprintf(stderr, "2 param excepted: start point and number of sensors\n");
    return 1;
  }

  unsigned long t = time(NULL);
  srand(t);

  int b;   // index of first sensor
  int n;   // number of sensors
  sscanf(argv[1], "%d", &b);
  sscanf(argv[2], "%d", &n);

  for ( int i = b; i < b+n; ++i ) {
    printf("put democ.%d %d %lf host=localhost.localdomain\n", i, t, 1.0 + 0.01 * (rand() % 100));
  }

  return 0;
}

And afterwards watching for democ.%d metricas via localhost:4242.

I am satisfied with its performance, but there are problems when the generator produces a large number of metrics (n).

First problem is dissapearing of some datapoints. It depends of n. If n = 10000, there are 29 points in 30 seconds on the average. But if n = 75000, there are only 15 points. This problem is not critical. I think, it causes by disk bandwidth.

After some time, the server sends an error:

put: HBase error: 1000 RPCs waiting on "tsdb,\x00\x98[Q\x96E\xF0\x00\x00\x01\x00\x00\x01,1368809980414.dc6179de43f78eac6c8b745996200664." to come back online

Second problem is HBase failure, after the server has been running for some time. OpenTSDB dies with massive flooding to all clients and own console with such message:

put: HBase error: 10000 RPCs waiting on "-ROOT-,,0" to come back online

What can I do to solve this problem?

I also thought about the possibility of using Cassandra for my project.

What the best opensource solution to store time series data (approximately, I need to store data from 100 000 sensors for 30 days, while each sensor generates up to 40 bytes of data every second).

Was it helpful?

Solution

The errors about "RPCs waiting on ..." are caused by the fact that HBase isn't keeping up. OpenTSDB will retain data points in memory and retry up to a certain limit. But past a certain point, it will start discarding data and throw this error back at you to indicate that there is a problem.

Just like for any database (distributed or not) you need to do basic tuning on HBase. Typically the two most commonly useful recommendations for new starters are:

  1. Making sure the max region size is large enough so you don't split too often.
  2. Pre-creating regions in order to avoid stalling when starting up (this was discussed recently on the mailing list)

The last problem about waiting on "-ROOT-,,0" is less expected. You mentioned an HBase failure: have you actually seen HBase die during the test? If yes, check that it's not dying because it's running out of memory or experiencing GC pauses that are too long and that cause it to lose its ZooKeeper session (which forces it to commit suicide by design). Since you mentioned running in a VMware image, I assume you're in a constrained environment used for testing, so make sure that HBase (and thus the VM it's running on) is given enough memory for your write-heavy workload.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top