Question

I am interested in knowing if there is any alternative to rrdtool for logging time series data. I am looking at something that can scale for a large number of devices to monitor.

From what I read on this subject, rrdtool is I/O bound when you hit it with large amounts of data. Since I envision this to scale to a very large number of devices to monitor, I am curious if there's any alternative that would not choke on I/O. Preferable SQL based, but not necessarily.

Thanks

Was it helpful?

Solution

If I/O performance is the main worry then you want to look into something like rrdcached which is available in the current version (1.4) of the RRDTools.

The I/O overhead is not a function of the data being written, after all each value 8 bytes per data source. The I/O bandwidth comes from the fact a whole sector (typically 4k) needs to be read in before being written out. Suddenly to write 8 bytes you have read/written 8k bytes.

The rrdcached coalesces all these write together so when an RRD is updated the ratio of useful data (actual DS values) to wasted data (the spare bytes in the sector) is reduced.

All the RRDTools will automatically work with rrdcached when they detect it running (via an environment variable). This allows them to trigger flushes when needed, for example when generating a graph from the data.

While switching to an SQL based solution may help consider the extra I/O that will be required to support SQL. Considering you don't tend to use RRD data in that sort of random access pattern a database is a bit of a sledgehammer for the problem. While sticking with RRDTool will keep access to all the eco-system of tools that understand and can work with the files, which is useful especially if you are already familiar with it.

OTHER TIPS

There are some time series databases which have high availability and/or scalability as goals.

Maybe have a look at

  • rrdcached, a caching layer on top of rrd
  • whisper, the database engine behind graphite
  • opentsdb is a distributed, scalable Time Series Database (TSDB) written on top of HBase
  • reconnoiter although its focus is more on monitoring

A friend of mine did some work a while ago on a SQL backend to store round robin data: http://rrs.decibel.org

However, I suspect that since you're asking about "devices to monitor", you may be looking for a more complete solution.

If I/O operations per second is your main bottleneck and you're using Linux, there's an easy hack that only costs you memory. Use a tmpfs mount to stage your RRD writes.

All the i/o operations will be done in memory and won't incur any of the bottlenecks found in doing disk i/o (this is even faster than using solid state disks). You can then use a cron job and rsync to copy only changed RRDs to disk once every few minutes.


Create the directories

bash-4.2# mkdir /mnt/rrd-reads
bash-4.2# mkdir /mnt/rrd-writes

Create a 500MB-maximum RAM filesystem with appropriate options

bash-4.2# mount -t tmpfs -o size=500m,mode=0750,uid=collectd,gid=collectd none /mnt/rrd-writes
bash-4.2# echo "none /mnt/rrd-writes tmpfs size=500m,mode=0750,uid=collectd,gid=collectd 1 2" >> /etc/fstab

Copy the old RRD files into the new mount point

bash-4.2# cp -a /var/lib/collectd/rrd/* /mnt/rrd-writes

Configure your rrd-writing application to write to the new mount point

bash-4.2# sed -i -e 's/DataDir "\/var\/lib\/collectd\/rrd"/DataDir "\/mnt\/rrd-writes"/' /etc/collectd/collectd.conf

Set up a cron job to sync only the changed RRDs to disk once every 2 minutes

bash-4.2# echo "*/2 * * * * collectd rsync -a /mnt/rrd-writes/* /mnt/rrd-reads/ ; sync" > /etc/cron.d/rrd-sync

Don't forget to copy your saved RRD files into the mount point before you start your rrd-writing application! You may need to edit the init script for that service to make sure the files are there before it starts. If it starts without the files in place, new bare ones will be created and you'll be very confused once the read directory gets overwritten with empty RRDs.

If at some point you need to resize the tmpfs mount, you can do that on the fly:

bash-4.2# mount -t tmpfs -o remount,size=850m /mnt/rrd-writes
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top