Question

I've just come across RRD lately by trying out ganglia monitoring system. Ganglia stores the monitoring data in RRD. I am just wondering that, from scalability perspective, how RRD works ? What if I have potentially huge amount of data to store. Like the ganglia case, if I want to store all the historical monitoring statistics instead of just storing the data recently with a specific TTL, will RRD good enough to cope with that?

Can someone who used RRD share some experience on how does RRD scale, and how does it compare to RDBMS or even big table?

Was it helpful?

Solution 2

RRD is designed to automatically blur (average out) your data over time, such that total size of database stays roughly the same, even as new data continuously arrives.

So, it is only good if you want some historical data and are willing to lose precision over time.

In other words, you cannot really compare RRD to standard SQL databases or to Bigtable, because standard SQL and NoSQL databases all store data precisely - you will read exactly what was written.

With RRDtool, however, there is no such guarantee. But its speed makes it attractive solution for all kinds of monitoring solutions, where only most recent data matters.

OTHER TIPS

The built-in consolidation feature of rrdtool is configurable, so depending on your disk space there is no limit to the amount of high precision data you can store with rrdtool. also due to its design, rrdtool databases will never have to be vacuumed or otherwise maintained, so that you can grow the setup to staggering sizes. Obviously you need enough memory and fast disks for rrdtool to work with big data, but this is the same with any large data step.

Some people get confused about rrdtools abilities due to the fact that you can also run it on a tiny embedded system, and when these people start logging gigabytes worth of data on an old pc from the attic and find that it does not cope, they wonder ...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top