AllGoodBits.org

Fixing Erroneous Data in Ganglia Metrics by Editing RRDtool

Once upon a time, some of my network graphs generated by ganglia showed that some of my machines managed to shunt > 450 Petabytes/second of network traffic for about 45 seconds. Given that these things have a couple of gigabit NICs, I figured that we hadn't broken Physics and that these numbers were Incorrect.

This led me to discover that, contrary to my previous understanding/assumption, the RRDtool files that ganglia uses to store its time-series data are not too difficult to work with. This is because there is a straightforward editing pattern of dump-to-xml, edit, restore-from-xml.

I have a ganglia cluster called 'kvm' so the rrds live in /var/lib/ganglia/rrds/kvm.