Data retention in OpsCenter

Is there an easy way to limit the amount of data OpsCenter retains? It seems to be using about 50gb across my cluster at the moment and I'd like to reduce it a bit. I'm not sure if it's retaining a rolling window of data whose size I can tweak, or if it just retains data forever and I should occasionally purge it all.

Technically the data will eventually shrink on its own (based on ttls in Cassandra), but because of the nature of compaction, I couldn't really give you an exact timeline. The main optimizations we made revolve around the "pdps" column family in the "OpsCenter" keyspace (this is where raw data points are stored). If you want to reclaim some space immediately, you can truncate that column family after upgrading. You may lose 1 or 2 minutes of raw data points when you do that, but it sounds like that may be okay with you.

[default@OpsCenter] truncate pdps;
null
UnavailableException()
at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212)
at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077)
at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052)
at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1437)
at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:270)
at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)

In other news, the upgraded OpsCenter is definitely nicer (the data explorer actually works with non-utf8 data types, for instance), but there seems to be a problem with the Cluster view. It doesn't seem to be able to see repairs, etc, that are happening in the cluster anymore, and actually seems to be "stuck" in that respect (i.e OpsCenter has been thinking that node 5 is 97% done repairing one of our CFs for days now, ever since the upgrade, even though the repair has been run successfully more than once). Is there any way to "kick" it?

Alternatively I think I could probably fix a lot of my problems by just deleting all the OpsCenter data and restarting from scratch. It's not really something I've done before while my cluster is running though... is there any easy way to do that without disrupting other keyspaces in the cluster?

The UnavailableException could be caused by a couple of things. The first cause would be that not all nodes in the cluster are up; if OpsCenter doesn't show any nodes down, that's probably not the problem. The second is that for a few versions of Cassandra, if there was a node timeout during the truncate, an UnavailableException would be raised instead of a TimedOutException. If the data size for the pdps column family dropped to ~1MB, the truncate went through and just timed out. (Sometimes, not having JNA set up properly is the cause for slow truncates.)

You can drop all OpsCenter data by shutting down opscenterd and the agents, dropping the OpsCenter keyspace, starting opscenterd, and then starting all of the agents.

However, that's probably not necessary for fixing your issues. The repair issue may be that Cassandra is legitimately hung, or there may have just been some lost messages between the agent and opscenterd. If you run nodetool netstats on the nodes that OpsCenter says are streaming, you can see if some parts of the repair never actually completed. If nothing turns up, restarting the agents on the streaming nodes should resolve the issue.