Once you get a cluster into production, you’ll find that it takes on a life of its
own. Elasticsearch works hard to make clusters self-sufficient and just work.
But a cluster still requires routine care and feeding, such as routine backups
and upgrades.

Elasticsearch releases new versions with bug fixes and performance enhancements at
a very fast pace, and it is always a good idea to keep your cluster current.
Similarly, Lucene continues to find new and exciting bugs in the JVM itself, which
means you should always try to keep your JVM up-to-date.

This means it is a good idea to have a standardized, routine way to perform rolling
restarts and upgrades in your cluster. Upgrading should be a routine process,
rather than a once-yearly fiasco that requires countless hours of precise planning.

Similarly, it is important to have disaster recovery plans in place. Take frequent
snapshots of your cluster—and periodically test those snapshots by performing
a real recovery! It is all too common for organizations to make routine backups but
never test their recovery strategy. Often you’ll find a glaring deficiency
the first time you perform a real recovery (such as users being unaware of which
drive to mount). It’s better to work these bugs out of your process with
routine testing, rather than at 3 a.m. when there is a crisis.