Comparing and Contrasting Backup and Recovery Solutions for Cassandra

Datos Blog

In today’s blog, I’ll discuss a follow-up to the recent webinar we had comparing the RecoverX solution in Cassandra against the competition.

The Modern IT Stack

There has been a fundamental shift in IT infrastructure and the evolution of the modern IT stack. From infrastructure, database, and application standpoint, organizations are moving away from traditional monolithic view of application stack to a more micro-services based approach. Modern applications deployed in public cloud environments and across multiple clouds (private, hybrid) are emerging as the de facto enterprise architecture. Applications are moving to a more elastic-compute use of its resources in the cloud. They are comprised of micro-services based applications deployed on the elastic database, elastic compute and elastic storage services of cloud native environments. The change from the traditional monolithic data center centric IT stack to the modern IT stack is disrupting the core tenets of data management, specifically backup and recovery.

There is massive growth in the development of modern applications and organizations moving to the cloud for data storage. This has led to a shift from the IT stack and augmentation of these SQL database systems in a new ‘backup reality’. They are now being built on modern, NoSQL databases and big data file systems. Applications are no longer running in the traditional 4-wall data structure, instead breathing across a multi-cloud environment.

Traditional backup solutions have data is in a clustered environment, so there’s a common misnomer that it’s already protected via replication. However, this is not true. Database replication plays a role in the overall availability strategy but it does not substitute for backup. One of the longest running discussion threads in the backup world is explaining why “replication is not backup.” Learn more about this here in Michael Colby’s blog.

Datos IO RecoverX – Cloud Backup and Recovery Built for the Modern IT Stack

Datos IO RecoverX provides application-consistent backup and recovery for non-relational and NoSQL databases deployed across hybrid cloud environments. RecoverX is an elastic cloud data management software platform, simple to deploy, and architected to protect and manage data at scale. RecoverX integrates at the database level, making it data aware, as RecoverX understands the schemas you are backing up and recovering. RecoverX is built on top of seminal data management architecture called Consistent Orchestrated Distributed Recovery (CODRTM) engine, which is not dependent on media servers and transfers data in parallel directly between application sources and file-based or object-based secondary storage. The architecture is fully distributed in nature, resulting in high availability in failure scenarios and uses elastic compute resources for scalable performance and resiliency. RecoverX supports modern data sources from databases to filesystems, to cloud infrastructures of the digital enterprise, and supports enterprise use cases including backup and recovery to/from/within the cloud, user-defined automated and point-in-time backup and recovery, test/dev/QA in the cloud, DR to/from the cloud, and cloud instantiation – all developed for distributed, multi-cloud, scale-out modern application environments. All data RecoverX backsup remains in native format, so there is no vendor lock-in on multiple fronts.

Organizations use RecoverX for three primary uses cases:

Backup and Recovery – ability to create policies to do backups at any interval and for any duration, based on the user’s conditions. And to be able to automate the process of highly granular, subtable, orchestrated, error-free recovery, also called operational recovery.

Test and Dev Refresh – platform is built on a set of APIs, organizations use to automate the process of test/dev refresh.

Data Mobility – ability to take datasets and put them wherever the user wants, either on-premise, a different cluster, or in the cloud. Organizations want to put data in a certain location, and then move it around. Data mobility is the foundation of data monetization.

Comparing and Contrasting Alternative Solutions

Scripted-Based Snapshots vs. Datos IO

Scripted solutions use manual scripts for backup, while there is no versioning and storage savings (all replicas are stored). The recovery process can take hours if not days due to recovering the entire replicated data. This makes data difficult to scale as it is not possible to recovery to different topologies, there is no failure resiliency and no multi-platform data sources.

DataStax storage center is an automation of scripts a user creates. It creates snapshots for any point-in-time backups as most users store multiple snapshot on their database nodes, consuming large amounts of primary storage space. DataStax has multiple copies of databases on Replication Factor (RF) and the data is stored in an inconsistent state, resulting multiple versioning. For failure handling, if a source node fails, the backups for that node stop. If the TTL has expired, recovered data is automatically expired by Cassandra.

With Datos IO, RecoverX only retains data on secondary storage (S3, GCS, NFS), reducing storage savings due to semantic de-duplication. It has cluster consistent versioning, resulting in restored data that is consistent. There is no need to run database repairs after the initial restore, reducing the overall recovery time. Source node failures are handled without losing data as there is failure resiliency for backup and recovery due to primary source node failures. RecoverX has the ability to change TTL value to ensure data recoverability.

Traditional Backup Products Summary vs. Datos IO

Traditional backups include media servers, which result in minimal de-duplication of compressed, bitwise non-ordered and inconsistent data. There is vendor lock-in on multiple fronts as data is written to protect storage in proprietary vendor format. Media Servers require dedicated primary storage that is expensive for deployment in the cloud.

With Datos IO, its scalability removes dependency on media server bottlenecks as backup data is distributed via multiple nodes in the cloud, providing enterprises automated data protection and point-in-time backup at any interval and granularity. Semantic de-duplication reduces storage costs, while cluster consistent versioning ensures recovered data is protected in its native format. Protected data is accessible in native format for use by source application or alternative use cases. RecoverX is architected for the cloud as it uses cloud storage resources (AWS S3, GCS) and it has elastically scalable software with a separate control and data plane. This allows easy scalability of data storage in a public and private cloud environment.

To learn more about RecoverX 2.5, check out this blog. Click here to listen to the webinar.