DevOps Stack Exchange is a question and answer site for software engineers working on automated testing, continuous delivery, service integration and monitoring, and building SDLC infrastructure. Join them; it only takes a minute:

Continuous Delivery or Continuous Deployment of infrastructure and code is comparatively simple compared to trying the same approaches for databases, specifically RDBMSs. Code and infrastructure will not change or evolve once deployment has completed. Databases, however, will have new data added to them making the data if not the schema inherently mutable components.

I am aware that there are practices such as only adding database objects, i.e. tables and columns, never modifying or removing them - this purely additive way of approaching database schemas has the advantage of ensuring that schemas are backwards compatible at the cost of successively more complicated schemas.

Equally there are products such as Flyway and Ready Roll which assist in writing migrations to be written between versions of a schema.

What other practices and tools exist currently to allow for continuous deployment of database schemas into a production environment where data integrity is of a concern?

Frankly I know very little about traditional DBs - the question could very well apply to their indexes. I'm using google's cloud datastore for which changing indexes typically come with app code changes as well. My comment is an honest question, not in any way some "complaint" about Richard's question (which I upvoted).
– Dan CornilescuApr 7 '17 at 17:27

@DanCornilescu I also (honestly) believe what you wrote in your prior comment (and my prior comment was just an attempt to provide a possible answer to the question in your first comment). Next (real?) question?
– Pierre.VriensApr 7 '17 at 17:57

The Challenges

I am aware that there are practices such as only adding database objects, i.e. tables and columns, never modifying or removing them

At one company I worked for, a rolling window of raw data equated to about 6 months and ate up 10 TB. The data was then processed into an RDBMS format which cost 6 TB of usable data which accounted for about 10 years of reportable data. The point being that at scale, these kinds of practices simply aren'
t practical. Storage is expensive - probably the greatest compute expense. This provides several interesting challenges:

Backups - the innodb plugins are great and all, but the backup times on that much data just aren't that practical

Restore times - for large datasets - especially if you need replication to catch up after a restore getting back to an operational state can take days or even weeks

Creating/seeding new instances - Often the work you may be doing in dev/test involves ETL (Extract, Transform and Load) jobs on your dataset. These need to be validated using QA testing units, but this needs to be done in a non-destructive manner so that the origional production dataset is preserved. While in a disaster, you may be willing to deal with a long restore time on the understanding that backups are an insurance policy and the intent is to avoid them, the DevOps development workflow requires that, essentially, you be able to perform a restore or copy of your data on a regular basis (perhaps multiple times a day)

Capacity - Slinging around that much data at the scale I just described can be very I/O intensive. No only do you need to address the problems described in 1-3, but you need to do it in a way that doesn't cause outages or performance slowdowns to your production systems.

While the above considerations may not be a concern at smaller scales, at larger scales, these become huge problems. This means that it is extremely important that you define your requirements and forecast the size of your dataset.

Defining requirements

As a result, you need to define several requirements:

RTO - RTO or Restore Time Objective for backups are one of the most important drivers of database backup solutions. While, at first it might not appear relevant to most of the other problems, it becomes extremely relevant when you ask "What if I used my backup solution for creating or seeding new instances?". I'll cover some techniques for doing so in the next section.

RPO - RPO or Restore Point Objective for backups defines A) how far back you are able to restore (minutes, hours, days, weeks, months, or years) B) The backup interval at different tiers and C) how granularly you can restore. For example, for E-mail databases, Message Level Backups - restoring a specific E-mail - are often sought. Similarly, you may find that data over a few days is completely useless - so there is no point restoring back a year.

Size of your dataset - This is important because for a 1MB database, your RTO may be achieved with most backup products and solutions. For a 10TB database however, you will find that a full, row level backup to LTO 3 tape probably won't achieve your RTO and could interfere with your RPO because backups begin exceeding your backup window. You can't exactly just do a mysqldump on this large of a dataset, but could probably get away with that on the 1MB database.

Database consistency - A final thing that makes an immense difference in continuous deployment, site reliability, scalability and high-availablity when working with databses is your need (or lack thereof) for consistency. There are three basic types: immediate consistency, Just-In-Time (JIT) consistency and eventual consistency

In addition to the above major considerations, you also need to consider licensing and support requirements (open source or closed source; in house support, third party support or vendor support) application/language requirements (the connectors for many databases can be important; is your app compiled? Do you have access to the source code? Can you recompile it, or is it provided by a vendor? Or does it run on an interpreted language?) political requirements (Does your organization only trust Oracle? Do they hate oracle? Do they not trust MySql? How do you feel about MariaDB or Postgres?) and database engine (innoDB? MyISAM? Blackhole? NDB Cluster? Spider?) and historical or compatability requirements (we used PL/SQL for years and half our code is built into the oracle engine! How could we ever port over to MariaDB?!?)

All of these things (significantly) impact the tools available to you.

Some Options for in-house data management

Note: The following is in no way exhaustive, and other SE users should chime in with additional suggestions.

With the generic considerations out of the way, let me provide you with some techniques and technologies for addressing the above. First, ask yourself if you really need to use an RDBMS or if unstructured data with something like Hadoop, CouchDB or even Object Oriented Storage (something like swift) is an option.

Second, consider looking into a cloud based solution. This outsources some of this headache and leaves the complicated problems to highly qualified (and paid) individuals. At scale however, you can find this really eats into your budget (cloud providers DO make a profit at this, and at a certain scale, you can just afford to employ these experts yourself,) or if you are working under specific security or political requirements (read: we can't do clouds) consider a hybrid NFS/FibreChannel Filer. Most of these filers, such as those by NetApp, Pure Storage and Tegile support a delta based snapshotting and cloning technique that can be very handy for A) taking backups, B) Restoring backups and C) Seeding new backups.

At this point, I need to note that I am not a backup and storage expert, so there some portions of this problem that I never quite was able to solve before I moved on to other problems (and greener pastures).

But that being said, these products allow you to take differential snapshots underneath your databse. You will need to script out a full "lock tables with read lock" on one of your database instances (a read-only slave is recommended) and dump your binlog position or GTID but for these filers, once you do, you will be able to use these snaps to create new instances of your database. You will want to put binlogs on a seperate partition and put only your database data on these partitions. Once you do, you will be able to clone these partitions (on NetApps, this is know as a "FlexClone") The only issue with these is that they cannot span datastores which means that all your I/O is still impacting the same datastore/volume and calculating deltas can cause some overhead in terms of CPU resources.

This is because for each block read the filer must determine if the data resides in the frozen origional snapshot, or in the delta. For volumes/stores with multiple snapshots, this might need to be checked multiple times. You can overcome this by refreshing the data (meaning, discard your snapshot and clone it again periodically - which might happen naturally and organically for a good continuous deployment environment) Or by permanently splitting the volume (known as a "Flex Split" in NetApp terminology) which will take a moment to permanently resolve the deltas and create an entirely new and seperate volume.

These delta clones have the added benefit of reducing your overall storage requirement - you can spawn several clones or instance of your production database data to do your development, testing and validation. If you are only keeping one copy of your large dataset plus the (what are likely to be) small deltas, you reduce your overall storage cost and footprint.

The only trick here is that this may not constitute a full backup solution as the "backup" still resides on your filer. For this you may need to use something NetApp calls a Snap Mirror which will mirror data (using rsync-style technology) between filers and even datacenters, or use some type of integrated backup solution which can backup to tape one of your delta snapshots or a flex-clone.

This however has one major flaw: All of your data - dev, test and prod is still using I/O on the same filer and storage head. To work around this, consider creating a slave database instance on a second filer which can be the seeding point for you Test and/or dev filer, or consider using a load balancer/applcation delivery controller for your application layer to mirror production requests into your testing (and/or dev) environment(s). This has the added benefit of throwing prodcution traffic at your QA/Test environment before promoting to production for issues that might not be immediately noticed. You can then check your logs for errors based on production traffic and user behavior.

This then should allow you to use a few script to programatically spawn and destroy entire (and large) datasets for use with continuous deployment methodologies.

Scalability and High Availability

While you asked about continuous deployment, DevOps is conserned with more than just continuous deployment - so I am going to include some bits about redundancy, scalability and high availability.

I mentioned, JIT, immediate and eventual consistency. This is where varous RDBMS engines come in. Eventual consistency is relatively easy by simply configuring circular asynchronous replication. This can cause some collisions however *(what if your application layer updates data on one side of the cluster and on the other side of the cluster before replication is completed?) For immediate consistency, look at Galera cluster which will force synchronous replication, but causes scalability issues (how will you replicate to your Disaster Recovery site or load balance geographically without incurring significant latency due to propigation delay at the network layer?) You can also see if you can do synchronous replication within the datacenter and asynchronous replication between sites, but this seems the worst of both worlds.

Typically however, most poeple do not need fully synchronous replication - this is usually only needed for very specific (and exotic) high-write environments where multi-master is needed with table sharding. Most apps can deal with Just-In-Time consistency using a database proxy. For example, ScaleArc will monitor replication status and track where writes just went (to send subesquent read requests until replication catches up) to provide Just-In-Time consistency and the appearance of database consistency. ScaleArc is compatable with Postgres, MySQL, MariaDB, Oracle and MSSQL and can use regular expressions to shard/partition your databases for applications that can't use shard keys. It also has a robust REST API for your configuration management software to interact with - and their support team is outstanding

Similarly, you might wish to consider a free alternative, MaxScale developed by the MariaDB team for MariaDB. It lacks the GUI and some of the caching features of ScaleArc however.

Finally, MySQL fabric (and the in-RAM only MySQL Cluster - if you can afford that much RAM) are other potentials - especially with MySQL's new proxy. This can provide the scalability and redundancy component to your environment.

Postgres and Oracle should have the replication and sharding features you need, but ScaleArc will pair well if you need a proxy.

Ultimately, all these peices add up to a highly flexible environment suitable for continuous deployment and development if you are unable to simply use a cloud based environment and let your cloud provider deal with the above problems for you.

I'd argue a tool alone won't really help unless you shift the schema responsibility to the application team.

We do use liquibase or flyway at work, where the application team is responsible to create the changesets.

Along with this,you can avoid a purely additive way.
Each application is required to be compatible with its precedent version, when an alteration of schema has to be done, then the application will integrate a new column in the schema, write to it and still read and write from/to previous column (allowing the previous version to still have the same data).
The next version of the application is allowed to stop updating the old column and only the third version will be allowed to remove the now unused column.

Obviously, regular backups are needed in case something goes wrong.
A proper subset of data likely to create problems in integration tests to avoid them in production is also a good idea. The ideal case get an anonymous subset of production data.

Making changes to objects in your database—or even just performing routine administrative tasks—can be difficult, risky work. There are dozens of tasks to keep track of, and a single misstep could have a disastrous impact on availability and data integrity. You can cut back on both effort and risk with BMC Object Administration for DB2 11, a collection of tools to help you:

Reduce the time required to administer complex and disparate DB2 environments.

Automate routine tasks throughout the application lifecycle for improved integrity.

Builds and executes utility jobs, allowing users to take advantage of LISTDEFs and TEMPLATEs for increased productivity

Integration with SCM tools

Some time ago I was challenged by a customer to actually "integrate" the BMC alternative in their SCM lifecycle (managed by SERENA ChangeMan ZMF). The idea behind such integration being "Consider our DB2 DBA-team (doing stuff with DDLs) as a special case of a development team, accidentally using some exotic editor (the BMC tool) to produce the code (DDL) to be migrated".

About the only (but real!) challenge about this integration was to also facilitate "restartability", which is one of the key benefits of such DBA tool:

Restartability means that when this DBA tool is doing its thing (via sometimes long running jobs, as per the nature of the work to be completed), unexpected things may happen (deadlocks, time abends, etc).

If any of those things happen, and you don't want to restart from your backup (before you started), then the DBA tool expects you to restart from the (check-)point from which things started to go wrong (and from where you want everything to be re-executed).

Better still (or should I say "even worse"?), if a newbee to the DBA tool doesn't real know how to restart from such checkpoint, and simply tries again (from the beginning), then the DBA tool will simply fail with some kind of user error. And this with an indication with something like "Until you tell me how you want me to proceed after my last point of failure, I refuse to do anything (to not make things even worse)".

In the end, the real clue to implement this restartability in the SCM tool being used, required the SCM tool to be tuned also. Actually to allow it to be used for restarting failed SCM procedures (typically related to deliveries to target test/prod environments) ... instead of just submitting the failed SCM procedure all over again (which is how typically those SCM tools recover from such situations).

Bonus: after the SCM integration of the BMC alternative got completed, the customer decided to change their DBA tool to the IBM alternative. And even though both alternatives don't really look the same, the impact of it to the SCM integration was rather minimal: simply a matter of replacing (in some reusable SCM customization) some calls to the BMC alternative by the equivalent calls to the IBM alternative ... which (using DevOps terminology) got implemented by feature-flags / feature-toggles.