Zero Downtime Deployment with a Database

This article will explain in depth how to tackle issues related to database compatibility and the deployment process.
We will present what can happen with your production applications if you try to perform
such a deployment unprepared. We will then walk through the steps in the lifecycle of an application that are necessary
to have zero downtime. The result of our operations will be applying a backward incompatible database change in a backward compatible way.

If you want to work through the code samples below, you will find everything you need in GitHub.

Introduction

Zero downtime deployment

What is this mythical zero downtime deployment? You can say that your application is deployed that way if you can
successfully introduce a new version of your application to production without making the user see that the application
went down in the meantime. From the user’s and the company’s point of view it’s the best possible scenario of deployment
since new features can be introduced and bugs can be eliminated without any outage.

How can you achieve that? There are number of ways but one of them is just to:

deploy version 1 of your service

migrate your database to a new version

deploy version 2 of your service in parallel to the version 1

once you see that version 2 works like a charm just bring down version 1

you’re done!

Easy, isn’t it? Unfortunately, it’s not that easy and we’ll focus on that later on. Right now let’s check another
common deployment process which is the blue green deployment.

Have you ever heard of blue green deployment? With Cloud Foundry it’s
extremely easy to do. Just check out this article where
we describe it in more depth. To quickly recap, doing blue green deployment is as simple as:

maintain two copies of your production environment (“blue” and “green”);

route all traffic to the the blue environment by mapping production URLs to it;

deploy and test any changes to the application in the green environment;

“flip the switch” by mapping URLs onto green and unmapping them from blue.

Blue green deployment is an approach that gives you ease of introducing new features without the stress that
something will completely blow up on production. That’s due to the fact that even if that would be the case,
you can easily rollback your router to point to a previous environment just by "flipping the switch".

After reading all of the above you could ask yourself a question: What does zero downtime deployment have to do with Blue green deployment?

Well, they have quite a lot in common since maintaining two copies of the same environment leads to doubling the effort
required to support it. That’s why some teams, as Martin Fowler states it,
tend to perform a variation of that approach:

Another variation would be to use the same database, making the blue-green switches for web and domain layers.

Databases can often be a challenge with this technique, particularly when you need to change the schema to support a new version of the software.

And here we arrive at the main problem that we will touch in this article. The database. Let’s have another glimpse on this phrase:

migrate your database to a new version

Now you should ask yourself a question - what if the database change is backward incompatible? Won’t my version 1 of the application
just blow up? Actually, it will…​

So even though the benefits of zero downtime / blue green deployment are gigantic, companies tend to follow such a safer process
of deploying their apps:

prepare a package with the new version of the application

shut down the running application

run the database migration scripts

deploy and run the new version of the application

In this article we’ll describe in more depth how you can work with your database and your code so that you can profit from the
benefits of the zero downtime deployment.

Database issues

If you have a stateless application that doesn’t store any data in the database then you can start doing zero downtime deployment
right now. Unfortunately, most software has to store the data somewhere. That’s why you have to think twice before doing any sort
of schema changes. Before we go into the details of how to change the schema in such a way that zero downtime deployment is possible
let’s focus on schema versioning first.

Schema versioning

In this article we will use Flyway as a schema versioning tool. Naturally we’re also writing a Spring Boot application
that has native support for Flyway and will execute the schema migration upon application context setup. When using Flyway
you can store the migration scripts inside your projects folder (by default under classpath:db/migration). Here you can see an example
of such migration files

In this example we can see 4 migration scripts that, if not executed previously, will be executed one after another when the application
starts. Let’s take a look at one of the files (V1__init.sql) as an example.

Solving the database issue

In the following section of the article we will focus on presenting two approaches to database changes.

backward incompatible

backward compatible

The first one will be shown as a warning to not to try to do zero downtime deployment without some preparations.
The second one will present a suggested solution of how one can perform zero downtime deployment and maintain
backward compatibility at the same time.

Our project that we will work on will be a simple Spring Boot Flyway application in which we have a Person
that has a first_name and a last_name in the database. We want to rename the last_name column into surname.

Assumptions

Before we go into details we need to define a couple of assumptions towards our applications. The key result that we
would like to obtain is to have a fairly simple process.

Tip

Business PRO-TIP. Simplifying processes can save you a lot of money on support (the more people work in your company the more money you can save)!

We don’t want to do database rollbacks

Not doing them simplifies the deployment process (some database rollbacks are close to impossible like rolling back a delete).
We prefer to rollback only the applications. That way even if you have different databases (e.g. SQL and NoSQL) then your
deployment pipeline will look the same.

We want ALWAYS to be able to rollback the application one version back (not more)

We want to rollback only as a necessity. If there is a bug in the current version that can’t be solved easily we want to be
able to bring back the last working version. We assume that this last working version is the previous one. Maintaining code and database
compatibility for more than a single deployment would be extremely difficult and costly.

Tip

For readability purposes we will be versioning the applications in this article with major increments.

Step 1: Initial situation

Version of the app: 1.0.0

Version of the DB: v1

Comment

This will be the initial state of the application that we will take into consideration.

Renaming a column in backward-incompatible way

Let’s take a look at the following example if you want to change the column name:

Warning

The following example is deliberately done in such a way that it will break. We’re showing it to depict the problem of database
compatibility.

Version of the app: 2.0.0.BAD

Version of the DB: v2bad

Comment

Current changes DO NOT allow us to run two instances (old and new) at the same time. Thus zero down time
deployment will be difficult to achieve (if we take into consideration out assumptions it’s actually impossible).

A/B testing

The current situation is that we have an app deployed to production in version 1.0.0 and db in v1. We want to deploy the second
instance of the app that will be in version 2.0.0.BAD and update the db to v2bad.

Steps:

a new instance is deployed in version 2.0.0.BAD that updates the db to v2bad

in v2bad of the database the column last_name is no longer existing - it got changed to surname

the db and app upgrade is successful and you have some instances working in 1.0.0, others in 2.0.0.BAD. All are talking to db
in v2bad

all instances of version 1.0.0 will start producing exceptions cause they will try to insert data to last_name column which is
no longer there

all instances of version 2.0.0.BAD will work without any issues

As you can if we do backward incompatible changes of the DB and the application, A/B testing is impossible.

Rolling back the application

Let’s assume that after trying to do A/B deployment we’ve decided that we need to rollback the app back to version 1.0.0. We assumed
that we don’t want to roll back the database.

Steps:

we shut down the instance that was running with version 2.0.0.BAD

the database is still in v2bad

since version 1.0.0 doesn’t understand what surname column is it will produce exceptions

hell broke loose and we can’t go back

As you can if we do backward incompatible changes of the DB and the application, we can’t roll back to a previous version.

Logs from script execution

Backward incompatible scenario:
01) Run 1.0.0
02) Wait for the app (1.0.0) to boot
03) Generate a person by calling POST localhost:9991/person to version 1.0.0
04) Run 2.0.0.BAD
05) Wait for the app (2.0.0.BAD) to boot
06) Generate a person by calling POST localhost:9991/person to version 1.0.0 <-- this should fail
07) Generate a person by calling POST localhost:9992/person to version 2.0.0.BAD <-- this should pass
Starting app in version 1.0.0
Generate a person in version 1.0.0
Sending a post to 127.0.0.1:9991/person. This is the response:
{"firstName":"b73f639f-e176-4463-bf26-1135aace2f57","lastName":"b73f639f-e176-4463-bf26-1135aace2f57"}
Starting app in version 2.0.0.BAD
Generate a person in version 1.0.0
Sending a post to 127.0.0.1:9991/person. This is the response:
curl: (22) The requested URL returned error: 500 Internal Server Error
Generate a person in version 2.0.0.BAD
Sending a post to 127.0.0.1:9995/person. This is the response:
{"firstName":"e156be2e-06b6-4730-9c43-6e14cfcda125","surname":"e156be2e-06b6-4730-9c43-6e14cfcda125"}

Code changes

We have changed the field name from lastName to surname.

Renaming a column in backward-compatible way

This is the most frequent situation that we can encounter. We need to perform backward incompatible changes. We have already
proven that to do zero downtime deployment we must not simply apply the database migration without extra work. In this
section of the article we will go through 3 deployments of the application together with the database migrations to achieve
the desired effect and at the same time be backward compatible.

Tip

As a reminder - Let’s assume that we have the DB in version v1. It contains the columns first_name and last_name.
We want to change the last_name into surname. We also have the app in version 1.0.0 which doesn’t use the surname column just yet.

Step 2: Adding surname

Version of the app: 2.0.0

Version of the DB: v2

Comment

By adding a new column and copying its contents we have created backward compatible changes of the db. ATM if we
rollback the JAR / have an old JAR working at the same tame it won’t break at runtime.

Rolling a new version

Steps:

migrate your db to create the new column called surname. Now your db is in v2

copy the data from the last_name column to surname. NOTE that if you have a lot of this data then you should consider batch
migration!

write the code to use BOTH the new and the old column. Now your app is in version 2.0.0

read the surname value from surname column if it’s not null and from last_name if surname wasn’t set.
You can remove the getLastName() from the code since it will produce nulls when your app is rolled back from 3.0.0 to 2.0.0.

If you’re using Spring Boot Flyway those two steps will be performed upon booting the version 2.0.0 of the app. If you’re running
database versioning tool manually then you’d have to do it in separate processes (first manually upgrade the db version and then deploy
the new app).

Important

Remember that the newly created column MUST NOT be NOT NULL. If you rollback, the old app has no knowledge of the new
column and won’t set it upon Insert. But if you add that constraint and your db is in v2 it would require the value of the new
column to be set. That would result in constraint violations.

Important

You should remove the getLastName() method because in version 3.0.0 there is no notion of last_name column in the code.
That means that nulls will be set there. You can leave the method and add null-checks but a much better solution would be to ensure
that in the logic of getSurname() you pick the proper, non-null value.

A/B testing

The current situation is that we have an app deployed to production in version 1.0.0 and db in v1. We want to deploy the second
instance of the app that will be in version 2.0.0 and update the db to v2.

Steps:

a new instance is deployed in version 2.0.0 that updates the db to v2

in the meantime some requests got processed by instances being in version 1.0.0

the upgrade is successful and you have some instances working in 1.0.0, others in 2.0.0. All are talking to db in v2

version 1.0.0 is not using the database’s column surname and version 2.0.0 is. They don’t interfere each other, no exceptions
should be thrown.

version 2.0.0 is saving data to both old and new column thus it’s backward compatible

Important

If you have any queries that count items basing on values from old / new column you have to remember that now you have
duplicate values (most likely still being migrated). E.g. if you want to count the number of users whose last name (however you call it)
starts with a letter A then until the data migration (old → new column) is done you might have inconsistent data if you
perform the query against the new column.

Rolling back the application

The current situation is that we have app in version 2.0.0 and db in v2.

Steps:

roll back your app to version 1.0.0.

version 1.0.0 is not using the database’s column surname thus rollback should be successful

Remember NOT TO ADD any NOT NULL constraints to the added column. Cause if you rollback the JAR
the old version doesn’t have the notion of the added column and automatically a NULL value will be set. In case
of having a constraint the old application will blow up.

-- NOTE: This field can't have the NOT NULL constraint cause if you rollback, the old version won't know about this field
-- and will always set it to NULL
ALTER TABLE PERSON ADD surname varchar(255);
-- WE'RE ASSUMING THAT IT'S A FAST MIGRATION - OTHERWISE WE WOULD HAVE TO MIGRATE IN BATCHES
UPDATE PERSON SET PERSON.surname = PERSON.last_name

Code changes

We are storing data in both last_name and surname. Also, we are reading from the last_name column cause
it is most up to date. During the deployment process some requests might have been processed by the instance that
hasn’t yet been upgraded.

/*
* Copyright 2012-2016 the original author or authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package sample.flyway;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
@Entity
public class Person {
@Id
@GeneratedValue
private Long id;
private String firstName;
private String lastName;
private String surname;
public String getFirstName() {
return this.firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
/**
* Reading from the new column if it's set. If not the from the old one.
*
* When migrating from version 1.0.0 -> 2.0.0 this can lead to a possibility that some data in
* the surname column is not up to date (during the migration process lastName could have been updated).
* In this case one can run yet another migration script after all applications have been deployed in the
* new version to ensure that the surname field is updated.
*
* However it makes sense since when looking at the migration from 2.0.0 -> 3.0.0. In 3.0.0 we no longer
* have a notion of lastName at all - so we don't update that column. If we rollback from 3.0.0 -> 2.0.0 if we
* would be reading from lastName, then we would have very old data (since not a single datum was inserted
* to lastName in version 3.0.0).
*/
public String getSurname() {
return this.surname != null ? this.surname : this.lastName;
}
/**
* Storing both FIRST_NAME and SURNAME entries
*/
public void setSurname(String surname) {
this.lastName = surname;
this.surname = surname;
}
@Override
public String toString() {
return "Person [firstName=" + this.firstName + ", lastName=" + this.lastName + ", surname=" + this.surname
+ "]";
}
}

Step 3: Removing last name from code

Version of the app: 3.0.0

Version of the DB: v3

Comment

By adding a new column and copying its contents we have created backward compatible changes of the db. ATM if we
rollback the JAR / have an old JAR working at the same time it won’t break at runtime.

Rolling back the application

The current situation is that we have app in version 3.0.0 and db in v3. Version 3.0.0 is not storing data
into the last_name column. That means that most up to date information is stored in the surname column.

Steps:

roll back your app to version 2.0.0.

version 2.0.0 is using both last_name and surname column.

version 2.0.0 will pick first surname column if it’s not null and if that’s not the case then it will pick last_name

DB changes

There are no structure changes in the DB. The following script is executed that performs the final migration of old data:

-- WE'RE ASSUMING THAT IT'S A FAST MIGRATION - OTHERWISE WE WOULD HAVE TO MIGRATE IN BATCHES
-- ALSO WE'RE NOT CHECKING IF WE'RE NOT OVERRIDING EXISTING ENTRIES. WE WOULD HAVE TO COMPARE
-- ENTRY VERSIONS TO ENSURE THAT IF THERE IS ALREADY AN ENTRY WITH A HIGHER VERSION NUMBER
-- WE WILL NOT OVERRIDE IT.
UPDATE PERSON SET PERSON.surname = PERSON.last_name;
-- DROPPING THE NOT NULL CONSTRAINT; OTHERWISE YOU WILL TRY TO INSERT NULL VALUE OF THE LAST_NAME
-- WITH A NOT_NULL CONSTRAINT.
ALTER TABLE PERSON MODIFY COLUMN last_name varchar(255) NULL DEFAULT NULL;

Code changes

We are storing data in both last_name and surname. Also, we are reading from the last_name column cause
it is most up to date. During the deployment process some requests might have been processed by the instance that
hasn’t yet been upgraded.

Step 4: Removing last name from db

Comment

Since the code of version 3.0.0 wasn’t using last_name column, if we roll back to 3.0.0 after removing the
column from the database then nothing bad will happen at runtime.

Logs from script execution

We will do it in the following way:
01) Run 1.0.0
02) Wait for the app (1.0.0) to boot
03) Generate a person by calling POST localhost:9991/person to version 1.0.0
04) Run 2.0.0
05) Wait for the app (2.0.0) to boot
06) Generate a person by calling POST localhost:9991/person to version 1.0.0
07) Generate a person by calling POST localhost:9992/person to version 2.0.0
08) Kill app (1.0.0)
09) Run 3.0.0
10) Wait for the app (3.0.0) to boot
11) Generate a person by calling POST localhost:9992/person to version 2.0.0
12) Generate a person by calling POST localhost:9993/person to version 3.0.0
13) Kill app (3.0.0)
14) Run 4.0.0
15) Wait for the app (4.0.0) to boot
16) Generate a person by calling POST localhost:9993/person to version 3.0.0
17) Generate a person by calling POST localhost:9994/person to version 4.0.0
Starting app in version 1.0.0
Generate a person in version 1.0.0
Sending a post to 127.0.0.1:9991/person. This is the response:
{"firstName":"52b6e125-4a5c-429b-a47a-ef18bbc639d2","lastName":"52b6e125-4a5c-429b-a47a-ef18bbc639d2"}
Starting app in version 2.0.0
Generate a person in version 1.0.0
Sending a post to 127.0.0.1:9991/person. This is the response:
{"firstName":"e41ee756-4fa7-4737-b832-e28827a00deb","lastName":"e41ee756-4fa7-4737-b832-e28827a00deb"}
Generate a person in version 2.0.0
Sending a post to 127.0.0.1:9992/person. This is the response:
{"firstName":"0c1240f5-649a-4bc5-8aa9-cff855f3927f","lastName":"0c1240f5-649a-4bc5-8aa9-cff855f3927f","surname":"0c1240f5-649a-4bc5-8aa9-cff855f3927f"}
Killing app 1.0.0
Starting app in version 3.0.0
Generate a person in version 2.0.0
Sending a post to 127.0.0.1:9992/person. This is the response:
{"firstName":"74d84a9e-5f44-43b8-907c-148c6d26a71b","lastName":"74d84a9e-5f44-43b8-907c-148c6d26a71b","surname":"74d84a9e-5f44-43b8-907c-148c6d26a71b"}
Generate a person in version 3.0.0
Sending a post to 127.0.0.1:9993/person. This is the response:
{"firstName":"c6564dbe-9ab5-40ae-9077-8ae6668d5862","surname":"c6564dbe-9ab5-40ae-9077-8ae6668d5862"}
Killing app 2.0.0
Starting app in version 4.0.0
Generate a person in version 3.0.0
Sending a post to 127.0.0.1:9993/person. This is the response:
{"firstName":"cbe942fc-832e-45e9-a838-0fae25c10a51","surname":"cbe942fc-832e-45e9-a838-0fae25c10a51"}
Generate a person in version 4.0.0
Sending a post to 127.0.0.1:9994/person. This is the response:
{"firstName":"ff6857ce-9c41-413a-863e-358e2719bf88","surname":"ff6857ce-9c41-413a-863e-358e2719bf88"}

DB changes

In comparison to v3 we’re just removing the last_name column and add missing constraints.

Code changes

Recap

We have successfully applied the backward incompatible change of renaming the column by doing a couple of
backward compatible deploys. Here you can find the summary of the performed actions:

deploy version 1.0.0 of the application with v1 of db schema (column name = last_name)

deploy version 2.0.0 of the application that saves data to last_name and surname columns.
The app reads from last_name column. Db is in version v2 containing both last_name and surname columns. The surname column is
a copy of the last_name column. (NOTE: this column must not have the not null constraint)

deploy version 3.0.0 of the application that saves data only to surname and reads from surname. As for the db the final
migration of last_name to surname takes place. Also the NOT NULL constraint is dropped from last_name. Db is now in version v3

deploy version 4.0.0 of the application - there are no changes in the code. Deploy db in v4 that first
preforms a final migration of last_name to surname and removes the last_name column. Here you can add any missing constraints

By following this approach you can always rollback one version back without breaking the database / application compatibility.

Code

All the code used in this article is available at Github. Below you can find some additional description.

Projects

Once you clone the repo you’ll see the following folder structure.

├── boot-flyway-v1 - 1.0.0 version of the app with v1 of the schema
├── boot-flyway-v2 - 2.0.0 version of the app with v2 of the schema (backward-compatible - app can be rolled back)
├── boot-flyway-v2-bad - 2.0.0.BAD version of the app with v2bad of the schema (backward-incompatible - app cannot be rolled back)
├── boot-flyway-v3 - 3.0.0 version of the app with v3 of the schema (app can be rolled back)
└── boot-flyway-v4 - 4.0.0 version of the app with v4 of the schema (app can be rolled back)

Scripts

You can run the scripts to execute the scenario that shows the backward compatible and incompatible changes applied to the db.