1. Introduction

Platform upgrade is a key aspect of a platform high availability. While it is essential to ensure that a platform stays up-to-date at all time (for a number of reasons ranging from security, to performance improvements and getting the latest fancy features released in each version), platform upgrade can also mean service downtime.
The rolling upgrade presented here allows to use the existing servers and update them on a one-by-one basis while serving HTTP traffic with the remaining computation power.

(click on the image to watch the animation)

2. Prerequisites

The procedure works with DX target version 7.2.2.1 and later.

3. Detailed procedure

The detailed procedure slightly differs between pre-7.2.3.2 and 7.2.3.2 versions (the target version number, i.e. the version we migrate to, is meant here), since the 7.2.3.2 release introduced new startup options that simlify the procedure a bit.

3.1. Target versions 7.2.2.1-7.2.3.1

Let’s consider the following environment: a cluster of 3 nodes in the version X behind a load balancer (represented by the

logo). The procedure will explain step by step how to migrate it to the version X+1 without service disruption.

It is advised to perform this procedure during a period of low activity and traffic, as it requires to use the full read-only mode and to remove a node from the load balancer at any time during the upgrade procedure.

The environment is in its initial state:

Turn all nodes of the cluster in full read-only mode. The procedure is described here.

Create a backup of your environment.

Create a copy of the database, in what we’ll call “DB Schema B”
Your environment is now in the following state:

Remove the processing node from the load balancer, so it does not serve any request:

Stop the processing node.

Point the processing node to the copied database “DB Schema B”:

Migrate the processing node.

Before restarting the node, create a marker file (empty file) named "backup-restore" in the <digital-factory-data> folder. This will prevent any conflict with the other nodes. The description of this marker is available here.

If your DX cluster had Mail Service activated before this procedure, please, before restarting the node, drop the script ActivateMailService.groovy into <digital-factory-data>/patches/groovy/ . This script is used to enable again the mail service automatically on DX startup.

Restart the node. Note that the node will not be in full read-only mode anymore.

Traffic can be redirected to this node:

Repeat the steps 5 to 12 for the other, non processing, nodes (remove from load balancer, stop, point to migrated database, migrate the node, create backup-restore file and restart):

Your environment is now migrated to the newer version!

Re-enable the mail service via the administration:

3.2. Target versions 7.2.3.2 and later

Let’s consider the following environment: a cluster of 3 nodes in the version X behind a load balancer (represented by the

logo). The procedure will explain step by step how to migrate it to the version X+1 without service disruption.

It is advised to perform this procedure during a period of low activity and traffic, as it requires to use the full read-only mode and to remove a node from the load balancer at any time during the upgrade procedure.

The environment is in its initial state:

Turn all nodes of the cluster in full read-only mode. The procedure is described here.

Create a backup of your environment.

Create a copy of the database, in what we’ll call “DB Schema B”
Your environment is now in the following state:

Remove the processing node from the load balancer, so it does not serve any request:

Stop the processing node.

Point the processing node to the copied database “DB Schema B”:

Migrate the processing node.

Before restarting the node, create a marker file (empty file) named "rolling-upgrade" in the <digital-factory-data> folder. This will prevent any conflict with the other nodes. The description of this marker is available here.

Restart the node. Note that the node will not be in full read-only mode anymore.

Traffic can be redirected to this node:

Repeat the steps 5 to 12 for the other, non processing, nodes (remove from load balancer, stop, point to migrated database, migrate the node, create backup-restore file and restart):