The structure of automation of Rolling Upgrade process is quite straightforward.
* We need to generate config files(.cnf) for all nodes with respective path and version for MariaDB.
Say we have 5 nodes, then there will be 5 config files – like mynode1_10.3.cnf, mynode2_10.3.cnf etc.
But we need also 10.4 version config files for Upgrade – like mynode1_10.4.cnf, mynode2_10.4.cnf etc.
Main difference between those config files is – path of Galera – it should be 3 for 10.3 and 4 for 10.4
* Nodes need to be started with old 10.3 version of MariaDB and Galera 3, with respective config files mentioned above.
* After successfully cluster start, we need to start shut down and upgrade from last node – for us it is node5
So, shut down node5, start it 10.4 binary append new config(mynode5_10.4.cnf not mynode5_10.3.cnf).
Run mysql_upgrade after successful start with new binary.
Repeat this process in a loop until whole cluster members upgraded as described. So the next node should be node4, node3, node2 and node1 at last.

But there is another question – should be there any load for nodes during upgrade? In other words, should we run DDL/DML after upgrading each node? For eg,
* Upgrade node5, run some load on node1(which is still old version) – see if data replicated from old to new.
* Upgrade node5, run some load on it – see if data replicated from new to old.

Based on these entries, here is the easy way to lose node5: MDEV-18422
Simply, upgrade node5 and then run DDL on node1 get the crash, then be happy – because it is already fixed.

Another quite critical issue was Data Inconsistency.
All nodes already upgraded and Streaming Replication was enabled on each node.
Now it was turn to Create database and then Drop it on node5. Unfortunately it was dropped from all other nodes but not from node5 itself – MDEV-18587
Fixed.

In general we love Segfaults but not in cluster 🙂 Interestingly if you enable log-bin and log-slave-updates on all nodes of cluster and try to do Rolling Upgrade the last upgraded node – node1 will crash – MDEV-18588
Fixed.

Hall of the fame has permanent member called Sysbench. Here is the crash while doing sysbench prepare – MDEV-18631

There are several issues found on the way but are not related directly to Upgrade Tests. But in general Upgrade tests are great playground to get dirty with all sort of issues.
This post likely will be updated as new issues or fixes will come.