This is a step-by-step guide on how to build a mesh of Kubernetes clusters by
connecting them together, enabling pod-to-pod connectivity across all clusters,
define global services to load-balance between clusters and enforce security
policies to restrict access.

This guide and the referenced scripts assume that Cilium was installed using
the Installation with managed etcd instructions which leads to etcd being
managed by Cilium using etcd-operator. You can use any way to manage etcd but
you will have to adjust some of the scripts to account for different secret
names and adjust the LoadBalancer to expose the etcd pods.

Nodes in all clusters must have IP connectivity between each other. This
requirement is typically met by establishing peering or VPN tunnels between
the networks of the nodes of each cluster.

All nodes must have a unique IP address assigned them. Node IPs of clusters
being connected together may not conflict with each other.

Cilium must be configured to use etcd as the kvstore. Consul is not supported
by cluster mesh at this point.

It is highly recommended to use a TLS protected etcd cluster with Cilium. The
server certificate of etcd must whitelist the host name *.mesh.cilium.io.
If you are using the cilium-etcd-operator as set up in the
Installation with managed etcd instructions then this is automatically
taken care of.

The network between clusters must allow the inter-cluster communication. The
exact ports are documented in the Firewall Rules section.

Each cluster must be assigned a unique human-readable name. The name will be
used to group nodes of a cluster together. The cluster name is specified with
the --cluster-name=NAME argument or cluster-name ConfigMap option.

To ensure scalability of identity allocation and policy enforcement, each
cluster continues to manage its own security identity allocation. In order to
guarantee compatibility with identities across clusters, each cluster is
configured with a unique cluster ID configured with the --cluster-id=ID
argument or cluster-id ConfigMap option. The value must be between 1 and
255.

The Cilium etcd must be exposed to other clusters. There are many ways to
achieve this. The method documented in this guide will work with cloud
providers that implement the Kubernetes LoadBalancer service type:

The example used here exposes the etcd cluster as managed by
cilium-etcd-operator installed by the standard installation instructions as
an internal service which means that it is only exposed inside of a VPC and not
publicly accessible outside of the VPC. It is recommended to use a static IP
for the ServiceIP to avoid requiring to update the IP mapping as done in one of
the later steps.

If you are running the cilium-etcd-operator you can simply apply the following
service to expose etcd:

The cluster mesh control plane performs TLS based authentication and encryption.
For this purpose, the TLS keys and certificates of each etcd need to be made
available to all clusters that wish to connect.

Clone the cilium/clustermesh-tools repository. It contains scripts to
extracts the secrets and generate a Kubernetes secret in form of a YAML
file:

Ensure that the kubectl context is pointing to the cluster you want to
extract the secret from.

Extract the TLS certificate, key and root CA authority.

./extract-etcd-secrets.sh

This will extract the keys that Cilium is using to connect to the etcd in
the local cluster. The key files are written to
config/<cluster-name>.*.{key|crt|-ca.crt}

Repeat this step for all clusters you want to connect with each other.

Generate a single Kubernetes secret from all the keys and certificates
extracted. The secret will contain the etcd configuration with the service
IP or host name of the etcd including the keys and certificates to access
it.

./generate-secret-yaml.sh>clustermesh.yaml

Note

The key files in config/ and the secret represented as YAML are
sensitive. Anyone gaining access to these files is able to connect to the
etcd instances in the local cluster. Delete the files after the you are done
setting up the cluster mesh.

For TLS authentication to work properly, agents will connect to etcd in remote
clusters using a pre-defined naming schema {clustername}.mesh.cilium.io. In
order for DNS resolution to work on these virtual host name, the names are
statically mapped to the service IP via the /etc/hosts file.

The following script will generate the required segment which has to be
inserted into the cilium DaemonSet:

1. Import the cilium-clustermesh secret that you generated in the last
chapter into all of your clusters:

kubectlapply-fclustermesh.yaml

Restart the cilium-agent in all clusters so it picks up the new cluster
name, cluster id and mounts the cilium-clustermesh secret. Cilium will
automatically establish connectivity between the clusters.

kubectl-nkube-systemdeletepod-lk8s-app=cilium

For global services to work (see below), also restart the cilium-operator:

Establishing load-balancing between clusters is achieved by defining a
Kubernetes service with identical name and namespace in each cluster and adding
the annotation io.cilium/global-service:"true"` to declare it global.
Cilium will automatically perform load-balancing to pods in both clusters.

As addressing and network security is decoupled, network security enforcement
automatically spans across clusters. Note that Kubernetes security policies are
not automatically distributed across clusters, it is your responsibility to
apply CiliumNetworkPolicy or NetworkPolicy in all clusters.

The following policy illustrates how to allow particular pods to allow
communicate between two clusters. The cluster name refers to the name given via
the --cluster-name agent option or cluster-name ConfigMap option.

Validate that the cilium-xxx as well as the cilium-operator-xxx`podsarehealthyandready.Itisimportantthatthe``cilium-operator is
healthy as well as it is responsible for synchronizing state from the local
cluster into the kvstore. If this fails, check the logs of these pods to
track the reason for failure.

Validate that the ClusterMesh subsystem is initialized by looking for a
cilium-agent log message like this:

Validate that the configuration for remote clusters is picked up correctly.
For each remote cluster, an info log message Newremoteclusterconfiguration along with the remote cluster name must be logged in the
cilium-agent logs.

If the configuration is now found, check the following:

The Kubernetes secret clustermesh-secrets is imported correctly.

The secret contains a file for each remote cluster with the filename
matching the name of the remote cluster.

The contents of the file in the secret is a valid etcd configuration
consisting of the IP to reach the remote etcd as well as the required
certificates to connect to that etcd.

Run a kubectlexec-ti[...]bash in one of the Cilium pods and check
the contents of the directory /var/lib/cilium/clustermesh/. It must
contain a configuration file for each remote cluster along with all the
required SSL certificates and keys. The filenames must match the cluster
names as provided by the --cluster-name argument or cluster-name
ConfigMap option. If the directory is empty or incomplete, regenerate the
secret again and ensure that the secret is correctly mounted into the
DaemonSet.

Validate that the connection to the remote cluster could be established.
You will see a log message like this in the cilium-agent logs for each
remote cluster:

Validate that the hostAliases section in the Cilium DaemonSet maps
each remote cluster to the IP of the LoadBalancer that makes the remote
control plane available.

Validate that a local node in the source cluster can reach the IP
specified in the hostAliases section. The clustermesh-secrets
secret contains a configuration file for each remote cluster, it will
point to a logical name representing the remote cluster:

endpoints:-https://cluster1.mesh.cilium.io:2379

The name will NOT be resolvable via DNS outside of the cilium pod. The
name is mapped to an IP using hostAliases. Run kubectl-nkube-systemgetdscilium-oyaml and grep for the FQDN to retrieve the
IP that is configured. Then use curl to validate that the port is
reachable.

A firewall between the local cluster and the remote cluster may drop the
control plane connection. Ensure that port 2379/TCP is allowed.

Run ciliumnodelist in one of the Cilium pods and validate that it
lists both local nodes and nodes from remote clusters. If this discovery
does not work, validate the following:

In each cluster, check that the kvstore contains information about
local nodes by running:

ciliumkvstoreget--recursivecilium/state/nodes/v1/

Note

The kvstore will only contain nodes of the local cluster. It will
not contain nodes of remote clusters. The state in the kvstore is
used for other clusters to discover all nodes so it is important that
local nodes are listed.

Validate the connectivity health matrix across clusters by running
cilium-healthstatus inside any Cilium pod. It will list the status of
the connectivity health check to each remote node.

If this fails:

Make sure that the network allows the health checking traffic as
specified in the section Firewall Rules.

Validate that identities are synchronized correctly by running ciliumidentitylist in one of the Cilium pods. It must list identities from all
clusters. You can determine what cluster an identity belongs to by looking
at the label io.cilium.k8s.policy.cluster.

If this fails:

Is the identity information available in the kvstore of each cluster? You
can confirm this by running ciliumkvstoreget--recursivecilium/state/identities/v1/.

Note

The kvstore will only contain identities of the local cluster. It
will not contain identities of remote clusters. The state in the
kvstore is used for other clusters to discover all identities so it is
important that local identities are listed.

Validate that the IP cache is synchronized correctly by running ciliumbpfipcachelist or ciliummapgetcilium_ipcache. The output must
contain pod IPs from local and remote clusters.

If this fails:

Is the IP cache information available in the kvstore of each cluster? You
can confirm this by running ciliumkvstoreget--recursivecilium/state/ip/v1/.

Note

The kvstore will only contain IPs of the local cluster. It will
not contain IPs of remote clusters. The state in the kvstore is
used for other clusters to discover all pod IPs so it is important
that local identities are listed.

When using global services, ensure that global services are configured with
endpoints from all clusters. Run ciliumservicelist in any Cilium pod
and validate that the backend IPs consist of pod IPs from all clusters
running relevant backends. You can further validate the correct datapath
plumbing by running ciliumbpflblist to inspect the state of the BPF
maps.

If this fails:

Are services available in the kvstore of each cluster? You can confirm
this by running ciliumkvstoreget--recursivecilium/state/services/v1/.

Run ciliumdebuginfo and look for the section “k8s-service-cache”. In
that section, you will find the contents of the service correlation
cache. it will list the Kubernetes services and endpoints of the local
cluster. It will also have a section externalEndpoints which must
list all endpoints of remote clusters.

The sections services and endpoints represent the services of the
local cluster, the section externalEndpoints lists all remote
services and will be correlated with services matching the same
ServiceID.

L7 security policies currently only work across multiple clusters if worker
nodes have routes installed allowing to route pod IPs of all clusters. This
is given when running in direct routing mode by running a routing daemon or
--auto-direct-node-routes but won’t work automatically when using
tunnel/encapsulation mode.

The number of clusters that can be connected together is currently limited
to 255. This limitation will be lifted in the future when running in direct
routing mode or when running in encapsulation mode with encryption enabled.