Routing Optimization

Scaling OpenShift Container Platform HAProxy Router

Baseline Performance

The OpenShift Container Platform
router
is the ingress point for all external traffic destined for OpenShift Container Platform
services.

On an public cloud instance of size 4 vCPU/16GB RAM, a single HAProxy router is able to handle
between 7000-32000 HTTP keep-alive requests depending on encryption, page size,
and the number of connections used. For example, when using TLS
edge or
re-encryption
terminations with large page sizes and a high numbers of connections, expect to
see results in the lower range. With HTTP keep-alive, a single HAProxy router is
capable of saturating 1 Gbit NIC at page sizes as small as 8 kB.

The table below shows HTTP keep-alive performance on such a public cloud
instance with a single HAProxy and 100 routes:

Encryption

Page size

HTTP(s) requests per second

none

1kB

15435

none

4kB

11947

edge

1kB

7467

edge

4kB

7678

passthrough

1kB

25789

passthrough

4kB

17876

re-encrypt

1kB

7611

re-encrypt

4kB

7395

When running on bare metal with modern processors, you can expect roughly
twice the performance of the public cloud instance above. This
overhead is introduced by the virtualization layer in place on public clouds and
holds mostly true for private cloud-based virtualization as well. The following
table is a guide on how many applications to use behind the router:

Number of applications

Application type

5-10

static file/web server or caching proxy

100-1000

applications generating dynamic content

In general, HAProxy can saturate about 5-1000 applications, depending on the
technology in use. The number will typically be lower for applications serving
only static content.

Router sharding
should be used to serve more routes towards applications and help horizontally
scale the routing tier.

Performance Optimizations

Setting the Maximum Number of Connections

One of the most important tunable parameters for HAProxy scalability is the
maxconn parameter, which sets the maximum per-process number of concurrent
connections to a given number. Adjust this parameter by editing the
ROUTER_MAX_CONNECTIONS
environment variable in the OpenShift Container Platform HAProxy router’s deployment
configuration file.

CPU and Interrupt Affinity

In OpenShift Container Platform, the HAProxy router runs as a single process. The
OpenShift Container Platform HAProxy router typically performs better on a system with fewer
but high frequency cores, rather than on an symmetric multiprocessing (SMP)
system with a high number of lower frequency cores.

Pinning the HAProxy process to one CPU core and the network interrupts to
another CPU core tends to increase network performance. Having processes and
interrupts on the same non-uniform memory access (NUMA) node helps avoid memory
accesses by ensuring a shared L3 cache. However, this level of control is
generally not possible on a public cloud environment. On bare metal hosts,
irqbalance automatically handles peripheral component interconnect (PCI)
locality and NUMA affinity for interrupt request lines (IRQs). On a cloud
environment, this level of information is generally not provided to the
operating system.

CPU pinning is performed either by taskset or by using HAProxy’s cpu-map
parameter. This directive takes two arguments: the process ID and the CPU core
ID. For example, to pin HAProxy process 1 onto CPU core 0, add the following
line to the global section of HAProxy’s configuration file:

Impacts of Buffer Increases

The OpenShift Container Platform HAProxy router request buffer configuration limits the size
of headers in incoming requests and responses from applications. The HAProxy
parameter tune.bufsize can be increased to allow processing of larger headers
and to allow applications with very large cookies to work, such as those
accepted by load balancers provided by many public cloud providers. However,
this affects the total memory use, especially when large numbers of connections
are open. With very large numbers of open connections, the memory usage will be
nearly proportionate to the increase of this tunable parameter.