Cilium 1.0.0-rc9 - Feature Freeze for 1.0!

We are excited to announce Cilium
1.0.0-rc9 with many,
many bugfixes and the delivery of the final feature we were waiting on prior for
1.0: Egress policy enforcement support. It is therefore only logical that we
announce full feature freeze with rc9. This means that we will only merge
critical bugfixes and release 1.0 as soon as we have resolved all release
blockers. More on this below. We are thrilled to have come this far and
appreciate all of the efforts by the wide range of contributors that have
helped to get us here.

Upgrade Instructions

No special upgrade instructions are required for this release. Please follow
out simple upgrade guide
for the generic instrutions on how to upgrade.

Highlights

As usual, the full release notes are attached at the end of the blog but can be
found on the 1.0.0-rc9 release
page. The vast
majority of the work in this release has been around bugfixes and testing. Here
is a list of some highlights:

Egress Policy Enforcement capability

Cilium uses an identity based policy enforcement mechanism as its standard
enforcement mechanism and only falls back to IP/CIDR based enforcement when
absolutely required. The identity based model implies that we encode the
identity of the sending endpoint with all packets and then enforce on the
receiving side whether that identity is allowed to communicate with the
respective peer. Cilium only falls back to an IP/CIDR based enforcement mode if
we are not in control of the sender.

With this release, we are now completing the egress policy enforcement by
adding labels and entities based enforcement on top of the existing IP/CIDR
egress enforcement that existed before.

A few simple egress examples

The following example is tailored for Kubernetes and shows how to enable
default deny at egress for all role=frontend pods and then explicitly
whitelist the connection to role=backend on port TCP/80:

This obviously also applies to L7 aware policies. Here is another example which
shows how to whitelist POST /metric on port TCP/8080 from pods with the
label app=myService to their respective local host.

Scale Improvements

We have done a series of scale and stress tests which lead to tweaking of
default limits and improvements that affect scalability:

Several upper limits for BPF maps covering connection state have been
increased. We will likely make this adjustable and improve defaults to be
based on available system memory to take a good guess at expected network
load.

A new expedited garbage collector mode has been introduced which
identifies connections that have never been established (no complete SYN-ACK
handshake observed). Such incomplete connections are removed from state
tables much more aggressively. This finds a good balance to keep long lived
TCP connections in state tables for days without seeing any traffic while
aggressively removing connections created by connection attempt floods or
services such as Cassandra which perform retries very aggressively.

We have started enabling TCP keepalive for all proxied connections to gain
a better understanding of the health of long lived connections with minimal
traffic such as TCP connections used for health checking.

Known issues before 1.0

We have a couple of issues that are we tracking and fixing before releasing
1.0. If you are running into any issues, check the list of 1.0 blocker bugs
first.