The one authority worth trusting

Nov 15, 2015

*“We’re paying the highest tribute you can pay a man. We trust him to do right. It’s that simple.”
- Harper Lee, To kill a mockingbird

As part of the Sproute SPAN service, all
the routers belonging to an account set up secure tunnels with each
other to form a virtual network. To ensure correctness, the routers
need to authenticate each other before setting up a tunnel. For
example, each router needs to make sure:

it talks to only those routers that belong to the same account,

it does not talk to any router that has been compromised.

We have relied on public key infrastructure (PKI) using an internally
hosted heirarchy of certificate authorities (CA) for this purpose.

PKI

At a very high level, we start with creating a set of trusted
third parties, CAs, that certify that the entities are who they say they
are. When a CA certifies an entity, it issues a digital
certificate, signed with the CA’s private key. The certificate
includes the entity’s name, its public key, and an expiration date
among other things. The CA has its own certificate, which could be
self-signed if it’s a root CA. This enables the chain of trust: when
an entity is presented with a certificate from another entity, it uses
the CA’s certificate to verify the signature (public-key
cryptography). If that succeeds and the entity trusts the CA, it
successfully authenticates the other entity.

This also highlights PKI’s scale properties: the entities maintain
their own certificates and the authentication process involves
exchange of data only between the entities. No other party (e.g. a
server) needs to be online.

Look at this great blog
post
from Cloudflare to get more details on setting up a PKI.

On Sproute SPAN, every pair of routers exchanges certificates to
authenticate each other, as described in the following diagram.

Hierarchy

PKI supports a hierarchy of certificate authorities to dole out
certificates at scale. In fact, establishing a hierarchy of CAs is
beneficial for multiple reasons:

Scale. As mentioned, it helps scale the issuance of
certificates through the classic ‘divide and conquer’ method.

Segmentation. The hierarchy helps to maintain strict bounds
between different organizations and departments. For example, the
Sproute SPAN service runs a separate CA for each account that
issues certificates to the routers running on behalf of the
account.

Security. As you move up the CA hierarchy towards the root,
the dynamic nature of certificate issuance reduces. This allows
for the CAs closer to the root to be kept offline and/or more
securely, reducing the chances of compromise.

HTTPS

The same system is used for HTTPS on the web. For example, gmail.com
has a certificate signed by “Google Internet Authority G2”, which in
turn is signed by Geotrust root CA. Assuming you have OpenSSL
installed (default available on Mac OS X and Linux systems), have a
look at the following snippet using the s_client and x509 commands:

Browsers trust the Geotrust root CA. For example, on Mac OS, you can
run the Keychain access program to view the trust store (as seen in
the following diagram). It therefore follows that they trust the
gmail.com certificate.

Most of the client-server communication on the Internet follow what is
called One-way authentication. The client authenticates the right
server using the server’s certificate. This is important since you
would want to know you are talking to the right server - gmail.com or
bankofamerica.com. Sproute SPAN takes this one step further to do
mutual
authentication,
in which both routers exchange certificates and authenticate each
other before creating a secure tunnel.

Is there a catch?

One catch is in the inherent trust on all the root certificate
authorities. What happens when one CA maliciously or mistakenly issues
a certificate to an entity? The current PKI logic does not cover
this. Google and others have taken an
initiative to solve this
problem through a log auditing system.

For the controlled internal CA system hosted for Sproute SPAN, this is
less of a problem as proper checks and balances are done to establish
router-to-account relationship before issuing a certificate.