Things you can learn from a million users

Recently, I had the pleasure of leading a FortiGate-5000 series chassis deployment. While I’ve worked on the 5000 series before, this was my first “soup to nuts” deployment of our 4thgeneration chassis. The 5000 series is not merely a very large appliance. It is a vessel for FortiGate and FortiController blades. The firewall is actually a cluster of FG-5001D worker blades that rely on the 5144C chassis for power and connectivity. With two NP6 ASICS, each FG-5001D has similar throughput characteristics to the FG-1500D appliance. However, apart from the physical dimensions, there are a few other differences between the two:

More RAM to handle high session counts

Additional CPU power for content/NGFW

40GBE support via QSFP+ on the front panel

In some respects, an individual FG-5001D blade has more in common with the 3700D. However, the party trick of the FG-5001D is its potential rack density. Up to 12 can be deployed in an ActiveN scenario. Achieving a similar capacity with a stack of 2U 1500D’s is likely to give your data center planner a cardiac event.

Clustering and connectivity

The clustering implementation is perhaps the biggest difference the between 5000 series and other FortiGate platforms. While you could treat the 5000 chassis as a convenient box for FG-5001D FGCP clusters, the point would be missed. The FortiController blade is a dedicated, session-aware load-balancer (SLBC) with integrated switch fabrics. Each FC-5913 blade has a pair of 100GBE CFP2 interfaces (divisible into 10x10GBE if you are not quite 100GBE ready) as well as regular SFP+ for the base interfaces. The controller cards may be deployed singly or in pairs for active/passive or active/actives designs. In most cases, the FortiController provides connectivity for the entire cluster. The Controller balances new and existing sessions to the correct blade. This arrangement puts a fully populated 5000 series chassis in the terabit-class of Firewalls.

Firewalls by numbers

It’s unsurprising that few organizations require a firewall that deals with multiples of 80Gbps. However, when dealing with millions of concurrent sessions some interesting statistics float to the top.

Session Concurrency

With a sample size of millions, the average concurrent sessions per device drops to three. When calculating session counts in the enterprise, we would normally use a multiplication factor of 10-15 per host. However, in the mobile era, concurrency is actually much lower. This can be attributed to the move away from client-server applications that use multiple open connections (such as SMB and Exchange) towards web-enabled applications that have a greater number of short-lived connections. While a page may contain many elements on load, an individual user won’t continuously generate traffic. When averaged, the actual session concurrency is much lower than you’d think.

Traffic Mixture

In the Mobile era, IMIX is more important than ever. While individual applications may use a lot of data, the apps used by millions of users concurrently (such as Facebook, Twitter, Mobile Mail clients, and most mobile browsers) are highly optimized for pay-by-byte metered connections. Billions of tiny application status updates have driven down the average packet size. However, smaller packets create a greater overhead on state-aware devices such as firewalls. This is why firewall throughput data provided is often based on the ideal (i.e. lowest transactional overhead) packet size of ~1500 bytes. While a GIF of a cat will fill many packets, such transactions are comparatively rare.

Logging is not the problem you thought it was (it’s a different one)

When dealing with millions of sessions, firewall logging has a cumulative impact. Logging sessions (rather than just dealing with the state table) forces a firewall to deal with slow disk storage and log server communication. Just like with a database transaction history, you could operate your firewall without your logs, but they are an important safety net. Of all the basic firewall feature sets, logging generates many interrupts and consumes a lot of CPU time. However, it is at least measureable, and to some extent tuneable. The level of log traffic encryption between a FortiGate and its FortiAnalyzer is adjustable depending on your paranoia/CPU capacity. The standing advice is to log only what is administratively useful and lawfully required. This may be a broad definition, but it is difficult to be definitively prescriptive. The good news is that log transaction bandwidth consumption is probably a lot smaller than you would expect – around 1Mbps per 10,000 events per second (EPS).

Testing the rules

While an organization may have firewall capacity equivalent to a single 5000-series chassis distributed across its estate, a million users in a single rack is rare. With a sample size so large, firewall scaling rules of thumb are tested, validated (in the case of IMIX), or found wanting (for example, concurrent session counts). It also throws up some surprises, such as the impact on logging, or the lack thereof, on the infrastructure. This leads to the important conclusion that a firewall’s small packet performance is probably more important than its worst-case maximum session capabilities. And that there is no substitute for good policy and logging discipline.

Ultimately, the guidelines above are just that, a starting point for your own capacity calculations.