All Your COTS Multi-Core CPU’s With Non-Optimized Security Software Are Belong To Us…

{No, I didn’t forget to spell-check the title. Thanks for the 20+ emails on the topic, though. For those of you not familiar with the etymology of "..all your base are belong to us," please see here…}

I’ve been in my fair share of "discussions" regarding the perceived value in the security realm of proprietary custom hardware versus that of COTS (Commercial Off The Shelf) multi-core processor-enabled platforms.

Most of these debates center around what can be described as a philosophical design divergence suggesting that given the evolution and availability of muti-core Intel/AMD COTS processors, the need for proprietary hardware is moot.

Advocates of the COTS approach are usually pure software-only vendors who have no hardware acceleration capabilities of their own. They often reminisce about the past industry failures of big-dollar fixed function ASIC-based products (while seeming to ignore re-purposeable FPGA’s) to add weight to the theory that Moore’s Law is all one needs to make security software scream.

What’s really interesting and timely about this discussion is the notion of how commoditized the OEM/ODM appliance market has become when compared to the price/performance ratio offered by COTS hardware platforms such as Dell, HP, etc.

Combine that with the notion of how encapsulated "virtual appliances" provided by the virtualization enablement strategies of folks like VMWare will be the next great equalizer in the security industry and it gets much more interesting…

There’s a Moore’s Law for hardware, but there’s also one for software…didja know?

Sadly, one of the most overlooked points in the multi-core argument is
that exponentially faster hardware does not necessarily translate to
exponentially improved software
performance. You may get a performance bump by forking multiple single-threaded instances of software or even pinning them to CPU/core affinity by spawning networking off to one processor/core and (example) firewalling to another, but that’s just masking the issue.

This is just like tossing a 426 Hemi in a Mini Cooper; your ability to use all that horsepower is limited by what torque/power you can get to the ground when the chassis isn’t designed to efficiently harness it.

For the record, as you’ll see below, I advocate a balanced approach: use proprietary, purpose-built hardware for network processing and offload/acceleration functions where appropriate and ride the Intel multi-core curve on compute stacks for general software execution with appropriately architected software crafted to take advantage of the multi-core technology.

Here’s a good example.

In my last position with Crossbeam, the X-Series platform modules relied on a combination of proprietary network processing featuring custom NPU’s, multi-core MIPS security processors and custom FPGA’s paired with Intel reference design multi-core compute blades (basically COTS-based Woodcrest boards) for running various combinations of security software from leading ISV’s.

What’s interesting about the bulk of the security software from these best-of-breed players you all know and love that run on those Intel compute blades, is that even with two sockets and dual cores per, it is difficult to squeeze large performance gains out of the ISV’s software.

Why? There are lots of reasons. Kernel vs. user mode, optimization for specific hardware and kernels, no-packet copy network drivers, memory and data plane architectures and the like. However, one of the most interesting contributors to this problem is the fact that many of the core components of these ISV’s software were written 5+ years ago.

While these applications were born as tenants in single and dual processors, it has become obvious that developers cannot depend upon the increased clock speeds of processors or the availability of multi-core sockets alone to accelerate their single-threaded applications.

To take advantage of the increase in hardware performance, developers must redesign their applications to
run in a threaded environment as multi-core CPU architectures feature two or more processor compute engines (cores) and provide fully
parallellized hyperthreaded execution of multiple software threads.

Enter the Impending Muti-Core Crisis

But there’s a wrinkle with the pairing of this mutually-affected hardware/software growth curve that demonstrates a potential crisis with multi-core evolution. This crisis will effect the way in which developers evaluate how to move forward with both their software and the hardware it runs on.

The Multicore Crisis has to do with a shift in the behavior of Moore’s Law.
The law basically says that we can expect to double the number of
transistors on a chip every 18-24 months. For a long time, it meant
that clock speeds, and hence the ability of the chip to run the same program faster,
would also double along the same timeline. This was a fabulous thing
for software makers and hardware makers alike. Software makers could
write relatively bloated software (we’ve all complained at Microsoft
for that!) and be secure in the knowledge that by the time they
finished it and had it on the market for a short while, computers would
be twice as fast anyway. Hardware makers loved it because with
machines getting so much faster so quickly people always had a good
reason to buy new hardware.

Alas this trend has ground to a halt! It’s easy to see from the
chart above that relatively little progress has been made since the
curve flattens out around 2002. Here we are 5 years later in 2007.
The 3GHz chips of 2002 should be running at about 24 GHz, but in fact,
Intel’s latest Core 2 Extreme is running at about 3 GHz. Doh! I hate
when this happens! In fact, Intel made an announcement in 2003 that
they were moving away from trying to increase the clock speed and over
to adding more cores. Four cores are available today, and soon there
will be 8, 16, 32, or more cores.

What does this mean? First, Moore’s Law didn’t stop working. We
are still getting twice as many transistors. The Core 2 now includes 2
complete CPU’s for the price of one! However, unless you have software
that’s capable of taking advantage of this, it will do you no good. It
turns out there is precious little software that benefits if we look at
articles such Jeff Atwood’s comparison of 4 core vs 2 core performance. Blah! Intel says that software has to start obeying Moore’s Law.
What they mean is software will have to radically change how it is
written to exploit all these new cores. The software factories are
going to have to retool, in other words.

With more and more computing moving into the cloud on the twin
afterburners of SaaS and Web 2.0, we’re going to see more and more
centralized computing built on utility infrastructure using commodity
hardware. That’s means we have to learn to use thousands of these
little cores. Google did it, but only with some pretty radical new tooling.

This is fascinating stuff and may explain why many of the emerging appliances from leading network security vendors today that need optimized performance and packet processing do not depend solely on COTS multi-core server platforms.

This is even the case with new solutions that have been written from the ground-up to take advantage of multi-core capabilities; they augment the products (much like the Crossbeam example above) with NPU’s, security processors and acceleration/offload engines.

If you don’t have acceleration hardware, as is the case for most pure software-only vendors, this means that a fundamental re-write is required in order to take advantage of all this horsepower. Check out what Check Point has done with CoreXL which is their "…multi-core acceleration technology
[that] takes advantage of multi-core processors to provide high levels of
security inspection by dynamically sharing the load across all cores of
a CPU."

We’ll have to see how much more juice can be squeezed from the software and core stacking as the gap narrows on the processor performance increases (see above) as balanced against core density without a complete re-tooling of software stacks versus doing it in hardware.

Otherwise, combined with this smoothing/dipping of the Moore’s Law hardware curve, not retooling software will mean that proprietary processors may play an increasing role of importance as the cycle replays.

Oh that is hilarious. I was half tempted not to correct it and claim I was just being "punny." It would have been even more ironic if I made some allusion to software bugs in the post…
Here I am poking at the folks who are concerned I didn't spell check and I can't spend the time to check my grammar.
Buahahaha!
Great catch. Corrected. Thanks.
/Hoff

Rational Security has a great post on the challenges of multicore for vendors of security boxes. Youll recall that the Multicore Crisis comes about because the impact of Moores Law on microprocessor performance has changed. We no long…