Security Decrypted?

As corporate networks expand distributed security functions to keep up with interception and hacker attacks, thin clients are being tasked with many more functions than the simple block encryption that defined a secure transmission in past decades. Depending on the type of network and a client's place in it, a node may need to encrypt data, authenticate a message or a user, create a secure virtual private network to connect a telecommuter to a home office, and enable secure transactions using a socket-layer protocol.

What's true for the client is even more apparent for servers, since single devices on the network edge now are expected to fulfill such multiple security functions as firewalls, compression, encryption, intrusion detection and prevention, user authentication, antivirus duties and VPN brokering.

When a large rack-mounted server assumes the bulk of the authentication and authorization functions and is in charge of managing a VPN, the client task may not burden a security co-processor. But when critical tasks are passed on to a PDA or a 3G handset, IC real estate and power dissipation may limit the overall security capabilities of the network.

In the past five years, network-processor vendors that specialize in security functions have augmented core architectures with two classes of tasks: the network-layer duties defined in the Internet Engineering

Vendors with a control-plane core-usually an open-licensed RISC-at their disposal can make it the centerpiece of a complex device with both control- and data-path elements. Some of the first developers to meld security blocks with control RISC platforms were Freescale Semiconductor, with a dual-core PowerPC PowerQuicc, soon to be augmented with security blocks from the MPC184 family; PMC-Sierra Inc., with the dual-core MIPS RM11200; Broadcom Corp., offering a quad-core BCM14xx MIPS-based design; and Cavium Networks Inc., with the Octeon processor, which employs as many as 16 MIPS cores alongside dedicated security processors.

The optimum number of cores in a multicore solution incorporating control-plane RISC can only be assessed alongside latency and code complexity, said Russ Dietz, chief technology officer at security-chip specialist Hifn Inc. Four-way cores tend to show less latency than two-way designs, but integrating more than four cores in a design may lead to the same falloff in efficiency that occurs in symmetric-multiprocessing computing.

"Freescale has been good at recognizing the control-plane limits in practical communications processor designs," Dietz said. "There may be room for improvement beyond two-way multicore, but you cannot take the position that more cores means a more-efficient design."

Even for those security coprocessors that leave control-plane blocks off the main design, the layers of possible security solutions make for a large and complex portfolio of design possibilities. Security processor vendors can opt to keep all control-plane processing functions off-chip and design their security device as a coprocessor for a RISC or Pentium-class control processor. Or they can embed a RISC or ARM core alongside single or multiple cores for such hardwired functions as encryption and tunnel creation.

Meanwhile, security processors with or without control cores can be designed to be wide or deep. That is, they can be optimized to perform single functions-such as data encryption or digital signatures using secure-hash algorithms-very fast. Alternatively, they can be designed as multicore, multithreaded monsters capable of performing several multilayer tasks in parallel or pipeline fashion.

"You have to be very careful," Dietz of Hifn said. "We looked at a multicore design two years ago, but the choppy and bizarre code structures you see in many multilayer security concepts don't map too well to any known architecture. There always would be some function that would end up being the bottleneck for the processor overall.

"One strategy we continue to use in both in-band and lookaside architectures is to put boundaries around the particular problem we want to solve, [with] price and performance constraints [based on] what we think the customer will pay for. It makes more sense than adding [every] possible security block we can imagine."

Multicore designs like Octeon may represent the future for routing and switching boxes that take on multiple network security tasks. But designers at the bleeding edge must realize that the key revenue generators may be simpler, single-thread devices that are subject to commodity pricing. And some simpler encryption processors are down to $10 or less in high volumes.

"Some of our highest-volume sales still come from a generation just dedicated to encryption and hashing-and you'd better believe there's a lot of competition and price pressure out there," said Joe Wallace, director of security marketing at Broadcom. "The bulk of units shipped is in processors capable of handling secure operations at 300 to 500 Mbits per second." The gigabit security processor still counts for just "a tiny percentage of sales," Wallace said, whereas early designs capable of 5 or 10 Gbits for secure operations are a downright oddity.

There's definitely market interest in moving to multilayer processors that handle encryption along with IPsec VPNs and SSL VPNs, Wallace said, and Broadcom thus plans to augment its offerings in that area this year. But access systems that combine several security functions in a server or router will have smaller overall sales than simple clients requiring only one or two security functions.

This holds particularly true in mobile client environments, where Israel-based Discretix offers cores for encryption and hashing. "There is certainly interest in adding more authentication or VPN creation within the handset, but you have to deal with the reality of the digital baseband device in an enhanced cell phone," said Jacob Greenblat, director of strategy at Discretix. "There's not a lot to move in that environment, in terms of gate count and power dissipation, so current designs call for the home server to retain many VPN and authentication tasks."

Even at the server end, there is concern among processor vendors that undue design activity is being devoted to standards that are being invoked less and less frequently. Two years ago, the ubiquity of IPsec for multiple-layer security functions made the hardwiring of IPsec protocols seem a necessary evil. Now, the trend of creating VPNs at the SSL transaction layer leaves several designers wondering if the IPsec protocol suite will end up unused in real-world designs. For now, though, enough enterprise intranets and extranets use IPsec that processor designers can't afford to remove embedded IPsec blocks.

Scott Finley, director of marketing at Hifn, observed that SSL VPNs came in from the application layer and thus were oriented to transaction-based businesses for extranets. Those who talk about moving all VPN support to SSL neglect the fact that Layer 2 and 3 VPNs based on IPsec are designed to protect core corporate infrastructures. Trying to accomplish the same thing in a clientless, application-layer SSL VPN may end up costing the enterprise more in the long run, Finley said.

Climbing protocol ladder

At the physical layer, the tamper-proof security of the device itself and the integrity of the nonvolatile memory used in smart-card-like designs have become primary concerns of one subset of developers. The milestone defined by the Trusted Computing Group in 2004 was the Trusted Platform Module, a microcontroller with secure memory responsible for secure key generation and key cache management, using industry-standard cryptographic application programming interfaces.

Kevin Schutz, secure-product manager at Atmel Corp., said the TPM constitutes a "root of trust" that can be extended to desktop computers, mobile computers and small mobile clients by building key-exchange relationships and authentication procedures from the core capabilities of the TPM. The basic encryption keys for single systems and embedded networks, including storage root keys and endorsement keys, are protected in nonvolatile memory that is further guarded by both microcontroller logic and tamper-proof circuitry within the microcontroller.

This implies that the controller used in a TPM must, at a minimum, have a true random-number-generator block; generate key pairs using public-key algorithms, such as Diffie-Hellman or RSA; manage encryption and public-key signatures; store secure hashes; and create endorsement keys, Schutz said. At the same time, he said, such controllers must be priced reasonably enough to become ubiquitous, since TCG expects them to be in every desktop computer and, eventually, every mobile device.

As the TPM becomes ubiquitous, however, first-generation encryption processors from the likes of Hifn and SafeNet Inc. must add functionality at higher layers-or optimize public-key or private-key encryption for multigigabit speeds- to avoid being flattened by TPM commoditization.

Public keyPublic-key, asymmetric alternatives used computationally hard factoring problems, such as the factoring of large prime numbers, to enable keys to encrypted traffic to be published in the open. The industry regarded the federal government's Data Encryption Standard follow-on, the Advanced Encryption Standard, as far less influenced by the intelligence community than DES had been two decades earlier. The National Security Agency, meanwhile, belatedly decided in October 2003 to promote public-key algorithms by openly licensing Certicom Corp.'s specialized elliptic-curve algorithms.

In the 1990s many crypto-processor vendors opted to use private-key Triple-DES (and later AES) for bulk encryption while supporting public-key algorithms for digital signature and authentication functions.

Scott Vanstone, founder and executive vice president of strategic technology at Certicom and a professor of computer science at Ontario's University of Waterloo, said widespread approval of AES carried a hidden advantage for aiding adoption of elliptic-curve algorithms in the public-key arena. Because the Federal Information Processing Standard recommends common security levels between private- and public-key encryption, the 256-bit key size of AES represents a scaling problem for such traditional public-key algorithms as RSA and Diffie-Hellman, Vanstone said. Because they are fully exponential, elliptic-curve discrete logarithm problems offer inherently smaller parameters for scaling. That's useful for chips developed for mobile or small-client environments, he said.

"One of the key drivers in getting NSA interested in elliptic curve, particularly the MQV [Menezes Qu Vanstone] algorithm, was this ability to scale," Vanstone said. "The NSA and Canadian Communication Security Establishment have said they need a public-key crypto algorithm that will last the next 50 years."

Certicom has licensed elliptic curve to such vendors as Texas Instruments Inc., which uses the public-key suite as a part of its Omap mobile-processor family.

Tunnel troubleOne tough call for security-processor vendors in recent years has been the appropriate route for creation of packet tunnels used in VPNs. While the IETF worked on IPsec, early implementers developed hardwired implementations of such proprietary VPN standards as the Microsoft Layer 2 Tunneling Protocol.

IPsec's advantages lay in its multi-faceted capability. It allowed authentication of packets using public key, through the authentication header, based on checksum math. In addition to the authentication header function, IPsec provided for an encapsulating security payload for packet encapsulation, an "IPcomp" compression function and Internet key exchange using the IKE public-key model formerly known as ISAKMP/Oakley.

When IPsec was finalized in the late 1990s, it specified a transport mode for normal traffic and a tunnel mode for VPN use (in which tunnels are defined for both IPv4 and IPv6 traffic). In practice, most, if not all, IPsec network implementations use tunnel mode, and the standard has become synonymous with VPN.

SafeNet and other developers were early believers in IPsec, and such startups as NetOctave (now owned by CyberGuard) were formed with a single focus on hardwired IPsec processors. Hifn, with roots in Lempel-Ziv compression, has been able to leverage both IPcomp (within IPsec) and TLScomp (used for SSL compression).

This model was disrupted to a certain extent when Web services-oriented OEMs, including Neoteris Inc. and Aventail Inc., promoted the creation of VPNs based on SSL. While a nontrusted computer could log into a corporate SSL VPN, transaction-based VPNs would require much thinner client software than IPsec VPNs and could use simpler, Web-based management schemes. The popularity of SSL VPNs escalated as Neoteris was acquired by NetScreen, which itself was snapped up by Juniper Networks Inc.

Over the past three years, virtually every security-processor vendor has offered an SSL VPN option to a standard processor line based on IPsec. One startup, Britestream Corp. (formerly Layer N Networks), has designed most of its coprocessor chips specifically for SSL purposes.

Amer Haider, director of strategic marketing at Cavium, said processor vendors cannot afford to neglect IPsec, since it is still demanded by larger corporate networks with more significant client security needs. Rather, the security processor vendor must offer a wealth of IPsec and SSL tools, either in the form of processor products with different capabilities and price points, or as cores that can be enabled or disabled at will within a larger design.

One trend still in its infancy is the migration of specific security features at Layers 2 through 5 from a control processor or coprocessor to other network chips. Mobile-device requirements dictate adding security blocks to a DSP instead of an integer processor. TI is melding such functions in its Omap work with Certicom, while SafeNet is exploring DSP, ARM and CryptoAccelerator block combinations in its SafeXcel 2141 and ISES chip families.

Last week, when its enterprise switch group launched the XGS III switching family, Broadcom gave another preview of where multicore capabilities could go. The 1-Gbit/10-Gbit Ethernet switch chip integrates blocks from the company's BlueSteel security designs, putting hardware-based key generation and denial-of-service blocking circuitry directly on an Ethernet switch. Eric Hayes, marketing director with the group, called the move indicative of the distributed nature of work group security.

This begs the question of which security functions belong in which network nodes, which perhaps can be determined case by case but defies generalization. In architectures such as its own HIPP-3, Hifn has learned that as security processors move up to deep content analysis and application-layer inspection, a lookaside coprocessor with a search function may be preferable to a flow-through architecture. Dietz said that the latter "only works well when the problem set is clearly defined, with specific performance parameters."

Dietz listed several lessons learned: The spotty success of traditional network processors shows that any designer asking a customer to use proprietary code and a unique development system is asking for trouble. Likewise, if a multicore control-plane security IC design can't use simple tools, or if it has too many complexities in multithreading, it will not be efficient.

Each layer of security processing has its own set of associated values, with corresponding expectations in the average selling price of the device. Flow-through processors are best implemented in network interface cards in familiar environments, such as a network appliance or server. When a router has to add more general stateful-inspection functions at higher layers, such as intrusion detection or firewalls, it is usually best to implement the packet processing in stages, using a lookaside architecture with search engines.

"But this is subject to change," Dietz said. "What happens at Layer 7 with content inspection today is a struggle between customer-defined software and the drive within the processor design team to embed a task in gates or firmware. We've seen those stages happen again and again as a software operation becomes a commodity hardwired function, and we're seeing it happen in higher-layer security tasks today."