Archive for January, 2011

A little more than 10 years ago, in January 2001, I joined a small startup called Topspin Communications. We weren’t saying much publicly about what we were doing, but the idea when I joined was to build a super-high-performance box for dynamic web serving, with web app blades, TCP offload blades, storage blades and SSL blades. I was in charge of the SSL blade. However, early 2001 was when it became clear that the bubble was well and truly bursting, and it started to become clear the we weren’t going to have enough customers if we actually built our box, so we abandoned that product. Shortly after this decision, I got a call from the salesman from the company whose encryption chip we had selected for the SSL blade, telling me that they had decided not to build the encryption chip after all. I remember thinking how upset I would have been if he had called a week earlier, when we were still planning a product around the chip.

After a few months of flailing around searching for a product direction (not the most fun time in Topspin’s history), we decided to focus on InfiniBand networking gear. Initially, we focused on connections from servers on an InfiniBand fabric to existing Ethernet and Fibre Channel networks, and thus was born the IGR — InfiniBand Gateway Router — aka Buzz (“To InfiniBand and Beyond”):

This first chassis was pretty far from being a real product: it had only 1X IB ports (2 Gbps!) and was built using Mellanox MT21108 “Gamla” chips — pretty far from a shippable product. Heroic hardware reworks and software hacks were done just the get the system booting; for example, somehow I added enough IB support to PPCBoot for the line cards to load a kernel from the controller over InfiniBand directed route MADs.

Still, it was enough to get companies like Dell and Microsoft to take us seriously (which helped us raise another $30 million in the summer of 2002). Keep in mind that this was during the time that everyone thought InfiniBand was going to be huge, and Microsoft was planning on having IB drivers in Windows Server 2003. In fact we lugged some prototypes and emulators built on PCs up to Washington State to do interoperability testing and debugging with the Windows driver developers, and even watch Windows kernel developers at work.

When we were designing the next version of this box, one big decision was what 4X IB adapter chip to use inside. The choices were to play it safe with IBM Microelectronics, or to gamble on a startup, Mellanox, who was making bold performance promises. Luckily, we chose Mellanox, since the “safe” choice, IBM, canceled their IB products after struggling to make them work at all. Mellanox’s first spin of their chip worked — it was an amazing experience to have a real 4X adapter that “just worked” in our lab after all the screwing around with half-baked 1X products that we had gone though (although we did spend plenty of time debugging the driver and firmware for that Tavor adapter).

We worked hard on getting to a real product, and in November 2002, we were able to introduce the Topspin 360, which had 24 4X IB ports, 12 standard IB module slots (each could hold either a 4-port 1G Ethernet gateway or a 2-port Fibre Channel gateway) as well as one very cool bezel design:

In engineering, we followed the 360 with the “90 in 90” challenge and built the Topspin 90 in only 90 days. I was able to get IPoIB working on the 90’s controller, in spite of having only a primitive IB switch and no host adapter available. The Topspin 90 was introduced in January 2003:

The engineering team spent the rest of 2003 building the Topspin 120 24-port switch (another switch chip to get IPoIB working on), and a new 6-port Ethernet gateway. The Ethernet gateway was pretty cool — for the first 4-port Ethernet gateway, we used a PowerPC 440GP along with a Mellanox HCA and some Intel NICs and did all the forwarding between Ethernet and IPoIB in software. Between PCI-X and CPU bottlenecks, we were a bit performance limited. The 6-port gateway used a Xilinx Virtex 2 FPGA with our own InfiniBand logic, and did all the forwarding in hardware, so we were able to handle full line rate of minimum-sized packets in both directions on all 6 Ethernet ports–and in 2003, 12 Gbps of traffic was an awful lot!

Somewhere along the way, it became clear that operating systems (aside from borderline-irrelevant proprietary Unixes like Solaris and HP-UX) would not include InfiniBand drivers out of the box; Microsoft dropped their plans for IB drivers, and the open source Linux project stalled. It became clear that if we wanted anyone to buy InfiniBand networking gear, we would have to take care of the server side of things too, and so we started working on a host driver stack. Luckily, at the very beginning of our InfiniBand development in 2001, we made the decision to use Linux on PowerPC rather than VxWorks as our embedded OS. That meant we had a lot of Linux InfiniBand driver code from our switch systems that we could adapt into host drivers.

At first, we distributed our drivers as proprietary binary blobs, which meant a lot of pain for us building our drivers for every different kernel flavor on every distribution our customers used, and which also meant a lot of pain for our customers who wanted to mix and match IB gear from different vendors. Clearly, for IB to work everyone had to agree on an open source stack, and after a lot of arguing and political wrangling that I’ll skip over here, the OpenIB Alliance was formed, and we started working on InfiniBand drivers for upstream inclusion in Linux.

The starting point of all the different vendor stacks that got released as open source was not particularly good, and although a lot of the community was in denial about it, it was clear to me that we would have to start from scratch to get something clean enough to go upstream. Around February 2004, I was trying to optimize IPoIB performance, and I got so frustrated trying to wade through all the abstraction layers of the Mellanox HCA driver that I decided I would try to write my own drastically simpler driver, and I started working on something I called “mthca”.

By May 2004, I had mthca working enough to run IPoIB and I decided to announce it publicly. This led to another series of flamewars but also enough encouragement from people I considered sane that I continued working on a stack built around mthca, and by December 2004 we had something good enough to go upstream. That was really the start of a lot of great things, and I’m really proud of my role helping to maintain the Linux stack; today we have iWARP support, eight different hardware drivers, IPoIB, storage protocols, network file protocols, RDS; InfiniBand is used in more than half of the Top 500 supercomputers, etc. And I don’t think any of that happens without IB support being upstream.

On the hardware side of things, we continued building things like the Topspin 270 96-port switch (1.5 Tbps of switch capacity!), switches for IBM BladeCenter, and so on. In April 2005, Cisco bought Topspin, and when the deal closed in May 2005, I officially became a Cisco employee. The Topspin IB products became the Cisco SFS product line, and for a brief glorious time, Cisco sold IB gear.

Unfortunately (for the SFS product line, at least), the IB market didn’t grow fast enough to become the billion-dollar market that Cisco looks for, and so Cisco decided to stop selling IB gear. We went from announcing new products to announcing that we wouldn’t sell those products (and I don’t think an SFS 3504 ever actually shipped to a customer). In fact, I personally gummed up the works a bit by putting in an internal order for an SFS 3504 as soon as it was orderable; a year later, the guy responsible for winding down the SFS product line had to track me down and have me cancel the order, which was the last one still on the books.

After we stopped working on InfiniBand stuff, we were bounced around between a few Cisco business units until we ended up working on x86 servers for the Cisco UCS product line. For the past few years, I’ve been helping Cisco build rack servers while continuing to be the InfiniBand/RDMA maintainer for Linux. I’ve helped build cool products such as the Cisco C460 server (some amusing things about the C460 project were debugging UEFI/BIOS that made memtest86+ insta-reboot at a certain memory location, and figuring out why Linux wouldn’t boot on an x86 system with 1TB of RAM). Cisco is a fun, rewarding place to work, and it’s amazing to still work every day with so many people from the old-school Topspin team, who have taught me so much over the years and become good friends along the way.

But since the Cisco acquisition, I’ve always missed the rush of working at a startup (hence my cri de coeur defending startups), and starting on Monday I’ll finally get back to that. My new company is using InfiniBand, and continuing to maintain the upstream stack is part of my official job description, so nothing should be changing about my free software activities. If my next job is half as good as Topspin, it should be an awesome ride.

My new company is still trying to keep things on the down-low, so I’m not going to put a link on my blog. I can say that we still want to hire more great Linux developers, so if you’re interested, please get in touch with me! We’re looking for people to work in-person in downtown Mountain View, CA (really downtown–not off in the Shoreline wilderness near the Googleplex, but actually in the same building as the Mozilla Foundation, near the train station, restaurants, etc). As I said, working remotely isn’t an option, but if you aren’t currently in the area and want to move to Silicon Valley, we can help with relocation and visas (if you’re good enough, of course ;)).