Mellanox ships rounded InfiniBand lineup

More switches, pre-tested cables

InfiniBand and Ethernet hardware maker Mellanox Technologies has rounded out its lineup of quad-data rate InfiniBand switches today as it has ramped up to full production on the devices.

The company has also started a more rigorous testing regimen for InfiniBand cables, saving HPC shops a whole lot of headaches down the road.

Mellanox created its own InfiniScale IV silicon for 40 Gb/sec InfiniBand, chips that both Mellanox and Voltaire use inside their IB switches. (QLogic makes its own silicon, too, for its IB switches.) Mellanox announced a 648-port modular InfiniBand director switch, the IS5600, at the International Supercomputing Conference '09 last June, along with two smaller modular switches with 108 and 216 ports, the IS5100 and IS5200. Today, a 328-port IS5300 joins the product line, filling in a gap.

All of the IS5000 modular switches use the same leaf, spine, and management modules, which allows for customers to upgrade and not have to buy new modules. The switches also support both fat tree, 3D torus, and mesh network topologies, all of which have been deployed by some of the largest clusters in the world (including the "Roadrunner" Opteron-Linux hybrid petaflops-scale super at Los Alamos National Laboratory).

The company has the same InfiniBand silicon and 40 Gb/sec bandwidth available in its managed and unmanaged 36-port edge switches (the IS5025 with no management module, the IS5030 with one capable of managing 108 nodes, and the IS5035 with two capable of managing 2,000 nodes. All of these products have been shipping to early access customers since last summer, but John Monson, vice president of marketing at Mellanox, says they are shipping in volume today, delivering anywhere from 2.88 Tb/sec to 51.8 Tb/sec of aggregate, non-blocking InfiniBand bandwidth.

Monson puts a dig in against QLogic, saying that its InfiniBand switches deliver 100 nanosecond latency across the switch and 1 microsecond latency at the ports, compared to around 180 nanoseconds across the switch with the silicon used in the QLogic switches and as high as 300 nanoseconds in other IB switches.

Mellanox added another modular switch, and could yet add more products to the lineup, because it wants to have products that meet a wide variety of scenarios. "We never quite know what people want on modular and edge switches, so we give them lots of options." Sometimes they want managed switches in their clusters, sometimes they don't want to pay the premium.

In a lot of HPC setups, customers put unmanaged edge switches in the top of a rack of servers and then a modular switch, sometimes with management modules and sometimes not, at the end of a row of server racks. The switches need to scale from dozens of server nodes to tens of thousands using a bunch of the 648-port modular switches that are daisy chained together.

When you take into account the cost of cables and switches together, Monson says that 40 Gb/sec InfiniBand can be put into clusters for roughly the cost as 10 Gigabit Ethernet, and deliver four times the bandwidth and lower latency; and that this is one of the reasons why InfiniBand shipments steadily increased in 2009 at Mellanox. In the second quarter of 2009, Mellanox shipped around 50,000 of its Connect-X and Connect-X2 IB adapter cards, and in the third quarter that rose to over 70,000 units.

In the final quarter of 2009, Mellanox shipped more than 90,000 IB adapters, which represented about 150,000 server ports - and about 5 per cent of all server ports shipped in the quarter. (Mellanox did not provide states on how its IS5000 switch sales were going.) Significantly, in the fourth quarter, about 57 per cent of the InfiniBand adapter revenue stream was for QDR products rather than the earlier 10 Gb/sec and 20 Gb/sec DDR products.

"This is a pretty healthy trajectory," says Monson, adding that Mellanox has shipped more than 5.3 million InfiniBand ports in total in the past decade.

Still, the competition with 10 Gigabit Ethernet (including its own product lines) and with aggressive IB competitors (Voltaire, QLogic, and Cisco Systems) added to the economic downturn put pressure on Mellanox' profits in recent quarters. In the fourth quarter ended in December, when all those adapter cards and ports were being sold, driving sales up 41 per cent to $35.5m, but net earnings fell by 46 per cent to $4.3m.

One of the things that Mellanox is going to do to try to get a leg up on its InfiniBand competitors is be more aggressive about testing IB cables. "You only need one unreliable cable in a cluster and things start to get real flakey," says Monson.

The fastest switch in the world won't matter of the signal integrity in the cables is bad, and Mellanox is going to do the testing to make sure the cables are above spec - and for practical economic reasons. "One cable failure will cost customers a ton of money to fix. And if something isn't working, they call us anyway because customers just assume it is the switch, not the cable."

Mellanox does not manufacture InfiniBand and Ethernet cables, but it has bought, certified, and resold them to give customers some extra degree of comfort. In the past, it has certified passive copper cables up to 7 meters in length, active copper cables ranging in size from 8 to 12 meters, and assembled optical cables that can stretch up to 300 meters.

Now, Mellanox has instituted its own testing regimen that exceeds the mechanical and electrical standards set by the InfiniBand Trade Association. Mellanox and its cable partners are testing cables at the cluster level, using real MPI traffic, and using Mellanox testing equipment and procedures.

In addition to testing the mechanical specs of the cable and its signal quality and performance relative to the ITBA standards (a bit error rate of 10-12), Mellanox is doing more extensive signal quality tests on cables, which means looking for jitter, eye opening, crosstalk, and temperature interference and ensuring corner, adapter, and switch bit error rates are at 10-15 or less; it is also doing additional MPI and link testing on systems using cables to ensure a system-level bit error rate of 10-17 or less for each cable it sells.

All of this testing adds a small premium to the cost of the cable, of course. But finding a faulty cable in a spaghetti mess of thousands of cables costs a lot more money, as does having a supercomputer sitting there not doing any useful work as you hunt for a bad cable. ®