By the virtue of my high end system integration focus, I was aware of Intel's – and other CPU vendors – attempts to consolidate the high end system interconnect part of the product support, over the years. After all, seamlessly connecting a larger number of high margin multi-socket server boxes together into single system units, whether clusters or even tighter shared memory systems, does help have a winning proposition for many multi-million dollar deals in the datacenter, cloud and supercomputing space.

These interconnects can vary from plain-vanilla Gigabit Ethernet in simple setups where the communication between nodes isn't too high, to their faster (but not much lower latency) 10GE and 40GE versions, then on to even faster Infiniband low – but not lowest – latency QDR 40 Gbps and FDR 56 Gbps setups – but still limited to message passing between nodes, and then on to specialised low latency interconnects that allow some sort if memory sharing, either shmem or even NUMA single memory space. Until 4 years ago, it was the UK that had the best of them all that you could buy openly, the QsNet from Quadrics in Bristol, whose fate deserves a separate story. SGI's NUMA and the Cray interconnect fit into this same category.

As the server and networking businesses come closer, we saw Intel taking up Fulcrum switch chippery to its existing 10GE node controllers, then acquiring Qlogic Infiniband arm, and now, just recently, finally concluding a long negotiated deal with Cray, to get a hold of its high speed interconnect. What's the impact?

Within Intel, this completes their full spectrum coverage of interconnects: all of Ethernet standards, with both system and switch chips; complete Infiniband solutions, and top-end supercomputer interconnect from Cray. The last one can be combined with the proposed fibre-optic QPI CPU interconnect, seen in Intel research papers, to scale those Xeons to thousands of sockets able to share the memory space, whether virtual or even physical as in NUMA systems, for both uber supercomputers and ultralarge clouds – the final part of the Exascale computer strategy.

Even in message passing clusters, the standard mainstream of today, a very efficient low latency interconnect can up the performance of a supersized system by large chunks, whether it is extra 10% in Linpack benchmark for higher TOP500 standing and the related owner ego boost, or much more than that in actual highly parallel applications sensitive to the inter-system connection lags. Think molecular modeling or weather simulation here…

Any trouble for AMD, since they were close with Cray? Not really – the problem there is that Cray needs a new, faster CPU core from AMD, and the CPU vendor will need some time for a full turnaround from the Bulldozer, to bring in a much faster core that competes back with Intel. On the other hand, AMD already has an ultrafast open spec large system interconnect for years – the High Node Count extension to HyperTransport, using inexpensive Infiniband structure as the physical layer, but with efficient low overhead Hypertransport protocol, is handled by HT Consortium and can be used on all HyperTransport enabled CPUs. This includes Chinese Loongson MIPS CPUs, whose high speed multiteraflop per chip derivative is expected to be deployed – as the main CPU, not an accelerator, mind you – some 2 years from now in one of China's 100 PetaFLOPs systems in the Chongqing city. The irony is, yes, that non X86 CPUs may benefit the most from it.

And the others? Mellanox, which with the fellow Israeli Voltaire acquisition, was the dominant Infiniband total solution party, needs to diversify beyond just Infiniband – where they now compete against Intel, who created the Infiniband, after all – and Ethernet presence, where the likes of Huawei are gaining ground too. To keep their market spread, looking at the solutions above Infiniband in both performance and features may be needed for them.

System vendors like SGI and HP, will lose yet a bit more of their differentiation, as Intel's new interconnect spread will cover more of their own niches – SGI's NUMAlink is still the fastest and tightest way to link multiple Xeons, enabling fully shared single memory space and single system images for 2,048 Xeon cores and 16 TB RAM in one block, and even more in the Ivy Bridge EX generation. Such capabilities now could come to Intel interconnects openly sold to any system integrator.

Finally, the smaller fish will likely have to get away from the dangerous open seas now – small corners under the coral reefs, with specific niches – whether NUMA single memory image for supercomputers, or application specific networks, may be the solution.