But while Nvidia has a major head start Intel in this area, Intel's Xeon Phi coprocessors offer a big advantage over Nvidia GPUs in programmability. Xeon Phi uses the same programming languages and models that Intel's Xeon E5 processors do—including C/C++ and Fortran—while Nvidia's GPUs require a proprietary environment called Cuda.

Though Intel is one of the sponsors of Stampede, Boisseau said TACC is a multi-vendor shop that, as a taxpayer supported organization, is obligated to consider technologies from all vendors. Stampede does employ Kepler2 GPUs in each of its 16 large memory nodes.

Stampede, which boasts a footprint of 11,000 square feet, uses more than 75 miles of network cables.

But Boisseau said the ease of porting code to Intel's MIC architecture is a substantial advantage. "When you port code to GPUs, you have to put a lot of work in," Boisseau said, adding that porting code to GPUs can take weeks, months or even a year.

"MIC's support of standard programming languages and tools allow almost any code to be compiled for MIC and natively executed on MIC," said Lars Koesterke, a TACC research staff member. Since the Xeon Phi development environment supports native C/C++ and Fortran cross-compilation and direct login access to the coprocessor, "the porting process is generally very straightforward," Koesterke said. He added that code optimization after initial porting could maximize vectorization and parallel efficiencies on the MIC architecture.

TACC has already installed more than 2,000 Xeon Phi coprocessors in Stampede. By the launch in January, the system will feature more than 6,000. Boisseau said TACC expects another injection of about 1,600 around 2015, although he added that there is no firm commitment on that as of yet.

Xeon Phi cards awaiting installation. As of early October, more than 2,000 Xeon Phi coprocessors had been installed in Stampede.