The Open Compute Project Summit 2017: Trip report

As every year, we joined the OCP community for our yearly summit. This year has been slightly different than the last one as the summit was organized in Santa Clara instead of San Jose in California. In the end not a major change, still a big travel for European people, and massive jetlag, but we are used to it.

The summit was organized as usual around a 1 and half day period, which is short and quite dense. The first morning is dedicated to Keynotes, and then the summit is spread in 3 main zones of interest, which includes conference tracks, engineering workshop and a show within which you can discover the “new designs” produced by the community.

The show was a little bit bigger than usual this year, and the main booth were as usual Microsoft, and Facebook one’s, with plenty of new designs introduced. We won’t cover all of them within this short report, as our intend is to emphasis on the “small” technical step forward that we thought of interest to improve designs in the near futur.

We arrived quite early in the morning on the first day (jetlag issue, and it is always good to benefit from an empty show).

First booth is the Microsoft one, and the second one is Facebook one. The show is still very accessible, and not equipped with over sized booth like we can see at Mobile World Congress. So marketing team from companies still did not had a look at how to enhance there booth. This is far much better for engineers, who can go straight to there point.

The big news on Microsoft booth stand was the introduction of the Olympus project reference design. Microsoft displayed board with Qualcomm ARM 64bits based processor, next generation Intel chip (which had there DIMM hidden on the first day, but we had the opportunity to take them in picture, trying to figure out where was the secret trick), and some Cavium design. Olympus is a big project for OCP. It does fit into a standard 19 inches rack, and provide building blocks around servers, and GPU.

Microsoft as well as facebook both introduced M2 NVMe boards in 2 flavors, one with x4 slots which requires x16 PCIe Gen3 lanes slot on server side, and one with 2 M2, which can fit into a 1U server with an x8 PCIe Gen3 slot. This small enhancement might boost the I/O performances of servers in hyperconverged mode, and provide longer lifecycle to refurbished servers who might benefit from a massive performance boost.

On the AI or GPU computing side the winner has clearly been nVidia with Pascal GPU reference design standing on both booth. The new Pascal GPU for entreprise seems to be quite a big beast with 16 GB for HBM v2 (High Bandwith Memory) dropping the GDDR5 standard to get the memory closer to the die, sharing the same package. One of the main benefits from HBM is to lower latency, reduce power consumption by lowering the distance between the memory controller and the RAM Dimm and enhance the global bandwidth by breaking the 150GB/s barrier. In the end the module is on a cartridge which is far much smaller than a standard PCIe slot, and the interconnect is performed by using nvLink instead of PCIe.

Facebook introduced plenty of new designs, one with nVidia either. One of the fun questions we had, was about the fact that Facebook and Microsoft are working within the same collaborative, open organization, and what was the interest to run 2 different design on the same architecture ? Why compete, not sharing and accelerating designs ?

The GPU board is probably closed to each other, but the remaining parts are really different.

One of the big enhancement we liked the most was on Yosemite v2 server. As a quick reminder Yosemite project is a 4 nodes single Xeon shelf which can fit into Open Rack sled. Each node is sharing it’s network card with the other and get access to an higher bandwidth board from Mellanox (100Gbps), using SR-IOV and virtualization functions from PCIe standard. Yosemite v1 had a big tradeoff with the design which was related to the fact to maintain a node, the 4 nodes had to be powered off due to the fact that the shelf was connected at the back to the bus bar and didn’t had enough space to integrated a moving arm to keep the servers connected to power while being de-inserted.

The new design introduce a small trick, with a power distrubution within the shelf done through a rail approach and 2 contacts which stay in touch while the shelf is slide out of the rack to maintain one of the server.

This hint is a very basic enhancement, but fix one of the biggest tradeoff made in Yosemite design.

For the first time (we believe) the Open Power Foundation was at the summit with a booth. The actuality was quite high with the introduction of Bareleye v2 or Zaius, a dual socket Power 9 server which can fit an Open Rack v2 rack.

The server is designed through the Open Power foundation contributors, and we had the chance to discuss with Aaron Sullivan (picture left) from Rackspace about this tremendous design, as well as some Google folks who do support such innovation. The Open Power chassis is probably the most open server design from the show, with firmware which are based on FOSS, including remote management software based on Open BMC.

The chassis support multiple features which are quite interesting including a storage shelf for 15 drives directly attached to the board through an LSI controller. Power 8 and Power 9 put the emphasis on memory bandwidth massive caching (200MB max for Power8) and deep physical multithreading with support of up to 192 threads. Bareleye and Zaius are supported under various Linux flavors, and contribution quality is pretty high (from mechanical drawings to electrical files).

As a conclusion, the show was as usual, but more crowded (or we had this feeling), packed with core OCP contributors booth, and there associated suppliers. Every year we are hoping to see a “community” zone where hackers could share there enhancements and ideas. This zone is still not there, and the show is still deeply business oriented, which is fine but not enough to build a strong community. There are some upcoming enhancements but the path is still long. What OCP is missing like any other open hardware project currently is good tools to share and collaborate. We introduced some core concept during our engineering workshop session, but still have a lot of work ahead ! So join and contribute !