It has been only a decade since the first high-definition (HD) videoconferencing systems were unveiled, capable of producing video at 30 frames per second with a resolution of 1280x720 pixels. Today most major suppliers in the videoconferencing market offer HD solutions that can process resolutions up to 1920x1080 pixels at 100 frames per second.
Enterprise-quality videoconferencing has traditionally been very expensive, complex to manage, difficult to use, and hard to scale. First- and secondgeneration videoconferencing systems were built on custom hardware and digital signal processors (DSPs). Servers could cost as much as $180,000 and could not easily be programmed to support new video compression formats. As a result, they typically faced obsolescence after four to six years.

It’s a new day now. With the latest processors from Intel, off-the-shelf servers can now support media-intensive applications, using instruction set extensions along with hyper-threading and an expanded number of processor cores. Performance of these third-generation, software-based systems can even be better than custom hardware architectures using ASICs (application-specific integrated circuits), FPGAs (field-programmable gate arrays), and DSPs.

Norway-based Pexip has embraced this third-generation approach. The Pexip Infinity videoconferencing solution is a software-based transcoding engine, MCU, and gateway all in one—with a lot of extra functionality. With Pexip Infinity*, everyone in the organization can have their own Virtual Meeting Room at a fraction of the cost of traditional videoconferencing systems—about the cost of a cup of coffee a day. The interface is as easy to use and manage as telephone or email.

Pexip believes there are several fundamental pillars to achieving videoconferencing at scale:

Software-only. Videoconferencing is deployed on a virtualized software platform (VMware ESXi* or Microsoft Hyper-V*) running on existing datacenter infrastructure or in a private or public cloud.

Distributed. You can deploy conferencing capacity where and when you need it, without having to route traffic through a central point, for great flexibility and dramatically less bandwidth usage.

Future-ready. Because Pexip Infinity is virtualized software, it’s easy to scale, deploy, manage, update, and enhance. It is managed just like any other enterprise software.

Flexibly licensed. Since the solution scales by licensing, you pay only for the capacity the organization needs or uses, rather than having to buy hardware to meet anticipated peak usage.

By using Intel Parallel Studio XE, along with the processing power of the Intel Core architecture, Pexip has been able to match, and even exceed, the performance of traditional conferencing systems.

The Challenge

Pexip is constantly on the quest for the Holy Grail of enterprise-grade videoconferencing: highest quality and lowest overhead. That means high resolution, good performance
and a compelling user experience, while reducing bandwidth and processing requirements. More for less—a constantly moving target to maintain competitive differentiation.

The software-only approach has merit for achieving business goals—it’s cost-efficient, and easy to deploy and scale—but it can be technically challenging to achieve software encoding performance that beats hardware encoding on specialized hardware.

And while Pexip Infinity doesn’t require specialized hardware, it does require industry standard servers. You want to keep those to a minimum. The more Pexip can reduce processing requirements, the more users can be supported per server, the fewer servers needed, and the lower the overall cost of the solution. Pexip also needs to ensure a quality user experience on a broad range of devices, some of which may have only moderate computing power.

So the engineering team set out to improve software-based video encoding performance and take advantage of the latest hardware. This optimization effort targeted the new, third-generation Intel Xeon processor E5 v3 family, (code named Haswell), for an implementation using the WebRTC* (Web Real-Time Communication) browser engine and the Google VP8* codec standard.

Initial performance analyses showed there was room for optimization. Baseline performance wasn’t enough to meet Pexip’s high quality standards, and VP8 didn’t support the AVX2 (Advanced Vector Extensions 2) instruction set, so it couldn’t benefit from all the hardware capabilities on Haswell-based servers.

To improve performance, the engineers needed to detect and analyze bottlenecks and explore ways to use targeted hardware capabilities—which would be difficult without specialized tools.

The Solution

Pexip’s development cycle consisted of an iterative sequence of analysis and incremental development:

Analyze the code for hotspots using Intel VTune Amplifier XE.

Examine the code in the hotspots and look for ways of improving the algorithms.

Intel engineers provided technical consultation and early access to a new edition of Intel Advisor XE before official release.

Uncovering opportunities for improvement

Intel VTune Amplifier XE provides a rich set of performance profiling insight into hotspots, threading, locks and waits, OpenCL (Open Computing Language) parallel computing code and more. It provides an intuitive way to tune for CPU and GPU performance, multicore scalability, bandwidth efficiency, and more. The tool delivers quick performance insight with the ability to sort, filter, and visualize results on the timeline and source.

Vectorization Advisor is a vectorization design tool to identify loops that will benefit most from vectorization, what is blocking vectorization, and the benefit of alternative data reorganizations—ultimately raising confidence that vectorization is safe.

For Pexip, the preview edition of Intel Advisor XE enabled deep analysis of vectorized loops and SIMD instructions, which helped answer critical questions. Were the hottest loops vectorized? If so, how efficiently was that done, and what can be improved? If not vectorized, why? All vectorization-related diagnostics are nicely displayed in one place: CPU performance data, compiler diagnostics, SIMD instruction set used, source code, and assembly.

Results

Intel tools pointed the way to source code modifications that significantly improved the code generated by the compiler. The tools found the most CPU-timeconsuming algorithms, functions, and instructions. The tools also showed that the base version did not use enough SIMD instructions, which would be key for improving multimedia performance.

Code was refactored and reworked to enable more auto-vectorization by the compiler. Pexip achieved efficient usage of one-byte data types without expanding to two or four bytes. Along with compiler unrolling, this efficiency enabled processing of an entire loop in a single SIMD lane. The result was adoption of an algorithm that is a better fit for business videoconferencing.

Performance gains were impressive. Video encoding performance has increased 2.5 times over the reference implementation, due to algorithm and SIMD optimizations, coupled with efficient use of AVX2 instructions.

Better performance means less CPU resource consumption, which translates into lower hardware requirements. One server can now support the conference capacity that formerly required two servers. That means lower costs for end users and better competitive advantage for Pexip.

High-quality WebRTC video is enabled for browser clients, improving the customer experience. End users get premium conference quality with easy deployment and high scalability—all without specialized hardware.

“The 2.5X performance gain we achieved using Intel Parallel Studio XE had a direct impact on our success in the marketplace,” said Endresen. “It allowed us to offer more than double the capacity at the same price point. For us, that translates into a major competitive advantage.”

Conclusion

With Pexip Infinity’s virtualized and distributed software, videoconferencing is made scalable, flexible, and accessible in a realistic way. By using industry standard IT tools, and solving challenges through IT, everyone can have access to video, audio, and data collaboration. Working with the Intel Compiler team, Pexip was able to implement computationally demanding video algorithms with outstanding performance on servers powered by the Intel Xeon processor E5 family.

Customers recognize the resulting performance edge. Said one, “Pexip does not limit my port count and reduces the bandwidth required for large-scale
videoconferencing on my network. The solution makes sense and the model is attractive.”

In LetsDoVideo’s 2014 public voting Readers’ Choice Awards, Pexip Infinity won the “Best Interop for a Cloud Solution” award. Pexip Infinity also earned an ISE 2015 Best of Show Award. Now it’s even faster and more cost-effective.

About Pexip

Pexip overthrows conventional telepresence thinking by providing an affordable, virtualized, simple approach to videoconferencing on any device. Pexip eliminates the hurdles of high hardware costs and bandwidth requirements when deploying large-scale conferencing, making personal virtual meeting spaces for all a reality.

Deployed on premise, in a private or public cloud, or even rented as a service, Pexip’s virtualized software enables organizations to deploy videoconferencing as much and as widely as needed, in minutes rather than days or weeks.

Founded in 2012 by videoconferencing and telepresence experts, Pexip is headquartered in Oslo, Norway, and has offices in New York and London.