The Web is the world’s most universal compute platform and the foundation for the digital economy. Since its birth in early 1990s, Web capabilities have been increasing in both quantity and quality. But in spite of all the progress, computer vision isn’t yet mainstream on the Web. The reasons include:

The lack of sufficient performance of JavaScript*, the standard language of the Web

The lack of camera support in the standard Web APIs

The lack of comprehensive computer vision libraries

These problems are about to get solved―resulting in the potential for a more immersive and perceptual Web with transformational effects including online shopping, education, and entertainment, among others.

Over the last decade, the tremendous improvements in JavaScript performance, plus the recent emergence of WebAssembly*, close the Web performance gap with native computing. And the HTML5 WebRTC* API has brought camera support to the Open Web Platform*. Even so, a comprehensive library of computer vision algorithms for the Web was still lacking. This article outlines a solution for the last piece of the problem by bringing OpenCV* to the Open Web Platform.

OpenCV is the most popular computer vision library, with a comprehensive set of vision functions and a large developer community. It’s implemented in C++ and, up until now, was not available in Web browsers without the help of unpopular native plugins.

We’ll show how to leverage OpenCV efficiency, completeness, API maturity, and its community’s collective knowledge to bring hundreds of OpenCV functions to the Open Web Platform. It’s provided in a format that’s easy for JavaScript engines to optimize and has an API that’s easy for Web programmers to adopt and use to develop applications. On top of that, we’ll show how to port OpenCV parallel implementations that target single instruction, multiple data (SIMD) units and multiple processor cores to equivalent Web primitives―providing the high performance required for real-time and interactive use cases.

The Open Web Platform

The Open Web Platform is the most universal computing platform, with billions of connected devices. Its popularity in online commerce, entertainment, science, and education has grown exponentially―as has the amount of multimedia content on the Web. Despite this, computer vision processing on Web browsers hasn’t been a common practice. The lack of client-side vision processing is due to several limitations:

A lack of standard Web APIs to access and transfer multimedia content

Inferior JavaScript performance

Lack of a comprehensive computer vision library to develop apps

The approach we outline here, along with other recent developments on the Web front, will address those limitations and empower the Web with proper computer vision capabilities.

Adding Camera Support and Plugin-Free Multimedia Delivery

HTML5 introduced several Web APIs to capture, transfer, and present multimedia content in browsers without the need for third-party plugins. One of these, Web Real-Time Communication* (WebRTC*), allows acquisition and peer-to-peer transportation of multimedia content and video elements to display videos.

JavaScript is the dominant language of the Web. Because it’s a scripting language with dynamic typing, its performance is inferior to that of native languages such as C++. Multimedia processing often involves complex algorithms and massive amounts of computation. With client-side technologies such as just-in-time (JIT) compilation, and with the introduction of WebAssembly* (WASM*), a portable, binary format for the Web, Web clients can reach a near-native performance with JavaScript and handle more demanding tasks.

A Comprehensive Computer Vision Library

Although there are several computer vision libraries developed in native languages such C++, they can’t be used in browsers without relying on unpopular browser extensions, which pose security and portability issues. There have been a few efforts to develop computer vision libraries in JavaScript, but these are limited to select categories of vision functions. Expanding those efforts with new algorithms, and optimizing the implementation, are challenging tasks. Previous work lacked either functionality, performance, or portability.

As an alternative approach, we take advantage of an existing comprehensive computer vision library developed in C++ (i.e., OpenCV) and make it work on the Web. This approach works great on the Web for several reasons:

It provides an expansive set of functions with optimized implementation.

It performs more efficiently than normal JavaScript implementations, and performance will further improve through parallelism.

Developers can access a large collection of existing resources such as tutorials and examples.

OpenCV.js*

OpenCV1 is the de facto library for computer vision development. It’s an open-source library that started at Intel Labs back in 2000. OpenCV is very comprehensive and has been implemented as a set of modules (Figure 1). It offers a large number of primitive kernels and vision applications, ranging from image processing, object detection, and tracking to machine learning and deep neural networks (DNN). OpenCV provides efficient implementations for parallel hardware such as multicore processors with vector units. We translate many OpenCV functionalities into JavaScript and refer to it as OpenCV.js.