Direct Line

The WebRTC protocol converts your web browser into a communications center, supporting video chat over a peer-to-peer connection without the need for helper apps or browser plugins.

People who use video chat and other forms of real-time Internet communication often rely on Skype or similar tools. Web browsers too often depend on Flash or Java plugins for real-time communication. The latest generation of browsers, however, offer a powerful new tool for building real-time communication into scripts and homegrown web applications. WebRTC (Real-Time Web Communication) [1] supplements the new HTML5 standard by bringing native real-time communication to the browser.

WebRTC can handle video chat and similar formats. Communication occurs directly from browser to browser, without the need for an intervening web application. In this article, I show how easy it is to build a homegrown Internet video chat application by integrating WebRTC with the usual collection of web developer tools: HTML, JavaScript, CSS, and Node.js.

WebRTC is jointly promoted by browser vendors such as Google, Mozilla, and Opera. (Microsoft considers WebRTC to be too complicated and has presented UC-RTC [2] as its own design for real-time communication in browsers.) Although the WebRTC specification is not yet complete, the Google Chrome and Mozilla Firefox 22 web browsers already largely support it. WebRTC is a free standard described in a set of IETF documents [3], and W3C has already accepted a draft for a programming interface [4] for WebRTC in the browser.

You'll find a demo video that describes some of WebRTC's capabilities on YouTube [5]. The demo, which comes from the Mozilla project, shows a video call from a Firefox browser on a cellphone via the public telephone network.

How It Works

Figure 1 shows the data flow for a WebRTC session. Application and configuration data go their separate ways: The server acts as a router that accepts configuration information (i.e., IP addresses or information about video and audio formats) from one browser and forwards it to the other. WebRTC calls this the signaling process. The HTTP or WebSocket protocol is a useful choice for transferring the configuration data, with JavaScript Object Notation (JSON) providing the structure.

Figure 1: Configuration and application data go separate ways in WebRTC: The configuration data are routed by an intermediate server, whereas the protocol transmits the application data from peer to peer.

After exchanging the IP addresses on the signaling server, the browsers establish a peer-to-peer connection, as shown in Figure 1. The connection is used to transmit the application data directly between browsers via TCP or UDP, saving time and data traffic.

To open the connection, WebRTC works around Network Address Translation (NAT) routers or firewalls through the use of Interactive Connectivity Establishment (ICE) [6]. ICE retrieves usable IP addresses and ports for sending and receiving data over peer-to-peer connections.

ICE first tries to determine the IP address and port using Session Traversal Utilities for NAT (STUN) [7]. To do so, it sends a message to a public STUN server and receives the sender's address in return. However, if the NAT router blocks STUN, the IP address and port are provided by a relay server on the web using Traversal Using Relays around NAT (TURN) [8]. ICE reports the appropriate IP addresses and ports to WebRTC by triggering a onicecandidate event. The data is wrapped, along with the protocol to be used – TCP or UDP – in an ICE candidate-type object.

Sample Application

The following sample application uses WebRTC in the browser to implement a video chat. The browser takes the image from the webcam and the sound from the microphone and passes it to a second browser via WebRTC. The client application in this article uses HTML, CSS, and JavaScript for the implementation. Node.js is used for the signaling server.

The example is initially restricted to the Firefox browser, but you can port it to Chrome. Figure 2 shows the sample application running in Firefox on Ubuntu Linux. The other end is using Firefox on Windows 7.

Figure 2: At a glance: On the left is the image from the local webcam; on the right, the image from the remote webcam.

The getUserMedia() programming interface [9] in the latest versions of Firefox, Chrome, and Opera provides access to the webcam and microphone on the local machine, but it first asks the user's permission in a pop-up. WebRTC combines the video and audio streams to a media stream, as shown in Figure 3, which it can then process. A media stream can contain any number of video and audio tracks; one audio track includes two stereo channels.

Figure 3: The media stream combines video and audio streams for use in HTML video and audio elements, as well as in peer-to-peer connections.

Video and Audio Signals

Listings 1-3 [10] demonstrate how to integrate the local webcam image into an HTML document (Figure 4). The HTML document in Listing 1 references two JavaScript files in the header area (lines 3-4). The core.js file contains functions for several examples in this article, such as mediastream.js. In the body of the HTML document in Listing 1, the video element with the ID local is waiting in line 8 for a connection to a media stream.

Figure 4: Initial success: Firefox playing the image from the local webcam in the browser.

The call to the mozGetUserMedia() JavaScript method in Listing 2 (lines 2-8) grabs the images from the webcam and the sound from the microphone and bundles it into a media stream. The code in lines 4-6 passes in the stream as a parameter to the callback function. The connectStream() function in line 5 (and in Listing 3) plays the media stream in the video element. If a problem occurs, a callback function is called in line 7 to handle errors. Firefox makes the specification of a callback function for this case mandatory.

Listing 3 shows the JavaScript connectStream() function for the Firefox browser. Line 2 uses the querySelector() method with a CSS selector to select an element from the HTML document – preferably a video element. Line 4 binds the media stream to this element using the Firefox-specific mozSrcObject attribute.

Buy Linux Magazine

Related content

Linux users didn't need the recent NSA eavesdropping scandal to convince them that securing communication was a good idea. Free software developers have been creating secure tools for years that offer similar functionalities to all of those popular but very leaky services with ridiculous names.

The controversial Web Cryptography API offers flexible encryption for web applications, but it also lays the groundwork for content providers to implement more powerful access restrictions through DRM.

YouTube offers more than just funny kitten movies; you will also find more than 60 million music videos. With a native YouTube client for Linux, you can use this online jukebox as conveniently as your local music collection.