The WebRTC ecosystem is vast and sometimes can be a bit scary for newcomers. When I first tried to understand WebRTC, I remember coming across an incredible amount of acronyms. This article will provide a guide to webRTC media servers and a few open source options such as kurento, janus, jitsi.org and more. I will also aim to lower the technical barrier needed to understand WebRTC’s business value.

What is a WebRTC Server?

Since the early days of WebRTC, one of the main selling points of the tech was that it allowed peer-to-peer (browser-to-browser) communication with little intervention of a server, which is usually used only for signaling. This is why the concept of a WebRTC media server may be counterproductive.

Below, I’ll try to illustrate why media servers are useful, what type of features they normally offer and which open source alternatives the user has at its disposal.

Multiple Participants in a Video Call

While it’s true that is possible to hold video calls with multiple participants using peer to peer communication (Fig 1. mesh architecture), it stops being practical as the number of participants increase, since it is required that a peer sends his/her video/audio stream to every participant, while receiving a video/audio stream per participant.

In practice, even under optimal network conditions, a mesh video call doesn’t work well beyond five participants. This is where a media server comes in handy as it helps reduce the number of streams a client needs to send,usually to one, and can even reduce the number of streams a client needs to receive, depending on the media server capabilities.

When a media server acts as this kind of media relay, it is usually called SFU (Single Forwarding Unit), meaning its main purpose is to forward media streams between clients.

There’s also the concept of a MCU (Multipoint Conferencing Unit), which is used to address a media server that not only forwards, but can operate on the media streams that go through it (e.g. mixing all video or audio streams into a single one).

Video Recording

One of the main benefits of having all video streams go through a media server (cluster) is that the media can be recorded and stored for any purpose, something that would be pretty difficult to do on a mesh architecture, if possible at all.

Integration with Other Communication Technologies

Another advantage of using a media server is communicating with systems beyond what web technologies allow, such as the PSTN via SIP trunking or streaming through RTMP to services that support it, like Facebook Live and YouTube Live Streaming.

You can see a sample of this in one of our previous blog posts, in which Kurento Media Server is used to connect a video call between a browser and a SIP phone.

Processing of Media Streams

Some media servers allow the processing the video and audio streams at a very low level, like being able to run computer vision models on the video or send the audio stream to a speech recognition engine, such as Google Speech. These are features that take WebRTC to another level; in my opinion,it allows for a richer and innovative real-time interactions that can add a lot of value to a, otherwise normal, communication platform.

We’ve also discussed this subject before, in the article mentioned in the last point, where Kurento Media Server applies a face detection model on the video stream to put a hat overlay on top the participant’s head.

Which OSS Media Servers Options Are Available?

As mentioned before, the WebRTC ecosystem is vast, and there are quite a few open source options on the market.

Jitsi is not only a WebRTC media server, but has a whole platform built around it. The Jitsi family of products include Jitsi Videobridge (Media Relay, SFU), Jitsi Meet (Conference web client), Jicofo (Jitsi Conference Focus), Jigasi (Jitsi Gateway to SIP) and Jitsi SIP Phone. The most appealing feature of the Jitsi platform is that it includes everything for a communication platform to be up and running in a matter of a few hours. It also implements its own signaling using Jingle (XMPP) and a fully featured web interface. Sadly, however, one of the biggest pain points is implementing media recording, as there’s no solid, easy to use solution.

This is one of the most versatile solutions out there. It not only is a media server, but is a toolkit to build one. The main advantage of Kurento is its versatility by introducing the concept of a Media Workflow, that allows to define, in code, how and where the media flows. This allows a WebRTC developer to compose and integrate very interesting features such as computer vision (e.g. recognize QR codes, face detection), real-time media modification and interop with RTP (VoIP) services. Kurento can also be configured to function as SFU or MCU, or both, in a single instance.

While its description doesn’t mention “media server” anywhere, Janus can be setup as an SFU pretty easily. One of its most notable features is its plugin architecture, that allows to augment the service’s core capabilities. There’s a demo page that displays a few interesting use cases of Janus such as SIP Gateway, screen sharing and others.

A relatively new and interesting media server, what makes it different from the rest is that it’s designed to be a library (for Node) allowing it to be integrated in bigger applications.

Final Thoughts

Hopefully this article helped demystify the concept of WebRTC media servers, examined the features they offer and offered a few open source options that are available.

Contact us to build or improve your WebRTC app!

Would your business benefit from a webRTC real-time video and audio chat-based application? Are ready to chat about how you can incorporate it into your business? Do you have a current webRTC application that needs a health check assessment? We have an experienced team ready & happy to help you out. Contact us today.