Should you use Kurento or Jitsi for your multiparty WebRTC video conference product?

Kurento or Jitsi; Kurento vs Jitsi – is the the ultimate head to head comparison for open source media servers in WebRTC?

Yes and no. And if you want an easy answer of “Kurento is the way to go” or “Jitsi will solve all of your headaches” then you’ve come to the wrong place. As with everything else here, the answer depends a lot on what it is you are trying to achieve.

Since this is something that get raised quite often these days by the people I chat with, I decided to share my views here. To do that, the best way I know is to start by explaining how I compartmentalized these two projects in my mind:

Jitsi Videobridge is an XMPP server component that allows for multiuser video communication. Unlike the expensive dedicated hardware videobridges, Jitsi Videobridge does not mix the video channels into a composite video stream, but only relays the received video channels to all call participants. Therefore, while it does need to run on a server with good network bandwidth, CPU horsepower is not that critical for performance.

I emphasized the important parts for you. Here’s what they mean:

XMPP server component – a decision was made as to the signaling of Jitsi. It was made years ago, where the idea was to “compete” head-to-head with Google Hangouts. So the choice was made to use XMPP signaling. This means that if you need/want/desire anything else, you are in for a world of pain – doable, but not fun

does not mix the video channels – it doesn’t look into the media at all or can process raw video in any way

only relays the received video – it is an SFU

Put simply – Jitsi is an SFU with XMPP signaling.

If this is what you’re looking for then this baby is for you. If you don’t want/need an SFU or have other signaling protocol, better start elsewhere.

You can find outsourcing vendors who are happy to use Jitsi and have it customized or integrated to your use case.

Kurento

Kurento is a kind of an media server framework. This too is an open source one, but one that is maintained by Kurento Technologies.

With Kurento you can essentially build whatever you want when it comes to backend media processing: SFU, MCU, recording, transcoding, gateway, etc.

This is an advantage and a disadvantage.

An advantage because it means you can practically use it for any type of use case you have.

A disadvantage because there’s more work to be done with it than something that is single purpose and focused.

Kurento has its own set of vendors who are happy to support, customize and integrate it for you, one of which are the actual authors and maintainers of the Kurento code base.

Which one’s for you? Kurento or Jitsi?

Both frameworks are very popular, with each having at the very least 10’s of independent installations and integrations done on top of them and running in production services.

Kurento or Jitsi? Kurento or Jitsi? Not always an easy choice, but here’s where I draw the line:

If what you need is a pure SFU with XMPP on top, then go with Jitsi. Or find some other “out of the box” SFU that you like.

If what you need is more complex, or necessitates more integration points, then you are probably better off using Kurento.

What about Janus?

Their website states that it is a “general purpose WebRTC Gateway”. So in my mind it will mostly fit into the role of a WebRTC-SIP gateway.

That said, I’ve seen more than a single vendor using it in totally other ways – anything from an SFU to an IOT gateway.

I need to see more evidence of use cases where production services end up using it for multiparty as opposed to a gateway component to suggest it as a solid alternative.

Oh – and there are other frameworks out there as well – open source or commercial.

Where can I learn more?

Multiparty and server components are a small part of what is needed when going about building a WebRTC infrastructure for a communication service.

In the past few months, I’ve noticed a growing requests in challenges and misunderstandings of how and what WebRTC really is. People tend to focus on the obvious side of the browser APIs that WebRTC has, and forget to think about the backend infrastructure for it – something that is just as important, if not more.

It is why I’ve decided to launch an online WebRTC Architecture course that tackles these types of questions.

Course starts October 24, priced at $247 USD per student. If you enroll before October 10, there’s a $50 discount – so why wait? Until I get enrollment automation up, contact me directly.

Slack for now is voice-only. While they might add video now that HipChat relaunched it on their service (via Jitsi), who knows if it will still be Janus or not.

On top of that, Slack is a great reference, but might not be the right one for others. I don’t know how much support they got, how much customizations they made and how much crap they ate along the way – it might be the best thing that happened to Slack – or it might not.

The track record I see and the recommendations I give are based on multiple variables – it relates to the DNA of the vendor adopting the framework, the feature set he needs, the type of support he is looking for, the scale he needs, the direct feedback I get from others on their use of said frameworks and on discussions with the vendors themselves.

For now, I am still waiting for more evidence about Janus besides Slack – and not because I have anything bad to say about it.

Despite the title, it was a more generic overview on Janus in general, and towards the end you can see a (non-exhaustive) list of different products using Janus nowadays, most of them exploiting the SFU plugin. As to Slack, they did everything by themselves without involving us, apart from a couple open discussions on our Google group.

XMPP (or rather colibri) is just the control layer. You can write a server for the xmpp component connection to the jitsi video bridge and translate to whatever you want from there… just requires skill.

Hi, I’m the Kurento lead and my opinion might be probably biased, but my honest feeling is that Tsahi’s diagnose in this post is quite accurate. Jitsi was designed with a specific videoconferencing model in mind and XMPP makes a lot of sense on it. When creating applications complying with such model, Jitsi makes a great job and using it may save lots of development hours. However, this may be too narrow when special requirements need to be satisfied. First, because using XMPP-inspired control mechanisms is not appropriate for all types of media control logic one may need to have. Just as an example, consider how you would be using XMPP for doing things such as interoperating with IP cameras or smart video devices, controlling computer vision filters, combining media mixing models with SFU models dynamically or orchestrating a complex dynamic media processing topology, etc. Using XMPP-like control mechanisms for those might be a counter intuitive and complex task for developers. Second because extending Jitsi video bridge with further capabilities requires a lot of hacking and deep knowledge of its code internals. On the other hand, Kurento was designed, since its very beginning, as a modular media development framework providing full composability and extensibility. Due to this, Kurento developers use consistent APIs available through programming-language-dependent SDKs that are designed based on software engineering principles (e.g. type protection, efficient management of synchronous/asynchronous calls, efficient use of threads, concurrency control, distributed garbage collection, testability, etc.) These APIs are fully agnostic to any kind of signaling or assumption being the “call model” just one of the possibilities for it. This provides a lot of flexibility but it also has the drawback Tsahi comments: when you just need a standard videoconferencing call flow you might need to develop your own signaling stack and then you might feel that you are reinventing the wheel. For minimizing this effect, on top of the Kurento raw APIs, the Kurento team also created several high-level APIs providing specific signaling such as the Kurento Room API and the Kurento Tree API, but this is another story.

Hi, I am currently developing a many-to-many video/audio conference solution for a german company which should integrate a public phone conference system via SIP in the future. Therefore I played around with Kurento and Janus. From my experiences so far I can tell that there are advantages and drawbacks in both media servers. Both of them have interesting approaches regarding the architecture. I like the Kurento way of connecting media endpoints to pipelines. But also the Plugin based approach of Janus offers a lot of possibilities as long as the developer is able to create those native plugins. In my view the typical use case for multiparty audio/video conferences is a SFU-approach for video and a MCU-approach (mixing) for audio. Mixing video would eat a lot of server resources and it’s not the typical use case that all participant in a big conference show video concurrently. But it’s different for audio. Especially if a gateway to the public phone world is necessary. Here you don’t get around mixing the audio channels. And this is the point where I see advantages on the Janus side. I compared server resource consumption of the Janus Audio Bridge to the Kurento composite element. Janus is using libOpus in the plugin to decode the opus audio, mixes the streams and encodes the mixed stream. This implementation limits the usage to the opus codec but it’s saving a lot of resources compared to the Kurento composite. As far as I understood Kurento uses gstreamer libs and pipelines for all the media processing. This may be much more flexible but leads to much more CPU consumption. My rough measurements showed approx. four times higher CPU load for a audio room with 10 participants.

Hi, Tsahi. Nice review of these technologies but what’s your take on mediasoup. We’re a full Javascript(Node) team and while Kurento provides NodeJS support, your latest comments indicates that its usage has began to dwindle. How does Mediasoup compare with Kurento?

Thanks for the prompt reply Tsahi. I looked at Kurento during the weekend while it appears as though work has started on it again after almost a year of inactivity. The tutorial repos for the Javascript still appears dated to 6.6.2 as against the current 6.7.* . Moreso it appears to be an overkill for our particular use case though. I think I’ll peruse mediasoup as it appears more lightweight and better suited for our use case. Thanks again.

Tsahi, i agree with you. I haven’t heard from anyone using the same for an enterprise level solution. The main problem for me is “multi-party seems to be limited to 16 participants” in Intel’s SDK. I want to acheive video calls with as many as 100 pax in a single call. I also went through your blog concerning the load testing for Kurento and it was bit dissapointing. Do you still think I can make a 100 pax call work with Kurento in MCU mode, with a very high spec server

Frankly, I am not aware of anything dominant or typical when it comes to signaling alongside Kurento. My guess is that it takes one of two forms:

1. The developers use the Node.js server coming from Kurento and modify it, making it their de facto signaling server 2. The developers use whatever it is they decided to drive their app interactions with

If other readers here can share their views and experiences that will be appreciated.

Hi, I’m Luis, NUBOMEDIA project coordinator. NUBOMEDIA is a research infrastructure that was created to experiment novel paradigms for combining WebRTC with advanced media processing capabilities in a scalable way. As a research project, it is not devoted to production and it lacks features that are probably required for such purpose (e.g. billing, fault-resilience mechanisms, etc.) Hence, NUBOMEDIA is a good starting point for any organization wishing to create a next-generation WebRTC PaaS, but not for being used direcly in production. In other words, if you are willing to create your very own WebRTC PaaS, evolving NUBOMEDIA may save you thousands of development hours, but further efforts need to be invested before having NUBOMEDIA to be production ready.

Well, it’s been a few months. It appears one can still select kurento from the AWS marketplace, though I have not tried it and officially, twilio has stated (back in ’16!) that they are not taking new electricRTC clients. Maybe that only means they are not offering support services for new electricRTC clients, but you are still free to use it. I’m trying to pick the right media server and hosting environment for my needs, and I wish there was some resolution on this. Twilio pricing seems to be $0.001 per participant minute. That’s better than Vidyo.io ($0.01 per participant minute) but I think I can do better by self-hosting.

I love Jitsi’s feature set, but unless I’m missing something, the documentation is nearly nonexistent! We need to get to market quickly. Any suggestions? Anyone I can hire as advisor/consultant? (It seems like bluejimp, the original jitsi consulting team no longer does consulting since the acquisition)