webrtcH4cKS: ~ Project WONDER: showing WebRTC NNI does not need SIP

As discussed in previous posts, WebRTC standards do not specify a signaling protocol. In general this decision is positive by giving developers the freedom to select (or invent) the protocol that best suits the particular WebRTC application’s needs. This can also reduce the time to market since standards compliance-related tasks are minimized. WebRTC media and data protocols from the provider to the user are standardized, so the lack of a standardized signaling protocol does not hurt interoperability of subscribers within the same network service. The calling party just has to have a URL from the called party to download its web app and to establish a WebRTC session with them. They both connect to the same web server and will then utilize the same signaling scheme. This is a new paradigm that is often difficult to embrace for traditional telephony developers who are used using the SIP protocols for handling all signaling, including the User to Network Interface (UNI) and Network-to-Network Interface (NNI).

While this approach is appropriate for many applications, it does leave many open questions for others:

What happens when the caller wishes to retain some control of the call?

Who determines the calling platform?

How do you allow cross-domain calls?

How do you avoid vendor lock-in to proprietary signaling protocols?

SIP-based IMS networks address all these problems by providing a vendor independent, end-to-end signaling mechanism to works within and across service provider domains. As a result, end-to-end SIP proponents often argue, “why create something new when SIP already exists?”

Is it possible to ensure interoperability between different WebRTC service providers while using application specific signaling?

This is the root question that drove the WONDER (Webrtc interOperability tested in coNtradictive DEployment scenaRios) project, a partnership between Deutsche Telekom and Portugal Telecom that is partially funded by the European Commission.

Is WebRTC’s non-standardized signaling, triangular network model, and minimum network side functionality fundamentally incompatible with the Telco model where distinct apps can natively communicate across any compliant environment? Or, is it possible to leverage the current WebRTC model within the Telco model? That is the the core concept behind project Wonder. The scientists from Portugal Telecom Inovação and Deutsche Telekom call this concept Signaling on-the-fly and they explored and tested this concept as part of this project.

Signaling on-the-fly architecture

Topology

Before we review how the Signaling on-the-fly concept works, let’s start by defining a few of its terms and look at how these relate in a diagram (Figure 1):

Domain Channel: the signaling channel that is established with the domain’s messaging server as soon as the user is registered and is online

Transient Channel: the signaling channel that is established, typically with a foreign messaging server (i.e. from another domain) in the scope of a certain conversation

Messaging Stub: the script containing the protocol stack and all the logic needed to establish a Channel to a certain Messaging Server

Conversation Host: the Conversation Host is the Message Server that is used to support all conversation messages exchanged among peers belonging to different domains

Called-party Domain hosting

Let’s use the classic Alice and Bob example to explain the concept assuming they are registered in different Service Provider domains. Alice wants to talk to Bob by using Bob’s RTC identity e.g. bob@domain.com, so the process as illustrated in Figure 1 is:

Information about the Identity of Bob, including Bob’s Messaging Stub provider, is provided and asserted by Bob’s Identity Provider (IdP).

Alice downloads and instantiates Bob’s Messaging Stub in her browser to setup a Transient Channel with Bob’s domain Messaging Server.

As soon as the Transient Channel is established, Alice can send an Invitation message to Bob containing her SDP offer.

Since Bob is connected in the same Message Server via his Domain Channel, he will receive Alice’s invitation in his Browser. If Bob accepts the invitation, an Accepted message containing Bob SDP response will be send to Alice.

As soon as Alice’s browser receives Bob’s SDP, the media and/or data streams can be directly connected between the two browsers.

Figure 1 – Conversation hosted by called party domain

Calling-party Domain hosting

The previous scenario implies that the called party domain is spending more resources than the calling party domain since the Conversation is hosted by the called party using its Messaging server. If this is not ok, then conversations can also be hosted by calling parties. In this case then:

A RESTful notification service endpoint is asserted from Bob’s IdP which is used to push an invitation message containing Alice’s offer towards Bob device

The Identity of Alice including Alice’s MessagingStub URI is provided and asserted by Alice’s IdP

As soon as the Transient Channel is established, Bob can send an ACCEPTED message to Alice containing his SDP response

Since Alice is connected in the same Message Server via her Domain Channel, she will receive Bob’s SDP and the media and/or data streams can be directly connected between the two browsers

Figure 2 – Conversation Hosted by Calling Party domain

Legacy network interoperability

The signaling on-the-fly concept can also be applied to support interoperability with legacy networks (e.g. IMS and PSTN) by using a Messaging Gateway that will convert the signaling protocol used in the WebRTC device into the signaling protocol used in the legacy network (Figure 3).

Figure 3 – Interworking with Legacy networks, e.g. IMS

Multiparty Conversations

Multiparty Conversations with more than one user coming from different domains are also supported by the signaling on-the-fly concept. Different Network Topologies can be used including:

Mesh Topology with a Hosting peer, where all peers have direct media and data streams established with all remaining peers and a single Hosting Messaging server is used i.e. all peers have a signaling channel established with the same Messaging Server (Figure 4).

Figure 4 – Mesh Multiparty Conversation with Hosting

MCU based Topology with a Hosting peer, where peers have media and data streams established with a central media server that mixes and distributes streams among the peers, and a single Hosting Messaging server is used i.e. all peers have a signaling channel established with the same Messaging Server (Figure 5).

Figure 5 – Multiparty Conversation in a Media Stream Star Topology

JavaScript Framework

A Javascript framework, the WONDER lib, was designed and implemented to validate the signaling on-the-fly concept. Main WONDER lib classes are (Figure 6):

The Identity represents a user and contains all information needed to support Conversation services including the service endpoint to retrieve the protocol stack (Messaging Stub) that will be used to establish a signaling channel with the Identity domain messaging server. The Identity entity extends the current Identity concept defined in WebRTC specification to support seamless interoperability by using the signaling on-the-fly mechanism.

The MessagingStub implements the protocol stack used to communicate with a certain Messaging Server.

The Conversation class manages all participants including the setup, update or close of media and data connections.

The Participant class handles all operations needed to manage the participation of an Identity (User) in a conversation including the WebRTC PeerConnection functionalities. The Local Participant is associated with the Identity that is using the Browser while the Remote Participant is associated to remote Identities (users) involved in the conversation.

The Resource class represents the digital assets that are shared among participants in the conversation including participants’ voice, video, screens, photos, video Clips, music clips, documents, etc. These assets are usually managed by the Participant that owns it. For local participants assets are sent (e.g. WebRTC outgoing stream tracks) while for remote participants assets are received (e.g. WebRTC incoming stream tracks). Some Resource types like Chat are not managed by a Participant but by the Conversation.

The Data Codec is used by Resources that are shared on top of the Data Channel, like file sharing and Textual Chat, to decode and encode the data in a consistent way by all the peers. The Data Codec may also be downloaded on-the-fly by the peers.

The Message is used to exchange all data needed to setup, update and close media and data connection between peers via the Messaging Server. It may also be used for other purposes e.g. presence information management. Each message is comprised by a Header and a Body. Please see here for details on message types and headers.

Figure 6 Main WONDER Classes

Developing a WONDER Application

The Wonder library provides different entry points which differ in the level of abstraction and complexity. These levels are illustrated in Figure 7. The Conversation layer provides the highest level of abstraction and hides all the complexity of programming a WebRTC application. This includes methods for accessing media sources, establishment and management of RTCPeerConnections, abstraction of the call-participants and the whole signaling between them. Therefore this option is most suitable for developers who want to start an application from scratch and in the most simple and straight-forward way. This option provides full control of all parameters and flexible ways for modification of running conversations.

The other extreme of programming is to use the Core layer directly. This method provides mechanisms for handling identities and for the exchange of standardized messages for the establishment of WebRTC communications. This also includes the described on-the-fly methods for downloading of Messaging Stubs and therefore provides the advantage of cross-domain interoperability. However – all WebRTC related coding and the management of calls and their participants are left to the programmer. This option is intended for developers who already have a WebRTC application and want to make use of the WONDER interoperability features.

There is also a third option – the Participant layer – which is a compromise between both options described above. It provides an abstraction of the participants of a conversation and handles all WebRTC related stuff for them, but it does not provide an abstraction of a conversation itself. So it might be of interest for developers who do not want to struggle with the complex WebRTC coding, but want to keep their own concept of what a Conversation is.

Figure 7 – Different levels of programming with the WONDER API

A complete example of the necessary code for a bidirectional audio/video communication app by using WONDER Conversation layer, is provided here.

Tests and Results

The WONDER library was used in different experiments to validate the signaling on-the-fly concept. Experiments were performed by using an OpenIMS based test-bed effectively operated by University of Patras in the OpenLab project. The test-bed was extended and configured to emulate four different WebRTC domains, namely:

IMS –imsserver.ece.upatras.grdomain that uses a WebSocket-based JSON signaling protocol that is translated into SIP protocol by an IMS-Signaling GW provided by Deutsche Telekom Labs. The IMS-Client acts as a SIP user agent and provides a JSON based API to the web-frontend. The main role of the IMS-client is to map this JSON API to SIP and vice-versa. Therefore there is no need for any SIP library in the browser.

node.js – the nodejs.wonderdomain uses a JSON over WebSockets provided by a Node.js message server.

The experimentation results are summarized in the table below (Figure 8).

In general inter-domain experiments were very successful, demonstrating that the signaling on-the-fly can be used to enable seamless interoperability between any WebRTC domains with no use of NNI standard protocols.

For IMS based domains the tests for multi-party conversations were not performed since the algorithm used implies the exchange of signaling messages outside conversation SIP dialogs. This requirement is quite challenging for IMS based architectures. Eventually it may be supported by using SIP PUB-SUB dialogs. Nevertheless, the solution would always demand additional resources that were not available in the project and, at the end, the solution would also require changes in standard IMS clients, to work.

Figure 8 – Inter-domain interoperability test results

Conclusions and Future Work

The WONDER project has demonstrated Network to Network Interface (NNI) standard protocols are NOT needed to achieve seamless interoperability between any WebRTC domains. A standard and protocol-agnostic Javascript API, like the WONDER API, could be used instead, promoting portability of Applications among different back-end solutions.

Such approach, also benefits service providers by minimizing dependencies between Applications and back-end vendors. Until now, one of the rationales to use IMS based back-end solutions was the need to have NNI standard interfaces based on SIP to ensure full interoperability between different Service Provider domains. The successful demonstration of the signaling on-the-fly concept means this rational is not valid anymore. In the end this means a web centric delivery approach using more agile and simpler architectures is feasible and paves the way for a future Web centric standard Service Architecture as an alternative to IMS.

Network to Network Interface (NNI) standard protocols are NOT needed to achieve seamless interoperability between any WebRTC domains..a web centric delivery approach using more agile and simpler architectures is feasible and paves the way for a future Web centric standard Service Architecture as an alternative to IMS

Looking into the summary experimentation results tables we may conclude web-centric delivery approach had more success than our IMS tests. This result is somewhat of a surprise since IMS is a mature architecture with a large set of services available, while WebRTC is still in very early stages and is not a standard yet. In reality WONDER experimentation did not take much advantage of existing services – namely Presence and XDMS – due to the amount of integration effort it would demand. Nevertheless, this also indicates how IMS option implies further integration efforts when compared with the Web centric option.

The WONDER javascript library has been published in a GitHub repository along with tutorials and live demos. Have a look and try it out. We welcome all feedback to improve it. We are also evaluating the potential of signaling on-the-fly to be adopted by the industry and vendors and to make it a standard. We are currently exploiting and researching new application domains (e.g. IoT, Content Delivery) for the signaling on-the-fly concept and its usage in any Web Service powered with WebRTC. In particular, we are investigating the design of new p2p service architectures, moving from a Client-Server paradigm (e.g. RESTful architectures) towards a more powerful service concept paradigm that we call Hyperlinked Entities or just Hyperties.

{“authors”, [“Paulo Chainho”, “Steffen Druesedow” ,”Kay Haensge”]}

{“developers”, [“Vasco Amaral”, “Miguel Seijo Simo”,”Luis Oliveira”]}

Want to keep up on our latest posts? Please click here to subscribe to our mailing list if you have not already. We only email post updates. You can also follow us on twitter at @webrtcHacks for blog updates.

4 comments on “Project WONDER: showing WebRTC NNI does not need SIP”

I am totally missing the point on how WONDER handles inter-domain. In called-party domain hosting (calling-party domain hosting, resp.) Alice (Bob, resp.) is not being authenticated. Since it is inter-domain, I presume the hosting domain can not authenticate the “visitor”.

I agree that NNI is not needed, but there has to be a way for Bob’s server to authenticate Alice, likely via her server or a 3rd party id provider. That critical piece is missing in your framework. In my app ffonio.in, I am using OpenID or an “unauthenticated, dummy id” for this purpose.

I probably don’t follow what you mean by “signaling-on-the fly”, but what you are describing is nothing more than what is derived from WebRTC directly.

Since WebRTC Identity related functionalities are still not implemented by browsers, we have just used a simple IDP javascript class to handle all Identity management related aspects including the instantiation of an Identity. As soon as the standards, mentioned above are finalized and fully implemented, I’m expecting the browser to natively support Identity management functionalities.

So, at the end this means, “signaling on-the-fly” does not directly address identity management issues but attempts to be compliant with on-going standardisation activities on this domain, mainly by extending RTCIdentityAssertion to also include the assertion of MessagingStubs. Users Authentication should be done outside “signaling on-the-fly” procedures which are agnostic of the IDP and authentication protocols used.

The “signaling on-the-fly” only applies WebRTC principles to truly support inter-operability between any WebRTC Application. Lets take as example, a Portugal Telecom WebRTC single-page application featuring presence enriched contact list and audio and video conversations. The “signaling on-the-fly” enables to have in my contact list identities from other domains (eg Google Hangout subscribers) and to setup conversations with them without leaving my web app, ie keeping the user experience designed by my service provider.

What does it imply? To have a *standard signaling API* used by WebRTC Applications that is able to dinamicaly select and use different signaling protocol stacks according to the domain the peer belongs to i.e. signaling on-the-fly”.

Can you help me to understand how WONDER facilitates in setting up “conversations without leaving the web app”. I follow that user can click on the name in the web app’s address book. Subsequent to that the user experience for that session will be determined by the called domain, no? For example, if the called domain is mine, then you will get a pop up text chat window, where both sides have to further agree to which mode to use and only then the media path is setup. Admittedly, another implementation would be different. Still it is likely that in some conversations, the initiators may not be continue to use the same UI/UX provided by their SPs.

The dialog user experience for the conversation setup should always be set by your service provider. When you decide to setup a conversation with someone from other domain, you will use the “standard” signaling on the fly API to send your offer: MessagingStub.sendMessage(invitation). The invitation message is described in JSON plus the needed SDP collected from the PeerConnection. Then, the MessagingStub will handle this message and translate it, if needed, to something else (eg SIP INVITE message) according to the signalling protocol used by the remote peer (assuming the remote peer is the conversation host).

Currently, we are mixing JSON and SDP but hopefully, with ORTC, we will use pure JSON.