WebRTC Architecture – A Layman’s Guide
Technology has often been the key driver behind the change in our day to day lives. One such technology that changed our lives for good and made remote or online work a possibility is the WebRTC architecture. Companies realised the many merits of online work, collaboration and communication possibilities, and have moved to working online permanently. But what is WebRTC and what goes on behind the scenes or the architecture that makes it all so seamless? Let’s find out.
Table of Contents
Simply put, WebRTC is a simplified way of real-time web communication via the web. It is primarily used for peer to peer connection on the web where a connection is established between two browsers. Users can share different data forms such as audio, live video stream, and more. Zoom is a classic example of this form of communication. Since its inception, WebRTC was designed to enable direct communication between browsers.
While on the face of it all of this appears seamless there is a complex WebRTC architecture in the background that runs the show. A basic infrastructure on the server side is a prerequisite of any WebRTC application for exchange of signalling messages. WebRTC apps that support media exchange like Tragofone, involve a more complex behind the scenes architecture.
Given below is a diagram depicting a typical WebRTC architecture diagram:
WebRTC Security Architecture: How does it work and what are the components involved?
#1. Peer to peer connection
Most WebRTC apps are based on P2P (peer to peer) architecture. In a P2P connection the participants involved transfer data from one end to another independently, without relying on a middleman in most cases. Even if one of the call participants disconnects the call or gets dropped off for whatever reason, the other participants can keep sharing data. This feature makes WebRTC architecture popular over traditional communication technologies where users can’t continue sharing data if the server connection is lost. Also, peers are geographically closer to one another, which means data doesn’t have to travel long distances.
#2. Signalling server
The call is now connected and progressing. The call initiator or the admin needs to keep track of people who join or leave the conversation, and dispose or create connections respectively. A signalling server will help to keep track of these events. A signalling server facilitates the initial connection between 2 or more peers who would like to communicate with each other. A signalling server is required at the time of call initiation, it is not required during an on-going communication. However, one may use a signalling server to keep track of events like a peer disconnecting mid-way. There are multiple ways in which one can implement a signalling server, the only prerequisite being a bridge between two peers.
#3. SDP (Session Description Protocol)
You initiated a call, a peer joined the conversation, but how does one establish a connection and exchange information without knowing about each other’s systems. SDP comes to the rescue. SDP fetches details like what agent a peer is using, the kind of hardware the peer supports, the kind of media a peer would like to exchange, and more. An SDP will represent an answer or an offer.
- Offer/Answer: While initiating a connection request we are actually making an offer for which we should get an answer in return. Offer / answer is bi-directional, meaning it does not matter which side initiates the connection, the outcome will be the same.
- ICE (Interactive connection establishment) candidates: A peer may have multiple communication transports such as multiple private IPs/ports or multiple public IPs/ports or various protocols or one or more reverse proxies, etc. Once an SDP offer is created, WebRTC will make an effort to find every possible communication transport to the browser which is termed as the ICE candidate. An ICE candidate is a key-value pair that should be added to the SDP. There are two ways to do this:
- WebRTC finds every possible candidate and sends a complete SDP.
- Send each detected ICE candidate with the signalling server and gradually extend the SDP.
- WebRTC will ideally alternate between ICEs and pick the most viable option.
- NAT (Network Address Translation): Internet and networking have evolved over the years. Most machines are connected to a global network through a NAT layer (Network Address Translation). What does it imply? It means that the private IP/port of a machine connected through NAT is translated to a different public IP/port when transporting through the router.
WebRTC is designed with an objective to establish a direct connection between two parties, but because of the NAT layer both parties connect through a proxy which results in some complications. Different NAT configurations (Normal (full cone) NAT, Restricted cone NAT, Port restricted cone NAT, Symmetric NAT) establish direct connections differently. WebRTC applications use TURN servers to connect machines on NAT to those located in the public internet for forwarding of media data between browsers.
Types of WebRTC Architecture
There are primarily three main types of WebRTC architecture.
- Peer to peer
- Multi-point conferencing units
- Selective conferencing units
Each architecture fits well in different scenarios and comes with its own set of strengths and weaknesses. Let’s walk through each of them one by one.
#1. Peer to peer architecture
We have spoken at length about peer-to-peer communication and how it has been designed for direct communication between two participants. However, this WebRTC architecture has its own merits and de-merits.
Advantages of peer to peer architecture
Peer to peer architecture is simple to implement and has a low application operating cost, as the backend infrastructure is minimal. The architecture is designed such that it ensures end-to-end security between participants. The data being exchanged need not be encrypted as there are no intermediaries involved in between.
Disadvantages of peer to peer architecture
The peer-to-peer communication WebRTC architecture is not a good fit for multiparty calls. In a multiparty call a participant shares the media content with all other participants on the call. Sharing media content with multiple participants requires a significant amount of uplink bandwidth, and involves significant computational cost for each client device as it must encode the same stream multiple times making peer to peer architecture a misfit.
#2. Multi-point conferencing unit
Multipoint Conferencing Units (MCU) have been used for years in conjunction with legacy conferencing systems. In the MCU architecture each conference participant sends his or her stream to the MCU which decodes each received stream, rescales it, composes a new stream from all received streams, encodes it, and sends a single to all other participants. MCUs are a great fit for multi-party calls in cases where legacy systems are still in use.
Advantages of MCUs
MCU approach requires little or no intelligence in device endpoints, as the logic is located in the MCU itself. As a result MCUs generate output streams with different quality for different participants depending on their specific downlink conditions making MCUs a reliable choice for low capacity networks. No wonder the MCU approach has been widely used for many years and still remains a popular choice with establishments still having a part of their communications on legacy systems.
Disadvantages of MCUs
MCUs can lead to higher lag times as recomposing media to be sent to different channels requires time. Besides, the media quality may be compromised due to packet loss on one of the links, as it must wait for the complete frame to encode.
#3. Selective forwarding unit
In Selective Forwarding Units (SFUs) architecture, every participant sends the media stream to a centralized server (SFU) and receives streams from all other participants via the same central server. The architecture thus enables a participant to send multiple media streams to the SFU, where the SFU decides which of the media streams to forward to the other call participants. The SFU architecture is thus one of the most popular WebRTC architectures in use in the modern business landscape.
Advantages of SFU
SFU does not decode and re-encode received streams. It simply forwards streams between call participants. In the case of SFU device endpoints are more intelligent and have more computing power as compared to the MCU architecture. SFU architecture is capable of working seamlessly with asymmetric bandwidth and adding more streams is fairly easy. It also provides support for various screen layouts.
Disadvantages of SFU
One of the biggest challenges of SFU architecture is the fact that it does not support server-side recording. It also requires higher bandwidth as compared to video conferencing solutions based on other architectures.
Which WebRTC architecture seems the most relevant in the current business landscape?
Like clothes, when it comes to WebRTC architecture there is no one size fits all. P2P, MFU, and SFU have their own merits and de-merits as discussed at length above. Depending on what a particular business requires one architecture might suit one but not necessarily the other.
- Though P2P architecture is cost-effective and simple to implement, it does not scale well with multiple participants on a single call. P2P architecture based WebRTC applications provide only direct media communication between two WebRTC endpoints and does not work for legacy, non-WebRTC capable endpoints
- MCU architecture acts as a WebRTC gateway to legacy systems. If your business requires you to build a service that involves features such as computer vision, speech analytics, or media recording, and can work in conjunction with legacy systems then MCU is the ideal fit for you as such capabilities will always require a central server to provide support.
- SFU architecture supports scaling your communication capabilities and needs less computing power on the server, since the computing requirements are delegated to the endpoints. However, SFU architecture requires a high network because of the high number of media streams being exchanged.
One thing is clear that none of the WebRTC architecture discussed here can be deemed as superior to the other. At the end of the day choose one that fits you well, fulfils your requirements, and makes you feel confident and in control. Though the behind the scenes of WebRTC may look complex, the fact remains that from a user’s standpoint it is a fairly easy to use conversation starter between two browsers. The telecommunication and IT space is evolving at a rapid pace, we can expect a lot of path breaking discoveries in the future. The future is exciting with endless possibilities !