WebRTC Architecture and How does it Work

Technology has often been the key driver behind the change in our day to day lives. One such technology that changed our lives for good and made remote or online work a possibility is the WebRTC architecture. Companies realised the many merits of online work, collaboration and communication possibilities, and have moved to working online permanently. But what is WebRTC and what goes on behind the scenes or the architecture that makes it all so seamless? Let’s find out.

De-coding WebRTC

Simply put, WebRTC is a simplified way of real-time web communication via the web. It is primarily used for peer to peer connection on the web where a connection is established between two browsers. Users can share different data forms such as audio, live video stream, and more. Zoom is a classic example of this form of communication. Since its inception, WebRTC was designed to enable direct communication between browsers.

While on the face of it all of this appears seamless there is a complex WebRTC architecture in the background that runs the show. A basic infrastructure on the server side is a prerequisite of any WebRTC application for exchange of signalling messages. WebRTC apps that support media exchange like Tragofone, involve a more complex behind the scenes architecture.

Given below is a diagram depicting a typical WebRTC architecture diagram:

WebRTC Security Architecture: How does it work and what are the components involved?

#1. Peer to peer connection

Most WebRTC apps are based on P2P (peer to peer) architecture. In a P2P connection the participants involved transfer data from one end to another independently, without relying on a middleman in most cases. Even if one of the call participants disconnects the call or gets dropped off for whatever reason, the other participants can keep sharing data. This feature makes WebRTC architecture popular over traditional communication technologies where users can’t continue sharing data if the server connection is lost. Also, peers are geographically closer to one another, which means data doesn’t have to travel long distances.

#2. Signalling server

The call is now connected and progressing. The call initiator or the admin needs to keep track of people who join or leave the conversation, and dispose or create connections respectively. A signalling server will help to keep track of these events. A signalling server facilitates the initial connection between 2 or more peers who would like to communicate with each other. A signalling server is required at the time of call initiation, it is not required during an on-going communication. However, one may use a signalling server to keep track of events like a peer disconnecting mid-way. There are multiple ways in which one can implement a signalling server, the only prerequisite being a bridge between two peers.

[vc_row row_width=”” nav_skin=”light” consent_include=”include” el_design=””][vc_column css_animation=””][vc_content_block block=”95586″][/vc_column][/vc_row]

#3. SDP (Session Description Protocol)

You initiated a call, a peer joined the conversation, but how does one establish a connection and exchange information without knowing about each other’s systems. SDP comes to the rescue. SDP fetches details like what agent a peer is using, the kind of hardware the peer supports, the kind of media a peer would like to exchange, and more. An SDP will represent an answer or an offer.

Offer/Answer: While initiating a connection request we are actually making an offer for which we should get an answer in return. Offer / answer is bi-directional, meaning it does not matter which side initiates the connection, the outcome will be the same.

ICE (Interactive connection establishment) candidates: A peer may have multiple communication transports such as multiple private IPs/ports or multiple public IPs/ports or various protocols or one or more reverse proxies, etc. Once an SDP offer is created, WebRTC will make an effort to find every possible communication transport to the browser which is termed as the ICE candidate. An ICE candidate is a key-value pair that should be added to the SDP. There are two ways to do this:
WebRTC finds every possible candidate and sends a complete SDP.
Send each detected ICE candidate with the signalling server and gradually extend the SDP.
WebRTC will ideally alternate between ICEs and pick the most viable option.

NAT (Network Address Translation): Internet and networking have evolved over the years. Most machines are connected to a global network through a NAT layer (Network Address Translation). What does it imply? It means that the private IP/port of a machine connected through NAT is translated to a different public IP/port when transporting through the router.

WebRTC is designed with an objective to establish a direct connection between two parties, but because of the NAT layer both parties connect through a proxy which results in some complications. Different NAT configurations (Normal (full cone) NAT, Restricted cone NAT, Port restricted cone NAT, Symmetric NAT) establish direct connections differently. WebRTC applications use TURN servers to connect machines on NAT to those located in the public internet for forwarding of media data between browsers.

Types of WebRTC Architecture

There are primarily three main types of WebRTC architecture.

Peer to peer
Multi-point conferencing units
Selective conferencing units

Each architecture fits well in different scenarios and comes with its own set of strengths and weaknesses. Let’s walk through each of them one by one.

#1. Peer to peer architecture

We have spoken at length about peer-to-peer communication and how it has been designed for direct communication between two participants. However, this WebRTC architecture has its own merits and de-merits.

Advantages of peer to peer architecture

Peer to peer architecture is simple to implement and has a low application operating cost, as the backend infrastructure is minimal. The architecture is designed such that it ensures end-to-end security between participants. The data being exchanged need not be encrypted as there are no intermediaries involved in between.

Disadvantages of peer to peer architecture

The peer-to-peer communication WebRTC architecture is not a good fit for multiparty calls. In a multiparty call a participant shares the media content with all other participants on the call. Sharing media content with multiple participants requires a significant amount of uplink bandwidth, and involves significant computational cost for each client device as it must encode the same stream multiple times making peer to peer architecture a misfit.

[vc_row row_width=”” nav_skin=”light” consent_include=”include” el_design=””][vc_column css_animation=””][vc_content_block block=”95596″][/vc_column][/vc_row]

#2. Multi-point conferencing unit

Multipoint Conferencing Units (MCU) have been used for years in conjunction with legacy conferencing systems. In the MCU architecture each conference participant sends his or her stream to the MCU which decodes each received stream, rescales it, composes a new stream from all received streams, encodes it, and sends a single to all other participants. MCUs are a great fit for multi-party calls in cases where legacy systems are still in use.

Advantages of MCUs

MCU approach requires little or no intelligence in device endpoints, as the logic is located in the MCU itself. As a result MCUs generate output streams with different quality for different participants depending on their specific downlink conditions making MCUs a reliable choice for low capacity networks. No wonder the MCU approach has been widely used for many years and still remains a popular choice with establishments still having a part of their communications on legacy systems.

Disadvantages of MCUs

MCUs can lead to higher lag times as recomposing media to be sent to different channels requires time. Besides, the media quality may be compromised due to packet loss on one of the links, as it must wait for the complete frame to encode.

#3. Selective forwarding unit

In Selective Forwarding Units (SFUs) architecture, every participant sends the media stream to a centralized server (SFU) and receives streams from all other participants via the same central server. The architecture thus enables a participant to send multiple media streams to the SFU, where the SFU decides which of the media streams to forward to the other call participants. The SFU architecture is thus one of the most popular WebRTC architectures in use in the modern business landscape.

Advantages of SFU

SFU does not decode and re-encode received streams. It simply forwards streams between call participants. In the case of SFU device endpoints are more intelligent and have more computing power as compared to the MCU architecture. SFU architecture is capable of working seamlessly with asymmetric bandwidth and adding more streams is fairly easy. It also provides support for various screen layouts.

Disadvantages of SFU

One of the biggest challenges of SFU architecture is the fact that it does not support server-side recording. It also requires higher bandwidth as compared to video conferencing solutions based on other architectures.

[vc_row row_width=”” nav_skin=”light” consent_include=”include” el_design=””][vc_column css_animation=””][vc_content_block block=”95591″][/vc_column][/vc_row]

Which WebRTC architecture seems the most relevant in the current business landscape?

Like clothes, when it comes to WebRTC architecture there is no one size fits all. P2P, MFU, and SFU have their own merits and de-merits as discussed at length above. Depending on what a particular business requires one architecture might suit one but not necessarily the other.

Though P2P architecture is cost-effective and simple to implement, it does not scale well with multiple participants on a single call. P2P architecture based WebRTC applications provide only direct media communication between two WebRTC endpoints and does not work for legacy, non-WebRTC capable endpoints
MCU architecture acts as a WebRTC gateway to legacy systems. If your business requires you to build a service that involves features such as computer vision, speech analytics, or media recording, and can work in conjunction with legacy systems then MCU is the ideal fit for you as such capabilities will always require a central server to provide support.
SFU architecture supports scaling your communication capabilities and needs less computing power on the server, since the computing requirements are delegated to the endpoints. However, SFU architecture requires a high network because of the high number of media streams being exchanged.

[vc_row row_width=”” nav_skin=”light” consent_include=”include” el_design=””][vc_column css_animation=””][vc_content_block block=”95586″][/vc_column][/vc_row]

Looking forward

One thing is clear that none of the WebRTC architecture discussed here can be deemed as superior to the other. At the end of the day choose one that fits you well, fulfils your requirements, and makes you feel confident and in control. Though the behind the scenes of WebRTC may look complex, the fact remains that from a user’s standpoint it is a fairly easy to use conversation starter between two browsers. The telecommunication and IT space is evolving at a rapid pace, we can expect a lot of path breaking discoveries in the future. The future is exciting with endless possibilities !

75 Posts

Maulik Shah

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

webtc softphone vs traditional softphone

Outdated or Upgraded? Choosing Between WebRTC Softphone and Traditional Softphones

Softphone November 8, 2024

Patrick Gentemann videoplay

Ubefone , France

Partnering with Tragofone has transformed our telecom services. Their softphone solution delivers exceptional call quality, reliability, and customization, greatly enhancing our customer experience. The seamless integration and outstanding support team further boost our service capabilities. Tragofone's dedication and innovation make it a highly recommended partner for any telecom operator seeking superior softphone solutions.

Rafael Manarin

Vox City, Brazil

Tragofone's team exceeds expectations by creating VoxFone, an app that perfectly caters to the needs of the Brazilian market. It boasts various user-friendly features, including text chat, video calls, internal and external calls, and a straightforward setup process. Enter your login credentials and you're ready to go. No technical expertise is required! The Brazilian public loved how strong, compatible, and easy to use the system was! Thanks, Tragofone crew!

Arjan Westmass

WeCloudit, Netherlands

WeCloudit was looking for a solution that would bring our business and our clients further. A robust and stable solution to reliably make telephone calls via an "app" on Android and IOS. Tragofone has proved to be such a solution provider and has proactively helped us reach that goal. Many clients now enjoy the freedom of direct VoIP calls from there cellular devices.

John Farhat

Loquantur, INC., United States

For years, we struggled to offer a secure, self-configurable app for our clients. Enter Tragofone! They built a custom, encrypted app with QR code setup, taking clients from zero to operational in 30-40 seconds. The responsive Tragofone team became a true extension of ours, always exceeding expectations.

Patrick Williams

Wocom, Jamaica

Tragofone transformed our business. Tragofone simplified everything with one-step onboarding, clear calling, and secure chat. It's a reliable, user-friendly solution that improves both our operations and our clients' communication. Since incorporating Tragofone into our systems, we have witnessed a remarkable transformation. The integration process is now streamlined into a one-step procedure, simplifying the onboarding of our clients.

Peter Enumah

Cedarview , Nigeria

Tragofone consistently delivers exceptional customer experiences through innovative, tailored solutions that surpass expectations. Their VoIP offerings expertly address the Nigerian market's needs, showcasing outstanding voice quality, reliability, and intuitive interfaces. Tragofone's dedication to excellence, innovation, and customer satisfaction makes them an exemplary partner for telecommunications companies seeking premium VoIP solutions.

Yaser Ali Akram

Easy Solutions , Netherlands

I am pleased to share my experience with Werk Tel App in the Dutch market.Since its launch, the app has significantly streamlined our operations, providing a seamless and efficient solution for our customers.The user-friendly interface, robust performance, and reliable functionality have contributed to an improved customer experience.Additionally, the support from the Tragofone Team has been exceptional, ensuring smooth integration and ongoing enhancements.We appreciate the expertise and dedication of your team, and we look forward to continuing our collaboration as we expand our reach in the Netherlands.

Get in touch with our team today

First Name

Work Email

Phone Number

Company Name

Country

By Features