This post is also available in 简体中文, 日本語 and Español.
In the last two years, there has been a rapid rise in real-time apps that help groups of people get together virtually with near-zero latency. User expectations have also increased: your users expect real-time video and audio features to work flawlessly. We found that developers building real-time apps want to spend less time building and maintaining low-level infrastructure. Developers also told us they want to spend more time building features that truly make their idea special.
So today, we are announcing a new product that lets developers build real-time audio/video apps. Cloudflare Calls exposes a set of APIs that allows you to build things like:
- A video conferencing app with a custom UI
- An interactive conversation where the moderators can invite select audience members “on stage” as speakers
- A privacy-first group workout app where only the instructor can view all the participants while the participants can only view the instructor
- Remote 'fireside chats' where one or multiple people can have a video call with an audience of 10,000+ people in real time (<100ms delay)
The protocol that makes all this possible is WebRTC. And Cloudflare Calls is the product that abstracts away the complexity by turning the Cloudflare network into a “super peer,” helping you build reliable and secure real-time experiences.
What is WebRTC?
WebRTC is a peer-to-peer protocol that enables two or more users’ devices to talk to each other directly and without leaving the browser. In a native implementation, peer-to-peer typically works well for 1:1 calls with only two participants. But as you add additional participants, it is common for participants to experience reliability issues, including video freezes and participants getting out of sync. Why? Because as the number of participants increases, the coordination overhead between users’ devices also increases. Each participant needs to send media to each other participant, increasing the data consumption from each computer exponentially.
A selective forwarding unit (SFU) solves this problem. An SFU is a system that connects users with each other in real-time apps by intelligently managing and routing video and audio data between the participants. Apps that use an SFU reduce the data capacity required from each user because each user doesn’t have to send data to every other user. SFUs are required parts of a real-time application when the applications need to determine who is currently speaking or when they want to send appropriate resolution video when WebRTC simulcast is used.
The centralized nature of an SFU is also its weakness. A centralized WebRTC server needs a region, which means that it will be slow in most parts of the world for most users while being fast for only a few select regions.
Typically, SFUs are built on public clouds. They consume a lot of bandwidth by both receiving and sending high resolution media to many devices. And they come with significant devops overhead requiring your team to manually configure regions and scalability.
We realized that merely offering an SFU-as-a-service wouldn’t solve the problem of cost and bandwidth efficiency.
Biggest WebRTC server in the world
When you are on a five-person video call powered by a classic WebRTC implementation, each person’s device talks directly with each other. In WebRTC parlance, each of the five participants is called a peer. And the reliability of the five-person call will only be as good as the reliability of the person (or peer) with the weakest Internet connection.
We built Calls with a simple premise: “What if Cloudflare could act as a WebRTC peer?”. Calls is a “super peer” or a “giant server that spans the whole world” allows applications to be built beyond the limitations of the lowest common denominator peer or a centralized SFU. Developers can focus on the strength of their app instead of trying to compensate for the weaknesses of the weakest peer in a p2p topology.
Calls does not use the traditional SFU topology where every participant connects to a centralized server in a single location. Instead, each participant connects to their local Cloudflare data center. When another participant wants to retrieve that media, the datacenter that homes that original media stream is found and the tracks are forwarded between datacenters automatically. If two participants are physically close their media does not travel around the world to a centralized region, instead they use the same datacenter, greatly reducing latency and improving reliability.
Calls is a configurable, global, regionless WebRTC server that is the size of Cloudflare's ever-growing network. The WebRTC protocol enables peers to send and receive media tracks. When you are on a video call, your computer is typically sending two tracks: one that contains the audio of you speaking and another that contains the video stream from your camera. Calls implements the WebRTC RTCPeerConnection API across the Cloudflare Network where users can push media tracks. Calls also exposes an API where other media tracks can be requested within the same Peer Connection context.
Cloudflare Calls will be a good solution if you operate your own WebRTC server such as Janus or MediaSoup. Cloudflare Calls can also replace existing deployments of Janus or MediaSoup, especially in cases where you have clients connecting globally to a single, centralized deployment.
Building and maintaining your own real-time infrastructure comes with unique architecture and scaling challenges. It requires you to answer and constantly revise your answers to thorny questions such as “which regions do we support?”, “how many users do we need to justify spinning up more infrastructure in yet another cloud region?”, “how do we scale for unplanned spikes in usage?” and “how do we not lose money during low-usage hours of our infrastructure?” when you run your own WebRTC server infrastructure.
Cloudflare Calls eliminates the need to answer these questions. Calls uses anycast for every connection, so every packet is always routed to the closest Cloudflare location. It is global by nature: your users are automatically served from a location close to them. Calls scales with your use and your team doesn’t have to build its own auto-scaling logic.
Calls runs on every Cloudflare location and every single Cloudflare server. Because the Cloudflare network is within 10 milliseconds of 90% of the world’s population, it does not add any noticeable latency.
Answer “where’s the problem?”, only faster
When we talk to customers with existing WebRTC workloads, there is one consistent theme: customers wish it was easier to troubleshoot issues. When a group of people are talking over a video call, the stakes are much higher when users experience issues. When a web page fails to load, it is common for users to simply retry after a few minutes. When a video call is disruptive, it is often the end of the call.
Cloudflare Calls’ focus on observability will help customers get to the bottom of the issues faster. Because Calls is built on Cloudflare’s infrastructure, we have end-to-end visibility from all layers of the OSI model.
Calls provides a server side view of the WebRTC Statistics API, so you can drill into issues each Peer Connection and the flow of media within without depending only on data sent from clients. We chose this because the Statistics API is a standardized place developers are used to getting information about their experience. It is the same API available in browsers, and you might already be using it today to gain insight into the performance of your WebRTC connections.
Privacy and security at the core
Calls eliminates the need for participants to share information such as their IP address with each other. Let’s say you are building an app that connects therapists and patients via video calls. With a traditional WebRTC implementation, both the patient and therapist’s devices would talk directly with each other, leading to exposure of potentially sensitive data such as the IP address. Exposure of information such as the IP address can leave your users vulnerable to denial-of-service attacks.
When using Calls, you are still using WebRTC, but the individual participants are connecting to the Cloudflare network. If four people are on a video call powered by Cloudflare Calls, each of the four participants' devices will be talking only with the Cloudflare network. To your end users, the experience will feel just like a peer-to-peer call, only with added security and privacy upside.
Finally, all video and audio traffic that passes through Cloudflare Calls is encrypted by default. Calls leverages existing Cloudflare products including Argo to route the video and audio content in a secure and efficient manner. Calls API enables granular controls that cannot be implemented with vanilla WebRTC alone. When you build using Calls, you are only limited by your imagination; not the technology.
We’re releasing Cloudflare Calls in closed beta today. To try out Cloudflare Calls, request an invitation and check your inbox in coming weeks.
Calls will be free during the beta period. We're looking to work with early customers who want to take Calls from beta to generally available with us. If you are building a real-time video app today, having challenges scaling traditional WebRTC infrastructure, or just have a great idea you want to explore, leave a comment when you are requesting an invitation, and we’ll reach out.