The Cloudflare Blog

MoQ: Refactoring the Internet's real-time media stack

Mike English — Fri, 22 Aug 2025 14:00:00 GMT

For over two decades, we've built real-time communication on the Internet using a patchwork of specialized tools. RTMP gave us ingest. HLS and DASH gave us scale. WebRTC gave us interactivity. Each solved a specific problem for its time, and together they power the global streaming ecosystem we rely on today.

But using them together in 2025 feels like building a modern application with tools from different eras. The seams are starting to show—in complexity, in latency, and in the flexibility needed for the next generation of applications, from sub-second live auctions to massive interactive events. We're often forced to make painful trade-offs between latency, scale, and operational complexity.

Today Cloudflare is launching the first Media over QUIC (MoQ) relay network, running on every Cloudflare server in datacenters in 330+ cities. MoQ is an open protocol being developed at the IETF by engineers from across the industry—not a proprietary Cloudflare technology. MoQ combines the low-latency interactivity of WebRTC, the scalability of HLS/DASH, and the simplicity of a single architecture, all built on a modern transport layer. We're joining Meta, Google, Cisco, and others in building implementations that work seamlessly together, creating a shared foundation for the next generation of real-time applications on the Internet.

An evolutionary ladder of compromise

To understand the promise of MoQ, we first have to appreciate the history that led us here—a journey defined by a series of architectural compromises where solving one problem inevitably created another.

The RTMP era: Conquering latency, compromising on scale

In the early 2000s, RTMP (Real-Time Messaging Protocol) was a breakthrough. It solved the frustrating "download and wait" experience of early video playback on the web by creating a persistent, stateful TCP connection between a Flash client and a server. This enabled low-latency streaming (2-5 seconds), powering the first wave of live platforms like Justin.tv (which later became Twitch).

But its strength was its weakness. That stateful connection, which had to be maintained for every viewer, was architecturally hostile to scale. It required expensive, specialized media servers and couldn't use the commodity HTTP-based Content Delivery Networks (CDNs) that were beginning to power the rest of the web. Its reliance on TCP also meant that a single lost packet could freeze the entire stream—a phenomenon known as head-of-line blocking—creating jarring latency spikes. The industry retained RTMP for the "first mile" from the camera to servers (ingest), but a new solution was needed for the "last mile" from servers to your screen (delivery).

The HLS & DASH era: Solving for scale, compromising on latency

The catalyst for the next era was the iPhone's rejection of Flash. In response, Apple created HLS (HTTP Live Streaming). HLS, and its open-standard counterpart MPEG-DASH abandoned stateful connections and treated video as a sequence of small, static files delivered over standard HTTP.

This enabled much greater scalability. By moving to the interoperable open standard of HTTP for the underlying transport, video could now be distributed by any web server and cached by global CDNs, allowing platforms to reach millions of viewers reliably and relatively inexpensively. The compromise? A significant trade-off in latency. To ensure smooth playback, players needed to buffer at least three video segments before starting. With segment durations of 6-10 seconds, this baked 15-30 seconds of latency directly into the architecture.

While extensions like Low-Latency HLS (LL-HLS) have more recently emerged to achieve latencies in the 3-second range, they remain complex patches fighting against the protocol's fundamental design. These extensions introduce a layer of stateful, real-time communication—using clever workarounds like holding playlist requests open—that ultimately strain the stateless request-response model central to HTTP's scalability and composability.

The WebRTC Era: Conquering conversational latency, compromising on architecture

In parallel, WebRTC (Web Real-Time Communication) emerged to solve a different problem: plugin-free, two-way conversational video with sub-500ms latency within a browser. It worked by creating direct peer-to-peer (P2P) media paths, removing central servers from the equation.

But this P2P model is fundamentally at odds with broadcast scale. In a mesh network, the number of connections grows quadratically with each new participant (the "N-squared problem"). For more than a handful of users, the model collapses under the weight of its own complexity. To work around this, the industry developed server-based topologies like the Selective Forwarding Unit (SFU) and Multipoint Control Unit (MCU). These are effective but require building what is essentially a private, stateful, real-time CDN—a complex and expensive undertaking that is not standardized across infrastructure providers.

This journey has left us with a fragmented landscape of specialized, non-interoperable silos, forcing developers to stitch together multiple protocols and accept a painful three-way tension between latency, scale, and complexity.

Introducing MoQ

This is the context into which Media over QUIC (MoQ) emerges. It's not just another protocol; it's a new design philosophy built from the ground up to resolve this historical trilemma. Born out of an open, community-driven effort at the IETF, MoQ aims to be a foundational Internet technology, not a proprietary product.

Its promise is to unify the disparate worlds of streaming by delivering:

Sub-second latency at broadcast scale: Combining the latency of WebRTC with the scale of HLS/DASH and the simplicity of RTMP.
Architectural simplicity: Creating a single, flexible protocol for ingest, distribution, and interactive use cases, eliminating the need to transcode between different technologies.
Transport efficiency: Building on QUIC, a UDP based protocol to eliminate bottlenecks like TCP head-of-line blocking.

The initial focus was "Media" over QUIC, but the core concepts—named tracks of timed, ordered, but independent data—are so flexible that the working group is now simply calling the protocol "MoQ." The name reflects the power of the abstraction: it's a generic transport for any real-time data that needs to be delivered efficiently and at scale.

MoQ is now generic enough that it’s a data fanout or pub/sub system, for everything from audio/video (high bandwidth data) to sports score updates (low bandwidth data).

A deep dive into the MoQ protocol stack

MoQ's elegance comes from solving the right problem at the right layer. Let's build up from the foundation to see how it achieves sub-second latency at scale.

The choice of QUIC as MoQ's foundation isn't arbitrary—it addresses issues that have plagued streaming protocols for decades.

By building on QUIC (the transport protocol that also powers HTTP/3), MoQ solves some key streaming problems:

No head-of-line blocking: Unlike TCP where one lost packet blocks everything behind it, QUIC streams are independent. A lost packet on one stream (e.g., an audio track) doesn't block another (e.g., the main video track). This alone eliminates the stuttering that plagued RTMP.
Connection migration: When your device switches from Wi-Fi to cellular mid-stream, the connection seamlessly migrates without interruption—no rebuffering, no reconnection.
Fast connection establishment: QUIC's 0-RTT resumption means returning viewers can start playing instantly.
Baked-in, mandatory encryption: All QUIC connections are encrypted by default with TLS 1.3.

The core innovation: Publish/subscribe for media

With QUIC solving transport issues, MoQ introduces its key innovation: treating media as subscribable tracks in a publish/subscribe system. But unlike traditional pub/sub, this is designed specifically for real-time media at CDN scale.

Instead of complex session management (WebRTC) or file-based chunking (HLS), MoQ lets publishers announce named tracks of media that subscribers can request. A relay network handles the distribution without needing to understand the media itself.

How MoQ organizes media: The data model

Before we see how media flows through the network, let's understand how MoQ structures it. MoQ organizes data in a hierarchy:

Tracks: Named streams of media, like "video-1080p" or "audio-english". Subscribers request specific tracks by name.
Groups: Independently decodable chunks of a track. For video, this typically means a GOP (Group of Pictures) starting with a keyframe. New subscribers can join at any Group boundary.
Objects: The actual packets sent on the wire. Each Object belongs to a Track and has a position within a Group.

This simple hierarchy enables two capabilities:

Subscribers can start playback at Group boundaries without waiting for the next keyframe
Relays can forward Objects without parsing or understanding the media format

The network architecture: From publisher to subscriber

MoQ’s network components are also simple:

Publishers: Announce track namespaces and send Objects
Subscribers: Request specific tracks by name
Relays: Connect publishers to subscribers by forwarding immutable Objects without parsing or transcoding the media

A Relay acts as a subscriber to receive tracks from upstream (like the original publisher) and simultaneously acts as a publisher to forward those same tracks downstream. This model is the key to MoQ's scalability: one upstream subscription can fan out to serve thousands of downstream viewers.

The MoQ Stack

MoQ's architecture can be understood as three distinct layers, each with a clear job:

The Transport Foundation (QUIC or WebTransport): This is the modern foundation upon which everything is built. MoQT can run directly over raw QUIC, which is ideal for native applications, or over WebTransport, which is required for use in a web browser. Crucially, the WebTransport protocol and its corresponding W3C browser API make QUIC's multiplexed reliable streams and unreliable datagrams directly accessible to browser applications. This is a game-changer. Protocols like SRT may be efficient, but their lack of native browser support relegates them to ingest-only roles. WebTransport gives MoQ first-class citizenship on the web, making it suitable for both ingest and massive-scale distribution directly to clients.
The MoQT Layer: Sitting on top of QUIC (or WebTransport), the MoQT layer provides the signaling and structure for a publish-subscribe system. This is the primary focus of the IETF working group. It defines the core control messages—like ANNOUNCE, and SUBSCRIBE—and the basic data model we just covered. MoQT itself is intentionally spartan; it doesn't know or care if the data it's moving is H.264 video, Opus audio, or game state updates.
The Streaming Format Layer: This is where media-specific logic lives. A streaming format defines things like manifests, codec metadata, and packaging rules. WARP is one such format being developed alongside MoQT at the IETF, but it isn't the only one. Another standards body, like DASH-IF, could define a CMAF-based streaming format over MoQT. A company that controls both original publisher and end subscriber can develop its own proprietary streaming format to experiment with new codecs or delivery mechanisms without being constrained by the transport protocol.

This separation of layers is why different organizations can build interoperable implementations while still innovating at the streaming format layer.

End-to-End Data Flow

Now that we understand the architecture and the data model, let's walk through how these pieces come together to deliver a stream. The protocol is flexible, but a typical broadcast flow relies on the ANNOUNCE and SUBSCRIBE messages to establish a data path from a publisher to a subscriber through the relay network.

Here is a step-by-step breakdown of what happens in this flow:

Initiating Connections: The process begins when the endpoints, acting as clients, connect to the relay network. The Original Publisher initiates a connection with its nearest relay (we'll call it Relay A). Separately, an End Subscriber initiates a connection with its own local relay (Relay B). These endpoints perform a SETUP handshake with their respective relays to establish a MoQ session and declare supported parameters.
Announcing a Namespace: To make its content discoverable, the Publisher sends an ANNOUNCE message to Relay A. This message declares that the publisher is the authoritative source for a given track namespace. Relay A receives this and registers in a shared control plane (a conceptual database) that it is now a source for this namespace within the network.
Subscribing to a Track: When the End Subscriber wants to receive media, it sends a SUBSCRIBE message to its relay, Relay B. This message is a request for a specific track name within a specific track namespace.
Connecting the Relays: Relay B receives the SUBSCRIBE request and queries the control plane. It looks up the requested namespace and discovers that Relay A is the source. Relay B then initiates a session with Relay A (if it doesn't already have one) and forwards the SUBSCRIBE request upstream.
Completing the Path and Forwarding Objects: Relay A, having received the subscription request from Relay B, forwards it to the Original Publisher. With the full path now established, the Publisher begins sending the Objects for the requested track. The Objects flow from the Publisher to Relay A, which forwards them to Relay B, which in turn forwards them to the End Subscriber. If another subscriber connects to Relay B and requests the same track, Relay B can immediately start sending them the Objects without needing to create a new upstream subscription.

An Alternative Flow: The `PUBLISH` Model

More recent drafts of the MoQ specification have introduced an alternative, push-based model using a PUBLISH message. In this flow, a publisher can effectively ask for permission to send a track's objects to a relay without waiting for a SUBSCRIBE request. The publisher sends a PUBLISH message, and the relay's PUBLISH_OK response indicates whether it will accept the objects. This is particularly useful for ingest scenarios, where a publisher wants to send its stream to an entry point in the network immediately, ensuring the media is available the instant the first subscriber connects.

Advanced capabilities: Prioritization and congestion control

MoQ’s benefits really shine when networks get congested. MoQ includes mechanisms for handling the reality of network traffic. One such mechanism is Subgroups.

Subgroups are subdivisions within a Group that effectively map directly to the underlying QUIC streams. All Objects within the same Subgroup are generally sent on the same QUIC stream, guaranteeing their delivery order. Subgroup numbering also presents an opportunity to encode prioritization: within a Group, lower-numbered Subgroups are considered higher priority.

This enables intelligent quality degradation, especially with layered codecs (e.g. SVC):

Subgroup 0: Base video layer (360p) - must deliver
Subgroup 1: Enhancement to 720p - deliver if bandwidth allows
Subgroup 2: Enhancement to 1080p - first to drop under congestion

When a relay detects congestion, it can drop Objects from higher-numbered Subgroups, preserving the base layer. Viewers see reduced quality instead of buffering.

The MoQ specification defines a scheduling algorithm that determines the order for all objects that are "ready to send." When a relay has multiple objects ready, it prioritizes them first by group order (ascending or descending) and then, within a group, by subgroup id. Our implementation supports the group order preference, which can be useful for low-latency broadcasts. If a viewer falls behind and its subscription uses descending group order, the relay prioritizes sending Objects from the newest "live" Group, potentially canceling unsent Objects from older Groups. This can help viewers catch up to the live edge quickly, a highly desirable feature for many interactive streaming use cases. The optimal strategies for using these features to improve QoE for specific use cases are still an open research question. We invite developers and researchers to use our network to experiment and help find the answers.

Implementation: building the Cloudflare MoQ relay

Theory is one thing; implementation is another. To validate the protocol and understand its real-world challenges, we've been building one of the first global MoQ relay networks. Cloudflare's network, which places compute and logic at the edge, is very well suited for this.

Our architecture connects the abstract concepts of MoQ to the Cloudflare stack. In our deep dive, we mentioned that when a publisher ANNOUNCEs a namespace, relays need to register this availability in a "shared control plane" so that SUBSCRIBE requests can be routed correctly. For this critical piece of state management, we use Durable Objects.

When a publisher announces a new namespace to a relay in, say, London, that relay uses a Durable Object—our strongly consistent, single-threaded storage solution—to record that this namespace is now available at that specific location. When a subscriber in Paris wants a track from that namespace, the network can query this distributed state to find the nearest source and route the SUBSCRIBE request accordingly. This architecture builds upon the technology we developed for Cloudflare's real-time services and provides a solution to the challenge of state management at a global scale.

An Evolving Specification

Building on a new protocol in the open means implementing against a moving target. To get MoQ into the hands of the community, we made a deliberate trade-off: our current relay implementation is based on a subset of the features defined in draft-ietf-moq-transport-07. This version became a de facto target for interoperability among several open-source projects and pausing there allowed us to put effort towards other aspects of deploying our relay network.

This draft of the protocol makes a distinction between accessing "past" and "future" content. SUBSCRIBE is used to receive future objects for a track as they arrive—like tuning into a live broadcast to get everything from that moment forward. In contrast, FETCH provides a mechanism for accessing past content that a relay may already have in its cache—like asking for a recording of a song that just played.

Both are part of the same specification, but for the most pressing low-latency use cases, a performant implementation of SUBSCRIBE is what matters most. For that reason, we have focused our initial efforts there and have not yet implemented FETCH.

This is where our roadmap is flexible and where the community can have a direct impact. Do you need FETCH to build on-demand or catch-up functionality? Or is more complete support for the prioritization features within SUBSCRIBE more critical for your use case? The feedback we receive from early developers will help us decide what to build next.

As always, we will announce our updates and changes to our implementation as we continue with development on our developer docs pages.

Kick the tires on the future

We believe in building in the open and interoperability in the community. MoQ is not a Cloudflare technology but a foundational Internet technology. To that end, the first demo client we’re presenting is an open source, community example.

You can access the demo here: https://moq.dev/publish/

Even though this is a preview release, we are running MoQ relays at Cloudflare’s full scale, like we do every production service. This means every server that is part of the Cloudflare network in more than 330 cities is now a MoQ relay.

We invite you to experience the "wow" moment of near-instant, sub-second streaming latency that MoQ enables. How would you use a protocol that offers the speed of a video call with the scale of a global broadcast?

Interoperability

We’ve been working with others in the IETF WG community and beyond on interoperability of publishers, players and other parts of the MoQ ecosystem. So far, we’ve tested with:

Luke Curley’s moq.dev
Lorenzo Miniero’s imquic
Meta’s Moxygen
moq-rs
moq-js
Norsk
Vindral

The Road Ahead

The Internet's media stack is being refactored. For two decades, we've been forced to choose between latency, scale, and complexity. The compromises we made solved some problems, but also led to a fragmented ecosystem.

MoQ represents a promising new foundation—a chance to unify the silos and build the next generation of real-time applications on a scalable protocol. We're committed to helping build this foundation in the open, and we're just getting started.

MoQ is a realistic way forward, built on QUIC for future proofing, easier to understand than WebRTC, compatible with browsers unlike RTMP.

The protocol is evolving, the implementations are maturing, and the community is growing. Whether you're building the next generation of live streaming, exploring real-time collaboration, or pushing the boundaries of interactive media, consider whether MoQ may provide the foundation you need.

Availability and pricing

We want developers to start building with MoQ today. To make that possible MoQ at Cloudflare is in tech preview - this means it's available free of charge for testing (at any scale). Visit our developer homepage for updates and potential breaking changes.

Indie developers and large enterprises alike ask about pricing early in their adoption of new technologies. We will be transparent and clear about MoQ pricing. In general availability, self-serve customers should expect to pay 5 cents/GB outbound with no cost for traffic sent towards Cloudflare.

Enterprise customers can expect usual pricing in line with regular media delivery pricing, competitive with incumbent protocols. This means if you’re already using Cloudflare for media delivery, you should not be wary of adopting new technologies because of cost. We will support you.

If you’re interested in partnering with Cloudflare in adopting the protocol early or contributing to its development, please reach out to us at moq@cloudflare.com! Engineers excited about the future of the Internet are standing by.

Get involved:

Try the demo: https://moq.dev/publish/
Read the Internet draft: https://datatracker.ietf.org/doc/draft-ietf-moq-transport/
Contribute to the protocol’s development: https://datatracker.ietf.org/group/moq/documents/
Visit our developer homepage: https://developers.cloudflare.com/moq/

Make your apps truly interactive with Cloudflare Realtime and RealtimeKit

Zaid Farooqui — Wed, 09 Apr 2025 14:05:00 GMT

Over the past few years, we’ve seen developers push the boundaries of what’s possible with real-time communication — tools for collaborative work, massive online watch parties, and interactive live classrooms are all exploding in popularity.

We use AI more and more in our daily lives. Text-based interactions are evolving into something more natural: voice and video. When users interact with the applications and tools that AI developers create, we have high expectations for response time and connection quality. Complex applications of AI are built on not just one tool, but a combination of tools, often from different providers which requires a well connected cloud to sit in the middle for the coordination of different AI tools.

Developers already use Workers, Workers AI, and our WebRTC SFU and TURN services to build powerful apps without needing to think about coordinating compute or media services to be closest to their user. It’s only natural for there to be a singular "Region: Earth" for real-time applications.

We're excited to introduce Cloudflare Realtime — a suite of products to help you make your apps truly interactive with real-time audio and video experiences. Cloudflare Realtime now brings together our SFU, STUN, and TURN services, along with the new RealtimeKit.

Say hello to RealtimeKit

RealtimeKit is a collection of mobile SDKs (iOS, Android, React Native, Flutter), SDKs for the Web (React, Angular, vanilla JS, WebComponents), and server side services (recording, coordination, transcription) that make it easier than ever to build real-time voice, video, and AI applications. RealtimeKit also includes user interface components to build interfaces quickly.

The amazing team behind Dyte, a leading company in the real-time ecosystem, joined Cloudflare to accelerate the development of RealtimeKit. The Dyte team spent years focused on making real-time experiences accessible to developers of all skill levels, and had a deep understanding of the developer journey — they built abstractions that hid WebRTC's complexity without removing its power.

Already a user of Cloudflare’s products, Dyte was a perfect complement to Cloudflare’s existing real-time infrastructure spanning 300+ cities worldwide. They built a developer experience layer that made complex media capabilities accessible. We’re incredibly excited for their team to join Cloudflare as we help developers define the future of user interaction for real-time applications as one team.

Interactive applications shouldn't require WebRTC expertise

For many developers, what starts as "let's add video chat" can quickly escalate into weeks of technical deep dives into WebSockets and WebRTC. While we are big believers in the potential of WebRTC, we also know that it comes with real challenges when building for the first time. Debugging WebRTC sessions can require developers to learn about esoteric new concepts such as navigating ICE candidate failures, TURN server configurations, and SDP negotiation issues.

The challenges of building a WebRTC app for the first time don’t stop there. Device management adds another layer of complexity. Inconsistent camera and microphone APIs across browsers and mobile platforms introduce unexpected behaviors in production. Chrome handles resolution switching one way, Safari another, and Android WebViews break in uniquely frustrating ways. We regularly see applications that function perfectly in testing environments fail mysteriously when deployed to certain devices or browsers.

Systems that work flawlessly with 5 test users collapse under the load of 50 real-world participants. Bandwidth adaptation falters, connection management becomes unwieldy, and maintaining consistent quality across diverse network conditions proves nearly impossible without specialized expertise.

What starts as a straightforward feature becomes a multi-month project requiring low-level engineering to solve problems that aren’t core to your business.

We realized that we needed to extend our products to client devices to help solve these problems.

RealtimeKit SDKs for Kotlin, React Native, Swift, JavaScript, Flutter

RealtimeKit is our toolkit for building real-time applications without common WebRTC headaches. The core of RealtimeKit is a set of cross-platform SDKs that handle all the low-level complexities, from session establishment and media permissions to NAT traversal and connection management. Instead of spending weeks implementing and debugging these foundations, you can focus entirely on creating unique experiences for your users.

Recording capabilities come built-in, eliminating one of the most commonly requested yet difficult-to-implement features in real-time applications. Whether you need to capture meetings for compliance, save virtual classroom sessions for students who couldn't attend live, or enable content creators to archive their streams, RealtimeKit handles the entire media pipeline. No more wrestling with MediaRecorder APIs or building custom recording infrastructure — it just works, scaling alongside your user base.

We've also integrated voice AI capabilities from providers like ElevenLabs directly into the platform. Adding AI participants to conversations becomes as simple as a function call, opening up entirely new interaction models. These AI voices operate with the same low latency as human participants — tens of milliseconds across our global network — creating truly synchronous experiences where AI and humans converse naturally. Combined with RealtimeKit's ability to scale to millions of concurrent participants, this enables entirely new categories of applications that weren't feasible before.

The Developer Experience

RealtimeKit focuses on what developers want to accomplish, rather than how the underlying protocols work. Adding participants or turning on recording are just an API call away. SDKs handle device enumeration, permission requests, and UI rendering across platforms. Behind the scenes, we’re solving the thorny problems of media orchestration and state management that can be challenging to debug.

We’ve been quietly working towards launching the Cloudflare RealtimeKit for years. From the very beginning, our global network has been optimized for minimizing latency between our network and end users, which is where the majority of network disruptions are introduced.

We developed a Selective Forwarding Unit (SFU) that intelligently routes media streams between participants, dynamically adjusting quality based on network conditions. Our TURN infrastructure solves the complex problem of NAT traversal, allowing connections to be established reliably behind firewalls. With Workers AI, we brought inference capabilities to the edge, minimizing latency for AI-powered interactions. Workers and Durable Objects provided the WebSockets coordination layer necessary for maintaining consistent state across participants.

SFU and TURN services are now Generally Available

We’re also announcing the General Availability of our SFU and TURN services for WebRTC developers that need more control and a low-level integration with the Cloudflare network.

SFU now supports simulcast, a very common feature request. Simulcast allows developers to select media streams from multiple options, similar to selecting the quality level of an online video, but for WebRTC. Users with different network qualities are now able to receive different levels of quality, either automatically defined by the SFU or manually selected.

Our TURN service now offers advanced analytics with insight into regional, country, and city level usage metrics. Together with Custom Identifiers, and revocable tokens, Cloudflare’s TURN service offers an in-depth view into usage and helps avoid abuse.

Our SFU and TURN products continue to be one of the most affordable ways to build WebRTC apps at scale, at 5 cents per GB after 1,000 GB of free usage each month.

Partnering with Hugging Face to make realtime AI communication seamless

FastRTC is a lightweight Python library from Hugging Face that makes it easy to stream real-time audio and video into and out of AI models using WebRTC. TURN servers are a critical part of WebRTC infrastructure and ensure that media streams can reliably connect across firewalls and NATs. For users of FastRTC, setting up a globally distributed TURN server can be complex and expensive.

Through our new partnership with Hugging Face, FastRTC users now have free access to Cloudflare’s TURN Server product, giving them reliable connectivity out of the box. Developers get 10 GB of TURN bandwidth each month using just a Hugging Face access token — no setup, no credit card, no servers to manage. As projects grow, they can easily switch to a Cloudflare account for more capacity and a larger free tier.

This integration allows AI developers to focus on building voice interfaces, video pipelines, and multimodal apps without worrying about NAT traversal or network reliability. FastRTC simplifies the code, and Cloudflare ensures it works everywhere. See these demos to get started.

Ship AI-powered realtime apps in days, not weeks

With RealtimeKit, developers can now implement complex real-time experiences in hours. The SDKs abstract away the most time-consuming aspects of WebRTC development while providing APIs tailored to common implementation patterns. Here are a few of the possibilities:

Video conferencing: Add multi-participant video calls to your application with just a few lines of code. RealtimeKit handles the connection management, bandwidth adaptation, and device permissions that typically consume weeks of development time.
Live streaming: Build interactive broadcasts where hosts can stream to thousands of viewers while selectively bringing participants on-screen. The SFU automatically optimizes media routing based on participant roles and network conditions.
Real-time synchronization: Implement watch parties or collaborative viewing experiences where content playback stays synchronized across all participants. The timing API handles the complex delay calculations and adjustments traditionally required.
Voice AI integrations: Add transcription and AI voice participants without building custom media pipelines. RealtimeKit's media processing APIs integrate with your existing authentication and storage systems rather than requiring separate infrastructure.

When we’ve seen our early testers use the RealtimeKit, it doesn't just accelerate their existing projects, it fundamentally changes which projects become viable.

Get started with RealtimeKit

Starting today, you'll notice a new Realtime section in your Cloudflare Dashboard. This section includes our TURN and SFU products alongside our latest product, RealtimeKit.

RealtimeKit is currently in a closed beta ready for select customers to start kicking the tires. There is currently no cost to test it out during the beta. Request early access here or via the link in your Cloudflare dashboard. We can’t wait to see what you build.

Bring multimodal real-time interaction to your AI applications with Cloudflare Calls

Will Allen — Fri, 20 Dec 2024 14:00:00 GMT

OpenAI announced support for WebRTC in their Realtime API on December 17, 2024. Combining their Realtime API with Cloudflare Calls allows you to build experiences that weren’t possible just a few days earlier.

Previously, interactions with audio and video AIs were largely single-player: only one person could be interacting with the AI unless you were in the same physical room. Now, applications built using Cloudflare Calls and OpenAI’s Realtime API can now support multiple users across the globe simultaneously seeing and interacting with a voice or video AI.

Have your AI join your video calls

Here’s what this means in practice: you can now invite ChatGPT to your next video meeting:

We built this into our Orange Meets demo app to serve as an inspiration for what is possible, but the opportunities are much broader.

In the not-too-distant future, every company could have a 'corporate AI' they invite to their internal meetings that is secure, private and has access to their company data. Imagine this sort of real-time audio and video interactions with your company’s AI:

"Hey ChatGPT, do we have any open Jira tickets about this?"

"Hey Company AI, who are the competitors in the space doing Y?"

"AI, is XYZ a big customer? How much more did they spend with us vs last year?"

There are similar opportunities if your application is built for consumers: broadcasts and global livestreams can become much more interactive. The murder mystery game in the video above is just one example: you could build your own to play live with your friends in different cities.

WebRTC vs. WebSockets

These interactive multimedia experiences are enabled by the industry adoption of WebRTC, which stands for Web Real-time Communication.

Many real-time product experiences have historically used Websockets instead of WebRTC. Websockets operate over a single, persistent TCP connection established between a client and server. This is useful for maintaining a data sync for text-based chat apps or maintaining the state of gameplay in your favorite video game. Cloudflare has extensive support for Websockets across our network as well as in our AI Gateway.

If you were building a chat application prior to WebSockets, you would likely have your client-side app poll the server every n seconds to see if there are new messages to be displayed. WebSockets eliminated this need for polling. Instead, the client and the server establish a persistent, long-running connection to send and receive messages.

However, once you have multiple users across geographies simultaneously interacting with voice and video, small delays in the data sync can become unacceptable product experiences. Imagine building an app that does real-time translation of audio. With WebSockets, you would need to chunk the audio input, so each chunk contains 100–500 milliseconds of audio. That chunking size, along with the head-of-line blocking, becomes the latency floor for your ability to deliver a real-time multimodal experience to your users.

WebRTC solves this problem by having native support for audio and video tracks over UDP-based channels directly between users, eliminating the need for chunking. This lets you stream audio and video data to an AI model from multiple users and receive audio and video data back from the AI model in real-time.

Realtime AI fanout using Cloudflare Calls

Historically, setting up the underlying infrastructure for WebRTC — servers for media routing, TURN relays, global availability — could be challenging.

Cloudflare Calls handles the entirety of this complexity for developers, allowing them to leverage WebRTC without needing to worry about servers, regions, or scaling. Cloudflare Calls works as a single mesh network that automatically connects each user to a server close to them. Calls can connect directly with other WebRTC-powered services such as OpenAI’s, letting you deliver the output with near-zero latency to hundreds or thousands of users.

Privacy and security also come standard: all video and audio traffic that passes through Cloudflare Calls is encrypted by default. In this particular demo, we take it a step further by creating a button that allows you to decide when to allow ChatGPT to listen and interact with the meeting participants, allowing you to be more granular and targeted in your privacy and security posture.

How we connected Cloudflare Calls to OpenAI’s Realtime API

Cloudflare Calls has three building blocks: Applications, Sessions, and Tracks:

“A Session in Cloudflare Calls correlates directly to a WebRTC PeerConnection. It represents the establishment of a communication channel between a client and the nearest Cloudflare data center, as determined by Cloudflare's anycast routing …
Within a Session, there can be one or more Tracks. … [which] align with the MediaStreamTrack concept, facilitating audio, video, or data transmission.”

To include ChatGPT in our video conferencing demo, we needed to add ChatGPT as a track in an ongoing session. To do this, we connected to the Realtime API in Orange Meets:

// Connect Cloudflare Calls sessions and tracks like a switchboard
async function connectHumanAndOpenAI(
	humanSessionId: string,
	openAiSessionId: string
) {
	const callsApiHeaders = {
		Authorization: `Bearer ${APP_TOKEN}`,
		'Content-Type': 'application/json',
	}
	// Pull OpenAI audio track to human's track
	await fetch(`${callsEndpoint}/sessions/${humanSessionId}/tracks/new`, {
		method: 'POST',
		headers: callsApiHeaders,
		body: JSON.stringify({
			tracks: [
				{
					location: 'remote',
					sessionId: openAiSessionId,
					trackName: 'ai-generated-voice',
					mid: '#user-mic',
				},
			],
		}),
	})
	// Pull human's audio track to OpenAI's track
	await fetch(`${callsEndpoint}/sessions/${openAiSessionId}/tracks/new`, {
		method: 'POST',
		headers: callsApiHeaders,
		body: JSON.stringify({
			tracks: [
				{
					location: 'remote',
					sessionId: humanSessionId,
					trackName: 'user-mic',
					mid: '#ai-generated-voice',
				},
			],
		}),
	})
}

This code sets up the bidirectional routing between the human’s session and ChatGPT, which would allow the humans to hear ChatGPT and ChatGPT to hear the humans.

You can review all the code for this demo app on GitHub.

Get started today

Give the Cloudflare Calls + OpenAI Realtime API demo a try for yourself and review how it was built via the source code on GitHub. Then get started today with Cloudflare Calls to bring real-time, interactive AI to your apps and services.

TURN and anycast: making peer connections work globally

Nils Ohlmeier — Wed, 25 Sep 2024 13:00:00 GMT

A TURN server helps maintain connections during video calls when local networking conditions prevent participants from connecting directly to other participants. It acts as an intermediary, passing data between users when their networks block direct communication. TURN servers ensure that peer-to-peer calls go smoothly, even in less-than-ideal network conditions.

When building their own TURN infrastructure, developers often have to answer a few critical questions:

“How do we build and maintain a mesh network that achieves near-zero latency to all our users?”
“Where should we spin up our servers?”
“Can we auto-scale reliably to be cost-efficient without hurting performance?”

In April, we launched Cloudflare Calls TURN in open beta to help answer these questions. Starting today, Cloudflare Calls’ TURN service is now generally available to all Cloudflare accounts. Our TURN server works on our anycast network, which helps deliver global coverage and near-zero latency required by real time applications.

TURN solves connectivity and privacy problems for real time apps

When Internet Protocol version 4 (IPv4, RFC 791) was designed back in 1981, it was assumed that the 32-bit address space was big enough for all computers to be able to connect to each other. When IPv4 was created, billions of people didn’t have smartphones in their pockets and the idea of the Internet of Things didn’t exist yet. It didn’t take long for companies, ISPs, and even entire countries to realize they didn’t have enough IPv4 address space to meet their needs.

NATs are unpredictable

Fortunately, you can have multiple devices share the same IP address because the most common protocols run on top of IP are TCP and UDP, both of which support up to 65,535 port numbers. (Think of port numbers on an IP address as extensions behind a single phone number.) To solve this problem of IP scarcity, network engineers developed a way to share a single IP address across multiple devices by exploiting the port numbers. This is called Network Address Translation (NAT) and it is a process through which your router knows which packets to send to your smartphone versus your laptop or other devices, all of which are connecting to the public Internet through the IP address assigned to the router.

In a typical NAT setup, when a device sends a packet to the Internet, the NAT assigns a random, unused port to track it, keeping a forwarding table to map the device to the port. This allows NAT to direct responses back to the correct device, even if the source IP address and port vary across different destinations. The system works as long as the internal device initiates the connection and waits for the response.

However, real-time apps like video or audio calls are more challenging with NAT. Since NATs don't reveal how they assign ports, devices can't pre-communicate where to send responses, making it difficult to establish reliable connections. Earlier solutions like STUN (RFC 3489) couldn't fully solve this, which gave rise to the TURN protocol.

TURN predictably relays traffic between devices while ensuring minimal delay, which is crucial for real-time communication where even a second of lag can disrupt the experience.

ICE to determine if a relay server is needed

The ICE (Interactive Connectivity Establishment) protocol was designed to find the fastest communication path between devices. It works by testing multiple routes and choosing the one with the least delay. ICE determines whether a TURN server is needed to relay the connection when a direct peer-to-peer path cannot be established or is not performant enough.

^{How two peers (A and B) try to connect directly by sharing their public and local IP addresses using the ICE protocol. If the direct connection fails, both peers use the TURN server to relay their connection and communicate with each other.}

While ICE is designed to find the most efficient connection path between peers, it can inadvertently expose sensitive information, creating privacy concerns. During the ICE process, endpoints exchange a list of all possible network addresses, including local IP addresses, NAT IP addresses, and TURN server addresses. This comprehensive sharing of network details can reveal information about a user's network topology, potentially exposing their approximate geographic location or details about their local network setup.

The "brute force" nature of ICE, where it attempts connections on all possible paths, can create distinctive network traffic patterns that sophisticated observers might use to infer the use of specific applications or communication protocols.

TURN solves privacy problems

The threat from exposing sensitive information while using real-time applications is especially important for people that use end-to-end encrypted messaging apps for sensitive information — for example, journalists who need to communicate with unknown sources without revealing their location.

With Cloudflare TURN in place, traffic is proxied through Cloudflare, preventing either party in the call from seeing client IP addresses or associated metadata. Cloudflare simply forwards the calls to their intended recipients, but never inspects the contents — the underlying call data is always end-to-end encrypted. This masking of network traffic is an added layer of privacy.

Cloudflare is a trusted third-party when it comes to operating these types of services: we have experience operating privacy-preserving proxies at scale for our Consumer WARP product, Apple’s Private Relay, and Microsoft Edge’s Secure Network, preserving end-user privacy without sacrificing performance.

Cloudflare’s TURN is the fastest because of Anycast

Lots of real time communication services run their own TURN servers on a commercial cloud provider because they don’t want to leave a certain percentage of their customers with non-working communication. This results in additional costs for DevOps, egress bandwidth, etc. And honestly, just deploying and running a TURN server, like CoTURN, in a VPS isn’t an interesting project for most engineers.

Because using a TURN relay adds extra delay for the packets to travel between the peers, the relays should be located as close as possible to the peers. Cloudflare’s TURN service avoids all these headaches by simply running in all of the 330 cities where Cloudflare has data centers. And any time Cloudflare adds another city, the TURN service automatically becomes available there as well.

Anycast is the perfect network topology for TURN

Anycast is a network addressing and routing methodology in which a single IP address is shared by multiple servers in different locations. When a client sends a request to an anycast address, the network automatically routes the request via BGP to the topologically nearest server. This is in contrast to unicast, where each destination has a unique IP address. Anycast allows multiple servers to have the same IP address, and enables clients to automatically connect to a server close to them. This is similar to emergency phone networks (911, 112, etc.) which connect you to the closest emergency communications center in your area.

Anycast allows for lower latency because of the sheer number of locations available around the world. Approximately 95% of the Internet-connected population globally is within approximately 50ms away from a Cloudflare location. For real-time communication applications that use TURN, leads to improved call quality and user experience.

Auto-scaling and inherently global

Running TURN over anycast allows for better scalability and global distribution. By naturally distributing load across multiple servers based on network topology, this setup helps balance traffic and improve performance. When you use Cloudflare’s TURN service, you don’t need to manage a list of servers for different parts of the world. And you don’t need to write custom scaling logic to scale VMs up or down based on your traffic.

Anycast allows TURN to use fewer IP addresses, making it easier to allowlist in restrictive networks. Stateless protocols like DNS over UDP work well with anycast. This includes stateless STUN binding requests used to determine a system's external IP address behind a NAT.

However, stateful protocols over UDP, like QUIC or TURN, are more challenging with anycast. QUIC handles this better due to its stable connection ID, which load balancers can use to consistently route traffic. However, TURN/STUN lacks a similar connection ID. So when a TURN client sends requests to the Cloudflare TURN service, the Unimog load balancer ensures that all its requests get routed to the same server within a data center. The challenges for the communication between a client on the Internet and Cloudflare services listening on an anycast IP address have been described multiple times before.

How does Cloudflare's TURN server receive packets?

TURN servers act as relay points to help connect clients. This process involves two types of connections: the client-server connection and the third-party connection (relayed address).

The client-server connection uses published IP and port information to communicate with TURN clients using anycast.

For the relayed address, using anycast poses a challenge. The TURN protocol requires that packets reach the specific Cloudflare server handling the client connection. If we used anycast for relay addresses, packets might not arrive at the correct data center or server.

One alternative is to use unicast addresses for relay candidates. However, this approach has drawbacks, including making servers vulnerable to attacks and requiring many IP addresses.

To solve these issues, we've developed a middle-ground solution, previously discussed in “Cloudflare servers don't own IPs anymore – so how do they connect to the Internet?”. We use anycast addresses but add extra handling for packets that reach incorrect servers. If a packet arrives at the wrong Cloudflare location, we forward it over our backbone to the correct datacenter, rather than sending it back over the public Internet.

This approach not only resolves routing issues but also improves TURN connection speed. Packets meant for the relay address enter the Cloudflare network as close to the sender as possible, optimizing the routing process.

^{In this non-ideal setup, a TURN client connects to Cloudflare using Anycast, while a direct client uses Unicast, which would expose the TURN server to potential DDoS attacks.}

^{The optimized setup uses Anycast for all TURN clients, allowing for dynamic load distribution across Cloudflare's globally distributed TURN servers.}

Try Cloudflare Calls TURN today

The new TURN feature of Cloudflare Calls addresses critical challenges in real-time communication:

Connectivity: By solving NAT traversal issues, TURN ensures reliable connections even in complex network environments.
Privacy: Acting as an intermediary, TURN enhances user privacy by masking IP addresses and network details.
Performance: Leveraging Cloudflare's global anycast network, our TURN service offers unparalleled speed and near-zero latency.
Scalability: With presence in over 330 cities, Cloudflare Calls TURN grows with your needs.

Cloudflare Calls TURN service is billed on a usage basis. It is available to self-serve and Enterprise customers alike. There is no cost for the first 1,000 GB (one terabyte) of Cloudflare Calls usage each month. It costs five cents per GB after your first terabyte of usage on self-serve. Volume pricing is available for Enterprise customers through your account team.

Switching TURN providers is likely as simple as changing a single configuration in your real-time app. To get started with Cloudflare’s TURN service, create a TURN app from your Cloudflare Calls Dashboard or read the Developer Docs.

Cloudflare Calls: millions of cascading trees all the way down

Renan Dincer — Thu, 04 Apr 2024 13:00:07 GMT

Following its initial announcement in September 2022, Cloudflare Calls is now in open beta and available in your Cloudflare Dashboard. Cloudflare Calls lets developers build real-time audio/video apps using WebRTC, and it abstracts away the complexity by turning the Cloudflare network into a singular SFU. In this post, we dig into how we make this possible.

WebRTC growing pains

WebRTC is the only way to send UDP traffic out of a web browser – everything else uses TCP.

As a developer, you need a UDP-based transport layer for applications demanding low latency and real-time feedback, such as audio/video conferencing and interactive gaming. This is because unlike WebSocket and other TCP-based solutions, UDP is not subject to head-of-line blocking, a frequent topic on the Cloudflare Blog.

When building a new video conferencing app, you typically start with a peer-to-peer web application using WebRTC, where clients exchange data directly. This approach is efficient for small-scale demos, but scalability issues arise as the number of participants increases. This is because the amount of data each client must transmit grows substantially, following an almost exponential increase relative to the number of participants, as each client needs to send data to n-1 other clients.

Selective Forwarding Units (SFUs) play pivotal roles in scaling WebRTC applications. An SFU functions by receiving multiple media or data flows from participants and deciding which streams should be forwarded to other participants, thus acting as a media stream routing hub. This mechanism significantly reduces bandwidth requirements and improves scalability by managing stream distribution based on network conditions and participant needs. Even though it hasn’t always been this way from when video calling on computers first became popular, SFUs are often found in the cloud, rather than home computers of clients, because of superior connectivity offered in a data center.

A modern audio/video application thus quickly becomes complicated with the addition of this server side element. Since all clients connect to this central SFU server, there are numerous things to consider when you’re architecting and scaling a real-time application:

How close is the SFU server location(s) to the end user clients, how is a client assigned to a server?
Where is the SFU hosted, and if it’s hosted in the cloud, what are the egress costs from VMs?
How many participants can fit in a “room”? Are all participants sending and receiving data? With cameras on? Audio only?
Some SFUs require the use of custom SDKs. Which platforms do these run on and are they compatible with the application you’re trying to build?
Monitoring/reliability/other issues that come with running infrastructure

Some of these concerns, and the complexity of WebRTC infrastructure in general, has made the community look in different directions. However, it is clear that in 2024, WebRTC is alive and well with plenty of new and old uses. AI startups build characters that converse in real time, cars leverage WebRTC to stream live footage of their cameras to smartphones, and video conferencing tools are going strong.

WebRTC has been interesting to us for a while. Cloudflare Stream implemented WHIP and WHEP WebRTC video streaming protocols in 2022, which remain the lowest latency way to broadcast video. OBS Studio implemented WHIP broadcasting support as have a variety of software and hardware vendors alongside Cloudflare. In late 2022, we launched Cloudflare Calls in closed beta. When we blogged about it back then, we were very impressed with how WebRTC fared, and spoke to many customers about their pain points as well as creative ideas the existing browser APIs can foster. We also saw other WebRTC-based apps like Clubhouse rise in popularity and Twitter Spaces play a role in popular culture. Today, we see real-time applications of a different sort. Many AI projects have impressive demos with voice/video interactions. All of these apps are built with the same WebRTC APIs and system architectures.

We are confident that Cloudflare Calls is a new kind of WebRTC infrastructure you should try. When we set out to build Cloudflare Calls, we had a few ideas that we weren’t sure would work, but were worth trying:

Build every WebRTC component on Anycast with a single IP address for DTLS, ICE, STUN, SRTP, SCTP, etc.
Don’t force an SDK – WebRTC APIs by themselves are enough, and allow for the most novel uses to shine, because best developers always find ways to hit the limits of SDKs.
Deploy in all 310+ cities Cloudflare operates in – use every Cloudflare server, not just a subset
Exchange offer and answer over HTTP between Cloudflare and the WebRTC client. This way there is only a single PeerConnection to manage.

Now we know this is all possible, because we made it happen, and we think it’s the best experience a developer can get with pure WebRTC.

Is Cloudflare Calls a real SFU?

Cloudflare is in the business of having computers in numerous places. Historically, our core competency was operating a caching HTTP reverse proxy, and we are very good at this. With Cloudflare Calls, we asked ourselves “how can we build a large distributed system that brings together our global network to form one giant stateful system that feels like a single machine?”

When using Calls, every PeerConnection automatically connects to the closest Cloudflare data center instead of a single server. Rather than connecting every client that needs to communicate with each other to a single server, anycast spreads out connections as much as possible to minimize last mile latency sourced from your ISP between your client and Cloudflare.

It’s good to minimize last mile latency because after the data enters Cloudflare’s control, the underlying media can be managed carefully and routed through the Cloudflare backbone. This is crucial for WebRTC applications where millisecond delays can significantly impact user experience. To give you a sense about latency between Cloudflare’s data centers and end-users, about 95% of the Internet connected population is within 50ms of a Cloudflare data center. As I write this, I am about 20ms away, but in the past, I have been lucky enough to be connected to a **great** home Wi-Fi network less than 1ms away in Manhattan. “But you are just one user!” you might be thinking, so here is a chart from Cloudflare Radar showing recent global latency measurements:

This setup allows more opportunities for packets lost to be replied with retransmissions closer to users, more opportunities for bandwidth adjustments.

Eliminating SFU region selection

A traditional challenge in WebRTC infrastructure involves the manual selection of Selective Forwarding Units (SFUs) based on geographic location to minimize latency. Some systems solve this problem by selecting a location for the SFU after the first user joins the “room”. This makes routing inefficient when the rest of the participants in the conversation are clustered elsewhere. The anycast architecture of Calls eliminates this issue. When a client initiates a connection, BGP dynamically determines the closest data center. Each selected server only becomes responsible for the PeerConnection of the clients closest to it.

One might see this is actually a simpler way of managing servers, as there is no need to maintain a layer of WebRTC load balancing for traffic or CPU capacity between servers. However, anycast has its own challenges, and we couldn’t take a laissez-faire approach.

Steps to establishing a PeerConnection

One of the challenging parts in assigning a server to a client PeerConnection is supporting dual stack networking for backwards compatibility with clients that only support the old version of the Internet Protocol, IPv4.

Cloudflare Calls uses a single IP address per protocol, and our L4 load balancer directs packets to a single server per client by using the 4-tuple {client IP, client port, destination IP, destination port} hashing. This means that every ICE connectivity check packet arrives at different servers: one for IPv4 and one for IPv6.

ICE is not the only protocol used for WebRTC; there is also STUN and TURN for connectivity establishment. Actual media bits are encrypted using DTLS, which carries most of the data during a session.

DTLS packets don’t have any identifiers in them that would indicate they belong to a specific connection (unlike QUIC’s connection ID field), so every server should be able to handle DTLS packets and get the necessary certificates to be able to decrypt them for processing. DTLS encryption is negotiated at the SDP layer using the HTTPS API.

The HTTPS API for Calls also lands on a different server than DTLS and ICE connectivity checks. Since DTLS packets need information from the SDP exchanged using the HTTPS API, and ICE connectivity checks depend on the HTTPS API for userFragment and password fields in the connectivity check packets, it would be very useful for all of these to be available in one server. Yet in our setup, they’re not.

Fippo and Gustavo of WebRTCHacks complained (gracefully noted) about slow replies to ICE connectivity checks in their great article as they were digging into our WHIP implementation right around our announcement in 2022:

Looking at the Wireshark dumps we see a surprisingly large amount of time pass between the first STUN request and the first STUN response – it was 1.8 seconds in the screenshot below.
In other tests, it was shorter, but still 600ms long.
After that, the DTLS packets do not get an immediate response, requiring multiple attempts. This ultimately leads to a call setup time of almost three seconds – way above the global average of 800ms Fippo has measured previously (for the complete handshake, 200ms for the DTLS handshake). For Cloudflare with their extensive network, we expected this to be way below that average.

Gustavo and Fippo observed our solution to this problem of different parts of the WebRTC negotiation landing on different servers. Since Cloudflare Calls unbundles the WebRTC protocol to make the entire network act like a single computer, at this critical moment, we need to form consensus across the network. We form consensus by configuring every server to handle any incoming PeerConnection just in time. When a packet arrives, if the server doesn’t know about it, it quickly learns about the negotiated parameters from another server, such as the ufrag and the DTLS fingerprint from the SDP, and responds with the appropriate response.

Getting faster

Even though we've sped up the process of forming consensus across the Cloudflare network, any delays incurred can still have weird side effects. For example, up until a few months ago, delays of a few hundred milliseconds caused slow connections in Chrome.

A connectivity check packet delayed by a few hundred milliseconds signals to Chrome that this is a high latency network, even though every other STUN message after that was replied to in less than 5-10ms. Chrome thus delays sending a USE-CANDIDATE attribute in the responses for a few seconds, degrading the user experience.

Fortunately, Chrome also sends DTLS ClientHello before USE-CANDIDATE (behavior we’ve seen only on Chrome), so to help speed up Chrome, Calls uses DTLS packets in place of STUN packets with USE-CANDIDATE attributes.

After solving this issue with Chrome, PeerConnections globally now take about 100-250ms to get connected. This includes all consensus management, STUN packets, and a complete DTLS handshake.

Sessions and Tracks are the building blocks of Cloudflare’s SFU, not rooms

Once a PeerConnection is established to Cloudflare, we call this a Session. Many media Tracks or DataChannels can be published using a single Session, which returns a unique ID for each. These then can be subscribed to over any other PeerConnection anywhere around the world using the unique ID. The tracks can be published or subscribed anytime during the lifecycle of the PeerConnection.

In the background, Cloudflare takes care of scaling through a fan-out architecture with cascading trees that are unique per track. This structure works by creating a hierarchy of nodes where the root node distributes the stream to intermediate nodes, which then fan out to end-users. This significantly reduces the bandwidth required at the source and ensures scalability by distributing the load across the network. This simple but powerful architecture allows developers to build anything from 1:1 video calls to large 1:many or many:many broadcasting scenarios with Calls.

There is no “room” concept in Cloudflare Calls. Each client can add as many tracks into a PeerConnection as they’d like. The limit is the bandwidth available between Cloudflare and the client, which is practically limited by the client side every time. The signaling or the concept of a “room” is left to the application developer, who can choose to pull as many tracks as they’d like from the tracks they have pushed elsewhere into a PeerConnection. This allows developers to move participants into breakout rooms and then back into a plenary room, and then 1:1 rooms while keeping the same PeerConnection and MediaTracks active.

Cloudflare offers an unopinionated approach to bandwidth management, allowing for greater control in customizing logic to suit your business needs. There is no active bandwidth management or restriction on the number of tracks. The WebRTC Stats API provides a standardized way to access data on packet loss and possible congestion, enabling you to incorporate client-side logic based on this information. For instance, if poor Wi-Fi connectivity leads to degraded service, your front-end could inform the user through a notice and automatically reduce the number of video tracks for that client.

“NACK shield” at the edge

The Internet can't guarantee timely and orderly delivery of packets, leading to the necessity of retransmission mechanisms, particularly in protocols like TCP. This ensures data eventually reaches its destination, despite possible delays. Real-time systems, however, need special consideration of these delays. A packet that is delayed past its deadline for rendering on the screen is worthless, but a packet that is lost can be recovered if it can be retransmitted within a very short period of time, on the order of milliseconds. This is where NACKs come to play.

A WebRTC client receiving data constantly checks for packet loss. When one or more packets don’t arrive at the expected time or a sequence number discontinuity is seen on the receiving buffer, a special NACK packet is sent back to the source in order to ask for a packet retransmission.

In a peer-to-peer topology, if it receives a NACK packet, the source of the data has to retransmit packets for every participant. When an SFU is used, the SFU could send NACKs back to source, or keep a complex buffer for each client to handle retransmissions.

This gets more complicated with Cloudflare Calls, since both the publisher and the subscriber connect to Cloudflare, likely to different servers and also probably in different locations. In addition, there is a possibility of other Cloudflare data centers in the middle, either through Argo, or just as part of scaling to many subscribers on the same track.

It is common for SFUs to backpropagate NACK packets back to the source, losing valuable time to recover packets. Calls goes beyond this and can handle NACK packets in the location closest to the user, which decreases overall latency. The latency advantage gives more chance for the packet to be recovered compared to a centralized SFU or no NACK handling at all.

Since there is possibly a number of Cloudflare data centers between clients, packet loss within the Cloudflare network is also possible. We handle this by generating NACK packets in the network. With each hop that is taken with the packets, the receiving end can generate NACK packets. These packets are then recovered or backpropagated to the publisher to be recovered.

Cloudflare Calls does TURN over Anycast too

Separately from the SFU, Calls also offers a TURN service. TURN relays act as relay points for traffic between WebRTC clients like the browser and SFUs, particularly in scenarios where direct communication is obstructed by NATs or firewalls. TURN maintains an allocation of public IP addresses and ports for each session, ensuring connectivity even in restrictive network environments.

Cloudflare Calls’ TURN service supports a few ports to help with misbehaving middleboxes and firewalls:

TURN-over-UDP over port 3478 (standard), and also port 53
TURN-over-TCP over ports 3478 and 80
TURN-over-TLS over ports 5349 and 443

TURN works the same way as Calls, available over anycast and always connecting to the closest datacenter.

Pricing and how to get started

Cloudflare Calls is now in open beta and available in your Cloudflare Dashboard. Depending on your use case, you can set up an SFU application and/or a TURN service with only a few clicks.

To kick off its open beta phase, Calls is available at no cost for a limited time. Starting May 15, 2024, customers will receive the first terabyte each month for free, with any usage beyond that charged at $0.05 per real-time gigabyte. Beta customers will be provided at least 30 days to upgrade from the free beta to a paid subscription. Additionally, there are no charges for in-bound traffic to Cloudflare. For volume pricing, talk to your account manager.

Cloudflare Calls is ideal if you are building new WebRTC apps. If you have existing SFUs or TURN infrastructure, you may still consider using Calls alongside your existing infrastructure. Building a bridge to Calls from other places is not difficult as Cloudflare Calls supports standard WebRTC APIs and acts like just another WebRTC peer.

We understand that getting started with a new platform is difficult, so we’re also open sourcing our internal video conferencing app, Orange Meets. Orange Meets supports small and large conference calls by maintaining room state in Workers Durable Objects. It has screen sharing, client-side noise-canceling, and background blur. It is written with TypeScript and React and is available on GitHub.

We’re hiring

We think the current state of Cloudflare Calls enables many use cases. Calls already supports publishing and subscribing to media tracks and DataChannels. Soon, it will support features like simulcasting.

But we’re just scratching the surface and there is so much more to build on top of this foundation.

If you are passionate about WebRTC (and other real-time protocols!!), the Media Platform team building the Calls product at Cloudflare is hiring and would love to talk to you.

Build real-time video and audio apps on the world’s most interconnected network

Zaid Farooqui — Tue, 27 Sep 2022 13:00:00 GMT

In the last two years, there has been a rapid rise in real-time apps that help groups of people get together virtually with near-zero latency. User expectations have also increased: your users expect real-time video and audio features to work flawlessly. We found that developers building real-time apps want to spend less time building and maintaining low-level infrastructure. Developers also told us they want to spend more time building features that truly make their idea special.

So today, we are announcing a new product that lets developers build real-time audio/video apps. Cloudflare Calls exposes a set of APIs that allows you to build things like:

A video conferencing app with a custom UI
An interactive conversation where the moderators can invite select audience members “on stage” as speakers
A privacy-first group workout app where only the instructor can view all the participants while the participants can only view the instructor
Remote 'fireside chats' where one or multiple people can have a video call with an audience of 10,000+ people in real time (<100ms delay)

The protocol that makes all this possible is WebRTC. And Cloudflare Calls is the product that abstracts away the complexity by turning the Cloudflare network into a “super peer,” helping you build reliable and secure real-time experiences.

What is WebRTC?

WebRTC is a peer-to-peer protocol that enables two or more users’ devices to talk to each other directly and without leaving the browser. In a native implementation, peer-to-peer typically works well for 1:1 calls with only two participants. But as you add additional participants, it is common for participants to experience reliability issues, including video freezes and participants getting out of sync. Why? Because as the number of participants increases, the coordination overhead between users’ devices also increases. Each participant needs to send media to each other participant, increasing the data consumption from each computer exponentially.

A selective forwarding unit (SFU) solves this problem. An SFU is a system that connects users with each other in real-time apps by intelligently managing and routing video and audio data between the participants. Apps that use an SFU reduce the data capacity required from each user because each user doesn’t have to send data to every other user. SFUs are required parts of a real-time application when the applications need to determine who is currently speaking or when they want to send appropriate resolution video when WebRTC simulcast is used.

Beyond SFUs

The centralized nature of an SFU is also its weakness. A centralized WebRTC server needs a region, which means that it will be slow in most parts of the world for most users while being fast for only a few select regions.

Typically, SFUs are built on public clouds. They consume a lot of bandwidth by both receiving and sending high resolution media to many devices. And they come with significant devops overhead requiring your team to manually configure regions and scalability.

We realized that merely offering an SFU-as-a-service wouldn’t solve the problem of cost and bandwidth efficiency.

Biggest WebRTC server in the world

When you are on a five-person video call powered by a classic WebRTC implementation, each person’s device talks directly with each other. In WebRTC parlance, each of the five participants is called a peer. And the reliability of the five-person call will only be as good as the reliability of the person (or peer) with the weakest Internet connection.

We built Calls with a simple premise: “What if Cloudflare could act as a WebRTC peer?”. Calls is a “super peer” or a “giant server that spans the whole world” allows applications to be built beyond the limitations of the lowest common denominator peer or a centralized SFU. Developers can focus on the strength of their app instead of trying to compensate for the weaknesses of the weakest peer in a p2p topology.

Calls does not use the traditional SFU topology where every participant connects to a centralized server in a single location. Instead, each participant connects to their local Cloudflare data center. When another participant wants to retrieve that media, the datacenter that homes that original media stream is found and the tracks are forwarded between datacenters automatically. If two participants are physically close their media does not travel around the world to a centralized region, instead they use the same datacenter, greatly reducing latency and improving reliability.

Calls is a configurable, global, regionless WebRTC server that is the size of Cloudflare's ever-growing network. The WebRTC protocol enables peers to send and receive media tracks. When you are on a video call, your computer is typically sending two tracks: one that contains the audio of you speaking and another that contains the video stream from your camera. Calls implements the WebRTC RTCPeerConnection API across the Cloudflare Network where users can push media tracks. Calls also exposes an API where other media tracks can be requested within the same Peer Connection context.

Cloudflare Calls will be a good solution if you operate your own WebRTC server such as Janus or MediaSoup. Cloudflare Calls can also replace existing deployments of Janus or MediaSoup, especially in cases where you have clients connecting globally to a single, centralized deployment.

Region: Earth

Building and maintaining your own real-time infrastructure comes with unique architecture and scaling challenges. It requires you to answer and constantly revise your answers to thorny questions such as “which regions do we support?”, “how many users do we need to justify spinning up more infrastructure in yet another cloud region?”, “how do we scale for unplanned spikes in usage?” and “how do we not lose money during low-usage hours of our infrastructure?” when you run your own WebRTC server infrastructure.

Cloudflare Calls eliminates the need to answer these questions. Calls uses anycast for every connection, so every packet is always routed to the closest Cloudflare location. It is global by nature: your users are automatically served from a location close to them. Calls scales with your use and your team doesn’t have to build its own auto-scaling logic.

Calls runs on every Cloudflare location and every single Cloudflare server. Because the Cloudflare network is within 10 milliseconds of 90% of the world’s population, it does not add any noticeable latency.

Answer “where’s the problem?”, only faster

When we talk to customers with existing WebRTC workloads, there is one consistent theme: customers wish it was easier to troubleshoot issues. When a group of people are talking over a video call, the stakes are much higher when users experience issues. When a web page fails to load, it is common for users to simply retry after a few minutes. When a video call is disruptive, it is often the end of the call.

Cloudflare Calls’ focus on observability will help customers get to the bottom of the issues faster. Because Calls is built on Cloudflare’s infrastructure, we have end-to-end visibility from all layers of the OSI model.

Calls provides a server side view of the WebRTC Statistics API, so you can drill into issues each Peer Connection and the flow of media within without depending only on data sent from clients. We chose this because the Statistics API is a standardized place developers are used to getting information about their experience. It is the same API available in browsers, and you might already be using it today to gain insight into the performance of your WebRTC connections.

Privacy and security at the core

Calls eliminates the need for participants to share information such as their IP address with each other. Let’s say you are building an app that connects therapists and patients via video calls. With a traditional WebRTC implementation, both the patient and therapist’s devices would talk directly with each other, leading to exposure of potentially sensitive data such as the IP address. Exposure of information such as the IP address can leave your users vulnerable to denial-of-service attacks.

When using Calls, you are still using WebRTC, but the individual participants are connecting to the Cloudflare network. If four people are on a video call powered by Cloudflare Calls, each of the four participants' devices will be talking only with the Cloudflare network. To your end users, the experience will feel just like a peer-to-peer call, only with added security and privacy upside.

Finally, all video and audio traffic that passes through Cloudflare Calls is encrypted by default. Calls leverages existing Cloudflare products including Argo to route the video and audio content in a secure and efficient manner. Calls API enables granular controls that cannot be implemented with vanilla WebRTC alone. When you build using Calls, you are only limited by your imagination; not the technology.

What’s next

We’re releasing Cloudflare Calls in closed beta today. To try out Cloudflare Calls, request an invitation and check your inbox in coming weeks.Calls will be free during the beta period. We're looking to work with early customers who want to take Calls from beta to generally available with us. If you are building a real-time video app today, having challenges scaling traditional WebRTC infrastructure, or just have a great idea you want to explore, leave a comment when you are requesting an invitation, and we’ll reach out.

Real-Time Communications at Scale

Matt Silverlock — Thu, 30 Sep 2021 12:59:36 GMT

For every successful technology, there is a moment where its time comes. Something happens, usually external, to catalyze it — shifting it from being a good idea with promise, to a reality that we can’t imagine living without. Perhaps the best recent example was what happened to the cloud as a result of the introduction of the iPhone in 2007. Smartphones created a huge addressable market for small developers; and even big developers found their customer base could explode in a way that they couldn’t handle without access to public cloud infrastructure. Both wanted to be able to focus on building amazing applications, without having to worry about what lay underneath.

Last year, during the outbreak of COVID-19, a similar moment happened to real time communication. Being able to communicate is the lifeblood of any organization. Before 2020, much of it happened in meeting rooms in offices all around the world. But in March last year — that changed dramatically. Those meeting rooms suddenly were emptied. Fast-forward 18 months, and that massive shift in how we work has persisted.

While, undoubtedly, many organizations would not have been able to get by without the likes of Slack, Zoom and Teams as real time collaboration tools, we think today’s iteration of communication tools is just the tip of the iceberg. Looking around, it’s hard to escape the feeling there is going to be an explosion in innovation that is about to take place to enable organizations to communicate in a remote, or at least hybrid, world.

With this in mind, today we’re excited to be introducing Cloudflare’s Real Time Communications platform. This is a new suite of products designed to help you build the next generation of real-time, interactive applications. Whether it’s one-to-one video calling, group audio or video-conferencing, the demand for real-time communications only continues to grow.

Running a reliable and scalable real-time communications platform requires building out a large-scale network. You need to get your network edge within milliseconds of your users in multiple geographies to make sure everyone can always connect with low latency, low packet loss and low jitter. A backbone to route around Internet traffic jams. Infrastructure that can efficiently scale to serve thousands of participants at once. And then you need to deploy media servers, write business logic, manage multiple client platforms, and keep it all running smoothly. We think we can help with this.

Launching today, you will be able to leverage Cloudflare’s global edge network to improve connectivity for any existing WebRTC-based video and audio application, with what we’re calling “WebRTC Components”. This includes scaling to (tens of) thousands of participants, leveraging our DDoS mitigation to protect your services from attacks, and enforce IP and ASN-based access policies in just a few clicks.

How Real Time is “Real Time”?

Real-time typically refers to communication that happens in under 500ms: that is, as fast as packets can traverse the fibre optic networks that connect the world together. In 2021, most real-time audio and video applications use WebRTC, a set of open standards and browser APIs that define how to connect, secure, and transfer both media and data over UDP. It was designed to bring better, more flexible bi-directional communication when compared to the primary browser-based communication protocol we rely on today, HTTP. And because WebRTC is supported in the browser, it means that users don’t need custom clients, nor do developers need to build them: all they need is a browser.

Importantly, we’ve seen the need for reliable, real-time communication across time-zones and geographies increase dramatically, as organizations change the way they work (yes, including us).

So where is real-time important in practice?

One-to-one calls (think FaceTime). We’re used to almost instantaneous communication over traditional telephone lines, and there’s no reason for us to head backwards.
Group calling and conferencing (Zoom or Google Meet), where even just a few seconds of delay results in everyone talking over each other.
Social video, gaming and sports. You don’t want to be 10 seconds behind the action or miss that key moment in a game because the stream dropped a few frames or decided to buffer.
Interactive applications: from 3D modeling in the browser, Augmented Reality on your phone, and even game streaming need to be in real-time.

We believe that we’ve only collectively scratched the surface when it comes to real-time applications — and part of that is because scaling real-time applications to even thousands of users requires new infrastructure paradigms and demands more from the network than traditional HTTP-based communication.

Enter: WebRTC Components

Today, we’re launching our closed beta WebRTC Components, allowing teams running centralized WebRTC TURN servers to offload it to Cloudflare’s distributed, global network and improve reliability, scale to more users, and spend less time managing infrastructure.

TURN, or Traversal Using Relays Around NAT (Network Address Translation), was designed to navigate the practical shortcomings of WebRTC’s peer-to-peer origins. WebRTC was (and is!) a peer-to-peer technology, but in practice, establishing reliable peer-to-peer connections remains hard due to Carrier-Grade NAT, corporate NATs and firewalls. Further, each peer is limited by its own network connectivity — in a traditional peer-to-peer mesh, participants can quickly find their network connections saturated because they have to receive data from every other peer. In a mixed environment with different devices (mobile, desktops), networks (high-latency 3G through to fast fiber), scaling to more than a handful of peers becomes extremely challenging.

Running a TURN service at the edge instead of your own infrastructure gets you a better connection. Cloudflare operates an anycast network spanning 250+ cities, meaning we’re very close to wherever your users are. This means that when users connect to Cloudflare’s TURN service, they get a really good connection to the Cloudflare network. Once it’s on there, we leverage our network and private backbone to get you superior connectivity, all the way back to the other user on the call.

But even better: stop worrying about scale. WebRTC infrastructure is notoriously difficult to scale: you need to make sure you have the right capacity in the right location. Cloudflare’s TURN service scales automatically and if you want more endpoints they’re just an API call away.

Of course WebRTC Components is built on the Cloudflare network, benefiting from the DDoS protection that it’s 100 Tbps network offers. From now on deploying scalable, secure, production-grade WebRTC relays globally is only a couple of API calls away.

A Developer First Real-Time Platform

But, as we like to say at Cloudflare: we’re just getting started. Managed, scalable TURN infrastructure is a critical building block to building real-time services for one-to-one and small group calling, especially for teams who have been managing their own infrastructure, but things become rapidly more complex when you start adding more participants.

Whether that’s managing the quality of the streams (“tracks”, in WebRTC parlance) each client is sending and receiving to keep call quality up, permissions systems to determine who can speak or broadcast in large-scale events, and/or building signalling infrastructure with support chat and interactivity on top of the media experience, one thing is clear: it there’s a lot to bite off.

With that in mind, here’s a sneak peek at where we’re headed:

Developer-first APIs that abstract the need to manage and configure low-level infrastructure, authentication, authorization and participant permissions. Think in terms of your participants, rooms and channels, without having to learn the intricacies of ICE, peer connections and media tracks.
Integration with Cloudflare for Teams to support organizational access policies: great for when your company town hall meetings are now conducted remotely.
Making it easy to connect any input and output source, including broadcasting to traditional HTTP streaming clients and recording for on-demand playback with Stream Live, and ingesting from RTMP sources with Stream Connect, or future protocols such as WHIP.
Embedded serverless capabilities via Cloudflare Workers, from triggering Workers on participant events (e.g. join, leave) through to building stateful chat and collaboration tools with Durable Objects and WebSockets.

… and this is just the beginning.

We’re also looking for ambitious engineers who want to play a role in building our RTC platform. If you’re an engineer interested in building the next generation of real-time, interactive applications, join us!

If you’re interested in working with us to help connect more of the world together, and are struggling with scaling your existing 1-to-1 real-time video & audio platform beyond a few hundred or thousand concurrent users, sign up for the closed beta of WebRTC Components. We’re especially interested in partnering with teams at the beginning of their real-time journeys and who are keen to iterate closely with us.

Serverless Live Streaming with Cloudflare Stream

Zaid Farooqui — Thu, 30 Sep 2021 12:59:23 GMT

We’re excited to introduce the open beta of Stream Live, an end-to-end scalable live-streaming platform that allows you to focus on growing your live video apps, not your codebase.

With Stream Live, you can painlessly grow your streaming app to scale to millions of concurrent broadcasters and millions of concurrent users. Start sending live video from mobile or desktop using the industry standard RTMPS protocol to millions of viewers instantly. Stream Live works with the most popular live video broadcasting software you already use, including ffmpeg, OBS or Zoom. Your broadcasts are automatically recorded, optimized and delivered using the Stream player.

When you are building your live infrastructure from scratch, you have to answer a few critical questions:

“Which codec(s) are we going to use to encode the videos?”
“Which protocols are we going to use to ingest and deliver videos?”
“How are the different components going to impact latency?”

We built Stream Live, so you don’t have to think about these questions and spend considerable engineering effort answering them. Stream Live abstracts these pesky yet important implementation details by automatically choosing the most compatible codec and streaming protocol for the client device. There is no limit to the number of live broadcasts you can start and viewers you can have on Stream Live. Whether you want to make the next viral video sharing app or securely broadcast all-hands meetings to your company, Stream will scale with you without having to spend months building and maintaining video infrastructure.

Built-in Player and Access Control

Every live video gets an embed code that can be placed inside your app, enabling your users to watch the live stream. You can also use your own player with included support for the two major HTTP streaming formats — HLS and DASH — for a granular control over the user experience.

You can limit who can view your live videos with self-expiring tokenized links for each viewer. When generating the tokenized links, you can define constraints including time-based expiration, geo-fencing and IP restrictions. When building an online learning site or a video sharing app, you can put videos behind authentication, so only logged-in users can view your videos. Or if you are building a live concert platform, you may have agreements to only allow viewers from specific countries or regions. Stream’s signed tokens help you comply with complex and custom rulesets.

Instant Recordings

With Stream Live, you don’t have to wait for a recording to be available after the live broadcast ends. Live videos automatically get converted to recordings in less than a second. Viewers get access to the recording instantly, allowing them to catch up on what they missed.

Instant Scale

Whether your platform has one active broadcaster or ten thousand, Stream Live scales with your use case. You don’t have to worry about adding new compute instances, setting up availability zones or negotiating additional software licenses.

Legacy live video pipelines built in-house typically ingest and encode the live stream continents away in a single location. Video that is ingested far away makes video streaming unreliable, especially for global audiences. All Cloudflare locations run the necessary software to ingest live video in and deliver video out. Once your video broadcast is in the Cloudflare network, Stream Live uses the Cloudflare backbone and Argo to transmit your live video with increased reliability.

Broadcast with 15 second latency

Depending on your video encoder settings, the time between you broadcasting and the video displaying on your viewer’s screens can be as low as fifteen seconds with Stream Live. Low latency allows you to build interactive features such as chat and Q&A into your application. This latency is good for broadcasting meetings, sports, concerts, and worship, but we know it doesn’t cover all uses for live video.

We’re on a mission to reduce the latency Stream Live adds to near-zero. The Cloudflare network is now within 50ms for 95% of the world’s population. We believe we can significantly reduce the delay from the broadcaster to the viewer in the coming months. Finally, in the world of live-streaming, latency is only meaningful once you can assume reliability. By using the Cloudflare network spanning over 250 locations, you get unparalleled reliability that is critical for live events.

Simple and predictable pricing

Stream Live is available as a pay-as-you-go service based on the duration of videos recorded and duration of video viewed.

It costs $5 per 1,000 minutes of video storage capacity per month. Live-streamed videos are automatically recorded. There is no additional cost for ingesting the live stream.
It costs $1 per 1,000 minutes of video viewed.
There are no surprises. You never have to pay hidden costs for video ingest, compute (encoding), egress or storage found in legacy video pipelines.
You can control how much you spend with Stream using billing alerts and restrict viewing by creating signed tokens that only work for authorized viewers.

Cloudflare Stream encodes the live stream in multiple quality levels at no additional cost. This ensures smooth playback for your viewers with varying Internet speed. As your viewers move from Wi-Fi to mobile networks, videos continue playing without interruption. Other platforms that offer live-streaming infrastructure tend to add extra fees for adding quality levels that caters to a global audience.

If your use case consists of thousands of concurrent broadcasters or millions of concurrent viewers, reach out to us for volume pricing.

Go live with Stream

Stream works independent of any domain on Cloudflare. If you already have a Cloudflare account with a Stream subscription, you can begin using Stream Live by clicking on the “Live Input” tab on the Stream Dashboard and creating a new input:

If you are new to Cloudflare, sign up for Cloudflare Stream.