How Cloudflare uses the world’s greatest collection of performance data to make the world’s fastest global network even faster

Cloudflare operates the fastest network on the planet. We’ve shared an update today about how we are overhauling the software technology that accelerates every server in our fleet, improving speed globally.

That is not where the work stops, though. To improve speed even further, we have to also make sure that our network swiftly handles the Internet-scale congestion that hits it every day, routing traffic to our now-faster servers.

We have invested in congestion control for years. Today, we are excited to share how we are applying a superpower of our network, our massive Free Plan user base, to optimize performance and find the best way to route traffic across our network for all our customers globally.

Early results have seen performance increases that average 10% faster than the prior baseline. We achieved this by applying different algorithmic methods to improve performance based on the data we observe about the Internet each day. We are excited to begin rolling out these improvements to all customers.

How does traffic arrive in our network?

The Internet is a massive collection of interconnected networks, each composed of many machines (“nodes”). Data is transmitted by breaking it up into small packets, and passing them from one machine to another (over a “link”). Each one of these machines is linked to many others, and each link has limited capacity.

When we send a packet over the Internet, it will travel in a series of “hops” over the links from A to B. At any given time, there will be one link (one “hop”) with the least available capacity for that path. It doesn’t matter where in the connection this hop is — it will be the bottleneck.

But there’s a challenge — when you’re sending data over the Internet, you don’t know what route it’s going to take. In fact, each node decides for itself which route to send the traffic through, and different packets going from A to B can take entirely different routes. The dynamic and decentralized nature of the system is what makes the Internet so effective, but it also makes it very hard to work out how much data can be sent. So — how can a sender know where the bottleneck is, and how fast to send data?

Between Cloudflare nodes, our Argo Smart Routing product takes advantage of our visibility into the global network to speed up communication. Similarly, when we initiate connections to customer origins, we can leverage Argo and other insights to optimize them. However, the speed of a connection from your phone or laptop (the Client below) to the nearest Cloudflare datacenter will depend on the capacity of the bottleneck hop in the chain from you to Cloudflare, which happens outside our network.

What happens when too much data arrives at once?

If too much data arrives at any one node in a network in the path of a request being processed, the requestor will experience delays due to congestion. The data will either be queued for a while (risking bufferbloat), or some of it will simply get dropped. Protocols like TCP and QUIC respond to packets being dropped by retransmitting the data, but this introduces a delay, and can even make the problem worse by further overloading the limited capacity.

If cloud infrastructure providers like Cloudflare don’t manage congestion carefully, we risk overloading the system, slowing down the rate of data getting through. This actually happened in the early days of the Internet. To avoid this, the Internet infrastructure community has developed systems for controlling congestion, which give everyone a turn to send their data, without overloading the network. This is an evolving challenge, as the network grows ever more complicated, and the best method to implement congestion control is a constant pursuit. Many different algorithms have been developed, which take different sources of information and signals, optimize in a particular method, and respond to congestion in different ways.

Congestion control algorithms use a number of signals to estimate the right rate to send traffic, without knowing how the network is set up. One important signal has been loss. When a packet is received, the receiver sends an “ACK,” telling the sender the packet got through. If it’s dropped somewhere along the way, the sender never gets the receipt, and after a timeout will treat the packet as having been lost.

More recent algorithms have used additional data. For example, a popular algorithm called BBR (Bottleneck Bandwidth and Round-trip propagation time), which we have been using for much of our traffic, attempts to build a model during each connection of the maximum amount of data that can be transmitted in a given time period, using estimates of the round trip time as well as loss information.

The best algorithm to use often depends on the workload. For example, for interactive traffic like a video call, an algorithm that biases towards sending too much traffic can cause queues to build up, leading to high latency and poor video experience. If one were to optimize solely for that use case though, and avoid that by sending less traffic, the network will not make the best use of the connection for clients doing bulk downloads. The performance optimization outcome varies, depending on a lot of different factors. But – we have visibility into many of them!

BBR was an exciting development in congestion control approach, moving from reactive loss-based approaches to proactive model-based optimization, resulting in significantly better performance for modern networks. Our data gives us an opportunity to go further, applying different algorithmic methods to improve performance.

How can we do better?

All the existing algorithms are constrained to use only information gathered during the lifetime of the current connection. Thankfully, we know far more about the Internet at any given moment than this! With Cloudflare’s perspective on traffic, we see much more than any one customer or ISP might see at any given time.

Every day, we see traffic from essentially every major network on the planet. When a request comes into our system, we know what client device we’re talking to, what type of network is enabling the connection, and whether we’re talking to consumer ISPs or cloud infrastructure providers.

We know about the patterns of load across the global Internet, and the locations where we believe systems are overloaded, within our network, or externally. We know about the networks that have stable properties, which have high packet loss due to cellular data connections, and the ones that traverse low earth orbit satellite links and radically change their routes every 15 seconds.

How does this work?

We have been in the process of migrating our network technology stack to use a new platform, powered by Rust, that provides more flexibility to experiment with varying the parameters in the algorithms used to handle congestion control. Then we needed data.

The data powering these experiments needs to reflect the measure we’re trying to optimize, which is the user experience. It’s not just enough that we’re sending data to nearly all the networks on the planet; we have to be able to see what is the experience that customers have. So how do we do that, at our scale?

First, we have detailed “passive” logs of the rate at which data is able to be sent from our network, and how long it takes for the destination to acknowledge receipt. This covers all our traffic, and gives us an idea of how quickly the data was received by the client, but doesn’t guarantee to tell us about the user experience.

Next, we have a system for gathering Real User Measurement (RUM) data, which records information in supported web browsers about metrics such as Page Load Time (PLT). Any Cloudflare customer can enable this and will receive detailed insights in their dashboard. In addition, we use this metadata in aggregate across all our customers and networks to understand what customers are really experiencing.

However, RUM data is only going to be present for a small proportion of connections across our network. So, we’ve been working to find a way to predict the RUM measures by extrapolating from the data we see only in passive logs. For example, here are the results of an experiment we performed comparing two different algorithms against the cubic baseline.

Now, here’s the same timescale, observed through the prediction based on our passive logs. The curves are very similar - but even more importantly, the ratio between the curves is very similar. This is huge! We can use a relatively small amount of RUM data to validate our findings, but optimize our network in a much more fine-grained way by using the full firehose of our passive logs.

Extrapolating too far becomes unreliable, so we’re also working with some of our largest customers to improve our visibility of the behaviour of the network from their clients’ point of view, which allows us to extend this predictive model even further. In return, we’ll be able to give our customers insights into the true experience of their clients, in a way that no other platform can offer.

What is next?

We’re currently running our experiments and improved algorithms for congestion control on all of our free tier QUIC traffic. As we learn more, verify on more complex customers, and expand to TCP traffic, we’ll gradually roll this out to all our customers, for all traffic, over 2026 and beyond. The results have led to as much as a 10% improvement as compared to the baseline!

We’re working with a select group of enterprises to test this in an early access program. If you’re interested in learning more, contact us!

The Cloudflare Blog

How Cloudflare uses the world’s greatest collection of performance data to make the world’s fastest global network even faster

How does traffic arrive in our network?

What happens when too much data arrives at once?

How can we do better?

How does this work?

What is next?

Partnering with Black Forest Labs to bring FLUX.2 [dev] to Workers AI

Replicate is joining Cloudflare

Making the Internet observable: the evolution of Cloudflare Radar

How does Cloudflare’s Speed Test really work?