Subscribe to receive notifications of new posts:

I Wanna Go Fast - Load Balancing Dynamic Steering

2018-07-21

4 min read
Ricky Bobby

Earlier this month we released Dynamic Steering for Load Balancing which allows you to have your Cloudflare load balancer direct traffic to the fastest pool for a given Cloudflare region or colo (Enterprise only).

To build this feature, we had to solve two key problems: 1) How to decide which pool of origins was the fastest and 2) How to distribute this decision to a growing group of 151 locations around the world.

steering_logo

Distance, Approximate Latency, and a Better Way

As my math teacher taught me, the shortest distance between two points is a straight line. This is also typically true on the internet - the shorter approximate distance there is between a user going through Cloudflare location to a customer origin, the better the experience is for the user. Geography is one way to approximate speed and we included the Geo Steering function when we initially introduced the Cloudflare Load Balancer. It is powerful, but manual; it’s not the best way. A customer on Twitter said it best:

@Cloudflare #FeatureRequest why can’t your load balancers determine which server is closest to the user then direct them to that one?

I don't want to have configure 10+ regions manually. This feels like something that should be built in? Am I missing it?

cc: @eastdakota

— Adam Evers ? OAK / SFO (@adamevers) March 30, 2018

A Brief Refresher on Cloudflare Load Balancing

Cloudflare’s Load Balancers are comprised of a combination of origins, pools, and health checks. Origins are IPs or hostnames from which our customers serve content. Pools are collections of origins, usually grouped in along some dimension, like geography, cloud service provider, or a combination thereof (eg. a pool named GCP-West-1 may contain a customer’s origins in Google Cloud’s Oregon west1 region). Finally, there are health checks — configurable probes by our customers to their pools and origins to identify whether a given pool or origin is up or down. These health checks allow Cloudflare load balancers to quickly identify and fail over from downed origins from a network of systems that can map to the customer’s user base.

Measuring and Determining “Fast”

The first decision we faced was when and how to measure speed. We already probe at regular intervals for uptime from the Cloudflare locations that our customers tell us are relevant for their setup. It was an obvious choice to use our existing health checks and gather the round trip time (RTT) from there.

As pool origins are probed periodically we get RTT information from the edge. The next question was how to use this data to decide which pool is the fastest: we decided to calculate the pool RTT using Exponential Weighted Moving Average (EWMA).

Why did we choose EWMA?

We considered other ways to calculate the RTT such as Simple Moving Average (SMA). Although the RTT calculation is much simpler using SMA, we chose EWMA is because it responds to RTT changes faster than SMA, since it applies more weight to the most recent RTT. Also, it can reduce the noise and help make the trend clearer in a dataset with large variance. Another benefit EWMA has is that stays more true to the trend than other types of moving averages, some of which can over- or under-correct, or others that smooth things out too much.

How does EWMA work?

EWMA works by applying weights to the data in such a way that older data weighs less (and therefore becomes less impactful to the result) than more recent data. The weight for a datapoint decreases exponentially for each time period further in the past. The exponential decay is determined by the time bias parameter. When the time bias is set to 1 minute, about 63.2% of the value is coming from the last minute measurements, 23.3% from the minute before that (0.233 = (1 – 0.632) * 0.632), etc. The weight is decreasing exponentially with each passing minute, historical data older than t minutes have weight 1 / exp(t). The most recent minute has weight 63.2%, since 63.2% = 1 – 1 / exp(1).

Actual Implementation

For every load balancer that has Dynamic Steering enabled, the RTT is calculated independently for each of its pools using an EWMA. We wait for a period of time (default is 10m, but this is configurable) before writing the calculated pool RTT values to our internal key-value store, QuickSilver (QS). This is done to build the RTT profile, which helps reduce the noise in cases of large variance data. From then on, we keep writing the values periodically (default 10m, again this value is tunable) and only if there is a change in RTT value to avoid unnecessary writes to QuickSilver.

Data Propagation

To make sure that Dynamic Steering is as performant as possible, all data we use for steering decisions needs to be as close as possible to every machine serving requests. When it comes down to delivering responses as fast as possible, requesting data from another machine - even in the same datacenter - can add non-trivial overhead.

We run a custom inhouse key-value store on every machine servicing requests. The main advantage of this datastore lies in how its replication logic takes advantage of the hierarchy nature of our network layout to facilitate faster replication while transfering less data.

data_model

Since we keep a copy of the data on every machine in every data center, we need to make sure our dataset is as small as possible. We evaluated what additional data we actually needed to select a pool inside a Load Balancer configured with Dynamic Steering. Currently the only information we propagate is a map of the pool identifier to the EWMA.

Eyeball Experience

Internally at Cloudflare we often talk about eyeballs, the actual visitors of a site clicking away in their browsers, and their experience of the process. Let's say you’ve setup three pools around the world: North America, Europe, and Australia. With Dynamic Steering, we will route your traffic to the pool with the lowest EWMA. Assuming all your pools are in good health and reporting expected RTT values an eyeballs experience should look like this.

IMG_1101

Trying It Out

All Enterprise customers and customers with the Geo Routing add-on for Load Balancing have access to Dynamic Steering. To enable Dynamic Steering, select the option in your Load Balancing traffic steering configuration. Please see the KB article or your Cloudflare account team for more information.

Dynamic Steering Configuration

Interested in helping us go faster?

The Cloudflare Load Balancing and DNS Engineering teams are hiring in San Francisco and London.

Backend Systems Engineer San FranciscoBackend Systems Engineer LondonSoftware Engineer London

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
Load BalancingSpeed & ReliabilityProduct News

Follow on X

Sergi Isasi|@sgisasi
Cloudflare|@cloudflare

Related posts

September 27, 2024 1:00 PM

AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection

This year for Cloudflare’s birthday, we’ve extended our AI Assistant capabilities to help you build new WAF rules, added new AI bot & crawler traffic insights to Radar, and given customers new AI bot blocking capabilities...

September 26, 2024 1:00 PM

Zero-latency SQLite storage in every Durable Object

Traditional cloud storage is inherently slow because it is accessed over a network and must synchronize many clients. But what if we could instead put your application code deep into the storage layer, such that your code runs where the data is stored? Durable Objects with SQLite do just that. ...

September 26, 2024 1:00 PM

Making Workers AI faster and more efficient: Performance optimization with KV cache compression and speculative decoding

With a new generation of data center accelerator hardware and using optimization techniques such as KV cache compression and speculative decoding, we’ve made large language model (LLM) inference lightning-fast on the Cloudflare Workers AI platform....

September 25, 2024 1:00 PM

Introducing Speed Brain: helping web pages load 45% faster

We are excited to announce the latest leap forward in speed – Speed Brain. Speed Brain uses the Speculation Rules API to prefetch content for the user's likely next navigations. The goal is to download a web page to the browser before a user navigates to it, allowing pages to load instantly. ...