Subscribe to receive notifications of new posts:

Blue Light Special: Ensuring fast global configuration changes

2015-07-03

2 min read

CloudFlare operates a huge global network of servers that proxy our customers' web sites, operate as caches, inspect requests to ensure they are not malicious, deflect DDoS attacks and handle one of the largest authoritative DNS systems in the world. And where there's software there's configuration information.

CloudFlare is highly customisable. Each customer has a unique configuration consisting of DNS records, all manner of settings (such as minification, image recompression, IP-based blocking, which individual WAF rules to execute) and per-URL rules. And the configuration changes constantly.

Warp speed configuration

We offer almost instant configuration changes. If a user adds a DNS record it should be globally resolvable in seconds. If a user enables a CloudFlare WAF rule it should happen very, very fast to protect a site. This presents a challenge because those configuration changes need to be pushed across the globe very quickly.

We've written in the past about the underlying technology we use: Kyoto Tycoon and how we secured it from eavesdroppers. We also monitor its performance.

DNS records are currently changing at a rate of around 40 per second, 24 hours a day. All those changes need to be propagated in seconds.

So we take propagation times very seriously.

Keep a close eye on this light of mine

For this we need to keep a close eye on how long it takes a change to reach every one of our data centers. Whilst we have in-depth metrics for our operations team to look at it's sometimes useful (and fun) to have something more visceral.

We also want developers and operations people to equally be aware of some critical metrics, and developers are spending their time observing the metrics and alerts aimed at operations.

On some rare occasions, perhaps due to routing problems on the wider Internet, we may find that our ability to push changes at the required velocity becomes impractical. To ensure that we know about this as soon as possible and know when to take action we've built a custom alert system that everyone in the office can see.

From an external global collection of machines we monitor propagation time for DNS records and trigger an alert if propagation time exceeds a pre-set threshold. The alert comes in the form of a blue rotating 'police light'.

We had joked about having a "red alert" alarm when we fall behind on propagation and so I turned that joke into reality.

Hawaii Pi-O

A Raspberry Pi hidden in an old hard drive case connects to our global monitors and obtains the current propagation time (as measured from outside our network). The Pi is connected (via a transistor acting as a switch) to a cheap mini police light that's visible throughout the office.

PS: All the puns in this post were added by John Graham-Cumming. I disclaim all responsibility.

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
Speed & ReliabilityReliabilityDNSWAFRaspberry Pi

Follow on X

Cloudflare|@cloudflare

Related posts

October 09, 2024 1:00 PM

Improving platform resilience at Cloudflare through automation

We realized that we need a way to automatically heal our platform from an operations perspective, and designed and built a workflow orchestration platform to provide these self-healing capabilities across our global network. We explore how this has helped us to reduce the impact on our customers due to operational issues, and the rich variety of similar problems it has empowered us to solve....

September 25, 2024 1:00 PM

Introducing Speed Brain: helping web pages load 45% faster

We are excited to announce the latest leap forward in speed – Speed Brain. Speed Brain uses the Speculation Rules API to prefetch content for the user's likely next navigations. The goal is to download a web page to the browser before a user navigates to it, allowing pages to load instantly. ...