Always Online v.2

by Matthew Prince.

Always Online  
v.2

The video on CloudFlare's home page promises that we will keep your web page online "even if your server goes down." It's a feature we dubbed "Always Online" and, when it works, it's magical. The problem is, Always Online doesn't always work.

This blog post is to announce that we've just released a new version of Always Online which we believe will make the feature significantly better. But, before I get to that, let me tell you a bit about the history of Always Online, how it has worked up until recently, and why it didn't always work. Then I'll turn to what we've done to create Always Online v.2.

An Accidental Feature

Prior to starting CloudFlare, Lee and I ran Project Honey Pot. The Project Honey Pot website is database driven and contains a virtually infinite number of pages. One of the biggest challenges we had wasn't human traffic, which followed a predictable browsing pattern and could therefore reliably be cached, but instead dealing with traffic from automated crawlers.

These crawlers, whether legitimate (e.g., Google's bot) or illegitimate (e.g., spam harvesters), tend to crawl very "deep" into sites. As a result, they hit pages that are unlikely to have been crawled in a while and, in doing so, can impose significant load on a database. I've previously written about the hidden tax web crawlers impose on web performance.

Always Online  
v.2

At Project Honey Pot, Lee built a number of sophisticated caching strategies in order to help lessen the load of automated crawlers on the site's database. At CloudFlare, he realized that we could provide the same type of caching in order to cut the burden bots placed on backends. In essence, we automatically cache content for a short amount of time and, if it hasn't changed since the last request from a bot, deliver it without having to burden your web application. It works great.

In the process of building the bot content cache, Lee realized he could implement something else: a system to serve static versions of pages if an origin server fails. Using human traffic to build such a cache is dangerous because you don't want to expose one user's private information to another user (e.g., we can't cache when one user visits their bank's website to view their statement and then show that statement to another user). However, search engine crawlers are the perfect anonymous user to build a site's cache. The logic was: if it's in Google, then it's already effectively cached.

Good, Not Perfect

The approach of using known search engine bot traffic to build CloudFlare's cache was clever, but it had some problems. The first was that CloudFlare runs multiple data centers around the world and the cache in each is different. The solution was to find the data center with the most search engine crawler traffic and, if a copy of the page didn't exist in the local data center's cache, fall back on the "master" data center. In our case, our Ashburn, Virginia data center received the most crawl traffic so we added a lot more disks there and used it to build up the Always Online cache.

That worked great for some sites, but for others we still would not have content in our cache when the server went offline. Seemingly bizarrely, the more static the page the less likely it was to be in our cache. The explanation was the source of the cache data: search engine crawlers. These crawlers are generally setup to visit pages that change regularly more often, and for pages that rarely change only occasionally. If the page returned a 304 "Not Modified" response then the content didn't get recached. We didn't help things by automatically expiring items in our cache after a period of time.

Always Online  
v.2

The net result was, far too often, when someone's site would go offline their visitors wouldn't see a cached version of the page but, instead, a CloudFlare error page telling them that the site was offline and no cached version was available. This became one of the top complaints from our users and the visitors to their sites. When our support team dubbed the feature "Always Offline" we knew it was time to make it better.

Version 2

We made a number of improvements in how we cache pages in order to improve Always Online, but the biggest change we made was to begin to actively crawl pages ourselves. CloudFlare now runs a crawler which periodically crawls our customers' pages if they have the Always Online feature enabled. The crawler's useragent is:

Mozilla/5.0 (compatible; CloudFlare-AlwaysOnline/1.0; +http://www.cloudflare.com/always-online )

You can learn more about the crawler's behavior by visiting: www.cloudflare.com/always-online. The frequency that we refresh pages in the Always Online depends on your plan. We crawl free customers once every 9 days, Pro customers once every 3 days, and Business and Enterprise customers daily. We are tinkering with the amount of time we spend crawling each site as well as tuning the crawler to ensure it doesn't visit sites when they're under load or otherwise impose any additional burden.

Given that we can now control exactly what is in our Always Online cache, our next iteration will be to turn that control over to our users and allow you to both "pin" the pages you want to ensure are always available and "exclude" any pages you never want cached. In the meantime, we're using data we have about the most popular portions of each site in order to choose what pages to prioritize in the cache.

Our goal is to make the Site Offline error a thing of the past. We started building the new cache a couple days ago and expect everyone with Always Online to have a more robust cache available within the next few days. While everyone hopes their origin server will never go down, with Always Online v.2 we're happy to provide better peace of mind in case it ever does.

comments powered by Disqus