This post is also available in 简体中文.
Today we’re excited to announce Smart Edge Revalidation. It was designed to ensure that compute resources are synchronized efficiently between our edge and a browser. Right now, as many as 30% of objects cached on Cloudflare’s edge do not have the HTTP response headers required for revalidation. This can result in unnecessary origin calls. Smart Edge Revalidation fixes this: it does the work to ensure that these headers are present, even when an origin doesn’t send them to us. The advantage of this? There’s less wasted bandwidth and compute for objects that do not need to be redownloaded. And there are faster browser page loads for users.
So What Is Revalidation?
Revalidation is one part of a longer story about efficiently serving objects that live on an origin server from an intermediary cache. Visitors to a website want it to be fast. One foundational way to make sure that a website is fast for visitors is to serve objects from cache. In this way, requests and responses do not need to transit unnecessary parts of the Internet back to an origin and, instead, can be served from a data center that is closer to the visitor. As such, website operators generally only want to serve content from an origin when content has changed. So how do objects stay in cache for as long as necessary?
One way to do that is with HTTP response headers.
When Cloudflare gets a response from an origin, included in that response are a number of headers. You can see these headers by opening any webpage, inspecting the page, going to the network tab, and clicking any file. In the response headers section there will generally be a header known as “
Cache-Control.” This header is a way for origins to answer caching intermediaries’ questions like: is this object eligible for cache? How long should this object be in cache? And what should the caching intermediary do after that time expires?
How long something should be in cache can be specified through the
s-maxage directives. These directives specify a TTL or time-to-live for the object in seconds. Once the object has been in cache for the requisite TTL, the clock hits 0 (zero) and it is marked as expired. Cache can no longer safely serve expired content to requests without figuring out if the object has changed on the origin or if it is the same.
If it has changed, it must be redownloaded from the origin. If it hasn’t changed, then it can be marked as fresh and continue to be served. This check, again, is known as revalidation.
We’re excited that Smart Edge Revalidation extends the efficiency of revalidation to everyone, regardless of an origin sending the necessary response headers
How is Revalidation Accomplished?
Two additional headers,
ETag, are set by an origin in order to distinguish different versions of the same URL/object across modifications. After the object expires and the revalidation check occurs, if the
ETag value hasn’t changed or a more recent
Last-Modified timestamp isn’t present, the object is marked “revalidated” and the expired object can continue to be served from cache. If there has been a change as indicated by the
ETag value or
Last-Modified timestamp, then the new object is downloaded and the old object is removed from cache.
Revalidation checks occur when a browser sends a request to a cache server using
If-None-Match headers. These request headers are questions sent from the browser cache about when an object has last changed that can be answered via the
Last-Modified response headers on the cache server. For example, if the browser sends a request to a cache server with
If-Modified-Since: Tue, 8 Nov 2021 07:28:00 GMT the cache server must look at the object being asked about and if it has not changed since November 8 at 7:28 AM, it will respond with a 304 status code indicating it’s unchanged. If the object has changed, the cache server will respond with the new object.
Sending a 304 status code that indicates an object can be reused is much more efficient than sending the entire object. It’s like if you ran a news website that updated every 24 hours. Once the content is updated for the day, you wouldn’t want to keep redownloading the same unchanged content from the origin and instead, you would prefer to make sure that the day’s content was just reused by sending a lightweight signal to that effect, until the site changes the next day.
The problem with this system of browser questions and revalidation responses is that sometimes origins don’t set
Last-Modified headers, or they aren’t configured by the website’s admin, making revalidation impossible. This means that every time an object expires, it must be redownloaded regardless of if there has been a change or not, because we have to assume that the asset has been updated, or else risk serving stale content.
This is an incredible waste of resources which costs hundreds of GB/sec of needless bandwidth between the edge and the visitor. Meaning browsers are downloading hundreds of GB/sec of content they may already have. If our baseline of revalidation is around 10% of all traffic and in initial tests, Smart Edge Revalidation increased revalidation just under 50%, this means that without a user needing to configure anything, we can increase total revalidations by around 5%!
Such a large reduction in bandwidth use also comes with potential environmental benefits. Based on Cloudflare's carbon emissions per byte, the needless bandwidth being used could amount to 2000+ metric tons CO2e/year, the equivalent of the CO2 emissions from more than 400 cars in a year.
Revalidation also comes with a performance improvement because it usually means a browser is downloading less than 1KB of data to check if the asset has changed or not, while pulling the full asset can be 100sKB. This can improve performance and reduce the bandwidth between the visitor and our edge.
How Smart Edge Revalidation Works
Etag headers are absent from the origin server response, Smart Edge Revalidation will use the time the object was cached on Cloudflare’s edge as the
Last-Modified header value. When a browser sends a revalidation request to Cloudflare using
If-None-Match, our edge can answer those revalidation questions using the
Last-Modified header generated from Smart Edge Revalidation. In this way, our edge can ensure efficient revalidation even if the headers are not sent from the origin.
Smart Edge Revalidation will be enabled automatically for all Cloudflare customers over the coming weeks. If this behavior is undesired, you can always ensure that Smart Edge Revalidation is not activated by confirming your origin is sending
Last-Modified headers when you want to indicate changed content. Additionally, you could have your origin direct your desired revalidation behavior by making sure it sets appropriate cache-control headers.
Smart Edge Revalidation is a win for everyone: visitors will get more content faster from cache, website owners can serve and revalidate additional content from Cloudflare efficiently, and the Internet will get a bit greener and more efficient.
Smart Edge Revalidation is the latest announcement to join the list of ways we're making our network more sustainable to help build a greener Internet — check out posts from earlier this week to learn about our climate commitments, Green Compute with Workers, Carbon Impact Report, Pages x Green Web Foundation partnership, and crawler hints.