The Cloudflare Blog

R2 SQL: a deep dive into our new distributed query engine

Yevgen Safronov — Thu, 25 Sep 2025 14:00:00 GMT

How do you run SQL queries over petabytes of data… without a server?

We have an answer for that: R2 SQL, a serverless query engine that can sift through enormous datasets and return results in seconds.

This post details the architecture and techniques that make this possible. We'll walk through our Query Planner, which uses R2 Data Catalog to prune terabytes of data before reading a single byte, and explain how we distribute the work across Cloudflare’s global network, Workers and R2 for massively parallel execution.

From catalog to query

During Developer Week 2025, we launched R2 Data Catalog, a managed Apache Iceberg catalog built directly into your Cloudflare R2 bucket. Iceberg is an open table format that provides critical database features like transactions and schema evolution for petabyte-scale object storage. It gives you a reliable catalog of your data, but it doesn’t provide a way to query it.

Until now, reading your R2 Data Catalog required setting up a separate service like Apache Spark or Trino. Operating these engines at scale is not easy: you need to provision clusters, manage resource usage, and be responsible for their availability, none of which contributes to the primary goal of getting value from your data.

R2 SQL removes that step entirely. It’s a serverless query engine that executes retrieval SQL queries against your Iceberg tables, right where your data lives.

Designing a query engine for petabytes

Object storage is fundamentally different from a traditional database’s storage. A database is structured by design; R2 is an ocean of objects, where a single logical table can be composed of potentially millions of individual files, large and small, with more arriving every second.

Apache Iceberg provides a powerful layer of logical organization on top of this reality. It works by managing the table's state as an immutable series of snapshots, creating a reliable, structured view of the table by manipulating lightweight metadata files instead of rewriting the data files themselves.

However, this logical structure doesn't change the underlying physical challenge: an efficient query engine must still find the specific data it needs within that vast collection of files, and this requires overcoming two major technical hurdles:

The I/O problem: A core challenge for query efficiency is minimizing the amount of data read from storage. A brute-force approach of reading every object is simply not viable. The primary goal is to read only the data that is absolutely necessary.

The Compute problem: The amount of data that does need to be read can still be enormous. We need a way to give the right amount of compute power to a query, which might be massive, for just a few seconds, and then scale it down to zero instantly to avoid waste.

Our architecture for R2 SQL is designed to solve these two problems with a two-phase approach: a Query Planner that uses metadata to intelligently prune the search space, and a Query Execution system that distributes the work across Cloudflare's global network to process the data in parallel.

Query Planner

The most efficient way to process data is to avoid reading it in the first place. This is the core strategy of the R2 SQL Query Planner. Instead of exhaustively scanning every file, the planner makes use of the metadata structure provided by R2 Data Catalog to prune the search space, that is, to avoid reading huge swathes of data irrelevant to a query.

This is a top-down investigation where the planner navigates the hierarchy of Iceberg metadata layers, using stats at each level to build a fast plan, specifying exactly which byte ranges the query engine needs to read.

What do we mean by “stats”?

When we say the planner uses "stats" we are referring to summary metadata that Iceberg stores about the contents of the data files. These statistics create a coarse map of the data, allowing the planner to make decisions about which files to read, and which to ignore, without opening them.

There are two primary levels of statistics the planner uses for pruning:

Partition-level stats: Stored in the Iceberg manifest list, these stats describe the range of partition values for all the data in a given Iceberg manifest file. For a partition on day(event_timestamp), this would be the earliest and latest day present in the files tracked by that manifest.

Column-level stats: Stored in the manifest files, these are more granular stats about each individual data file. Data files in R2 Data Catalog are formatted using the Apache Parquet. For every column of a Parquet file, the manifest stores key information like:

The minimum and maximum values. If a query asks for http_status = 500, and a file’s stats show its http_status column has a min of 200 and a max of 404, that entire file can be skipped.
A count of null values. This allows the planner to skip files when a query specifically looks for non-null values (e.g., WHERE error_code IS NOT NULL) and the file's metadata reports that all values for error_code are null.

Now, let's see how the planner uses these stats as it walks through the metadata layers.

Pruning the search space

The pruning process is a top-down investigation that happens in three main steps:

Table metadata and the current snapshot

The planner begins by asking the catalog for the location of the current table metadata. This is a JSON file containing the table's current schema, partition specs, and a log of all historical snapshots. The planner then fetches the latest snapshot to work with.

2. Manifest list and partition pruning

The current snapshot points to a single Iceberg manifest list. The planner reads this file and uses the partition-level stats for each entry to perform the first, most powerful pruning step, discarding any manifests whose partition value ranges don't satisfy the query. For a table partitioned by day(event_timestamp), the planner can use the min/max values in the manifest list to immediately discard any manifests that don't contain data for the days relevant to the query.

3. Manifests and file-level pruning

For the remaining manifests, the planner reads each one to get a list of the actual Parquet data files. These manifest files contain more granular, column-level stats for each individual data file they track. This allows for a second pruning step, discarding entire data files that cannot possibly contain rows matching the query's filters.

4. File row-group pruning

Finally, for the specific data files that are still candidates, the Query Planner uses statistics stored inside Parquet file's footers to skip over entire row groups.

The result of this multi-layer pruning is a precise list of Parquet files, and of row groups within those Parquet files. These become the query work units that are dispatched to the Query Execution system for processing.

The Planning pipeline

In R2 SQL, the multi-layer pruning we've described so far isn't a monolithic process. For a table with millions of files, the metadata can be too large to process before starting any real work. Waiting for a complete plan would introduce significant latency.

Instead, R2 SQL treats planning and execution together as a concurrent pipeline. The planner's job is to produce a stream of work units for the executor to consume as soon as they are available.

The planner’s investigation begins with two fetches to get a map of the table's structure: one for the table’s snapshot and another for the manifest list.

Starting execution as early as possible

From that point on, the query is processed in a streaming fashion. As the Query Planner reads through the manifest files and subsequently the data files they point to and prunes them, it immediately emits any matching data files/row groups as work units to the execution queue.

This pipeline structure ensures the compute nodes can begin the expensive work of data I/O almost instantly, long before the planner has finished its full investigation.

On top of this pipeline model, the planner adds a crucial optimization: deliberate ordering. The manifest files are not streamed in an arbitrary sequence. Instead, the planner processes them in an order matching by the query's ORDER BY clause, guided by the metadata stats. This ensures that the data most likely to contain the desired results is processed first.

These two concepts work together to address query latency from both ends of the query pipeline.

The streamed planning pipeline lets us start crunching data as soon as possible, minimizing the delay before the first byte is processed. At the other end of the pipeline, the deliberate ordering of that work lets us finish early by finding a definitive result without scanning the entire dataset.

The next section explains the mechanics behind this "finish early" strategy.

Stopping early: how to finish without reading everything

Thanks to the Query Planner streaming work units in an order matching the ORDER BY clause, the Query Execution system first processes the data that is most likely to be in the final result set.

This prioritization happens at two levels of the metadata hierarchy:

Manifest ordering: The planner first inspects the manifest list. Using the partition stats for each manifest (e.g., the latest timestamp in that group of files), it decides which entire manifest files to stream first.

Parquet file ordering: As it reads each manifest, it then uses the more granular column-level stats to decide the processing order of the individual Parquet files within that manifest.

This ensures a constantly prioritized stream of work units is sent to the execution engine. This prioritized stream is what allows us to stop the query early.

For instance, with a query like ... ORDER BY timestamp DESC LIMIT 5, as the execution engine processes work units and sends back results, the planner does two things concurrently:

It maintains a bounded heap of the best 5 results seen so far, constantly comparing new results to the oldest timestamp in the heap.

It keeps a "high-water mark" on the stream itself. Thanks to the metadata, it always knows the absolute latest timestamp of any data file that has not yet been processed.

The planner is constantly comparing the state of the heap to the water mark of the remaining stream. The moment the oldest timestamp in our Top 5 heap is newer than the high-water mark of the remaining stream, the entire query can be stopped.

At that point, we can prove no remaining work unit could possibly contain a result that would make it into the top 5. The pipeline is halted, and a complete, correct result is returned to the user, often after reading only a fraction of the potentially matching data.

Currently, R2 SQL supports ordering on columns that are part of the table's partition key only. This is a limitation we are working on lifting in the future.

Architecture

Query Execution

Query Planner streams the query work in bite-sized pieces called row groups. A single Parquet file usually contains multiple row groups, but most of the time only a few of them contain relevant data. Splitting query work into row groups allows R2 SQL to only read small parts of potentially multi-GB Parquet files.

The server that receives the user’s request and performs query planning assumes the role of query coordinator. It distributes the work across query workers and aggregates results before returning them to the user.

Cloudflare’s network is vast, and many servers can be in maintenance at the same time. The query coordinator contacts Cloudflare’s internal API to make sure only healthy, fully functioning servers are picked for query execution. Connections between coordinator and query worker go through Cloudflare Argo Smart Routing to ensure fast, reliable connectivity.

Servers that receive query execution requests from the coordinator assume the role of query workers. Query workers serve as a point of horizontal scalability in R2 SQL. With a higher number of query workers, R2 SQL can process queries faster by distributing the work among many servers. That’s especially true for queries covering large amounts of files.

Both the coordinator and query workers run on Cloudflare’s distributed network, ensuring R2 SQL has plenty of compute power and I/O throughput to handle analytical workloads.

Each query worker receives a batch of row groups from the coordinator as well as an SQL query to run on it. Additionally, the coordinator sends serialized metadata about Parquet files containing the row groups. Thanks to that, query workers know exact byte offsets where each row group is located in the Parquet file without the need to read this information from R2.

Apache DataFusion

Internally, each query worker uses Apache DataFusion to run SQL queries against row groups. DataFusion is an open-source analytical query engine written in Rust. It is built around the concept of partitions. A query is split into multiple concurrent independent streams, each working on its own partition of data.

Partitions in DataFusion are similar to partitions in Iceberg, but serve a different purpose. In Iceberg, partitions are a way to physically organize data on object storage. In DataFusion, partitions organize in-memory data for query processing. While logically they are similar – rows grouped together based on some logic – in practice, a partition in Iceberg doesn’t always correspond to a partition in DataFusion.

DataFusion partitions map perfectly to the R2 SQL query worker’s data model because each row group can be considered its own independent partition. Thanks to that, each row group is processed in parallel.

At the same time, since row groups usually contain at least 1000 rows, R2 SQL benefits from vectorized execution. Each DataFusion partition stream can execute the SQL query on multiple rows in one go, amortizing the overhead of query interpretation.

There are two ends of the spectrum when it comes to query execution: processing all rows sequentially in one big batch and processing each individual row in parallel. Sequential processing creates a so-called “tight loop”, which is usually more CPU cache friendly. In addition to that, we can significantly reduce interpretation overhead, as processing a large number of rows at a time in batches means that we go through the query plan less often. Completely parallel processing doesn’t allow us to do these things, but makes use of multiple CPU cores to finish the query faster.

DataFusion’s architecture allows us to achieve a balance on this scale, reaping benefits from both ends. For each data partition, we gain better CPU cache locality and amortized interpretation overhead. At the same time, since many partitions are processed in parallel, we distribute the workload between multiple CPUs, cutting the execution time further.

In addition to the smart query execution model, DataFusion also provides first-class Parquet support.

As a file format, Parquet has multiple optimizations designed specifically for query engines. Parquet is a column-based format, meaning that each column is physically separated from others. This separation allows better compression ratios, but it also allows the query engine to read columns selectively. If the query only ever uses five columns, we can only read them and skip reading the remaining fifty. This massively reduces the amount of data we need to read from R2 and the CPU time spent on decompression.

DataFusion does exactly that. Using R2 ranged reads, it is able to read parts of the Parquet files containing the requested columns, skipping the rest.

DataFusion’s optimizer also allows us to push down any filters to the lowest levels of the query plan. In other words, we can apply filters right as we are reading values from Parquet files. This allows us to skip materialization of results we know for sure won’t be returned to the user, cutting the query execution time further.

Returning query results

Once the query worker finishes computing results, it returns them to the coordinator through the gRPC protocol.

R2 SQL uses Apache Arrow for internal representation of query results. Arrow is an in-memory format that efficiently represents arrays of structured data. It is also used by DataFusion during query execution to represent partitions of data.

In addition to being an in-memory format, Arrow also defines the Arrow IPC serialization format. Arrow IPC isn’t designed for long-term storage of the data, but for inter-process communication, which is exactly what query workers and the coordinator do over the network. The query worker serializes all the results into the Arrow IPC format and embeds them into the gRPC response. The coordinator in turn deserializes results and can return to working on Arrow arrays.

Future plans

While R2 SQL is currently quite good at executing filter queries, we also plan to rapidly add new capabilities over the coming months. This includes, but is not limited to, adding:

Support for complex aggregations in a distributed and scalable fashion;
Tools to help provide visibility in query execution to help developers improve performance;
Support for many of the configuration options Apache Iceberg supports.

In addition to that, we have plans to improve our developer experience by allowing users to query their R2 Data Catalogs using R2 SQL from the Cloudflare Dashboard.

Given Cloudflare’s distributed compute, network capabilities, and ecosystem of developer tools, we have the opportunity to build something truly unique here. We are exploring different kinds of indexes to make R2 SQL queries even faster and provide more functionality such as full text search, geospatial queries, and more.

Try it now!

It’s early days for R2 SQL, but we’re excited for users to get their hands on it. R2 SQL is available in open beta today! Head over to our getting started guide to learn how to create an end-to-end data pipeline that processes and delivers events to an R2 Data Catalog table, which can then be queried with R2 SQL.

We’re excited to see what you build! Come share your feedback with us on our Developer Discord.

SVG support in Cloudflare Images

Paulo Costa — Wed, 21 Sep 2022 14:00:00 GMT

Cloudflare Images was announced one year ago on this very blog to help you solve the problem of delivering images in the right size, right quality and fast. Very fast.

It doesn’t really matter if you only run a personal blog, or a portal with thousands of vendors and millions of end-users. Doesn’t matter if you need one hundred images to be served one thousand times each at most, or if you deal with tens of millions of new, unoptimized, images that you deliver billions of times per month.

We want to remove the complexity of dealing with the need to store, to process, resize, re-encode and serve the images using multiple platforms and vendors.

At the time we wrote:

Images is a single product that stores, resizes, optimizes and serves images. We built Cloudflare Images, so customers of all sizes can build a scalable and affordable image pipeline in minutes.

We supported the most common formats, such as JPG, WebP, PNG and GIF.

We did not feel the need to support SVG files. SVG files are inherently scalable, so there is nothing to resize on the server side before serving them to your audience. One can even argue that SVG files are documents that can generate images through mathematical formulas of vectors and nodes, but are not images per se.

There was also the clear notion that SVG files were a potential risk due to known and well documented vulnerabilities. We knew we could do something from the security angle, but still, why go through that workload if it didn’t make sense in the first place to consider an SVG as a supported format.

Not supporting SVG files, though, did bring a set of challenges to an increasing number of our customers. Some stats already show that around 50% of websites serve SVG files, which matches the pulse we got from talking with many of you, customers and community.

If you relied on SVGs, you had to select a second storage location or a second image platform elsewhere. That commonly resulted in an egress fee when serving an uncached file from that source, and it goes against what we want for our product: one image pipeline to cover all your needs.

We heard loud and clear, and starting from today, you can store and serve SVG files, safely, with Cloudflare Images.

SVG, what is so special about them?

The Scalable Vector Graphics file type is great for serving all kinds of illustrations, charts, logos, and icons.

SVG files don't represent images as pixels, but as geometric shapes (lines, arcs, polygons) that can be drawn with perfect sharpness at any resolution.

Let’s use now a complex image as an example, filled with more than four hundred paths and ten thousand nodes:

Contrary to the bitmaps where pixels arrange together to create the visual perception of an image to the human eye, that vector image can be resized with no quality loss. That happens because resizing that SVG to 300% of its original size is redefining the size of the vectors to 300%, not expanding pixels to 300%.

This becomes evident when we’re dealing with small resolution images.

Here is the 100px width SVG from the Toroid shown above:

and the correspondent 100 pixels width PNG:

Now here is the same SVG with the HTML width attribute set at 300px:

and the same PNG you saw before, but, upscaled by 3x, so the width is also 300px:

The visual quality loss on the PNG is obvious when it gets scaled up.

Keep in mind: The Toroid shown above is stored in an SVG file of 142Kb. And that is a very complex and heavy SVG file already.

Now, if you do want to display a PNG with an original width of 1024px to present a high quality image of the same Toroid above, the size will become an issue:

The new 1024px PNG, however, weighs 344 KB. That’s about 2.4 times the weight of the unique SVG that you could use in any size.

Think about the storage and bandwidth savings when all you need to do with an SVG, to get the exact same displayed image is use a width=”1024” in your HTML. It requires less than half of the kilobytes used on the PNG.

Couple all of this with the flexibility of using attributes like viewbox in your HTML code, and you can pan, zoom, crop, scale, all without ever needing anything other than the one and original SVG file.

Here’s an example of an SVG being resized on the client side, with no visual quality loss:

Let’s do a quick summary of what we covered so far: SVG files are wonderful for vector images like illustrations, charts, logos, and are infinitely scalable with no need to resize on the server side;

the same generated image, but on a bitmap is either heavier than the SVG when used in high resolutions, or with very noticeable loss of visual quality when scaled up from a lower resolution.

So, what are the downsides of using SVG files?

SVG files aren't just images. They are XML-based documents that are as powerful as HTML pages. They can contain arbitrary JavaScript, fetch external content from other URLs or embed HTML elements. This gives SVG files much more power than expected from a simple image.

Throughout the years, numerous exploits have been known, identified and corrected.

Some old attacks were very rudimentary, yet effective. The famous Billion Laughs exploited how XML uses Entities and declares them in the Document Type Definition, and how it handles recursion.

Entities can be something as simple as a declaration of a text string, or a nested reference to other previous entities.

If you defined a first entity with a simple string, and then created a second entity calling 10 times the first one, and then a third entity calling 10 times the second one up until a 10th one of the same kind, you were requiring a parser to generate an output of a billion strings as defined on the very simple first entity. This would most commonly exhaust resources on the server parsing the XML, and form a DoS. While that particular limitation from the XML parsing got widely addressed through XML parser memory caps and lazy loading of entities, more complex attacks became a regular thing in recent years.

The common themes in these more recent attacks have been XSS (cross-site-scripting) and foreign objects referenced in the XML content. In both cases, using SVG inside tags in your HTML is an invitation for any ill-intended file to reach your end-users. So, what exactly can we do about it and make you trust any SVG file you serve?

The SVG filter

We've developed a filter that simplifies SVG files to only features used for images, so that serving SVG images from any source is just as safe as serving a JPEG or PNG, while preserving SVG's vector graphics capabilities.

We remove scripting. This prevents SVG files from being used for cross-site scripting attacks. Although browsers don't allow scripts in , they would run scripts when SVG files are opened directly as a top-level document.
We remove hyperlinks to other documents. This makes SVG files less attractive for SEO spam and phishing.
We remove references to cross-origin resources. This stops 3rd parties from tracking who is viewing the image.

What's left is just an image.

SVG files can also contain embedded images in other formats, like JPEG and PNG, in the form of Data URLs. We treat these embedded images just like other images that we process, and optimize them too. We don't support SVG files embedded in SVG recursively, though. It does open the door to recursive parsing leading to resource exhaustion on the parser. While the most common browsers are already limiting SVG recursion to one level, the potential to exploit that door led us to not include, at least for now, this capability on our filter.

We do set Content-Security-Policy (CSP) headers in all our HTTP response headers to disable unwanted features, and that alone acts as first defense, but filtering acts in more depth in case these headers are lost (e.g. if the image was saved as a file and served elsewhere).

Our tool is open-source. It's written in Rust and can filter SVG files in a streaming fashion without buffering, so it's fast enough for filtering on the fly.

The SVG format is pretty complex, with lots of features. If there is safe SVG functionality that we don't support yet, you can report issues and contribute to development of the filter.

You can see how the tool actually works by looking at the tests folder in the open-source repository, where a sample unfiltered XML and the already filtered version are present.

Here’s how a diff of those files looks like:

Removed are the external references, foreignObjects and any other potential threats.

How you can use SVG files in Cloudflare Images

Starting now you can upload SVG files to Cloudflare Images and serve them at will. Uploading the images can be done like for any other supported format, via UI or API.

Variants, named or flexible, are intended to transform bitmap (raster) images into whatever size you want to serve them.

SVG files, as vector images, do not require resizing inside the Images pipeline.

This results in a banner with the following message when you’re previewing an SVG in the UI:

And as a result, all variants listed will show the exact same image in the exact same dimensions.

Because an image is worth a thousand words, especially when trying to describe behaviors, here is what will it look like if you scroll through the variants preview:

With Cloudflare Images you do get a default Public Variant listed when you start using the product, and so you can immediately start serving your SVG files using it, just like this:

https://imagedelivery.net///public

And, as shown from above, you can use any of your variant names to serve the image, as it won’t affect the output at all.

If you’re an Image Resizing customer, you can also benefit from serving your files with our tool. Make sure you head to the Developer Documentation pages to see how.

What’s next?

You can subscribe to Cloudflare Images directly in the dashboard, and starting from today you can use the product to store and serve SVG files.

If you want to contribute to further developments of the filtering too and help expand its abilities, check out our SVG-Hush Tool repo.

You can also connect directly with the team in our Cloudflare Developers Discord Server.

Announcing the Cloudflare Images Sourcing Kit

Paulo Costa — Fri, 13 May 2022 12:59:25 GMT

When we announced Cloudflare Images to the world, we introduced a way to store images within the product and help customers move away from the egress fees met when using remote sources for their deliveries via Cloudflare.

To store the images in Cloudflare, customers can upload them via UI with a simple drag and drop, or via API for scenarios with a high number of objects for which scripting their way through the upload process makes more sense.

To create flexibility on how to import the images, we’ve recently also included the ability to upload via URL or define custom names and paths for your images to allow a simple mapping between customer repositories and the objects in Cloudflare. It's also possible to serve from a custom hostname to create flexibility on how your end-users see the path, to improve the delivery performance by removing the need to do TLS negotiations or to improve your brand recognition through URL consistency.

Still, there was no simple way to tell our product: “Tens of millions of images are in this repository URL. Go and grab them all from me”.

In some scenarios, our customers have buckets with millions of images to upload to Cloudflare Images. Their goal is to migrate all objects to Cloudflare through a one-time process, allowing you to drop the external storage altogether.

In another common scenario, different departments in larger companies use independent systems configured with varying storage repositories, all of which they feed at specific times with uneven upload volumes. And it would be best if they could reuse definitions to get all those new Images in Cloudflare to ensure the portfolio is up-to-date while not paying egregious egress fees by serving the public directly from those multiple storage providers.

These situations required the upload process to Cloudflare Images to include logistical coordination and scripting knowledge. Until now.

Announcing the Cloudflare Images Sourcing Kit

Today, we are happy to share with you our Sourcing Kit, where you can define one or more sources containing the objects you want to migrate to Cloudflare Images.

But, what exactly is Sourcing? In industries like manufacturing, it implies a number of operations, from selecting suppliers, to vetting raw materials and delivering reports to the process owners.

So, we borrowed that definition and translated it into a Cloudflare Images set of capabilities allowing you to:

Define one or multiple repositories of images to bulk import;
Reuse those sources and import only new images;
Make sure that only actual usable images are imported and not other objects or file types that exist in that source;
Define the target path and filename for imported images;
Obtain Logs for the bulk operations;

The new kit does it all. So let's go through it.

How the Cloudflare Images Sourcing Kit works

In the Cloudflare Dashboard, you will soon find the Sourcing Kit under Images.

In it, you will be able to create a new source definition, view existing ones, and view the status of the last operations.

Clicking on the create button will launch the wizard that will guide you through the first bulk import from your defined source:

First, you will need to input the Name of the Source and the URL for accessing it. You’ll be able to save the definitions and reuse the source whenever you wish.After running the necessary validations, you’ll be able to define the rules for the import process.

The first option you have allows an Optional Prefix Path. Defining a prefix allows a unique identifier for the images uploaded from this particular source, differentiating the ones imported from this source.

The naming rule in place respects the source image name and path already, so let's assume there's a puppy image to be retrieved at:

[https://my-bucket.s3.us-west-2.amazonaws.com/folderA/puppy.png](https://my-bucket.s3.us-west-2.amazonaws.com/folderA/puppy.png)

When imported without any Path Prefix, you’ll find the image at

[https://imagedelivery.net//folderA/puppy.png](https://imagedelivery.net//folderA/puppy.png)

Now, you might want to create an additional Path Prefix to identify the source, for example by mentioning that this bucket is from the Technical Writing department. In the puppy case, the result would be:

[https://imagedelivery.net//**techwriting**/folderA/puppy.png](https://imagedelivery.net//techwriting/folderA/puppy.png)

Custom Path prefixes also provide a way to prevent name clashes coming from other sources.

Still, there will be times when customers don't want to use them. And, when re-using the source to import images, a same path+filename destinations clash might occur.

By default, we don’t overwrite existing images, but we allow you to select that option and refresh your catalog present in the Cloudflare pipeline.

Once these inputs are defined, a click on the Create and start migration button at the bottom will trigger the upload process.

This action will show the final wizard screen, where the migration status is displayed. The progress log will report any errors obtained during the upload and is also available to download.

You can reuse, edit or delete source definitions when no operations are running, and at any point, from the home page of the kit, it's possible to access the status and return to the ongoing or last migration report.

What’s next?

With the Beta version of the Cloudflare Images Sourcing Kit, we will allow you to define AWS S3 buckets as a source for the imports. In the following versions, we will enable definitions for other common repositories, such as the ones from Azure Storage Accounts or Google Cloud Storage.

And while we're aiming for this to be a simple UI, we also plan to make everything available through CLI: from defining the repository URL to starting the upload process and retrieving a final report.

Apply for the Beta version

We will be releasing the Beta version of this kit in the following weeks, allowing you to source your images from third party repositories to Cloudflare.

If you want to be the first to use Sourcing Kit, request to join the waitlist on the Cloudflare Images dashboard.

Cloudflare Images introduces AVIF, Blur and Bundle with Stream

Marc Lamik — Thu, 18 Nov 2021 14:00:10 GMT

Two months ago we launched Cloudflare Images for everyone, and we are amazed about the adoption and the feedback we received.

Let’s start with some numbers:

More than 70 million images delivered per day on average in the week of November 5 to 12.

More than 1.5 million images have been uploaded so far, growing faster every day.

But we are just getting started and are happy to announce the release of the most requested features, first we talk about the AVIF support for Images, converting as many images as possible with AVIF results in highly compressed, fast delivered images without compromising on the quality.

Secondly we introduce blur. By blurring an image, in combination with the already supported protection of private images via signed URL, we make Cloudflare Images a great solution for previews for paid content.

For many of our customers it is important to be able to serve Images from their own domain and not only via imagedelivery.net. Here we show an easy solution for this using a custom Worker or a special URL.

Last but not least we announce the launch of new attractively priced bundles for both Cloudflare Images and Stream.

Images supports AVIF

We announced support for the new AVIF image format in Image Resizing product last year.

Last month we added AVIF support in Cloudflare Images. It compresses images significantly better than older-generation formats such as WebP and JPEG. Today, AVIF image format is supported both in Chrome and Firefox. Globally, almost 70% of users have a web browser that supports AVIF.

What is AVIF

As we explained previously, AVIF is a combination of the HEIF ISO standard, and a royalty-free AV1 codec by Mozilla, Xiph, Google, Cisco, and many others.

“Currently, JPEG is the most popular image format on the web. It's doing remarkably well for its age, and it will likely remain popular for years to come thanks to its excellent compatibility. There have been many previous attempts at replacing JPEG, such as JPEG 2000, JPEG XR, and WebP. However, these formats offered only modest compression improvements and didn't always beat JPEG on image quality. Compression and image quality in AVIF is better than in all of them, and by a wide margin.”¹

How Cloudflare Images supports AVIF

As a reminder, image delivery is done through the Cloudflare managed imagedelivery.net domain. It is powered by Cloudflare Workers. We have the following logic to request the AVIF format based on the Accept HTTP request header:

const WEBP_ACCEPT_HEADER = /image\/webp/i;
const AVIF_ACCEPT_HEADER = /image\/avif/i;

addEventListener("fetch", (event) => {
  event.respondWith(handleRequest(event));
});

async function handleRequest(event) {
  const request = event.request;
  const url = new URL(request.url);
  
  const headers = new Headers(request.headers);

  const accept = headers.get("accept");

  let format = undefined;

  if (WEBP_ACCEPT_HEADER.test(accept)) {
    format = "webp";
  }

  if (AVIF_ACCEPT_HEADER.test(accept)) {
    format = "avif";
  }

  const resizingReq = new Request(url, {
    headers,
    cf: {
      image: { ..., format },
    },
  });

  return fetch(resizingReq);
}

Based on the Accept header, the logic in the Worker detects if WebP or AVIF format can be served. The request is passed to Image Resizing. If the image is available in the Cloudflare cache it will be served immediately, otherwise the image will be resized, transformed, and cached. This approach ensures that for clients without AVIF format support we deliver images in WebP or JPEG formats.

The benefit of Cloudflare Images product is that we added AVIF support without a need for customers to change a single line of code from their side.

The transformation of an image to AVIF is compute-intensive but leads to a significant benefit in file-size. We are always weighing the cost and benefits in the decision which format to serve.

It Is worth noting that all the conversions to WebP and AVIF formats happen on the request phase for image delivery at the moment. We will be adding the ability to convert images on the upload phase in the future.

Introducing Blur

One of the most requested features for Images and Image Resizing was adding support for blur. We recently added the support for blur both via URL format and with Cloudflare Workers.

Cloudflare Images uses variants. When you create a variant, you can define properties including variant name, width, height, and whether the variant should be publicly accessible. Blur will be available as a new option for variants via variant API:

curl -X POST "https://api.cloudflare.com/client/v4/accounts/9a7806061c88ada191ed06f989cc3dac/images/v1/variants" \
     -H "Authorization: Bearer " \
     -H "Content-Type: application/json" \
     --data '{"id":"blur","options":{"metadata":"none","blur":20},"neverRequireSignedURLs":true}'

One of the use cases for using blur with Cloudflare Images is to control access to the premium content.

The customer will upload the image that requires an access token:

curl -X POST "https://api.cloudflare.com/client/v4/accounts/9a7806061c88ada191ed06f989cc3dac/images/v1" \
     -H "Authorization: Bearer "
     --form 'file=@./' \
     --form 'requireSignedURLs=true'

Using the variant we defined via API we can fetch the image without providing a signature:

To access the protected image a valid signed URL will be required:

Lava lamps in the Cloudflare lobby. Courtesy of @mahtin

The combination of image blurring and restricted access to images could be integrated into many scenarios and provides a powerful tool set for content publishers.

The functionality to define a variant with a blur option is coming soon in the Cloudflare dashboard.

Serving images from custom domains

One important use case for Cloudflare Images customers is to serve images from custom domains. It could improve latency and loading performance by not requiring additional TLS negotiations on the client. Using Cloudflare Workers customers can add this functionality today using the following example:

const IMAGE_DELIVERY_HOST = "https://imagedelivery.net";

addEventListener("fetch", async (event) => {
  event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
  const url = new URL(request.url);
  const { pathname, search } = url;

  const destinationURL = IMAGE_DELIVERY_HOST + pathname + search;
  return fetch(new Request(destinationURL));
}

For simplicity, the Workers script makes the redirect from the domain where it’s deployed to the imagedelivery.net. We assume the same format as for Cloudflare Images URLs:

https://///

The Worker could be adjusted to fit customer needs like:

Serving images from a specific domains' path e.g. /images/
Populate account id or variant name automatically
Map Cloudflare Images to custom URLs altogether

For customers who just want the simplicity of serving Cloudflare Images from their domains on Cloudflare we will be adding the ability to serve Cloudflare Images using the following format:

https:///cdn-cgi/imagedelivery//<_image_id>/

Image delivery will be supported from all customer domains under the same Cloudflare account where Cloudflare Images subscription is activated. This will be available to all Cloudflare Images customers before the holidays.

Images and Stream Bundle

Creator platforms, eCommerce, and many other products have one thing in common: having an easy and accessible way to upload, store and deliver your images and videos in the best and most affordable way is vital.

We teamed up with the Stream team to create a set of bundles that make it super easy to get started with your product.

The Starter bundle is perfect for experimenting and a first MVP. For just $10 per month it is 50% cheaper than the unbundled option, and includes enough to get started:

Stream: 1,000 stored minutes and 5,000 minutes served
Images: 100,000 stored images and 500,000 images served

For larger and fast scaling applications we have the Creator Bundle for $50 per month which saves over 60% compared to the unbundled products. It includes everything to start scaling:

Stream: 10,000 stored minutes and 50,000 minutes served
Images: 500,000 stored images and 1,000,000 images served

These new bundles will be available to all customers from the end of November.

What’s next

We are not stopping here, and we already have the next features for Images lined up. One of them is Images Analytics. Having great analytics for a product is vital, and so we will be introducing analytics functionality for Cloudflare Images for all customers to be able to keep track of all images and their usage.

Watch on Cloudflare TV

¹http://blog.cloudflare.com/generate-avif-images-with-image-resizing/#what-is-avif

Building Cloudflare Images in Rust and Cloudflare Workers

Yevgen Safronov — Wed, 15 Sep 2021 12:59:28 GMT

This post explains how we implemented the Cloudflare Images product with reusable Rust libraries and Cloudflare Workers. It covers the technical design of Cloudflare Image Resizing and Cloudflare Images. Using Rust and Cloudflare Workers helps us quickly iterate and deliver product improvements over the coming weeks and months.

Reuse of code in Rusty image projects

We developed Image Resizing in Rust. It's a web server that receives HTTP requests for images along with resizing options, fetches the full-size images from the origin, applies resizing and other image processing operations, compresses, and returns the HTTP response with the optimized image.

Rust makes it easy to split projects into libraries (called crates). The image processing and compression parts of Image Resizing are usable as libraries.

We also have a product called Polish, which is a Golang-based service that recompresses images in our cache. Polish was initially designed to run command-line programs like jpegtran and pngcrush. We took the core of Image Resizing and wrapped it in a command-line executable. This way, when Polish needs to apply lossy compression or generate WebP images or animations, it can use Image Resizing via a command-line tool instead of a third-party tool.

Reusing libraries has allowed us to easily unify processing between Image Resizing and Polish (for example, to ensure that both handle metadata and color profiles in the same way).

Cloudflare Images is another product we've built in Rust. It added support for a custom storage back-end, variants (size presets), support for signing URLs and more. We made it as a collection of Rust crates, so we can reuse pieces of it in other services running anywhere in our network. Image Resizing provides image processing for Cloudflare Images and shares libraries with Images to understand the new URL scheme, access the storage back-end, and database for variants.

How Image Resizing works

The Image Resizing service runs at the edge and is deployed on every server of the Cloudflare global network. Thanks to Cloudflare's global Anycast network, the closest Cloudflare data center will handle eyeball image resizing requests. Image Resizing is tightly integrated with the Cloudflare cache and handles eyeball requests only on a cache miss.

There are two ways to use Image Resizing. The default URL scheme provides an easy, declarative way of specifying image dimensions and other options. The other way is to use a JavaScript API in a Worker. Cloudflare Workers give powerful programmatic control over every image resizing request.

How Cloudflare Images work

Cloudflare Images consists of the following components:

The Images core service that powers the public API to manage images assets.
The Image Resizing service responsible for image transformations and caching.
The Image delivery Cloudflare Worker responsible for serving images and passing corresponding parameters through to the Imaging Resizing service.
Image storage that provides access and storage for original image assets.

To support Cloudflare Images scenarios for image transformations, we made several changes to the Image Resizing service:

Added access to Cloudflare storage with original image assets.
Added access to variant definitions (size presets).
Added support for signing URLs.

Image delivery

The primary use case for Cloudflare Images is to provide a simple and easy-to-use way of managing images assets. To cover egress costs, we provide image delivery through the Cloudflare managed imagedelivery.net domain. It is configured with Tiered Caching to maximize the cache hit ratio for image assets. imagedelivery.net provides image hosting without a need to configure a custom domain to proxy through Cloudflare.

A Cloudflare Worker powers image delivery. It parses image URLs and passes the corresponding parameters to the image resizing service.

How we store Cloudflare Images

There are several places we store information on Cloudflare Images:

image metadata in Cloudflare's core data centers
variant definitions in Cloudflare's edge data centers
original images in core data centers
optimized images in Cloudflare cache, physically close to eyeballs.

Image variant definitions are stored and delivered to the edge using Cloudflare's distributed key-value store called Quicksilver. We use a single source of truth for variants. The Images core service makes calls to Quicksilver to read and update variant definitions.

The rest of the information about the image is stored in the image URL itself:https://imagedelivery.net///

contains a flag, whether it's publicly available or requires access verification. It's not feasible to store any image metadata in Quicksilver as the data volume would increase linearly with the number of images we host. Instead, we only allow a finite number of variants per account, so we responsibly utilize available disk space on the edge. The downside of storing image metadata as part of is that will change on access change.

How we keep Cloudflare Images up to date

The only way to access images is through the use of variants. Each variant is a named image resizing configuration. Once the image asset is fetched, we cache the transformed image in the Cloudflare cache. The critical question is how we keep processed images up to date. The answer is by purging the Cloudflare cache when necessary. There are two use cases:

access to the image is changed
the variant definition is updated

In the first instance, we purge the cache by calling a URL:https://imagedelivery.net//

Then, the customer updates the variant we issue a cache purge request by tag:account-id/variant-name

To support cache purge by tag, the image resizing service adds the necessary tags for all transformed images.

How we restrict access to Cloudflare Images

The Image resizing service supports restricted access to images by using URL signatures with expiration. URLs are signed with an SHA-256 HMAC key. The steps to produce valid signatures are:

Take the path and query string (the path starts with /).
Compute the path’s SHA-256 HMAC with the query string, using the Images' URL signing key as the secret. The key is configured in the Dashboard.
If the URL is meant to expire, compute the Unix timestamp (number of seconds since 1970) of the expiration time, and append ?exp= and the timestamp as an integer to the URL.
Append ? or & to the URL as appropriate (? if it had no query string; & if it had a query string).
Append sig= and the HMAC as hex-encoded 64 characters.

A signed URL looks like this:

A signed URL with an expiration timestamp looks like this:

Signature of /hello/world URL with a secret ‘this is a secret’ is 6293f9144b4e9adc83416d1b059abcac750bf05b2c5c99ea72fd47cc9c2ace34.

https://imagedelivery.net/hello/world?sig=6293f9144b4e9adc83416d1b059abcac750bf05b2c5c99ea72fd47cc9c2ace34

Direct creator uploads with Cloudflare Worker and KV

Similar to Cloudflare Stream, Images supports direct creator uploads. That allow users to upload images without API tokens. Everyday use of direct creator uploads is by web apps, client-side applications, or mobile apps where users upload content directly to Cloudflare Images.

Once again, we used our serverless platform to support direct creator uploads. The successful API call stores the account's information in Workers KV with the specified expiration date. A simple Cloudflare Worker handles the upload URL, which reads the KV value and grants upload access only on a successful call to KV.

Future Work

Cloudflare Images product has an exciting product roadmap. Let’s review what’s possible with the current architecture of Cloudflare Images.

Resizing hints on upload

At the moment, no image transformations happen on upload. That means we can serve the image globally once it is uploaded to Image storage. We are considering adding resizing hints on image upload. That won't necessarily schedule image processing in all cases but could provide a valuable signal to resize the most critical image variants. An example could be to generate an AVIF variant for the most vital image assets.

Serving images from custom domains

We think serving images from a domain we manage (with Tiered Caching) is a great default option for many customers. The downside is that loading Cloudflare images requires additional TLS negotiations on the client-side, adding latency and impacting loading performance. On the other hand, serving Cloudflare Images from custom domains will be a viable option for customers who set up a website through Cloudflare. The good news is that we can support such functionality with the current architecture without radical changes in the architecture.

Conclusion

The Cloudflare Images product runs on top of the Cloudflare global network. We built Cloudflare Images in Rust and Cloudflare Workers. This way, we use Rust reusable libraries in several products such as Cloudflare Images, Image Resizing, and Polish. Cloudflare’s serverless platform is an indispensable tool to build Cloudflare products internally. If you are interested in building innovative products in Rust and Cloudflare Workers, we're hiring.

Watch on Cloudflare TV

Automatic Platform Optimization post-launch report

Yevgen Safronov — Tue, 16 Mar 2021 12:00:00 GMT

Last year during Birthday Week, we announced Automatic Platform Optimization for WordPress (APO): smart HTML caching for WordPress sites using Cloudflare. Initial testing across various WordPress sites demonstrated significant improvements in performance metrics like Time to First Byte (TTFB), First Contentful Paint (FCP), and Speed Index. We wanted to measure how APO impacted web performance for our customers since the launch.

In the blog post, we answer the following questions:

How fast is Automatic Platform Optimization? Can you demonstrate it with data?

We will show real-world improvements for several performance metrics.

Is Automatic Platform Optimization flexible enough to integrate smoothly with my WordPress site?

We have added and improved lots of features since the initial launch.

Will Automatic Platform Optimization work when used with other plugins?

We will cover the most common use cases and explain how Automatic Platform Optimization could be fined-tuned.

Measuring performance with WebPageTest

We use WebPageTest as a go-to tool for synthetic testing at Cloudflare. It measures web performance metrics in real browsers, is highly programmable, and could scale to test millions of sites per day. Among the benefits of synthetic testing are easy to produce results and their relatively high reproducibility.

Automatic Platform Optimization internal testing with WebPageTest demonstrated a very promising 72% reduction in Time to First Byte (TTFB) and 23% reduction to First Contentful Paint. Follow the original blog post to learn more about our test setup and results analysis.

Measuring performance with the Chrome User Experience Report

In comparison to synthetic testing, Real User Monitoring (RUM) is invaluable in painting the picture of how a website performs in real-world conditions: on different form factors and variable connection types. Although noise to signal ratio could be high in RUM data, there is no substitute for measuring web performance in the wild.

We analyzed Google's Chrome User Experience Report of Automatic Platform Optimization websites and compared results from two months before enabling CrUX to the two months after. We present results of Time To First Byte, First Paint, First Contentful Paint, and Largest Contentful Paint.

Time To First Byte by device

Time To First Byte (TTFB) is the time taken from the user or client making an HTTP request until the first byte arrives back to the browser.

Automatic Platform Optimization improvements in the TTFB metric demonstrated the largest increase in the 'good' bucket and the largest decrease in the 'poor' bucket both on desktop and phone form factors among all metrics. The improvement in the 'poor' bucket on mobile demonstrates how Automatic Platform Optimization makes a difference even on slow connection types like 3G, 2G, slow 2G. Faster response times with Automatic Platform Optimization from edge servers translate directly into quicker TTFB timings, positively affecting all other performance measurements.

First Paint by device

First Paint measures the first time the browser rendered any content. First Paint signifies the earliest point when something happens on the screen after a user requests a page. It is a good proxy for a user believing the website is not broken.

Almost a 10% increase in the 'good' bucket on a desktop is the highlight for the First Paint metric. It's nice to see a clear trend of improvement in both desktop and phone data. It's also worth mentioning that the exact values used to define 'good’, 'moderate’, and the 'poor' buckets are picked arbitrarily for each timing metric. This trend means it's more important to look at the percentage of improvement rather than absolute values for each 'bucket'.

First Contentful Paint by device

First Contentful Paint (FCP) metric measures how long a page takes to start rendering any text, image, non-white canvas, or SVG content. FCP is a good indicator of perceived speed as it portrays how long people wait to see the first signs of a site loading.

It is the third straight metric that has been improved in both form factors after customers activated Automatic Platform Optimization. FCP happens even later than the First Paint event. We can draw a hypothesis that Automatic Platform Optimization positively impacts the loading performance metrics of the site. Still, the later the event happens, the less impact Automatic Platform Optimization has on that particular metric.

Largest Contentful Paint by device

The Largest Contentful Paint (LCP) metric reports the render time for the largest image or text block visible within the viewport.

Our hypothesis holds ground with LCP as well. Interestingly enough, the positive impact of Automatic Platform Optimization activation is relatively equal on desktops and phones.

Summary

Overall, Automatic Platform Optimization consistently demonstrated better aggregate performance among sites we analyzed in TTFB, First Paint, FCP, and LCP metrics. Even more impressive are improvements on both desktop and phone form factors. It’s worth pointing out that apart from noticeable differences in hardware characteristics, phone data capture all mobile connection types from slow 2G to fast 4G.

We explored almost 200 websites with the activated Automatic Platform Optimization feature in Chrome User Experience Report data. To smooth the variance, we combined two months of data before and after Automatic Platform Optimization activation. To further decrease inaccuracy, we dropped a month’s worth of data that included the activation period. As an example, for a website that activated Automatic Platform Optimization last October, we used Chrome User Experience Report measurements from August and September as the before bucket. The after bucket combined data from November and December.

It is important to note that due to the limitations of iOS, Chrome User Experience Report mobile results don't include devices running Apple's mobile operating system.

Chrome User Experience Report data provides performance metrics per geographic location, form factor, or connection type. We analyzed aggregated data across all countries and connection types to focus on the overall performance.

Extended Automatic Platform Optimization Functionality

Since the product launch, we have been listening carefully to the customers' reports of Automatic Platform Optimization’s missing functionality or unexpected behavior. The number of different use cases our customers have underlines how broad the WordPress ecosystem is. One of the advantages of Automatic Platform Optimization utilizing the Workers platform is that we can quickly iterate and release in a matter of hours instead of days or weeks. Granted, some features like Cache By Device Type or subdomains support took us longer to build. Still, for apparent bugs or missing functionality, the ability to release on demand made all the difference for the team and our customers.

We start the second part of the report with a brief description of the features we have released since October. Afterward, we will paint a bigger picture of how Automatic Platform Optimization fits together with a broad WordPress plugins ecosystem.

Smart caching for marketing query parameters

By default Automatic Platform Optimization doesn’t cache pages with query parameters. One of the first feature requests from the community was to add caching support for marketing attribution (for example, UTMs) query parameters. We did exactly that, and the full list of the supported parameters is in the documentation.

Improved cache hit ratio

Cloudflare provides static caching out of the box by default. The caching system for static content relies on file extensions to determine the content type. In contrast, HTML pages don't always have file extensions in the URL. That's why Automatic Platform Optimization caching relies on HTTP's content negotiation mechanism for HTML content.

We check both the request's Accept and the response's Content-Type headers for the 'text/html' substring. When humans access sites on the Internet, browsers send correct Accept headers behind the scene. When bots access the sites, they don't always send Accept headers. Initially, the Automatic Platform Optimization cache passed all requests without a valid Accept header to the origin servers. When customers tried to migrate from using the "Cache Everything" page rule to only using Automatic Platform Optimization, they noticed extra load on the origin servers.. Now all GET and HEAD requests are checked against the Automatic Platform Optimization cache. The change noticeably improved the cache hit ratio for all Automatic Platform Optimization customers and enhanced page loads for good bots.

Improved security

Cache poisoning is a common attack vector against any caching system. One of the benefits of Automatic Platform Optimization is that most of the logic runs on edge servers, and we can update it without any changes on the origin server. To mitigate potential cache poisoning, we released a new version to bypass caching if any of the following request headers are present:

X-Host
X-Forwarded-Host
X-Original-URL
X-Rewrite-URL

Additionally, any GET request with a body will bypass Automatic Platform Optimization caching.

Page Rules integration

Automatic Platform Optimization's primary goal is to improve page load performance while keeping the configuration as simple as possible. On the other hand, the Automatic Platform Optimization service should allow fine-tuning for advanced use cases. One such mechanism is Cloudflare's Page Rules. As of today, Automatic Platform Optimization supports the following rules:

Cache Level: Bypass
Cache Level: Ignore Query String
Cache Level: Cache Everything
Bypass Cache on Cookie (Biz and Ent plans only)
Edge Cache TTL
Browser Cache TTL

For a detailed description, refer to the official documentation.

Subdomain Support

Automatic Platform Optimization aims to provide seamless integration with the WordPress ecosystem. It recognizes specific cookies for the most popular plugins like WooCommerce, JetPack, BigCommerce, Easy Digital Downloads, etc.

Currently, we limit Automatic Platform Optimization usage to WordPress sites. During the initial launch, we restricted Automatic Platform Optimization to run against root domains only, but we learned later of the high demand to run Automatic Platform Optimization on subdomains. To make it possible, we updated both the plugin and API to allow Automatic Platform Optimization to run on subdomains. Three steps are required to enable Automatic Platform Optimization on a subdomain:

Install version 3.8.7 or later of the Cloudflare WordPress plugin.
Log in using Global key. (You can only use an API token for the root domain.)
Enable APO. The subdomain displays in the list of hostnames in the card.

The initial cost of $5 per month for Free plans includes running Automatic Platform Optimization against any number of subdomains.

Caching by Device Type

The majority of site visits come from users accessing the web on mobile devices. And website visitors expect sites to work well on mobile devices. In fact, responsive design is a recommended approach for websites. APO works well for responsive design websites because the cache's content will adjust to the client's screen size seamlessly. The alternative method is to serve different markup on mobile browsers.

Many popular WordPress plugins add a mobile-friendly theme to the site. For sites with such plugins installed, Automatic Platform Optimization breaks functionality by serving the same cached version for all users. As we learned about the growing number of customers with this problem, we looked for the solution. Cloudflare’s caching already has support for cache by device type functionality, but it's only available for customers on the Enterprise plan. As was a case for the Bypass Cache on Cookie page rule, we decided to include the functionality as part of the Automatic Platform Optimization offering. As a recap, Caching by Device Type relies on the User-Agent request header for detection. There are three types:

Mobile
Tablet
Everything else

For each type, the Cloudflare cache will store content in a separate bucket. To enable caching by device type, either navigate to the dashboard's Automatic Platform Optimization card or the Cloudflare WordPress plugin. We recommend using a single cache whenever possible because caching by device type can decrease cache hit ratio and increase the load on origin servers.

Other noticeable changes

There are many improvements including:

Improved purging infrastructure of content from KV.
Extended automatic cache purging for categories and tags.
Addressed many edge cases for Google Fonts optimization.
Added support for HEAD requests.
Automated release pipeline for the Cloudflare WordPress plugin.

Improved WordPress plugins compatibility

There are over 50,000 WordPress plugins currently available for download, and because there are so many, we can't test the compatibility of Automatic Platform Optimization with each one individually. We do know, however, that it's vital to provide compatibility for the most popular plugins. Thanks to the vibrant community, we quickly learned about the most widely used issues with Automatic Platform Optimization caching. The plugins that experienced problems could be grouped into the following categories:

Plugins with custom cookies
Plugins with geolocation functionality
Plugins with mobile themes
Plugins with AMP support
Plugins that generate HTML
Caching and optimizations plugins

Let's review those categories and available solutions for improved compatibility.

Plugins with custom cookies

One of the critical features Automatic Platform Optimization provides out of the box is cookie-based rules to bypass APO caching. For any plugin that uses custom cookies, Automatic Platform Optimization requires extending the rules. We have a list of supported plugins that uses our cookies bypass logic.

Plugins with geolocation functionality

This broad category of plugins relies on geolocation information based on the client's (visitor) IP address (connecting to Cloudflare) to provide its functionality. Early on, we had a misconfiguration in Automatic Platform Optimization functionality that resulted in sending a dummy IP address in the CF-Connecting-IP request header that was forwarded to the origin server.

This behavior effectively broke the functionality of the widely used Wordfence Security plugin. We promptly released a fix. Because we use Cloudflare Workers internally, Cloudflare Workers treated Automatic Platform Optimization requests sent to the origin as cross-zone requests due to security concerns. As a result, the CF-Connecting-IP header value was replaced with a dummy IP address. The change was to send the Automatic Platform Optimization worker's subrequest to the origin as a same-zone request to pass the real client IP without a security concern. Also, Automatic Platform Optimization now sends the client's IP via the X-Forwarded-For request header to the origin to improve plugin compatibility.

Plugins with mobile themes

There are several WordPress plugins that render custom themes for mobile visitors. Those plugins rely on the browser's User-Agent to detect visitors from mobile devices. In December, we released Automatic Platform Optimization support for the "Cache by device type" feature. With a single configuration change, you can activate a separate cache based on the device type: mobile, tablet, and everything else. You can learn more about the feature in the official documentation.

Plugins with AMP support

The AMP (Accelerated Mobile Pages) project's goal is to make the web, and, in particular, the mobile web, much more pleasant to surf. The AMP HTML framework is designed to help web pages load quickly and avoid distracting the user with irrelevant content.

The most popular AMP WordPress plugins render AMP-compatible markup when the page URL contains the amp= query parameter. AMP markup is a subset of HTML with several restrictions and we looked into possible solutions to provide Automatic Platform Optimization caching for AMP pages. It requires a separate cache for AMP pages similar to the "Cache by device type" feature. Considering Google's recent push with Core Web Vitals, the AMP format's importance will decrease going forward. Based on the complexity of supporting dedicated AMP caching and Google's deprioritization of the AMP format, we decided to bypass Automatic Platform Optimization caching of AMP pages.

There are two possible approaches for caching AMP pages. The first one is to change the URL schema for AMP pages from, for example, site.com/page/?amp= to site.com/amp/page/. With this option, Automatic Platform Optimization caches AMP pages out of the box. Another solution is to activate the "Cache Everything" Page rule for AMP pages served with amp= query parameter. Note that AMP pages will require manual cache purging in both cases on content change.

Plugins that generate HTML

Using Automatic Platform Optimization with Page Rules makes it possible to:

Bypass caching pages that contain CAPTCHAs.
Set Edge TTL for pages that contain nonces or server-rendered ads to six hours or shorter.

Caching and optimizations plugins

Among the most popular caching and optimizations WordPress plugins are LiteSpeed Cache, W3 Total Cache, WP Rocket, WP Fastest Cache, WP Super Cache, Autoptimize. To successfully activate Advanced Platform Optimization when any of the plugins above already present, follow these steps:

Install and activate the Cloudflare WordPress plugin.
Enable Automatic Platform Optimization in the plugin.
Clear any server cache used via other plugins.
Verify that your origin starts serving the response header "cf-edge-cache: cache,platform=wordpress".

That should make caching plugins and Automatic Platform Optimization compatible.

In using optimizations features inside the plugins, additional steps are necessary to integrate with Automatic Platform Optimization. Any of the following plugin's optimizations require subsequent purging of Cloudflare cache:

JavaScript minification and async-loading
CSS minification, inlining, and aggregation
HTML minification
Images optimization and lazy-loading
Google fonts optimizations

There are three potential solutions we discuss in the order of preference.

1. Use Cloudflare products

Most of the optimizations are possible with Cloudflare today:

Auto Minify provides minification for HTML, CSS, and JavaScript
Rocket Loader provides JavaScript lazy loading
Mirage and Image-resizing allows image optimization and lazy-loading
Advanced Platform Optimization optimizes Google fonts out of the box

2. Activate plugins integration with Cloudflare

WP Rocket integrates with Cloudflare API.
WP Fastest Cache integrates with Cloudflare API.
W3 Total Cache integrates with Cloudflare API. Make sure to enable the Page Caching option.

3. Integration with Cloudflare cache purging

The rest of the plugins in the list, when producing content optimizations, require triggering of Cloudflare cache purging manually or via API:

LiteSpeed Cache
WP Super Cache
Autoptimize

Summary

Automatic Platform Optimization is compatible with the most popular caching plugins. Content optimizations should be preferably migrated to Cloudflare offerings. Alternatively, plugins require triggering Cloudflare cache purging via API integration. The action of the last resort is to disable plugins optimizations but keep caching functionality.

We work closely with the WordPress plugins community to improve compatibility with Advanced Platform Optimization.

Conclusion

Automatic Platform Optimization demonstrated improved performance metrics in both synthetic and real-world settings. The public Chrome User Experience Report dataset proved to be an invaluable source of RUM metrics for web performance analysis. Automatic Platform Optimization showed noticeable improvements on desktops and phones. TTFB is the most improved metric, but we also noticed positive changes in the First Paint, First Contentful Paint, and Largest Contentful Paint metrics.

It has been intensive and rewarding several months since the Automatic Platform Optimization launch, and we greatly increased Automatic Platform Optimization applicability based on customer feedback. Our community forum is a great place to get help and ask questions about Cloudflare products, including Advanced Platform Optimization.

There are more exciting Automatic Platform Optimization improvements the team is actively working on, and we can't wait to share them. Stay tuned!

Building Automatic Platform Optimization for WordPress using Cloudflare Workers

Yevgen Safronov — Fri, 02 Oct 2020 13:00:00 GMT

This post explains how we implemented the Automatic Platform Optimization for WordPress. In doing so, we have defined a new place to run WordPress plugins, at the edge written with Cloudflare Workers. We provide the feature as a Cloudflare service but what’s exciting is that anyone could build this using the Workers platform.

The service is an evolution of the ideas explained in an earlier zero-config edge caching of HTML blog post. The post will explain how Automatic Platform Optimization combines the best qualities of the regular Cloudflare cache with Workers KV to improve cache cold starts globally.

The optimization will work both with and without the Cloudflare for WordPress plugin integration. Not only have we provided a zero config edge HTML caching solution but by using the Workers platform we were also able to improve the performance of Google font loading for all pages.

We are launching the feature first for WordPress specifically but the concept can be applied to any website and/or content management system (CMS).

A new place to run WordPress plugins?

There are many individual WordPress plugins for performance that use similar optimizations to existing Cloudflare services. Automatic Platform Optimization is bringing them all together into one easy to use solution, deployed at the edge.

Traditionally you have to maintain server plugins with your WordPress installation. This comes with maintenance costs and can require a deep understanding of how to fine tune performance and security for each and every plugin. Providing the optimizations on the client side can also lead to performance problems due to the costs of JavaScript execution. In contrast most of the optimizations could be built-in in Cloudflare’s edge rather than running on the server or the client. Automatic Platform Optimization will be always up to date with the latest performance and security best practices.

How to optimize for WordPress

By default Cloudflare CDN caches assets based on file extension and doesn’t cache HTML content. It is possible to configure HTML caching with a Cache Everything Page rule but it is a manual process and often requires additional features only available on the Business and Enterprise plans. So for the majority of the WordPress websites even with a CDN in front them, HTML content is not cached. Requests for a HTML document have to go all the way to the origin.

Even if a CDN optimizes the connection between the closest edge and the website’s origin, the origin could be located far away and also be slow to respond, especially under load.

Move content closer to the user

One of the primary recommendations for speeding up websites is to move content closer to the end-user. This reduces the amount of time it takes for packets to travel between the end-user and the web server - the round-trip time (RTT). This improves the speed of establishing a connection as well as serving content from a closer location.

We have previously blogged about the benefits of edge caching HTML. Caching and serving from HTML from the Cloudflare edge will greatly improve the time to first byte (TTFB) by optimizing DNS, connection setup, SSL negotiation, and removing the origin server response time.If your origin is slow in generating HTML and/or your user is far from the origin server then all your performance metrics will be affected.

Most HTML isn’t really dynamic. It needs to be able to change relatively quickly when the site is updated but for a huge portion of the web, the content is static for months or years at a time. There are special cases like when a user is logged-in (as the admin or otherwise) where the content needs to differ but the vast majority of visits are of anonymous users.

Zero config edge caching revisited

The goal is to make updating content to the edge happen automatically. The edge will cache and serve the previous version content until there is new content available. This is usually achieved by triggering a cache purge to remove existing content. In fact using a combination of our WordPress plugin and Cloudflare cache purge API, we already support Automatic Cache Purge on Website Updates. This feature has been in use for many years.

Building automatic HTML edge caching is more nuanced than caching traditional static content like images, styles or scripts. It requires defining rules on what to cache and when to update the content. To help with that task we introduced a custom header to communicate caching rules between Cloudflare edge and origin servers.

The Cloudflare Worker runs from every edge data center, the serverless platform will take care of scaling to our needs. Based on the request type it will return HTML content from Cloudflare Cache using Worker’s Cache API or serve a response directly from the origin. Specifically designed custom header provides information from the origin on how the script should handle the response. For example worker script will never cache responses for authenticated users.

HTML Caching rules

With or without Cloudflare for WordPress plugin, HTML edge caching requires all of the following conditions to be met:

Origin responds with 200 status
Origin responds with "text/html" content type
Request method is GET.
Request path doesn’t contain query strings
Request doesn’t contain any WordPress specific cookies: "wp-*", "wordpress*", "comment_*", "woocommerce_*" unless it’s "wordpress_eli" or "wordpress_test_cookie".
Request doesn’t contain any of the following headers:
- "Cache-Control: no-cache"
- "Cache-Control: private"
- "Pragma:no-cache"
- "Vary: *"

Note that the caching is bypassed if the devtools are open and the “Disable cache” option is active.

Edge caching with plugin

The preferred solution requires a configured Cloudflare for WordPress plugin. We provide the following features set when the plugin is activated:

HTML edge caching with 30 days TTL
30 seconds or faster cache invalidation
Bypass HTML caching for logged in users
Bypass HTML caching based on presence of WordPress specific cookies
Decrease load on origin servers. If a request is fetched from Cloudflare CDN Cache we skip the request to the origin server.

How is this implemented?

When an eyeball requests a page from a website and Cloudflare doesn’t have a copy of the content it will be fetched from the origin. As the response is sent from the origin and goes through Cloudflare’s edge, Cloudflare for WordPress plugin adds a custom header: cf-edge-cache. It allows an origin to configure caching rules applied on responses.

Based on the X-HTML-Edge-Cache proposal the plugin adds a cf-edge-cache header to every origin response. There are 2 possible values:

cf-edge-cache: no-cache

The page contains private information that shouldn’t be cached by the edge. For example, an active session exists on the server.

cf-edge-cache: cache, platform=wordpress

This combination of cache and platform will ensure that the HTML page is cached. In addition, we ran a number of checks against the presence of WordPress specific cookies to make sure we either bypass or allow caching on the Edge.

If the header isn’t present we assume that the Cloudflare for WordPress plugin is not installed or up-to-date. In this case the feature operates without a plugin support.

Edge caching without plugin

Using the Automatic Platform Optimization feature in combination with Cloudflare for WordPress plugin is our recommended solution. It provides the best feature set together with almost instant cache invalidation. Still, we wanted to provide performance improvements without the need for any installation on the origin server.

We provide the following features set when the plugin is not activated:

HTML edge caching with 30 days TTL
Cache invalidation may take up to 30 minutes. A manual cache purge could be triggered to speed up cache invalidation
Bypass HTML caching based on presence of WordPress specific cookies
No decreased load on origin servers. If a request is fetched from Cloudflare CDN Cache we still require an origin response to apply cache invalidation logic.

Without Cloudflare for WordPress plugin we still cache HTML on the edge and serve the content from the cache when possible. The logic of cache revalidation happens after serving the response to the eyeball. Worker’s waitUntil() callback allows the user to run code without affecting the response to the eyeball and is run in background.

We rely on the following headers to detect whether the content is stale and requires cache update:

ETag. If the cached version and origin response both include ETag and they are different we replace cached version with origin response. The behavior is the same for strong and weak ETag values.
Last-Modified. If the cached version and origin response both include Last-Modified and origin has a later Last-Modified date we replace cached version with origin response.
Date. If no ETag or Last-Modified header is available we compare cached version and origin response Date values. If there was more than a 30 minutes difference we replace cached version with origin response.

Getting content everywhere

Cloudflare Cache works great for the frequently requested content. Regular requests to the site make sure the content stays in cache. For a typical personal blog, it will be more common that the content stays in cache only in some parts of our vast edge network. With the Automatic Platform Optimization release we wanted to improve loading time for cache cold start from any location in the world. We explored different approaches and decided to use Workers KV to improve Edge Caching.

In addition to Cloudflare's CDN cache we put the content into Workers KV. It only requires a single request to the page to cache it and within a minute it is made available to be read back from KV from any Cloudflare data center.

Updating content

After an update has been made to the WordPress website the plugin makes a request to Cloudflare’s API which both purges cache and marks content as stale in KV. The next request for the asset will trigger revalidation of the content. If the plugin is not enabled cache revalidation logic is triggered as detailed previously.

We serve the stale copy of the content still present in KV and asynchronously fetch new content from the origin, apply possible optimizations and then cache it (both regular local CDN cache and globally in KV).

To store the content in KV we use a single namespace. It’s keyed with a combination of a zone identifier and the URL. For instance:

1:example.com/blog-post-1.html => "transformed & cached content"

For marking content as stale in KV we write a new key which will be read from the edge. If the key is present we will revalidate the content.

stale:1:example.com/blog-post-1.html => ""

Once the content was revalidated the stale marker key is deleted.

Moving optimizations to the edge

On top of caching HTML at the edge, we can pre-process and transform the HTML to make the loading of websites even faster for the user. Moving the development of this feature to our Cloudflare Workers environment makes it easy to add performance features such as improving Google Font loading. Using Google Fonts can cause significant performance issues as to load a font requires loading the HTML page; then loading a CSS file and finally loading the font. All of these steps are using different domains.

The solution is for the worker to inline the CSS and serve the font directly from the edge minimizing the number of connections required.

If you read through the previous blog post’s implementation it required a lot of manual work to provide streaming HTML processing support and character encodings. As the set of worker APIs have improved over time it is now much simpler to implement. Specifically the addition of a streaming HTML rewriter/parser with CSS-selector based API and the ability to suspend the parsing to asynchronously fetch a resource has reduced the code required to implement this from ~600 lines of source code to under 200.

export function transform(request, res) {
  return new HTMLRewriter()
    .on("link", {
      async element(e) {
        const src = e.getAttribute("href");
        const rel = e.getAttribute("rel");
        const isGoogleFont =
          src.startsWith("https://fonts.googleapis.com")

        if (isGoogleFont && rel === "stylesheet") {
          const media = e.getAttribute("media") || "all";
          const id = e.getAttribute("id") || "";
          try {
            const content = await fetchCSS(src, request);
            e.replace(styleTag({ media, id }, content), {
              html: true
            });
          } catch (e) {
            console.error(e);
          }
        }
      }
    })
    .transform(res);
}

The HTML transformation doesn’t block the response to the user. It’s running as a background task which when complete will update kv and replace the global cached version.

Making edge publishing generic

We are launching the feature for WordPress specifically but the concept can be applied to any website and content management system (CMS).