The Cloudflare Blog

How Cloudflare runs more AI models on fewer GPUs: A technical deep-dive

Sven Sauleau — Wed, 27 Aug 2025 14:00:00 GMT

As the demand for AI products grows, developers are creating and tuning a wider variety of models. While adding new models to our growing catalog on Workers AI, we noticed that not all of them are used equally – leaving infrequently used models occupying valuable GPU space. Efficiency is a core value at Cloudflare, and with GPUs being the scarce commodity they are, we realized that we needed to build something to fully maximize our GPU usage.

Omni is an internal platform we’ve built for running and managing AI models on Cloudflare’s edge nodes. It does so by spawning and managing multiple models on a single machine and GPU using lightweight isolation. Omni makes it easy and efficient to run many small and/or low-volume models, combining multiple capabilities by:

Spawning multiple models from a single control plane,
Implementing lightweight process isolation, allowing models to spin up and down quickly,
Isolating the file system between models to easily manage per-model dependencies, and
Over-committing GPU memory to run more models on a single GPU.

Cloudflare aims to place GPUs as close as we possibly can to people and applications that are using them. With Omni in place, we’re now able to run more models on every node in our network, improving model availability, minimizing latency, and reducing power consumed by idle GPUs.

Here’s how.

Omni’s architecture – at a glance

At a high level, Omni is a platform to run AI models. When an inference request is made on Workers AI, we load the model’s configuration from Workers KV and our routing layer forwards it to the closest Omni instance that has available capacity. For inferences using the Asynchronous Batch API, we route to an Omni instance that is idle, which is typically in a location where it’s night.

Omni runs a few checks on the inference request, runs model specific pre and post processing, then hands the request over to the model.

Elastic scaling by spawning multiple models from a single control plane

If you’re developing an AI application, a typical setup is having a container or a VM dedicated to running a single model with a GPU attached to it. This is simple. But it’s also heavy-handed — because it requires managing the entire stack from provisioning the VM, installing GPU drivers, downloading model weights, and managing the Python environment. At scale, managing infrastructure this way is incredibly time consuming and often requires an entire team.

If you’re using Workers AI, we handle all of this for you. Omni uses a single control plane for running multiple models, called the scheduler, which automatically provisions models and spawns new instances as your traffic scales. When starting a new model instance, it downloads model weights, Python code, and any other dependencies. Omni’s scheduler provides fine-grained control and visibility over the model’s lifecycle: it receives incoming inference requests and routes them to the corresponding model processes, being sure to distribute the load between multiple GPUs. It then makes sure the model processes are running, rolls out new versions as they are released, and restarts itself when detecting errors or failure states. It also collects metrics for billing and emits logs.

The inference itself is done by a per-model process, supervised by the scheduler. It receives the inference request and some metadata, then sends back a response. Depending on the model, the response can be various types; for instance, a JSON object or a SSE stream for text generation, or binary for image generation.

The scheduler and the child processes communicate by passing messages over Inter-Process Communication (IPC). Usually the inference request is buffered in the scheduler for applying features, like prompt templating or tool calling, before the request is passed to the child process. For potentially large binary requests, the scheduler hands over the underlying TCP connection to the child process for consuming the request body directly.

Implementing lightweight process and Python isolation

Typically, deploying a model requires its own dedicated container, but we want to colocate more models on a single container to conserve memory and GPU capacity. In order to do so, we needed finer-grained controls over CPU memory and the ability to isolate a model from its dependencies and environment. We deploy Omni in two configurations; a container running multiple models or bare metal running a single model. In both cases, process isolation and Python virtual environments allow us to isolate models with different dependencies by creating namespaces and are limited by cgroups.

Python doesn’t take into account cgroups memory limits for memory allocations, which can lead to OOM errors. Many AI Python libraries rely on psutil for pre-allocating CPU memory. psutil reads /proc/meminfo to determine how much memory is available. Since in Omni each model has its own configurable memory limits, we need psutil to reflect the current usage and limits for a given model, not for the entire system.

The solution for us was to create a virtual file system, using fuse, to mount our own version of /proc/meminfo which reflects the model’s current usage and limits.

To illustrate this, here’s an Omni instance running a model (running as pid 8). If we enter the mount namespace and look at /proc/meminfo it will reflect the model’s configuration:

# Enter the mount (file system) namespace of a child process
$ nsenter -t 8 -m

$ mount
...
none /proc/meminfo fuse ...

$ cat /proc/meminfo
MemTotal:     7340032 kB
MemFree:     7316388 kB
MemAvailable:     7316388 kB

In this case the model has 7Gib of memory available and the entire container 15Gib. If the model tries to allocate more than 7Gib of memory, it will be OOM killed and restarted by the scheduler’s process manager, without causing any problems to the other models.

For isolating Python and some system dependencies, each model runs in a Python virtual environment, managed by uv. Dependencies are cached on the machine and, if possible, shared between models (uv uses symbolic links between its cache and virtual environments).

Also separated processes for models allows to have different CUDA contexts and isolation for error recovery.

Over-committing memory to run more models on a single GPU

Some models don’t receive enough traffic to fully utilize a GPU, and with Omni we can pack more models on a single GPU, freeing up capacity for other workloads. When it comes to GPU memory management, Omni has two main jobs: safely over-commit GPU memory, so that more models than normal can share a single GPU, and enforce memory limits, to prevent any single model from running out of memory while running.

Over-committing memory means allocating more memory than is physically available to the device.

For example, if a GPU has 10 Gib of memory, Omni would allow 2 models of 10Gib each on that GPU.

Right now, Omni is configured to run 13 models and is allocating about 400% GPU memory on a single GPU, saving up 4 GPUs. Omni does this by injecting a CUDA stub library that intercepts CUDA memory allocations (cuMalloc* or cudaMalloc*) calls and forces memory allocations to be performed in unified memory mode.

In Unified memory mode CUDA shares the same memory address space for both the GPU and the CPU:

^CUDA’s^{unified memory mode}

In practice this is what memory over-commitment looks like: imagine 3 models (A, B and C). Models A+B fit in the GPU’s memory but C takes up the entire memory.

Models A+B are loaded first and are in GPU memory, while model C is in CPU memory
Omni receives a request for model C so models A+B are swapped out and C is swapped in.
Omni receives a request for model B, so model C is partly swapped out and model B is swapped back in.
Omni receives a request for model A, so model A is swapped back in and model C is completely swapped out.

The trade-off is added latency: if performing an inference requires memory that is currently on the host system, it must be transferred to the GPU. For smaller models, this latency is minimal, because with PCIe 4.0, the physical bus between your GPU and system, provides 32 GB/sec of bandwidth. On the other hand, if a model need to be “cold started” i.e. it’s been swapped out because it hasn’t been used in a while, the system may need to swap back the entire model – a larger sized model, for example, might use 5Gib of GPU memory for weights and caches, and would take ~156ms to be swapped back into the GPU. Naturally, over time, inactive models are put into CPU memory, while active models stay hot in the GPU.

Rather than allowing the model to choose how much GPU memory it uses, AI frameworks tend to pre-allocate as much GPU memory as possible for performance reasons, making co-locating models more complicated. Omni allows us to control how much memory is actually exposed to any given model to prevent a greedy model from over-using the GPU allocated to it. We do this by overriding the CUDA runtime and driver APIs (cudaMemGetInfo and cuMemGetInfo). Instead of exposing the entire GPU memory, we only expose a subset of memory to each model.

How Omni runs multiple models for Workers AI

AI models can run in a variety of inference engines or backends: vLLM, Python, and now our very own inference engine, Infire. While models have different capabilities, each model needs to support Workers AI features, like batching and function calling. Omni acts as a unified layer for integrating these systems. It integrates into our internal routing and scheduling systems, and provides a Python API for our engineering team to add new models more easily. Let’s take a closer look at how Omni does this in practice:

from omni import Response
import cowsay


def handle_request(request, context):
    try:
        json = request.body.json
        text = json["text"]
    except Exception as err:
        return Response.error(...)

    return cowsay.get_output_string('cow', text)

Similar to how a JavaScript Worker works, Omni calls a request handler, running the model’s logic and returning a response.

Omni installs Python dependencies at model startup. We run an internal Python registry and mirror the public registry. In either case we declare dependencies in requirements.txt:

cowsay==6.1

The handle_request function can be async and return different Python types, including pydantic objects. Omni will convert the return value into a Workers AI response for the eyeball.

A Python package is injected, named omni, containing all the Python APIs to interact with the request, the Workers AI systems, building Responses, error handling, etc. Internally we publish it as regular Python package to be used in standalone, for unit testing for instance:

from omni import Context, Request
from model import handle_request


def test_basic():
    ctx = Context.inactive()
    req = Request(json={"text": "my dog is cooler than you!"})
    out = handle_request(req, ctx)
    assert out == """  __________________________
| my dog is cooler than you! |
  ==========================
                          \\
                           \\
                             ^__^
                             (oo)\\_______
                             (__)\\       )\\/\\
                                 ||----w |
                                 ||     ||"""

What’s next

Omni allows us to run models more efficiently by spawning them from a single control plane and implementing lightweight process isolation. This enables quick starting and stopping of models, isolated file systems for managing Python and system dependencies, and over-committing GPU memory to run more models on a single GPU. This improves the performance for our entire Workers AI stack, reduces the cost of running GPUs, and allows us to ship new models and features quickly and safely.

Right now, Omni is running in production on a handful of models in the Workers AI catalog, and we’re adding more every week. Check out Workers AI today to experience Omni’s performance benefits on your AI application.

Using Fortran on Cloudflare Workers

John Graham-Cumming — Tue, 07 May 2024 13:00:00 GMT

In April 2020, we blogged about how to get COBOL running on Cloudflare Workers by compiling to WebAssembly. The ecosystem around WebAssembly has grown significantly since then, and it has become a solid foundation for all types of projects, be they client-side or server-side.

As WebAssembly support has grown, more and more languages are able to compile to WebAssembly for execution on servers and in browsers. As Cloudflare Workers uses the V8 engine and supports WebAssembly natively, we’re able to support languages that compile to WebAssembly on the platform.

Recently, work on LLVM has enabled Fortran to compile to WebAssembly. So, today, we’re writing about running Fortran code on Cloudflare Workers.

Before we dive into how to do this, here’s a little demonstration of number recognition in Fortran. Draw a number from 0 to 9 and Fortran code running somewhere on Cloudflare’s network will predict the number you drew.

This is taken from the wonderful Fortran on WebAssembly post but instead of running client-side, the Fortran code is running on Cloudflare Workers. Read on to find out how you can use Fortran on Cloudflare Workers and how that demonstration works.

Wait, Fortran? No one uses that!

Not so fast! Or rather, actually pretty darn fast if you’re doing a lot of numerical programming or have scientific data to work with. Fortran (originally FORmula TRANslator) is very well suited for scientific workloads because of its native functionality for things like arithmetic and handling large arrays and matrices.

If you look at the ranking of the fastest supercomputers in the world you’ll discover that the measurement of “fast” is based on these supercomputers running a piece of software called LINPACK that was originally written in Fortran. LINPACK is designed to help with problems solvable using linear algebra.

The LINPACK benchmarks use LINPACK to solve an n x n system of linear equations using matrix operations and, in doing so, determine how fast supercomputers are. The code is available in Fortran, C and Java.

A related Fortran package, BLAS, also does linear algebra and forms the basis of the number identifying code above. But other Fortran packages are still relevant. Back in 2017, NASA ran a competition to make FUN3D (used to perform calculations of airflow over simulated aircraft). FUN3D is written in Fortran.

So, although Fortran (or at the time FORTRAN) first came to life in 1957, it’s alive and well and being used widely for scientific applications (there’s even Fortran for CUDA). One particular application left Earth 20 years after Fortran was born: Voyager. The Voyager probes use a combination of assembly language and Fortran to keep chugging along.

But back in our solar system, and back on Region: Earth, you can now use Fortran on Cloudflare Workers. Here’s how.

How to get your Fortran code running on Cloudflare Workers

To make it easy to run your Fortran code on Cloudflare Workers, we created a tool called Fortiche (translates to smart in French). It uses Flang and Emscripten under the hood.

Flang is a frontend in LLVM and, if you read the Fortran on WebAssembly blog post, we currently have to patch LLVM to work around a few issues.

Emscripten is used to compile LLVM output and produce code that is compatible with Cloudflare Workers.

This is all packaged in the Fortiche Docker image. Let’s see a simple example.

add.f90:

SUBROUTINE add(a, b, res)
    INTEGER, INTENT(IN) :: a, b
    INTEGER, INTENT(OUT) :: res

    res = a + b
END

Here we defined a subroutine called add that takes a and b, sums them together and places the result in res.

Compile with Fortiche:

docker run -v $PWD:/input -v $PWD/output:/output xtuc/fortiche --export-func=add add.f90

Passing --export-func=add to Fortiche makes the Fortran add subroutine available to JavaScript.

The output folder contains the compiled WebAssembly module and JavaScript from Emscripten, and a JavaScript endpoint generated by Fortiche:

$ ls -lh ./output
total 84K
-rw-r--r-- 1 root root 392 avril 22 12:00 index.mjs
-rw-r--r-- 1 root root 27K avril 22 12:00 out.mjs
-rwxr-xr-x 1 root root 49K avril 22 12:00 out.wasm

And finally the Cloudflare Worker:

// Import what Fortiche generated
import {load} from "../output/index.mjs"

export default {
    async fetch(request: Request): Promise {
        // Load the Fortran program
        const program = await load();

        // Allocate space in memory for the arguments and result
        const aPtr = program.malloc(4);
        const bPtr = program.malloc(4);
        const outPtr = program.malloc(4);

        // Set argument values
        program.HEAP32[aPtr / 4] = 123;
        program.HEAP32[bPtr / 4] = 321;

        // Run the Fortran add subroutine
        program.add(aPtr, bPtr, outPtr);

        // Read the result
        const res = program.HEAP32[outPtr / 4];

        // Free everything
        program.free(aPtr);
        program.free(bPtr);
        program.free(outPtr);

        return Response.json({ res });
    },
};

Interestingly, the values we pass to Fortran are all pointers, therefore we have to allocate space for each argument and result (the Fortran integer type is four bytes wide), and pass the pointers to the add subroutine.

Running the Worker gives us the right answer:

$ curl https://fortran-add.cfdemos.workers.dev

{"res":444}

You can find the full example here.

Handwritten digit classifier

This example is taken from https://gws.phd/posts/fortran_wasm/#mnist. It relies on the BLAS library, which is available in Fortiche with the flag: --with-BLAS-3-12-0.

Note that the LAPACK library is also available in Fortiche with the flag: --with-LAPACK-3-12-0.

You can try it below:

And you can access the source code here.

Let us know what you write using Fortran and Cloudflare Workers!

END

polyfill.io now available on cdnjs: reduce your supply chain risk

Sven Sauleau — Thu, 29 Feb 2024 17:51:32 GMT

Polyfill.io is a popular JavaScript library that nullifies differences across old browser versions. These differences often take up substantial development time.

It does this by adding support for modern functions (via polyfilling), ultimately letting developers work against a uniform environment simplifying development. The tool is historically loaded by linking to the endpoint provided under the domain polyfill.io.

In the interest of providing developers with additional options to use polyfill, today we are launching an alternative endpoint under cdnjs. You can replace links to polyfill.io “as is” with our new endpoint. You will then rely on the same service and reputation that cdnjs has built over the years for your polyfill needs.

Our interest in creating an alternative endpoint was also sparked by some concerns raised by the community, and main contributors, following the transition of the domain polyfill.io to a new provider (Funnull).

The concerns are that any website embedding a link to the original polyfill.io domain, will now be relying on Funnull to maintain and secure the underlying project to avoid the risk of a supply chain attack. Such an attack would occur if the underlying third party is compromised or alters the code being served to end users in nefarious ways, causing, by consequence, all websites using the tool to be compromised.

Supply chain attacks, in the context of web applications, are a growing concern for security teams, and also led us to build a client side security product to detect and mitigate these attack vectors: Page Shield.

Irrespective of the scenario described above, this is a timely reminder of the complexities and risks tied to modern web applications. As maintainers and contributors of cdnjs, currently used by more than 12% of all sites, this reinforces our commitment to help keep the Internet safe.

polyfill.io on cdnjs

The full polyfill.io implementation has been deployed at the following URL:

https://cdnjs.cloudflare.com/polyfill/

The underlying bundle link is:

For minified: https://cdnjs.cloudflare.com/polyfill/v3/polyfill.min.jsFor unminified: https://cdnjs.cloudflare.com/polyfill/v3/polyfill.js

Usage and deployment is intended to be identical to the original polyfill.io site. As a developer, you should be able to simply “replace” the old link with the new cdnjs-hosted link without observing any side effects, besides a possible improvement in performance and reliability.

If you don’t have access to the underlying website code, but your website is behind Cloudflare, replacing the links is even easier, as you can deploy a Cloudflare Worker to update the links for you:

export interface Env {}

export default {
    async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise {
        ctx.passThroughOnException();

        const response = await fetch(request);

        if ((response.headers.get('content-type') || '').includes('text/html')) {
            const rewriter = new HTMLRewriter()
                .on('link', {
                    element(element) {
                        const rel = element.getAttribute('rel');
                        if (rel === 'preconnect') {
                            const href = new URL(element.getAttribute('href') || '', request.url);

                            if (href.hostname === 'polyfill.io') {
                                href.hostname = 'cdnjs.cloudflare.com';
                                element.setAttribute('href', href.toString());
                            }
                        }
                    },
                })

                .on('script', {
                    element(element) {
                        if (element.hasAttribute('src')) {
                            const src = new URL(element.getAttribute('src') || '', request.url);
                            if (src.hostname === 'polyfill.io') {
                                src.hostname = 'cdnjs.cloudflare.com';
                                src.pathname = '/polyfill' + src.pathname;

                                element.setAttribute('src', src.toString());
                            }
                        }
                    },
                });

            return rewriter.transform(response);
        } else {
            return response;
        }
    },
};

Instructions on how to deploy a worker can be found on our developer documentation.

You can also test the Worker on your website without deploying the worker. You can find instructions on how to do this in another blog post we wrote in the past.

Implemented with Rust on Cloudflare Workers

We were happy to discover that polyfill.io is a Rust project. As you might know, Rust has been a first class citizen on Cloudflare Workers from the start.

The polyfill.io service was hosted on Fastly and used their Rust library. We forked the project to add the compatibility for Cloudflare Workers, and plan to make the fork publicly accessible in the near future.

Worker

The https://cdnjs.cloudflare.com/polyfill/[...].js endpoints are also implemented in a Cloudflare Worker that wraps our Polyfill.io fork. The wrapper uses Cloudflare’s Rust API and looks like the following:

#[event(fetch)]
async fn main(req: Request, env: Env, ctx: Context) -> Result {
    let metrics = {...};

    let polyfill_store = get_d1(&req, &env)?;
    let polyfill_env = Arc::new(service::Env { polyfill_store, metrics });
    
    // Run the polyfill.io entrypoint
    let res = service::handle_request(req2, polyfill_env).await;

    let status_code = if let Ok(res) = &res {
        res.status_code()
    } else {
        500
    };
    metrics
        .requests
        .with_label_values(&[&status_code.to_string()])
        .inc();

    ctx.wait_until(async move {
        if let Err(err) = metrics.report_metrics().await {
            console_error!("failed to report metrics: {err}");
        }
    });

    res
}

The wrapper only sets up our internal metrics and logging tools, so we can monitor uptime and performance of the underlying logic while calling the Polyfill.io entrypoint.

Storage for the Polyfill files

All the polyfill files are stored in a key-value store powered by Cloudflare D1. This allows us to fetch as many polyfill files as we need with a single SQL query, as opposed to the original implementation doing one KV get() per file.

For performance, we have one Cloudflare D1 instance per region and the SQL queries are routed to the nearest database.

cdnjs for your JavaScript libraries

cdnjs is hosting over 6k JavaScript libraries as of today. We are looking for ways to improve the service and provide new content. We listen to community feedback and welcome suggestions on our community forum, or cdnjs on GitHub.

Page Shield is also available to all paid plans. Log in to turn it on with a single click to increase visibility and security for your third party assets.

Wasm core dumps and debugging Rust in Cloudflare Workers

Sven Sauleau — Mon, 14 Aug 2023 13:00:33 GMT

A clear sign of maturing for any new programming language or environment is how easy and efficient debugging them is. Programming, like any other complex task, involves various challenges and potential pitfalls. Logic errors, off-by-ones, null pointer dereferences, and memory leaks are some examples of things that can make software developers desperate if they can't pinpoint and fix these issues quickly as part of their workflows and tools.

WebAssembly (Wasm) is a binary instruction format designed to be a portable and efficient target for the compilation of high-level languages like Rust, C, C++, and others. In recent years, it has gained significant traction for building high-performance applications in web and serverless environments.

Cloudflare Workers has had first-party support for Rust and Wasm for quite some time. We've been using this powerful combination to bootstrap and build some of our most recent services, like D1, Constellation, and Signed Exchanges, to name a few.

Using tools like Wrangler, our command-line tool for building with Cloudflare developer products, makes streaming real-time logs from our applications running remotely easy. Still, to be honest, debugging Rust and Wasm with Cloudflare Workers involves a lot of the good old time-consuming and nerve-wracking printf'ing strategy.

What if there’s a better way? This blog is about enabling and using Wasm core dumps and how you can easily debug Rust in Cloudflare Workers.

What are core dumps?

In computing, a core dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. They also add things like the processor registers, stack pointer, program counter, and other information that may be relevant to fully understanding why the program crashed.

In most cases, depending on the system’s configuration, core dumps are usually initiated by the operating system in response to a program crash. You can then use a debugger like gdb to examine what happened and hopefully determine the cause of a crash. gdb allows you to run the executable to try to replicate the crash in a more controlled environment, inspecting the variables, and much more. The Windows' equivalent of a core dump is a minidump. Other mature languages that are interpreted, like Python, or languages that run inside a virtual machine, like Java, also have their ways of generating core dumps for post-mortem analysis.

Core dumps are particularly useful for post-mortem debugging, determining the conditions that lead to a failure after it has occurred.

WebAssembly core dumps

WebAssembly has had a proposal for implementing core dumps in discussion for a while. It's a work-in-progress experimental specification, but it provides basic support for the main ideas of post-mortem debugging, including using the DWARF (debugging with attributed record formats) debug format, the same that Linux and gdb use. Some of the most popular Wasm runtimes, like Wasmtime and Wasmer, have experimental flags that you can enable and start playing with Wasm core dumps today.

If you run Wasmtime or Wasmer with the flag:

--coredump-on-trap=/path/to/coredump/file

The core dump file will be emitted at that location path if a crash happens. You can then use tools like wasmgdb to inspect the file and debug the crash.

But let's dig into how the core dumps are generated in WebAssembly, and what’s inside them.

How are Wasm core dumps generated

(and what’s inside them)

When WebAssembly terminates execution due to abnormal behavior, we say that it entered a trap. With Rust, examples of operations that can trap are accessing out-of-bounds addresses or a division by zero arithmetic call. You can read about the security model of WebAssembly to learn more about traps.

The core dump specification plugs into the trap workflow. When WebAssembly crashes and enters a trap, core dumping support kicks in and starts unwinding the call stack gathering debugging information. For each frame in the stack, it collects the function parameters and the values stored in locals and in the stack, along with binary offsets that help us map to exact locations in the source code. Finally, it snapshots the memory and captures information like the tables and the global variables.

DWARF is used by many mature languages like C, C++, Rust, Java, or Go. By emitting DWARF information into the binary at compile time a debugger can provide information such as the source name and the line number where the exception occurred, function and argument names, and more. Without DWARF, the core dumps would be just pure assembly code without any contextual information or metadata related to the source code that generated it before compilation, and they would be much harder to debug.

WebAssembly uses a (lighter) version of DWARF that maps functions, or a module and local variables, to their names in the source code (you can read about the WebAssembly name section for more information), and naturally core dumps use this information.

All this information for debugging is then bundled together and saved to the file, the core dump file.

The core dump structure has multiple sections, but the most important are:

General information about the process;
The threads and their stack frames (note that WebAssembly is single threaded in Cloudflare Workers);
A snapshot of the WebAssembly linear memory or only the relevant regions;
Optionally, other sections like globals, data, or table.

Here’s the thread definition from the core dump specification:

corestack   ::= customsec(thread-info vec(frame))
thread-info ::= 0x0 thread-name:name ...
frame       ::= 0x0 ... funcidx:u32 codeoffset:u32 locals:vec(value)
                stack:vec(value)

A thread is a custom section called corestack. A corestack section contains the thread name and a vector (or array) of frames. Each frame contains the function index in the WebAssembly module (funcidx), the code offset relative to the function's start (codeoffset), the list of locals, and the list of values in the stack.

Values are defined as follows:

value ::= 0x01       => ∅
        | 0x7F n:i32 => n
        | 0x7E n:i64 => n
        | 0x7D n:f32 => n
        | 0x7C n:f64 => n

At the time of this writing these are the possible numbers types in a value. Again, we wanted to describe the basics; you should track the full specification to get more detail or find information about future changes. WebAssembly core dump support is in its early stages of specification and implementation, things will get better, things might change.

This is all great news. Unfortunately, however, the Cloudflare Workers runtime doesn’t support WebAssembly core dumps yet. There is no technical impediment to adding this feature to workerd; after all, it's based on V8, but since it powers a critical part of our production infrastructure and products, we tend to be conservative when it comes to adding specifications or standards that are still considered experimental and still going through the definitions phase.

So, how do we get core Wasm dumps in Cloudflare Workers today?

Polyfilling

Polyfilling means using userland code to provide modern functionality in older environments that do not natively support it. Polyfills are widely popular in the JavaScript community and the browser environment; they've been used extensively to address issues where browser vendors still didn't catch up with the latest standards, or when they implement the same features in different ways, or address cases where old browsers can never support a new standard.

Meet wasm-coredump-rewriter, a tool that you can use to rewrite a Wasm module and inject the core dump runtime functionality in the binary. This runtime code will catch most traps (exceptions in host functions are not yet catched and memory violation not by default) and generate a standard core dump file. To some degree, this is similar to how Binaryen's Asyncify works.

Let’s look at code and see how this works. He’s some simple pseudo code:

export function entry(v1, v2) {
    return addTwo(v1, v2)
}

function addTwo(v1, v2) {
  res = v1 + v2;
  throw "something went wrong";

  return res
}

An imaginary compiler could take that source and generate the following Wasm binary code:

  (func $entry (param i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (call $addTwo)
  )

  (func $addTwo (param i32 i32) (result i32)
    (local.get 0)
    (local.get 1)
    (i32.add)
    (unreachable) ;; something went wrong
  )

  (export "entry" (func $entry))

“;;” is used to denote a comment.

entry() is the Wasm function exported to the host. In an environment like the browser, JavaScript (being the host) can call entry().

Irrelevant parts of the code have been snipped for brevity, but this is what the Wasm code will look like after wasm-coredump-rewriter rewrites it:

  (func $entry (type 0) (param i32 i32) (result i32)
    ...
    local.get 0
    local.get 1
    call $addTwo ;; see the addTwo function bellow
    global.get 2 ;; is unwinding?
    if  ;; label = @1
      i32.const x ;; code offset
      i32.const 0 ;; function index
      i32.const 2 ;; local_count
      call $coredump/start_frame
      local.get 0
      call $coredump/add_i32_local
      local.get 1
      call $coredump/add_i32_local
      ...
      call $coredump/write_coredump
      unreachable
    end)

  (func $addTwo (type 0) (param i32 i32) (result i32)
    local.get 0
    local.get 1
    i32.add
    ;; the unreachable instruction was here before
    call $coredump/unreachable_shim
    i32.const 1 ;; funcidx
    i32.const 2 ;; local_count
    call $coredump/start_frame
    local.get 0
    call $coredump/add_i32_local
    local.get 1
    call $coredump/add_i32_local
    ...
    return)

  (export "entry" (func $entry))

As you can see, a few things changed:

The (unreachable) instruction in addTwo() was replaced by a call to $coredump/unreachable_shim which starts the unwinding process. Then, the location and debugging data is captured, and the function returns normally to the entry() caller.
Code has been added after the addTwo() call instruction in entry() that detects if we have an unwinding process in progress or not. If we do, then it also captures the local debugging data, writes the core dump file and then, finally, moves to the unconditional trap unreachable.

In short, we unwind until the host function entry() gets destroyed by calling unreachable.

Let’s go over the runtime functions that we inject for more clarity, stay with us:

$coredump/start_frame(funcidx, local_count) starts a new frame in the coredump.
$coredump/add_*_local(value) captures the values of function arguments and in locals (currently capturing values from the stack isn’t implemented.)
$coredump/write_coredump is used at the end and writes the core dump in memory. We take advantage of the first 1 KiB of the Wasm linear memory, which is unused, to store our core dump.

A diagram is worth a thousand words:

Wait, what’s this about the first 1 KiB of the memory being unused, you ask? Well, it turns out that most WebAssembly linters and tools, including Emscripten and WebAssembly’s LLVM don’t use the first 1 KiB of memory. Rust and Zig also use LLVM, but they changed the default. This isn’t pretty, but the hugely popular Asyncify polyfill relies on the same trick, so there’s reasonable support until we find a better way.

But we digress, let’s continue. After the crash, the host, typically JavaScript in the browser, can now catch the exception and extract the core dump from the Wasm instance’s memory:

try {
    wasmInstance.exports.someExportedFunction();
} catch(err) {
    const image = new Uint8Array(wasmInstance.exports.memory.buffer);
    writeFile("coredump." + Date.now(), image);
}

If you're curious about the actual details of the core dump implementation, you can find the source code here. It was written in AssemblyScript, a TypeScript-like language for WebAssembly.

This is how we use the polyfilling technique to implement Wasm core dumps when the runtime doesn’t support them yet. Interestingly, some Wasm runtimes, being optimizing compilers, are likely to make debugging more difficult because function arguments, locals, or functions themselves can be optimized away. Polyfilling or rewriting the binary could actually preserve more source-level information for debugging.

You might be asking what about performance? We did some testing and found that the impact is negligible; the cost-benefit of being able to debug our crashes is positive. Also, you can easily turn wasm core dumps on or off for specific builds or environments; deciding when you need them is up to you.

Debugging from a core dump

We now know how to generate a core dump, but how do we use it to diagnose and debug a software crash?

Similarly to gdb (GNU Project Debugger) on Linux, wasmgdb is the tool you can use to parse and make sense of core dumps in WebAssembly; it understands the file structure, uses DWARF to provide naming and contextual information, and offers interactive commands to navigate the data. To exemplify how it works, wasmgdb has a demo of a Rust application that deliberately crashes; we will use it.

Let's imagine that our Wasm program crashed, wrote a core dump file, and we want to debug it.

$ wasmgdb source-program.wasm /path/to/coredump
wasmgdb>

When you fire wasmgdb, you enter a REPL (Read-Eval-Print Loop) interface, and you can start typing commands. The tool tries to mimic the gdb command syntax; you can find the list here.

Let's examine the backtrace using the bt command:

wasmgdb> bt
#18     000137 as __rust_start_panic () at library/panic_abort/src/lib.rs
#17     000129 as rust_panic () at library/std/src/panicking.rs
#16     000128 as rust_panic_with_hook () at library/std/src/panicking.rs
#15     000117 as {closure#0} () at library/std/src/panicking.rs
#14     000116 as __rust_end_short_backtrace () at library/std/src/sys_common/backtrace.rs
#13     000123 as begin_panic_handler () at library/std/src/panicking.rs
#12     000194 as panic_fmt () at library/core/src/panicking.rs
#11     000198 as panic () at library/core/src/panicking.rs
#10     000012 as calculate (value=0x03000000) at src/main.rs
#9      000011 as process_thing (thing=0x2cff0f00) at src/main.rs
#8      000010 as main () at src/main.rs
#7      000008 as call_once (???=0x01000000, ???=0x00000000) at /rustc/b833ad56f46a0bbe0e8729512812a161e7dae28a/library/core/src/ops/function.rs
#6      000020 as __rust_begin_short_backtrace (f=0x01000000) at /rustc/b833ad56f46a0bbe0e8729512812a161e7dae28a/library/std/src/sys_common/backtrace.rs
#5      000016 as {closure#0}<()> () at /rustc/b833ad56f46a0bbe0e8729512812a161e7dae28a/library/std/src/rt.rs
#4      000077 as lang_start_internal () at library/std/src/rt.rs
#3      000015 as lang_start<()> (main=0x01000000, argc=0x00000000, argv=0x00000000, sigpipe=0x00620000) at /rustc/b833ad56f46a0bbe0e8729512812a161e7dae28a/library/std/src/rt.rs
#2      000013 as __original_main () at /
#1      000005 as _start () at /
#0      000264 as _start.command_export at

Each line represents a frame from the program's call stack; see frame #3:

#3      000015 as lang_start<()> (main=0x01000000, argc=0x00000000, argv=0x00000000, sigpipe=0x00620000) at /rustc/b833ad56f46a0bbe0e8729512812a161e7dae28a/library/std/src/rt.rs

You can read the funcidx, function name, arguments names and values and source location are all present. Let's select frame #9 now and inspect the locals, which include the function arguments:

wasmgdb> f 9
000011 as process_thing (thing=0x2cff0f00) at src/main.rs
wasmgdb> info locals
thing: *MyThing = 0xfff1c

Let’s use the p command to inspect the content of the thing argument:

wasmgdb> p (*thing)
thing (0xfff2c): MyThing = {
    value (0xfff2c): usize = 0x00000003
}

You can also use the p command to inspect the value of the variable, which can be useful for nested structures:

wasmgdb> p (*thing)->value
value (0xfff2c): usize = 0x00000003

And you can use p to inspect memory addresses. Let’s point at 0xfff2c, the start of the MyThing structure, and inspect:

wasmgdb> p (MyThing) 0xfff2c
0xfff2c (0xfff2c): MyThing = {
    value (0xfff2c): usize = 0x00000003
}

All this information in every step of the stack is very helpful to determine the cause of a crash. In our test case, if you look at frame #10, we triggered an integer overflow. Once you get comfortable walking through wasmgdb and using its commands to inspect the data, debugging core dumps will be another powerful skill under your belt.

Tidying up everything in Cloudflare Workers

We learned about core dumps and how they work, and we know how to make Cloudflare Workers generate them using the wasm-coredump-rewriter polyfill, but how does all this work in practice end to end?

We've been dogfooding the technique described in this blog at Cloudflare for a while now. Wasm core dumps have been invaluable in helping us debug Rust-based services running on top of Cloudflare Workers like D1, Privacy Edge, AMP, or Constellation.

Today we're open-sourcing the Wasm Coredump Service and enabling anyone to deploy it. This service collects the Wasm core dumps originating from your projects and applications when they crash, parses them, prints an exception with the stack information in the logs, and can optionally store the full core dump in a file in an R2 bucket (which you can then use with wasmgdb) or send the exception to Sentry.

We use a service binding to facilitate the communication between your application Worker and the Coredump service Worker. A Service binding allows you to send HTTP requests to another Worker without those requests going over the Internet, thus avoiding network latency or having to deal with authentication. Here’s a diagram of how it works:

Using it is as simple as npm/yarn installing @cloudflare/wasm-coredump, configuring a few options, and then adding a few lines of code to your other applications running in Cloudflare Workers, in the exception handling logic:

import shim, { getMemory, wasmModule } from "../build/worker/shim.mjs"

const timeoutSecs = 20;

async function fetch(request, env, ctx) {
    try {
        // see https://github.com/rustwasm/wasm-bindgen/issues/2724.
        return await Promise.race([
            shim.fetch(request, env, ctx),
            new Promise((r, e) => setTimeout(() => e("timeout"), timeoutSecs * 1000))
        ]);
    } catch (err) {
      const memory = getMemory();
      const coredumpService = env.COREDUMP_SERVICE;
      await recordCoredump({ memory, wasmModule, request, coredumpService });
      throw err;
    }
}

The ../build/worker/shim.mjs import comes from the worker-build tool, from the workers-rs packages and is automatically generated when wrangler builds your Rust-based Cloudflare Workers project. If the Wasm throws an exception, we catch it, extract the core dump from memory, and send it to our Core dump service.

You might have noticed that we race the workers-rs shim.fetch() entry point with another Promise to generate a timeout exception if the Rust code doesn't respond earlier. This is because currently, wasm-bindgen, which generates the glue between the JavaScript and Rust land, used by workers-rs, has an issue where a Promise might not be rejected if Rust panics asynchronously (leading to the Worker runtime killing the worker with “Error: The script will never generate a response”.). This can block the wasm-coredump code and make the core dump generation flaky.

We are working to improve this, but in the meantime, make sure to adjust timeoutSecs to something slightly bigger than the typical response time of your application.

Here’s an example of a Wasm core dump exception in Sentry:

You can find a working example, the Sentry and R2 configuration options, and more details in the @cloudflare/wasm-coredump GitHub repository.

Too big to fail

It's worth mentioning one corner case of this debugging technique and the solution: sometimes your codebase is so big that adding core dump and DWARF debugging information might result in a Wasm binary that is too big to fit in a Cloudflare Worker. Well, worry not; we have a solution for that too.

Fortunately the DWARF for WebAssembly specification also supports external DWARF files. To make this work, we have a tool called debuginfo-split that you can add to the build command in the wrangler.toml configuration:

command = "... && debuginfo-split ./build/worker/index.wasm"

What this does is it strips the debugging information from the Wasm binary, and writes it to a new separate file called debug-{UUID}.wasm. You then need to upload this file to the same R2 bucket used by the Wasm Coredump Service (you can automate this as part of your CI or build scripts). The same UUID is also injected into the main Wasm binary; this allows us to correlate the Wasm binary with its corresponding DWARF debugging information. Problem solved.

Binaries without DWARF information can be significantly smaller. Here’s our example:

4.5 MiB	debug-63372dbe-41e6-447d-9c2e-e37b98e4c656.wasm
313 KiB	build/worker/index.wasm

Final words

We hope you enjoyed reading this blog as much as we did writing it and that it can help you take your Wasm debugging journeys, using Cloudflare Workers or not, to another level.

Note that while the examples used here were around using Rust and WebAssembly because that's a common pattern, you can use the same techniques if you're compiling WebAssembly from other languages like C or C++.

Also, note that the WebAssembly core dump standard is a hot topic, and its implementations and adoption are evolving quickly. We will continue improving the wasm-coredump-rewriter, debuginfo-split, and wasmgdb tools and the wasm-coredump service. More and more runtimes, including V8, will eventually support core dumps natively, thus eliminating the need to use polyfills, and the tooling, in general, will get better; that's a certainty. For now, we present you with a solution that works today, and we have strong incentives to keep supporting it.

As usual, you can talk to us on our Developers Discord or the Community forum or open issues or PRs in our GitHub repositories; the team will be listening.

Welcome to Wildebeest: the Fediverse on Cloudflare

Celso Martinho — Wed, 08 Feb 2023 19:00:00 GMT

The Fediverse has been a hot topic of discussion lately, with thousands, if not millions, of new users creating accounts on platforms like Mastodon to either move entirely to "the other side" or experiment and learn about this new social network.

Today we're introducing Wildebeest, an open-source, easy-to-deploy ActivityPub and Mastodon-compatible server built entirely on top of Cloudflare's Supercloud. If you want to run your own spot in the Fediverse you can now do it entirely on Cloudflare.

The Fediverse, built on Cloudflare

Today you're left with two options if you want to join the Mastodon federated network: either you join one of the existing servers (servers are also called communities, and each one has its own infrastructure and rules), or you can run your self-hosted server.

There are a few reasons why you'd want to run your own server:

You want to create a new community and attract other users over a common theme and usage rules.
You don't want to have to trust third-party servers or abide by their policies and want your server, under your domain, for your personal account.
You want complete control over your data, personal information, and content and visibility over what happens with your instance.

The Mastodon gGmbH non-profit organization provides a server implementation using Ruby, Node.js, PostgreSQL and Redis. Running the official server can be challenging, though. You need to own or rent a server or VPS somewhere; you have to install and configure the software, set up the database and public-facing web server, and configure and protect your network against attacks or abuse. And then you have to maintain all of that and deal with constant updates. It's a lot of scripting and technical work before you can get it up and running; definitely not something for the less technical enthusiasts.

Wildebeest serves two purposes: you can quickly deploy your Mastodon-compatible server on top of Cloudflare and connect it to the Fediverse in minutes, and you don't need to worry about maintaining or protecting it from abuse or attacks; Cloudflare will do it for you automatically.

Wildebeest is not a managed service. It's your instance, data, and code running in our cloud under your Cloudflare account. Furthermore, it's open-sourced, which means it keeps evolving with more features, and anyone can extend and improve it.

Here's what we support today:

ActivityPub, WebFinger, NodeInfo, WebPush and Mastodon-compatible APIs. Wildebeest can connect to or receive connections from other Fediverse servers.
Compatible with the most popular Mastodon web (like Pinafore), desktop, and mobile clients. We also provide a simple read-only web interface to explore the timelines and user profiles.
You can publish, edit, boost, or delete posts, sorry, toots. We support text, images, and (soon) video.
Anyone can follow you; you can follow anyone.
You can search for content.
You can register one or multiple accounts under your instance. Authentication can be email-based on or using any Cloudflare Access compatible IdP, like GitHub or Google.
You can edit your profile information, avatar, and header image.

How we built it

Our implementation is built entirely on top of our products and APIs. Building Wildebeest was another excellent opportunity to showcase our technology stack's power and versatility and prove how anyone can also use Cloudflare to build larger applications that involve multiple systems and complex requirements.

Here's a birds-eye diagram of Wildebeest's architecture:

Let's get into the details and get technical now.

Cloudflare Pages

At the core, Wildebeest is a Cloudflare Pages project running its code using Pages Functions. Cloudflare Pages provides an excellent foundation for building and deploying your application and serving your bundled assets, Functions gives you full access to the Workers ecosystem, where you can run any code.

Functions has a built-in file-based router. The /functions directory structure, which is uploaded by Wildebeest’s continuous deployment builds, defines your application routes and what files and code will process each HTTP endpoint request. This routing technique is similar to what other frameworks like Next.js use.

For example, Mastodon’s /api/v1/timelines/public API endpoint is handled by /functions/api/v1/timelines/public.ts with the onRequest method.

export onRequest = async ({ request, env }) => {
	const { searchParams } = new URL(request.url)
	const domain = new URL(request.url).hostname
...
	return handleRequest(domain, env.DATABASE, {})
}

export async function handleRequest(
    …
): Promise {
    …
}

Unit testing these endpoints becomes easier too, since we only have to call the handleRequest() function from the testing framework. Check one of our Jest tests, mastodon.spec.ts:

import * as v1_instance from 'wildebeest/functions/api/v1/instance'

describe('Mastodon APIs', () => {
	describe('instance', () => {
		test('return the instance infos v1', async () => {
			const res = await v1_instance.handleRequest(domain, env)
			assert.equal(res.status, 200)
			assertCORS(res)

			const data = await res.json()
			assert.equal(data.rules.length, 0)
			assert(data.version.includes('Wildebeest'))
		})
       })
})

As with any other regular Worker, Functions also lets you set up bindings to interact with other Cloudflare products and features like KV, R2, D1, Durable Objects, and more. The list keeps growing.

We use Functions to implement a large portion of the official Mastodon API specification, making Wildebeest compatible with the existing ecosystem of other servers and client applications, and also to run our own read-only web frontend under the same project codebase.

Wildebeest’s web frontend uses Qwik, a general-purpose web framework that is optimized for speed, uses modern concepts like the JSX JavaScript syntax extension and supports server-side-rendering (SSR) and static site generation (SSG).

Qwik provides a Cloudflare Pages Adaptor out of the box, so we use that (check our framework guide to know more about how to deploy a Qwik site on Cloudflare Pages). For styling we use the Tailwind CSS framework, which Qwik supports natively.

Our frontend website code and static assets can be found under the /frontend directory. The application is handled by the /functions/[[path]].js dynamic route, which basically catches all the non-API requests, and then invokes Qwik’s own internal router, Qwik City, which takes over everything else after that.

The power and versatility of Pages and Functions routes make it possible to run both the backend APIs and a server-side-rendered dynamic client, effectively a full-stack app, under the same project.

Let's dig even deeper now, and understand how the server interacts with the other components in our architecture.

D1

Wildebeest uses D1, Cloudflare’s first SQL database for the Workers platform built on top of SQLite, now open to everyone in alpha, to store and query data. Here’s our schema:

The schema will probably change in the future, as we add more features. That’s fine, D1 supports migrations which are great when you need to update your database schema without losing your data. With each new Wildebeest version, we can create a new migration file if it requires database schema changes.

-- Migration number: 0001 	 2023-01-16T13:09:04.033Z

CREATE UNIQUE INDEX unique_actor_following ON actor_following (actor_id, target_actor_id);

D1 exposes a powerful client API that developers can use to manipulate and query data from Worker scripts, or in our case, Pages Functions.

Here’s a simplified example of how we interact with D1 when you start following someone on the Fediverse:

export async function addFollowing(db, actor, target, targetAcct): Promise {
	const query = `INSERT OR IGNORE INTO actor_following (id, actor_id, target_actor_id, state, target_actor_acct) VALUES (?, ?, ?, ?, ?)`
	const out = await db
		.prepare(query)
		.bind(id, actor.id.toString(), target.id.toString(), STATE_PENDING, targetAcct)
		.run()
	return id
}

Cloudflare’s culture of dogfooding and building on top of our own products means that we sometimes experience their shortcomings before our users. We did face a few challenges using D1, which is built on SQLite, to store our data. Here are two examples.

ActivityPub uses UUIDs to identify objects and reference them in URIs extensively. These objects need to be stored in the database. Other databases like PostgreSQL provide built-in functions to generate unique identifiers. SQLite and D1 don't have that, yet, it’s in our roadmap.

Worry not though, the Workers runtime supports Web Crypto, so we use crypto.randomUUID() to get our unique identifiers. Check the /backend/src/activitypub/actors/inbox.ts:

export async function addObjectInInbox(db, actor, obj) {
	const id = crypto.randomUUID()
	const out = await db
		.prepare('INSERT INTO inbox_objects(id, actor_id, object_id) VALUES(?, ?, ?)')
		.bind(id, actor.id.toString(), obj.id.toString())
		.run()
}

Problem solved.

The other example is that we need to store dates with sub-second resolution. Again, databases like PostgreSQL have that:

psql> select now();
2023-02-01 11:45:17.425563+00

However SQLite falls short with:

sqlite> select datetime();
2023-02-01 11:44:02

We worked around this problem with a small hack using strftime():

sqlite> select strftime('%Y-%m-%d %H:%M:%f', 'NOW');
2023-02-01 11:49:35.624

See our initial SQL schema, look for the cdate defaults.

Images

Mastodon content has a lot of rich media. We don't need to reinvent the wheel and build an image pipeline; Cloudflare Images provides APIs to upload, transform, and serve optimized images from our global CDN, so it's the perfect fit for Wildebeest's requirements.

Things like posting content images, the profile avatar, or headers, all use the Images APIs. See /backend/src/media/image.ts to understand how we interface with Images.

async function upload(file: File, config: Config): Promise {
	const formData = new FormData()
	const url = `https://api.cloudflare.com/client/v4/accounts/${config.accountId}/images/v1`

	formData.set('file', file)

	const res = await fetch(url, {
		method: 'POST',
		body: formData,
		headers: {
			authorization: 'Bearer ' + config.apiToken,
		},
	})

      const data = await res.json()
	return data.result
}

If you're curious about Images for your next project, here's a tutorial on how to integrate Cloudflare Images on your website.

Cloudflare Images is also available from the dashboard. You can use it to browse or manage your catalog quickly.

Queues

The ActivityPub protocol is chatty by design. Depending on the size of your social graph, there might be a lot of back-and-forth HTTP traffic. We can’t have the clients blocked waiting for hundreds of Fediverse message deliveries every time someone posts something.

We needed a way to work asynchronously and launch background jobs to offload data processing away from the main app and keep the clients snappy. The official Mastodon server has a similar strategy using Sidekiq to do background processing.

Fortunately, we don't need to worry about any of this complexity either. Cloudflare Queues allows developers to send and receive messages with guaranteed delivery, and offload work from your Workers' requests, effectively providing you with asynchronous batch job capabilities.

To put it simply, you have a queue topic identifier, which is basically a buffered list that scales automatically, then you have one or more producers that, well, produce structured messages, JSON objects in our case, and put them in the queue (you define their schema), and finally you have one or more consumers that subscribes that queue, receive its messages and process them, at their own speed.

Here’s the How Queues works page for more information.

In our case, the main application produces queue jobs whenever any incoming API call requires long, expensive operations. For example, when someone posts, sorry, toots something, we need to broadcast that to their followers' inboxes, potentially triggering many requests to remote servers. Here we are queueing a job for that, thus freeing the APIs to keep responding:

export async function deliverFollowers(
	db: D1Database,
	from: Actor,
	activity: Activity,
	queue: Queue
) {
	const followers = await getFollowers(db, from)

	const messages = followers.map((id) => {
		const body = {
			activity: JSON.parse(JSON.stringify(activity)),
			actorId: from.id.toString(),
			toActorId: id,
		}
		return { body }
	})

	await queue.sendBatch(messages)
}

Similarly, we don't want to stop the main APIs when remote servers deliver messages to our instance inboxes. Here's Wildebeest creating asynchronous jobs when it receives messages in the inbox:

export async function handleRequest(
	domain: string,
	db: D1Database,
	id: string,
	activity: Activity,
	queue: Queue,
): Promise {
	const handle = parseHandle(id)

	const actorId = actorURL(domain, handle.localPart)
const actor = await actors.getPersonById(db, actorId)

      // creates job
	await queue.send({
		type: MessageType.Inbox,
		actorId: actor.id.toString(),
		activity,
	})

	// frees the API
	return new Response('', { status: 200 })
}

And the final piece of the puzzle, our queue consumer runs in a separate Worker, independently from the Pages project. The consumer listens for new messages and processes them sequentially, at its rhythm, freeing everyone else from blocking. When things get busy, the queue grows its buffer. Still, things keep running, and the jobs will eventually get dispatched, freeing the main APIs for the critical stuff: responding to remote servers and clients as quickly as possible.

export default {
	async queue(batch, env, ctx) {
		for (const message of batch.messages) {
			…

			switch (message.body.type) {
				case MessageType.Inbox: {
					await handleInboxMessage(...)
					break
				}
				case MessageType.Deliver: {
					await handleDeliverMessage(...)
					break
				}
			}
		}
	},
}

If you want to get your hands dirty with Queues, here’s a simple example on Using Queues to store data in R2.

Caching and Durable Objects

Caching repetitive operations is yet another strategy for improving performance in complex applications that require data processing. A famous Netscape developer, Phil Karlton, once said: "There are only two hard things in Computer Science: cache invalidation and naming things."

Cloudflare obviously knows a lot about caching since it's a core feature of our global CDN. We also provide Workers KV to our customers, a global, low-latency, key-value data store that anyone can use to cache data objects in our data centers and build fast websites and applications.

However, KV achieves its performance by being eventually consistent. While this is fine for many applications and use cases, it's not ideal for others.

The ActivityPub protocol is highly transactional and can't afford eventual consistency. Here's an example: generating complete timelines is expensive, so we cache that operation. However, when you post something, we need to invalidate that cache before we reply to the client. Otherwise, the new post won't be in the timeline and the client can fail with an error because it doesn’t see it. This actually happened to us with one of the most popular clients.

We needed to get clever. The team discussed a few options. Fortunately, our API catalog has plenty of options. Meet Durable Objects.

Durable Objects are single-instance Workers that provide a transactional storage API. They're ideal when you need central coordination, strong consistency, and state persistence. You can use Durable Objects in cases like handling the state of multiple WebSocket connections, coordinating and routing messages in a chatroom, or even running a multiplayer game like Doom.

You know where this is going now. Yes, we implemented our key-value caching subsystem for Wildebeest on top of a Durable Object. By taking advantage of the DO's native transactional storage API, we can have strong guarantees that whenever we create or change a key, the next read will always return the latest version.

The idea is so simple and effective that it took us literally a few lines of code to implement a key-value cache with two primitives: HTTP PUT and GET.

export class WildebeestCache {
	async fetch(request: Request) {
		if (request.method === 'GET') {
			const { pathname } = new URL(request.url)
			const key = pathname.slice(1)
			const value = await this.storage.get(key)
			return new Response(JSON.stringify(value))
		}

		if (request.method === 'PUT') {
			const { key, value } = await request.json()
			await this.storage.put(key, value)
			return new Response('', { status: 201 })
		}
	}
}

Strong consistency it is. Let's move to user registration and authentication now.

Zero Trust Access

The official Mastodon server handles user registrations, typically using email, before you can choose your local username and start using the service. Handling user registration and authentication can be daunting and time-consuming if we were to build it from scratch though.

Furthermore, people don't want to create new credentials for every new service they want to use and instead want more convenient OAuth-like authorization and authentication methods so that they can reuse their existing Apple, Google, or GitHub accounts.

We wanted to simplify things using Cloudflare’s built-in features. Needless to say, we have a product that handles user onboarding, authentication, and access policies to any application behind Cloudflare; it's called Zero Trust. So we put Wildebeest behind it.

Zero Trust Access can either do one-time PIN (OTP) authentication using email or single-sign-on (SSO) with many identity providers (examples: Google, Facebook, GitHub, LinkedIn), including any generic one supporting SAML 2.0.

When you start using Wildebeest with a client, you don't need to register at all. Instead, you go straight to log in, which will redirect you to the Access page and handle the authentication according to the policy that you, the owner of your instance, configured.

The policy defines who can authenticate, and how.

When authenticated, Access will redirect you back to Wildebeest. The first time this happens, we will detect that we don't have information about the user and ask for your Username and Display Name. This will be asked only once and is what will be to create your public Mastodon profile.

Technically, Wildebeest implements the OAuth 2 specification. Zero Trust protects the /oauth/authorize endpoint and issues a valid JWT token in the request headers when the user is authenticated. Wildebeest then reads and verifies the JWT and returns an authorization code in the URL redirect.

Once the client has an authorization code, it can use the /oauth/token endpoint to obtain an API access token. Subsequent API calls inject a bearer token in the Authorization header:

Authorization: Bearer access_token

Deployment and Continuous Integration

We didn't want to run a managed service for Mastodon as it would somewhat diminish the concepts of federation and data ownership. Also, we recognize that ActivityPub and Mastodon are emerging, fast-paced technologies that will evolve quickly and in ways that are difficult to predict just yet.

For these reasons, we thought the best way to help the ecosystem right now would be to provide an open-source software package that anyone could use, customize, improve, and deploy on top of our cloud. Cloudflare will obviously keep improving Wildebeest and support the community, but we want to give our Fediverse maintainers complete control and ownership of their instances and data.

The remaining question was, how do we distribute the Wildebeest bundle and make it easy to deploy into someone's account when it requires configuring so many Cloudflare features, and how do we facilitate updating the software over time?

The solution ended up being a clever mix of using GitHub with GitHub Actions, Deploy with Workers, and Terraform.

The Deploy with Workers button is a specially crafted link that auto-generates a workflow page where the user gets asked some questions, and Cloudflare handles authorizing GitHub to deploy to Workers, automatically forks the Wildebeest repository into the user's account, and then configures and deploys the project using a GitHub Actions workflow.

A GitHub Actions workflow is a YAML file that declares what to do in every step. Here’s the Wildebeest workflow (simplified):

name: Deploy
on:
  push:
    branches:
      - main
  repository_dispatch:
jobs:
  deploy:
    runs-on: ubuntu-latest
    timeout-minutes: 60
    steps:
      - name: Ensure CF_DEPLOY_DOMAIN and CF_ZONE_ID are defined
        ...
      - name: Create D1 database
        uses: cloudflare/wrangler-action@2.0.0
        with:
          command: d1 create wildebeest-${{ env.OWNER_LOWER }}
        ...
      - name: retrieve Zero Trust organization
        ...
      - name: retrieve Terraform state KV namespace
        ...
      - name: download VAPID keys
        ...
      - name: Publish DO
      - name: Configure
        run: terraform plan && terraform apply -auto-approve
      - name: Create Queue
        ...
      - name: Publish consumer
        ...
      - name: Publish
        uses: cloudflare/wrangler-action@2.0.0
        with:
          command: pages publish --project-name=wildebeest-${{ env.OWNER_LOWER }} .

Updating Wildebeest

This workflow runs automatically every time the main branch changes, so updating the Wildebeest is as easy as synchronizing the upstream official repository with the fork. You don't even need to use git commands for that; GitHub provides a convenient Sync button in the UI that you can simply click.

What's more? Updates are incremental and non-destructive. When the GitHub Actions workflow redeploys Wildebeest, we only make the necessary changes to your configuration and nothing else. You don't lose your data; we don't need to delete your existing configurations. Here’s how we achieved this:

We use Terraform, a declarative configuration language and tool that interacts with our APIs and can query and configure your Cloudflare features. Here's the trick, whenever we apply a new configuration, we keep a copy of the Terraform state for Wildebeest in a Cloudflare KV key. When a new deployment is triggered, we get that state from the KV copy, calculate the differences, then change only what's necessary.

Data loss is not a problem either because, as you read above, D1 supports migrations. If we need to add a new column to a table or a new table, we don't need to destroy the database and create it again; we just apply the necessary SQL to that change.

Protection, optimization and observability, naturally

Once Wildebeest is up and running, you can protect it from bad traffic and malicious actors. Cloudflare offers you DDoS, WAF, and Bot Management protection out of the box at a click's distance.

Likewise, you'll get instant network and content delivery optimizations from our products and analytics on how your Wildebeest instance is performing and being used.

ActivityPub, WebFinger, NodeInfo and Mastodon APIs

Mastodon popularized the Fediverse concept, but many of the underlying technologies used have been around for quite a while. This is one of those rare moments when everything finally comes together to create a working platform that answers an actual use case for Internet users. Let's quickly go through the protocols that Wildebeest had to implement:

ActivityPub

ActivityPub is a decentralized social networking protocol and has been around as a W3C recommendation since at least 2018. It defines client APIs for creating and manipulating content and server-to-server APIs for content exchange and notifications, also known as federation. ActivityPub uses ActivityStreams, an even older W3C protocol, for its vocabulary.

The concepts of Actors (profiles), messages or Objects (the toots), inbox (where you receive toots from people you follow), and outbox (where you send your toots to the people you follow), to name a few of many other actions and activities, are all defined on the ActivityPub specification.

Here’s our folder with the ActivityPub implementation.

import type { APObject } from 'wildebeest/backend/src/activitypub/objects'
import type { Actor } from 'wildebeest/backend/src/activitypub/actors'

export async function addObjectInInbox(db, actor, obj) {
	const id = crypto.randomUUID()
	const out = await db
		.prepare('INSERT INTO inbox_objects(id, actor_id, object_id) VALUES(?, ?, ?)')
		.bind(id, actor.id.toString(), obj.id.toString())
		.run()
}

WebFinger

WebFinger is a simple HTTP protocol used to discover information about any entity, like a profile, a server, or a specific feature. It resolves URIs to resource objects.

Mastodon uses WebFinger lookups to discover information about remote users. For example, say you want to interact with @user@example.com. Your local server would request https://example.com/.well-known/webfinger?resource=acct:user@example.com (using the acct scheme) and get something like this:

{
    "subject": "acct:user@example.com",
    "aliases": [
        "https://example.com/ap/users/user"
    ],
    "links": [
        {
            "rel": "self",
            "type": "application/activity+json",
            "href": "https://example.com/ap/users/user"
        }
    ]
}

Now we know how to interact with @user@example.com, using the https://example.com/ap/users/user endpoint.

Here’s our WebFinger response:

export async function handleRequest(request, db): Promise {
	…
	const jsonLink = /* … link to actor */

	const res: WebFingerResponse = {
		subject: `acct:...`,
		aliases: [jsonLink],
		links: [
			{
				rel: 'self',
				type: 'application/activity+json',
				href: jsonLink,
			},
		],
	}
	return new Response(JSON.stringify(res), { headers })
}

Mastodon API

Finally, things like setting your server information, profile information, generating timelines, notifications, and searches, are all Mastodon-specific APIs. The Mastodon open-source project defines a catalog of REST APIs, and you can find all the documentation for them on their website.

Our Mastodon API implementation can be found here (REST endpoints) and here (backend primitives). Here’s an example of Mastodon’s server information /api/v2/instance implemented by Wildebeest:

export async function handleRequest(domain, db, env) {

	const res: InstanceConfigV2 = {
		domain,
		title: env.INSTANCE_TITLE,
		version: getVersion(),
		source_url: 'https://github.com/cloudflare/wildebeest',
		description: env.INSTANCE_DESCR,
		thumbnail: {
			url: DEFAULT_THUMBNAIL,
		},
		languages: ['en'],
		registrations: {
			enabled: false,
		},
		contact: {
			email: env.ADMIN_EMAIL,
		},
		rules: [],
	}

	return new Response(JSON.stringify(res), { headers })
}

Wildebeest also implements WebPush for client notifications and NodeInfo for server information.

Other Mastodon-compatible servers had to implement all these protocols too; Wildebeest is one of them. The community is very active in discussing future enhancements; we will keep improving our compatibility and adding support to more features over time, ensuring that Wildebeest plays well with the Fediverse ecosystem of servers and clients emerging.

Get started now

Enough about technology; let's get you into the Fediverse. We tried to detail all the steps to deploy your server. To start using Wildebeest, head to the public GitHub repository and check our Get Started tutorial.

Most of Wildebeest's dependencies offer a generous free plan that allows you to try them for personal or hobby projects that aren't business-critical, however you will need to subscribe an Images plan (the lowest tier should be enough for most needs) and, depending on your server load, Workers Unbound (again, the minimum cost should be plenty for most use cases).

Following our dogfooding mantra, Cloudflare is also officially joining the Fediverse today. You can start following our Mastodon accounts and get the same experience of having regular updates from Cloudflare as you get from us on other social platforms, using your favorite Mastodon apps. These accounts are entirely running on top of a Wildebeest server:

@cloudflare@cloudflare.social - Our main account
@radar@cloudflare.social - Cloudflare Radar

Wildebeest is compatible with most client apps; we are confirmed to work with the official Mastodon Android and iOS apps, Pinafore, Mammoth, and tooot, and looking into others like Ivory. If your favorite isn’t working, please submit an issue here, we’ll do our best to help support it.

Final words

Wildebeest was built entirely on top of our Supercloud stack. It was one of the most complete and complex projects we have created that uses various Cloudflare products and features.

We hope this write-up inspires you to not only try deploying Wildebeest and joining the Fediverse, but also building your next application, however demanding it is, on top of Cloudflare.

Wildebeest is a minimally viable Mastodon-compatible server right now, but we will keep improving it with more features and supporting it over time; after all, we're using it for our official accounts. It is also open-sourced, meaning you are more than welcome to contribute with pull requests or feedback.

In the meantime, we opened a Wildebeest room on our Developers Discord Server and are keeping an eye open on the GitHub repo issues tab. Feel free to engage with us; the team is eager to know how you use Wildebeest and answer your questions.

PS: The code snippets in this blog were simplified to benefit readability and space (the TypeScript types and error handling code were removed, for example). Please refer to the GitHub repo links for the complete versions.

UPDATE Supercloud SET status = 'open alpha' WHERE product = 'D1';

Nevi Shah — Wed, 16 Nov 2022 14:01:00 GMT

In May 2022, we announced our quest to simplify databases – building them, maintaining them, integrating them. Our goal is to empower you with the tools to run a database that is powerful, scalable, with world-beating performance without any hassle. And we first set our sights on reimagining the database development experience for every type of user – not just database experts.

Over the past couple of months, we’ve been working to create just that, while learning some very important lessons along the way. As it turns out, building a global relational database product on top of Workers pushes the boundaries of the developer platform to their absolute limit, and often beyond them, but in a way that’s absolutely thrilling to us at Cloudflare. It means that while our progress might seem slow from outside, every improvement, bug fix or stress test helps lay down a path for all of our customers to build the world’s most ambitious serverless application.

However, as we continue down the road to making D1 production ready, it wouldn’t be “the Cloudflare way” unless we stopped for feedback first – even though it’s not quite finished yet. In the spirit of Developer Week, there is no better time to introduce the D1 open alpha!

An “open alpha” is a new concept for us. You'll likely hear the term “open beta” on various announcements at Cloudflare, and while it makes sense for many products here, it wasn’t quite right for D1. There are still some crucial pieces that are still in active development and testing, so before we release the fully-formed D1 as a public beta for you to start building real-world apps with, we want to make sure everybody can start to get a feel for the product on their hobby apps or side-projects.

What’s included in the alpha?

While a lot is still changing behind the scenes with D1, we’ve put a lot of thought into how you, as a developer, interact with it – even if you’re new to databases.

Using the D1 dashboard

In a few clicks you can get your D1 database up and running right from within your dashboard. In our D1 interface, you can create, maintain and view your database as you please. Changes made in the UI are instantly available to your Worker - no redeploy required!

Use Wrangler

If you’re looking to get your hands a little dirty, you can also work with your database using our Wrangler CLI. Create your database and begin adding your data manually or bootstrap your database with one of two ways:

1. Execute an SQL file

$ wrangler d1 execute my-database-name --file ./customers.sql

where your .sql file looks something like this:

customers.sql

DROP TABLE IF EXISTS Customers;
CREATE TABLE Customers (CustomerID INT, CompanyName TEXT, ContactName TEXT, PRIMARY KEY (`CustomerID`));
INSERT INTO Customers (CustomerID, CompanyName, ContactName) 
VALUES (1, 'Alfreds Futterkiste', 'Maria Anders'),(4, 'Around the Horn', 'Thomas Hardy'),(11, 'Bs Beverages', 'Victoria Ashworth'),(13, 'Bs Beverages', 'Random Name');

2. Create and run migrations

Migrations are a way to version your database changes. With D1, you can create a migration and then apply it to your database.

To create the migration, execute:

wrangler d1 migrations create

This will create an SQL file in a migrations folder where you can then go ahead and add your queries. Then apply the migrations to your database by executing:

wrangler d1 migrations apply

Access D1 from within your Worker

You can attach your D1 to a Worker by adding the D1 binding to your wrangler.toml configuration file. Then interact with D1 by executing queries inside your Worker like so:

export default {
 async fetch(request, env) {
   const { pathname } = new URL(request.url);

   if (pathname === "/api/beverages") {
     const { results } = await env.DB.prepare(
       "SELECT * FROM Customers WHERE CompanyName = ?"
     )
       .bind("Bs Beverages")
       .all();
     return Response.json(results);
   }

   return new Response("Call /api/beverages to see Bs Beverages customers");
 },
};

Or access D1 from within your Pages Function

In this Alpha launch, D1 also supports integration with Cloudflare Pages! You can add a D1 binding inside the Pages dashboard, and write your queries inside a Pages Function to build a full-stack application! Check out the full documentation to get started with Pages and D1.

Community built tooling

During our private alpha period, the excitement behind D1 led to some valuable contributions to the D1 ecosystem and developer experience by members of the community. Here are some of our favorite projects to date:

d1-orm

An Object Relational Mapping (ORM) is a way for you to query and manipulate data by using JavaScript. Created by a Cloudflare Discord Community Champion, the d1-orm seeks to provide a strictly typed experience while using D1:

const users = new Model(
    // table name, primary keys, indexes etc
    tableDefinition,
    // column types, default values, nullable etc
    columnDefinitions
)

// TS helper for typed queries
type User = Infer;

// ORM-style query builder
const user = await users.First({
    where: {
        id: 1,
    },
});

You can check out the full documentation, and provide feedback by making an issue on the GitHub repository.

workers-qb

This is a zero-dependency query builder that provides a simple standardized interface while keeping the benefits and speed of using raw queries over a traditional ORM. While not intended to provide ORM-like functionality, workers-qb makes it easier to interact with the database from code for direct SQL access:

const qb = new D1QB(env.DB)

const fetched = await qb.fetchOne({
  tableName: 'employees',
  fields: 'count(*) as count',
  where: {
    conditions: 'department = ?1',
    params: ['HQ'],
  },
})

You can read more about the query builder here.

d1-console

Instead of running the wrangler d1 execute command in your terminal every time you want to interact with your database, you can interact with D1 from within the d1-console. Created by a Discord Community Champion, this gives the benefit of executing multi-line queries, obtaining command history, and viewing a cleanly formatted table output.

While this is a community project today, we plan to natively support a “D1 Console” in the future. For now, get started by checking out the d1-console package here.

D1 adapter for Kysely

Kysely is a type-safe and autocompletion-friendly typescript SQL query builder. With this adapter you can interact with D1 with the familiar Kysely interface:

// Create Kysely instance with kysely-d1
const db = new Kysely({ 
  dialect: new D1Dialect({ database: env.DB })
});
    
// Read row from D1 table
const result = await db
  .selectFrom('kv')
  .selectAll()
  .where('key', '=', key)
  .executeTakeFirst();

Check out the project here.

What’s still in testing?

The biggest pieces that have been disabled for this alpha release are replication and JavaScript transaction support. While we’ll be rolling out these changes gradually, we want to call out some limitations that exist today that we’re actively working on testing:

Database location: Each D1 database only runs a single instance. It’s created close to where you, as the developer, create the database, and does not currently move regions based on access patterns. Workers running elsewhere in the world will see higher latency as a result.
Concurrency limitations: Under high load, read and write queries may be queued rather than triggering new replicas to be created. As a result, the performance & throughput characteristics of the open alpha won’t be representative of the final product.
Availability limitations: Backups will block access to the DB while they’re running. In most cases this should only be a second or two, and any requests that arrive during the backup will be queued.

You can also check out a more detailed, up-to-date list on D1 alpha Limitations.

Request for feedback

While we can make all sorts of guesses and bets on the kind of databases you want to use D1 for, we are not the users – you are! We want developers from all backgrounds to preview the D1 tech at its early stages, and let us know where we need to improve to make it suitable for your production apps.

For general feedback about your experience and to interact with other folks in the alpha, join our #d1-open-alpha channel in the Cloudflare Developers Discord. We plan to make any important announcements and changes in this channel as well as on our monthly community calls.

To file more specific feature requests (no matter how wacky) and report any bugs, create a thread in the Cloudflare Community forum under the D1 category. We will be maintaining this forum as a way to plan for the months ahead!

Get started

Want to get started right away? Check out our D1 documentation to get started today. Build our classic Northwind Traders demo to explore the D1 experience and deploy your first D1 database!

Cloudflare's Handling of an RCE Vulnerability in cdnjs

Jonathan Ganz — Sat, 24 Jul 2021 12:57:57 GMT

cdnjs provides JavaScript, CSS, images, and fonts assets for websites to reference with more than 4,000 libraries available. By utilizing cdnjs, websites can load faster with less strain on one’s own origin server as files are served directly from Cloudflare’s edge. Recently, a blog post detailed a vulnerability in the way cdnjs’ backend automatically keeps the libraries up to date.

This vulnerability allowed the researcher to execute arbitrary code, granting the ability to modify assets. This blog post details how Cloudflare responded to this report, including the steps we took to block exploitation, investigate potential abuse, and remediate the vulnerability.

This vulnerability is not related to Cloudflare CDN. The cdnjs project is a platform that leverages Cloudflare’s services, but the vulnerability described below relates to cdnjs’ platform only. To be clear, no existing libraries were modified using this exploit. The researcher published a new package which demonstrated the vulnerability and our investigation concluded that the integrity of all assets hosted on cdnjs remained intact.

Disclosure Timeline

As outlined in RyotaK’s blog post, the incident began on 2021-04-06. At around 1100 GMT, RyotaK published a package to npm exploiting the vulnerability. At 1129 GMT, cdnjs processed this package, resulting in a leak of credentials. This triggered GitHub alerting which notified Cloudflare of the exposed secrets.

Cloudflare disabled the auto-update service and revoked all credentials within an hour. In the meantime, our security team received RyotaK’s remote code execution report through HackerOne. A new version of the auto-update tool which prevents exploitation of the vulnerability RyotaK reported was released within 24 hours.

Having taken action immediately to prevent exploitation, we then proceeded to redesign the auto-update pipeline. Work to completely redesign it was completed on 2021-06-03.

Blocking Exploitation

Before RyotaK reported the vulnerability via HackerOne, Cloudflare had already taken action. When GitHub notified us that credentials were leaked, one of our engineers took immediate action and revoked them all. Additionally, the GitHub token associated with this service was automatically revoked by GitHub.

The second step was to bring the vulnerable service offline to prevent further abuse while we investigated the incident. This prevented exploitation but also made it impossible for legitimate developers to publish updates to their libraries. We wanted to release a fixed version of the pipeline used for retrieving and hosting new library versions so that developers could continue to benefit from caching. However, we understood that a stopgap was not a long term fix, and we decided to review the entire current solution to identify a better design that would improve the overall security of cdnjs.

Investigation

Any sort of investigation requires access to logs and all components of our pipeline generate extensive logs that prove valuable for forensics efforts. Logs produced by the auto-update process are collected in a GitHub repository and sent to our logging pipeline. We also collect and retain logs from cdnjs’ Cloudflare account. Our security team began reviewing this information as soon as we received RyotaK’s initial report. Based on access logs, API token usage, and file modification metadata, we are confident that only RyotaK exploited this vulnerability during his research and only on test files. To rule out abuse, we reviewed the list of source IP addresses that accessed the Workers KV token prior to revoking it and only found one, which belongs to the cdnjs auto-update bot.

The cdnjs team also reviewed files that were pushed to the cdnjs/cdnjs GitHub repository around that time and found no evidence of any other abuse across cdnjs.

Remediating the Vulnerability

Around half of the libraries on cdnjs use npm to auto-update. The primary vector in this attack was the ability to craft a .tar.gz archive with a symbolic link and publish it to the npm registry. When our pipeline extracted the content it would follow symlinks and overwrite local files using the pipeline user privileges. There are two fundamental issues at play here: an attacker can perform path traversal on the host processing untrusted files, and the process handling the compressed file is overly privileged.

We addressed the path traversal issue by checking that the destination of each file in the tarball will be contained within the target directory that the update process has designated for that package. If the file’s full canonical path doesn’t begin with the destination directory’s full path, we log this as a warning and skip extracting that file. This works fairly well, but as noted in the comment above this check, if the compressed file uses UTF-8 encoding for filenames, this check may not properly canonicalize the path. If this canonicalization does not occur, the path may contain path traversal, even though it starts with the correct destination path.

To ensure that other vulnerabilities in cdnjs’ publication pipeline cannot be exploited, we configured an AppArmor profile for it. This limits the capabilities of the service, so even if an attacker successfully instructed the process to perform an action, the operating system (kernel / security feature) will not allow any action outside of what it is allowed to do.

For illustration, here’s an example:

/path/to/bin {
  network,
  signal,
  /path/to/child ix,
  /tmp/ r,
  /tmp/cache** rw,
  ...
}

In this example, we only allow the binary (/path/to/bin) to:

access all networking
use all signals
execute /path/to/child (which will inherit the AppArmor profile)
read from /tmp
read+write under /tmp/cache.

Any attempt to access anything else will be denied. You can find the complete list of capabilities and more information on AppArmor’s manual page.

In the case of cdnjs’ autoupdate tool, we limit execution of applications to a very specific set, and we limit where files can be written.

Fixing the path traversal and implementing the AppArmor profile prevents similar issues from being exploited. However, having a single layer of defense wasn’t enough. We decided to completely redesign the auto-update process entirely to isolate each step, as well as each library it processes, thus preventing this entire class of attacks.

Redesigning the system

The main idea behind the redesign of the pipeline was to move away from the monolithic auto-update process. Instead, various operations are done using microservices or daemons which have well-defined scopes. Here’s an overview of the steps:

First, to detect new library versions, two daemons (for both npm and git based updates) are regularly running. Once a new version has been detected, the files will be downloaded as an archive and placed into the incoming storage bucket.

Writing a new version in the incoming bucket triggers a function that adds all the information we need to update the library. The function also generates a signed URL allowing for writing in the outgoing bucket, but only in a specific folder for a given library, reducing the blast radius. Finally, a message is placed into a queue to indicate that the new version of the given library is ready to be published.

A daemon listens for incoming messages and spawns an unprivileged Docker container to handle dangerous operations (archive extraction, minifications, and compression). After the sandbox exits, the daemon will use the signed URL to store the processed files in the outgoing storage bucket.

Finally, multiple daemons are triggered when the finalized package is written to the outgoing bucket. These daemons publish the assets to cdnjs.cloudflare.com and to the main cdnjs repository. The daemons also publish the version specific URL, cryptographic hash, and other information to Workers KV, cdnjs.com, and the API.

In this revised design, exploiting a similar vulnerability would happen in the sandbox (Docker container) context. The attacker would have access to container files, but nothing else. The container is minimal, ephemeral, has no secrets included, and is dedicated to a single library update, so it cannot affect other libraries’ files.

Our Commitment to Security

Beyond maintaining a vulnerability disclosure program, we regularly perform internal security reviews and hire third-party firms to audit the software we develop. But it is through our vulnerability disclosure program that we receive some of the most interesting and creative reports. Each report has helped us improve the security of our services. In this case, we worked with RyotaK to not only address the vulnerability, but to also ensure that their blog post was detailed and accurate. We invite those that find a security issue in any of Cloudflare’s services to report it to us through HackerOne.

Random Employee Chats at Cloudflare

Sven Sauleau — Sat, 20 Mar 2021 12:00:00 GMT

Due to the COVID-19 pandemic, most Cloudflare offices closed in March 2020, and employees began working from home. Having online meetings presented its own challenges, but preserving the benefits of casual encounters in physical offices was something we struggled with. Those informal interactions, like teams talking next to the coffee machine, help form the social glue that holds companies together.

In an attempt to recreate that experience, David Wragg, an engineer at Cloudflare, introduced “Random Engineer Chats” (We’re calling them “Random Employee Chats” here since this can be applied to any team). The idea is that participants are randomly paired, and the pairs then schedule a 30-minute video call. There’s no fixed agenda for these conversations, but the participants might learn what is going on in other teams, gain new perspectives on their own work by discussing it, or meet new people.

The first iteration of Random Employee Chats used a shared spreadsheet to coordinate the process. People would sign up by adding themselves to the spreadsheet, and once a week, David would randomly form pairs from the list and send out emails with the results. Then, each pair would schedule a call at their convenience. This process was the minimum viable implementation of the idea, but it meant that the process relied on a single person.

Moving to Cloudflare Workers

We wanted to automate these repetitive manual tasks, and naturally, we wanted to use Cloudflare Workers to do it. This is a great example of a complete application that runs entirely in Cloudflare Workers on the edge with no backend or origin server.

The technical requirements included:

A user interface so people can sign up
Storage to keep track of the participants
A program that automatically pairs participants and notifies each pair
A program that reminds people to register for the next sessions

Workers met all of these requirements, and the resulting application runs in Cloudflare's edge network without any need to run code or store data on other platforms. The Workers script supplies the UI that returns static HTML and JavaScript assets, and for storage, Workers KV keeps track of people who signed in.

We also recently announced Workers Cron Triggers which allow us to run a Cloudflare Workers script on a defined schedule. The Workers Cron Triggers are perfect for pairing people up before the sessions and reminding users to register for the next session.

The User Interface

The interface is very simple. It shows the list of participants and allows users to register for the next session.

When a user clicks on the register button, it calls an API that adds a key in Workers KV:

key: register:ID
value: {"name":"Sven Sauleau","picture":"picture.jpg","email":"example@cloudflare.com"}

User information is stored in Workers KV and displayed in the interface to create the list of participants. The user information gets deleted during pairing so the list is ready for the next round of chats. We require weekly sign-ups from participants who want to participate in the chats to confirm their availability.

The code for the interface can be found here and the API is here.

Forming the pairs

A Random Employee Chat is a one-on-one conversation, so at a set time, the application puts participants into pairs. Each Monday morning at 0800 UTC, a Workers cron job runs the pairing script which is deployed using Wrangler.

Wrangler supports configuring the schedule for a job using the familiar cron notation. For instance, our wrangler.toml has:

name = "randengchat-cron-pair"
type = "webpack"
account_id = "..."
webpack_config = "webpack.config.js"
…

kv_namespaces = [...]

[triggers]
crons = ["0 8 * * 2"]

The pairing script is the most intricate part of the application, so let’s run through its code. First, we list the users that are currently registered. This is done using the list function in Workers KV extracting keys with the prefix register:.

const list = await KV_NAMESPACE.list({ prefix: "register:" });

If we don’t have an even number of participants, we remove one person from the list (David!).

Then, we create all possible pairs and attach a weight to them.

async function createWeightedPairs() {
  const pairs = [];
  for (let i = 0; i < keys.length - 1; i++) {
    for (let j = i + 1; j < keys.length; j++) {
      const weight = (await countTimesPaired(...)) * -1;
      pairs.push([i, j, weight]);
    }
  }
  return pairs;
}

For example, suppose four people have registered (Tom, Edie, Ivie and Ada), that’s 6 possible pairs (4 choose 2). We might end up with the following pairs and their associated weights:

(Tom, Edie, 1)
(Tom, Ivie, 0)
(Tom, Ada, 1)
(Edie, Ivie, 2)
(Edie, Ada, 0)
(Ivie, Ada, 2)

The weight is calculated using the number of times a pair matched in the past to avoid scheduling chats between people that already met. More sophisticated factors could be taken into account, such as the same office or timezone, when they last met, and etc.

We keep track of how many times the pair matched using a count kept in KV:

async function countTimesPaired(key) {
  const v = await DB.get(key, "json");
  if (v !== null && v.count) {
    return v.count;
  }
  return 0;
}

The people form a complete graph with people as nodes and the edges weighted by the number of times the two people connected by the edge have met.

Next, we run a weighted matching algorithm, in our case the Blossom algorithm, which will find a maximum matching on the graph (a set of edges that maximize the number of pairs of people connected with each person appearing exactly once). As we use the weighted form of the Blossom algorithm we also minimize the path weights. This has the effect of finding the optimal set of pairs minimizing the number of times people have met previously.

In the case above the algorithm suggests the optimal pairs are (Tom, Ivie) and (Edie, Ada). In this case, those pairs have never met before.

The pairs are recorded in Workers KV with their updated matching count to refine the weights at future sessions:

key: paired:ID
value: {"emails":["left@cloudflare.com","right@cloudflare.com", "count": 1]}

A notification is sent to each pair of users to notify them that they matched.

Once the pairing is done, all register: keys are deleted from KV.

All the pairing logic is here.

Reminders

The application sends users a reminder to sign up every week. For the reminder, we use another Workers cron job that runs every Thursday at 1300 UTC. The schedule in Wrangler is

[triggers]
crons = ["0 13 * * 5"]

This script is much simpler than the pairing script. It simply sends a message to a room in our company messaging platform that notifies all members of the channel.

All the reminder code is here.

We hope you find this code useful and that it inspires you to use Workers, Workers KV, Workers Unbound and Workers Cron Triggers to write large, real applications that run entirely without a backend server.

Building Automatic Platform Optimization for WordPress using Cloudflare Workers

Yevgen Safronov — Fri, 02 Oct 2020 13:00:00 GMT

This post explains how we implemented the Automatic Platform Optimization for WordPress. In doing so, we have defined a new place to run WordPress plugins, at the edge written with Cloudflare Workers. We provide the feature as a Cloudflare service but what’s exciting is that anyone could build this using the Workers platform.

The service is an evolution of the ideas explained in an earlier zero-config edge caching of HTML blog post. The post will explain how Automatic Platform Optimization combines the best qualities of the regular Cloudflare cache with Workers KV to improve cache cold starts globally.

The optimization will work both with and without the Cloudflare for WordPress plugin integration. Not only have we provided a zero config edge HTML caching solution but by using the Workers platform we were also able to improve the performance of Google font loading for all pages.

We are launching the feature first for WordPress specifically but the concept can be applied to any website and/or content management system (CMS).

A new place to run WordPress plugins?

There are many individual WordPress plugins for performance that use similar optimizations to existing Cloudflare services. Automatic Platform Optimization is bringing them all together into one easy to use solution, deployed at the edge.

Traditionally you have to maintain server plugins with your WordPress installation. This comes with maintenance costs and can require a deep understanding of how to fine tune performance and security for each and every plugin. Providing the optimizations on the client side can also lead to performance problems due to the costs of JavaScript execution. In contrast most of the optimizations could be built-in in Cloudflare’s edge rather than running on the server or the client. Automatic Platform Optimization will be always up to date with the latest performance and security best practices.

How to optimize for WordPress

By default Cloudflare CDN caches assets based on file extension and doesn’t cache HTML content. It is possible to configure HTML caching with a Cache Everything Page rule but it is a manual process and often requires additional features only available on the Business and Enterprise plans. So for the majority of the WordPress websites even with a CDN in front them, HTML content is not cached. Requests for a HTML document have to go all the way to the origin.

Even if a CDN optimizes the connection between the closest edge and the website’s origin, the origin could be located far away and also be slow to respond, especially under load.

Move content closer to the user

One of the primary recommendations for speeding up websites is to move content closer to the end-user. This reduces the amount of time it takes for packets to travel between the end-user and the web server - the round-trip time (RTT). This improves the speed of establishing a connection as well as serving content from a closer location.

We have previously blogged about the benefits of edge caching HTML. Caching and serving from HTML from the Cloudflare edge will greatly improve the time to first byte (TTFB) by optimizing DNS, connection setup, SSL negotiation, and removing the origin server response time.If your origin is slow in generating HTML and/or your user is far from the origin server then all your performance metrics will be affected.

Most HTML isn’t really dynamic. It needs to be able to change relatively quickly when the site is updated but for a huge portion of the web, the content is static for months or years at a time. There are special cases like when a user is logged-in (as the admin or otherwise) where the content needs to differ but the vast majority of visits are of anonymous users.

Zero config edge caching revisited

The goal is to make updating content to the edge happen automatically. The edge will cache and serve the previous version content until there is new content available. This is usually achieved by triggering a cache purge to remove existing content. In fact using a combination of our WordPress plugin and Cloudflare cache purge API, we already support Automatic Cache Purge on Website Updates. This feature has been in use for many years.

Building automatic HTML edge caching is more nuanced than caching traditional static content like images, styles or scripts. It requires defining rules on what to cache and when to update the content. To help with that task we introduced a custom header to communicate caching rules between Cloudflare edge and origin servers.

The Cloudflare Worker runs from every edge data center, the serverless platform will take care of scaling to our needs. Based on the request type it will return HTML content from Cloudflare Cache using Worker’s Cache API or serve a response directly from the origin. Specifically designed custom header provides information from the origin on how the script should handle the response. For example worker script will never cache responses for authenticated users.

HTML Caching rules

With or without Cloudflare for WordPress plugin, HTML edge caching requires all of the following conditions to be met:

Origin responds with 200 status
Origin responds with "text/html" content type
Request method is GET.
Request path doesn’t contain query strings
Request doesn’t contain any WordPress specific cookies: "wp-*", "wordpress*", "comment_*", "woocommerce_*" unless it’s "wordpress_eli" or "wordpress_test_cookie".
Request doesn’t contain any of the following headers:
- "Cache-Control: no-cache"
- "Cache-Control: private"
- "Pragma:no-cache"
- "Vary: *"

Note that the caching is bypassed if the devtools are open and the “Disable cache” option is active.

Edge caching with plugin

The preferred solution requires a configured Cloudflare for WordPress plugin. We provide the following features set when the plugin is activated:

HTML edge caching with 30 days TTL
30 seconds or faster cache invalidation
Bypass HTML caching for logged in users
Bypass HTML caching based on presence of WordPress specific cookies
Decrease load on origin servers. If a request is fetched from Cloudflare CDN Cache we skip the request to the origin server.

How is this implemented?

When an eyeball requests a page from a website and Cloudflare doesn’t have a copy of the content it will be fetched from the origin. As the response is sent from the origin and goes through Cloudflare’s edge, Cloudflare for WordPress plugin adds a custom header: cf-edge-cache. It allows an origin to configure caching rules applied on responses.

Based on the X-HTML-Edge-Cache proposal the plugin adds a cf-edge-cache header to every origin response. There are 2 possible values:

cf-edge-cache: no-cache

The page contains private information that shouldn’t be cached by the edge. For example, an active session exists on the server.

cf-edge-cache: cache, platform=wordpress

This combination of cache and platform will ensure that the HTML page is cached. In addition, we ran a number of checks against the presence of WordPress specific cookies to make sure we either bypass or allow caching on the Edge.

If the header isn’t present we assume that the Cloudflare for WordPress plugin is not installed or up-to-date. In this case the feature operates without a plugin support.

Edge caching without plugin

Using the Automatic Platform Optimization feature in combination with Cloudflare for WordPress plugin is our recommended solution. It provides the best feature set together with almost instant cache invalidation. Still, we wanted to provide performance improvements without the need for any installation on the origin server.

We provide the following features set when the plugin is not activated:

HTML edge caching with 30 days TTL
Cache invalidation may take up to 30 minutes. A manual cache purge could be triggered to speed up cache invalidation
Bypass HTML caching based on presence of WordPress specific cookies
No decreased load on origin servers. If a request is fetched from Cloudflare CDN Cache we still require an origin response to apply cache invalidation logic.

Without Cloudflare for WordPress plugin we still cache HTML on the edge and serve the content from the cache when possible. The logic of cache revalidation happens after serving the response to the eyeball. Worker’s waitUntil() callback allows the user to run code without affecting the response to the eyeball and is run in background.

We rely on the following headers to detect whether the content is stale and requires cache update:

ETag. If the cached version and origin response both include ETag and they are different we replace cached version with origin response. The behavior is the same for strong and weak ETag values.
Last-Modified. If the cached version and origin response both include Last-Modified and origin has a later Last-Modified date we replace cached version with origin response.
Date. If no ETag or Last-Modified header is available we compare cached version and origin response Date values. If there was more than a 30 minutes difference we replace cached version with origin response.

Getting content everywhere

Cloudflare Cache works great for the frequently requested content. Regular requests to the site make sure the content stays in cache. For a typical personal blog, it will be more common that the content stays in cache only in some parts of our vast edge network. With the Automatic Platform Optimization release we wanted to improve loading time for cache cold start from any location in the world. We explored different approaches and decided to use Workers KV to improve Edge Caching.

In addition to Cloudflare's CDN cache we put the content into Workers KV. It only requires a single request to the page to cache it and within a minute it is made available to be read back from KV from any Cloudflare data center.

Updating content

After an update has been made to the WordPress website the plugin makes a request to Cloudflare’s API which both purges cache and marks content as stale in KV. The next request for the asset will trigger revalidation of the content. If the plugin is not enabled cache revalidation logic is triggered as detailed previously.

We serve the stale copy of the content still present in KV and asynchronously fetch new content from the origin, apply possible optimizations and then cache it (both regular local CDN cache and globally in KV).

To store the content in KV we use a single namespace. It’s keyed with a combination of a zone identifier and the URL. For instance:

1:example.com/blog-post-1.html => "transformed & cached content"

For marking content as stale in KV we write a new key which will be read from the edge. If the key is present we will revalidate the content.

stale:1:example.com/blog-post-1.html => ""

Once the content was revalidated the stale marker key is deleted.

Moving optimizations to the edge

On top of caching HTML at the edge, we can pre-process and transform the HTML to make the loading of websites even faster for the user. Moving the development of this feature to our Cloudflare Workers environment makes it easy to add performance features such as improving Google Font loading. Using Google Fonts can cause significant performance issues as to load a font requires loading the HTML page; then loading a CSS file and finally loading the font. All of these steps are using different domains.

The solution is for the worker to inline the CSS and serve the font directly from the edge minimizing the number of connections required.

If you read through the previous blog post’s implementation it required a lot of manual work to provide streaming HTML processing support and character encodings. As the set of worker APIs have improved over time it is now much simpler to implement. Specifically the addition of a streaming HTML rewriter/parser with CSS-selector based API and the ability to suspend the parsing to asynchronously fetch a resource has reduced the code required to implement this from ~600 lines of source code to under 200.

export function transform(request, res) {
  return new HTMLRewriter()
    .on("link", {
      async element(e) {
        const src = e.getAttribute("href");
        const rel = e.getAttribute("rel");
        const isGoogleFont =
          src.startsWith("https://fonts.googleapis.com")

        if (isGoogleFont && rel === "stylesheet") {
          const media = e.getAttribute("media") || "all";
          const id = e.getAttribute("id") || "";
          try {
            const content = await fetchCSS(src, request);
            e.replace(styleTag({ media, id }, content), {
              html: true
            });
          } catch (e) {
            console.error(e);
          }
        }
      }
    })
    .transform(res);
}

The HTML transformation doesn’t block the response to the user. It’s running as a background task which when complete will update kv and replace the global cached version.

Making edge publishing generic

We are launching the feature for WordPress specifically but the concept can be applied to any website and content management system (CMS).