Python Workers redux: fast cold starts, packages, and a uv-first workflow

^{Note: This post was updated with additional details regarding AWS Lambda.}

Last year we announced basic support for Python Workers, allowing Python developers to ship Python to region: Earth in a single command and take advantage of the Workers platform.

Since then, we’ve been hard at work making the Python experience on Workers feel great. We’ve focused on bringing package support to the platform, a reality that’s now here — with exceptionally fast cold starts and a Python-native developer experience.

This means a change in how packages are incorporated into a Python Worker. Instead of offering a limited set of built-in packages, we now support any package supported by Pyodide, the WebAssembly runtime powering Python Workers. This includes all pure Python packages, as well as many packages that rely on dynamic libraries. We also built tooling around uv to make package installation easy.

We’ve also implemented dedicated memory snapshots to reduce cold start times. These snapshots result in serious speed improvements over other serverless Python vendors. In cold start tests using common packages, Cloudflare Workers start over 2.4x faster than AWS Lambda without SnapStart and 3x faster than Google Cloud Run.

In this blog post, we’ll explain what makes Python Workers unique and share some of the technical details of how we’ve achieved the wins described above. But first, for those who may not be familiar with Workers or serverless platforms – and especially those coming from a Python background — let us share why you might want to use Workers at all.

Deploying Python globally in 2 minutes

Part of the magic of Workers is simple code and easy global deployments. Let's start by showing how you can deploy a FastAPI app across the world with fast cold starts in less than two minutes.

A simple Worker using FastAPI can be implemented in a handful of lines:

from fastapi import FastAPI
from workers import WorkerEntrypoint
import asgi

app = FastAPI()

@app.get("/")
async def root():
   return {"message": "This is FastAPI on Workers"}

class Default(WorkerEntrypoint):
   async def fetch(self, request):
       return await asgi.fetch(app, request.js_object, self.env)

To deploy something similar, just make sure you have uv and npm installed, then run the following:

$ uv tool install workers-py
$ pywrangler init --template \
    https://github.com/cloudflare/python-workers-examples/03-fastapi
$ pywrangler deploy

With just a little code and a pywrangler deploy, you’ve now deployed your application across Cloudflare’s edge network that extends to 330 locations across 125 countries. No worrying about infrastructure or scaling.

And for many use cases, Python Workers are completely free. Our free tier offers 100,000 requests per day and 10ms CPU time per invocation. For more information, check out the pricing page in our documentation.

For more examples, check out the repo in GitHub. And read on to find out more about Python Workers.

So what can you do with Python Workers?

Now that you’ve got a Worker, just about anything is possible. You write the code, so you get to decide. Your Python Worker receives HTTP requests and can make requests to any server on the public Internet.

You can set up cron triggers, so your Worker runs on a regular schedule. Plus, if you have more complex requirements, you can make use of Workflows for Python Workers, or even long-running WebSocket servers and clients using Durable Objects.

Here are more examples of the sorts of things you can do using Python Workers:

Faster package cold starts

Serverless platforms like Workers save you money by only running your code when it’s necessary to do so. This means that if your Worker isn’t receiving requests, it may be shut down and will need to be restarted once a new request comes in. This typically incurs a resource overhead we refer to as the “cold start.” It’s important to keep these as short as possible to minimize latency for end users.

In standard Python, booting the runtime is expensive, and our initial implementation of Python Workers focused on making the runtime boot fast. However, we quickly realized that this wasn’t enough. Even if the Python runtime boots quickly, in real-world scenarios the initial startup usually includes loading modules from packages, and unfortunately, in Python many popular packages can take several seconds to load.

We set out to make cold starts fast, regardless of whether packages were loaded.

To measure realistic cold start performance, we set up a benchmark that imports common packages, as well as a benchmark running a “hello world” using a bare Python runtime. Standard Lambda is able to start just the runtime quickly, but once you need to import packages, the cold start times shoot up. In order to optimize for faster cold starts with packages, you can use SnapStart on Lambda (which we will be adding to the linked benchmarks shortly). This incurs a cost to store the snapshot and an additional cost on every restore. Python Workers will automatically apply memory snapshots for free for every Python Worker.

Here are the average cold start times when loading three common packages (httpx, fastapi and pydantic):

Platform	Mean Cold Start (secs)
Cloudflare Python Workers	1.027
AWS Lambda (without SnapStart)	2.502
Google Cloud Run	3.069

In this case, Cloudflare Python Workers have 2.4x faster cold starts than AWS Lambda without SnapStart and 3x faster cold starts than Google Cloud Run. We achieved these low cold start numbers by using memory snapshots, and in a later section we explain how we did so.

We are regularly running these benchmarks. Go here for up-to-date data and more info on our testing methodology.

We’re architecturally different from these other platforms — namely, Workers is isolate-based. Because of that, our aims are high, and we are planning for a zero cold start future.

Package tooling integrated with uv

The diverse package ecosystem is a large part of what makes Python so amazing. That’s why we’ve been hard at work ensuring that using packages in Workers is as easy as possible.

We realised that working with the existing Python tooling is the best path towards a great development experience. So we picked the uv package and project manager, as it’s fast, mature, and gaining momentum in the Python ecosystem.

We built our own tooling around uv called pywrangler. This tool essentially performs the following actions:

Reads your Worker’s pyproject.toml file to determine the dependencies specified in it
Includes your dependencies in a python_modules folder that lives in your Worker

Pywrangler calls out to uv to install the dependencies in a way that is compatible with Python Workers, and calls out to wrangler when developing locally or deploying Workers.

Effectively this means that you just need to run pywrangler dev and pywrangler deploy to test your Worker locally and deploy it.

Type hints

You can generate type hints for all of the bindings defined in your wrangler config using pywrangler types. These type hints will work with Pylance or with recent versions of mypy.

To generate the types, we use wrangler types to create typescript type hints, then we use the typescript compiler to generate an abstract syntax tree for the types. Finally, we use the TypeScript hints — such as whether a JS object has an iterator field — to generate mypy type hints that work with the Pyodide foreign function interface.

Decreasing cold start duration using snapshots

Python startup is generally quite slow and importing a Python module can trigger a large amount of work. We avoid running Python startup during a cold start using memory snapshots.

When a Worker is deployed, we execute the Worker’s top-level scope and then take a memory snapshot and store it alongside your Worker. Whenever we are starting a new isolate for the Worker, we restore the memory snapshot and the Worker is ready to handle requests, with no need to execute any Python code in preparation. This improves cold start times considerably. For instance, starting a Worker that imports fastapi, httpx and pydantic without snapshots takes around 10 seconds. With snapshots, it takes 1 second.

The fact that Pyodide is built on WebAssembly enables this. We can easily capture the full linear memory of the runtime and restore it.

Memory snapshots and Entropy

WebAssembly runtimes do not require features like address space layout randomization for security, so most of the difficulties with memory snapshots on a modern operating system do not arise. Just like with native memory snapshots, we still have to carefully handle entropy at startup to avoid using the XKCD random number generator (we’re very into actual randomness).

By snapshotting memory, we might inadvertently lock in a seed value for randomness. In this case, future calls for “random” numbers would consistently return the same sequence of values across many requests.

Avoiding this is particularly challenging because Python uses a lot of entropy at startup. These include the libc functions getentropy() and getrandom() and also reading from /dev/random and /dev/urandom. All of these functions share the same implementation in terms of the JavaScript crypto.getRandomValues() function.

In Cloudflare Workers, crypto.getRandomValues() has always been disabled at startup in order to allow us to switch to using memory snapshots in the future. Unfortunately, the Python interpreter cannot bootstrap without calling this function. And many packages also require entropy at startup time. There are essentially two purposes for this entropy:

Hash seeds for hash randomization
Seeds for pseudorandom number generators

Hash randomization we do at startup time and accept the cost that each specific Worker has a fixed hash seed. Python has no mechanism to allow replacing the hash seed after startup.

For pseudorandom number generators (PRNG), we take the following approach:

At deploy time:

Seed the PRNG with a fixed “poison seed”, then record the PRNG state.
Replace all APIs that call into the PRNG with an overlay that fails the deployment with a user error.
Execute the top level scope of user code.
Capture the snapshot.

At run time:

Assert that the PRNG state is unchanged. If it changed, we forgot the overlay for some method. Fail the deployment with an internal error.
After restoring the snapshot, reseed the random number generator before executing any handlers.

With this, we can ensure that PRNGs can be used while the Worker is running, but stop Workers from using them during initialization and pre-snapshot.

Memory snapshots and WebAssembly state

An additional difficulty arises when creating memory snapshots on WebAssembly: The memory snapshot we are saving consists only of the WebAssembly linear memory, but the full state of the Pyodide WebAssembly instance is not contained in the linear memory.

There are two tables outside of this memory.

One table holds the values of function pointers. Traditional computers use a “Von Neumann” architecture, which means that code exists in the same memory space as data, so that calling a function pointer is a jump to some memory address. WebAssembly has a “Harvard architecture” where code lives in a separate address space. This is key to most of the security guarantees of WebAssembly and in particular why WebAssembly does not need address space layout randomization. A function pointer in WebAssembly is an index into the function pointer table.

A second table holds all JavaScript objects referenced from Python. JavaScript objects cannot be directly stored into memory because the JavaScript virtual machine forbids directly obtaining a pointer to a JavaScript object. Instead, they are stored into a table and represented in WebAssembly as an index into the table.

We need to ensure that both of these tables are in exactly the same state after we restore a snapshot as they were when we captured the snapshot.

The function pointer table is always in the same state when the WebAssembly instance is initialized and is updated by the dynamic loader when we load dynamic libraries — native Python packages like numpy.

To handle dynamic loading:

When taking the snapshot, we patch the loader to record the load order of dynamic libraries, the address in memory where the metadata for each library is allocated, and the function pointer table base address for relocations.
When restoring the snapshot, we reload the dynamic libraries in the same order, and we use a patched memory allocator to place the metadata in the same locations. We assert that the current size of the function pointer table matches the function pointer table base we recorded for the dynamic library.

All of this ensures that each function pointer has the same meaning after we’ve restored the snapshot as it had when we took the snapshot.

To handle the JavaScript references, we implemented a fairly limited system. If a JavaScript object is accessible from globalThis by a series of property accesses, we record those property accesses and replay them when restoring the snapshot. If any reference exists to a JavaScript object that is not accessible in this way, we fail deployment of the Worker. This is good enough to deal with all the existing Python packages with Pyodide support, which do top level imports like:

from js import fetch

Reducing cold start frequency using sharding

Another important characteristic of our performance strategy for Python Workers is sharding. There is a very detailed description of what went into its implementation here. In short, we now route requests to existing Worker instances, whereas before we might have chosen to start a new instance.

Sharding was actually enabled for Python Workers first and proved to be a great test bed for it. A cold start is far more expensive in Python than in JavaScript, so ensuring requests are routed to an already-running isolate is especially important.

Where do we go from here?

This is just the start. We have many plans to make Python Workers better:

More developer-friendly tooling
Even faster cold starts by utilising our isolate architecture
Support for more packages
Support for native TCP sockets, native WebSockets, and more bindings

To learn more about Python Workers, check out the documentation available here. To get help, be sure to join our Discord.

The Cloudflare Blog