
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Mon, 06 Apr 2026 10:57:25 GMT</lastBuildDate>
        <item>
            <title><![CDATA[R2 Data Catalog: Managed Apache Iceberg tables with zero egress fees]]></title>
            <link>https://blog.cloudflare.com/r2-data-catalog-public-beta/</link>
            <pubDate>Thu, 10 Apr 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ R2 Data Catalog is now in public beta: a managed Apache Iceberg data catalog built directly into your R2 bucket. ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://iceberg.apache.org/"><u>Apache Iceberg</u></a> is quickly becoming the standard table format for querying large analytic datasets in <a href="https://www.cloudflare.com/learning/cloud/what-is-object-storage/">object storage</a>. We’re seeing this trend firsthand as more and more developers and data teams adopt Iceberg on <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>Cloudflare R2</u></a>. But until now, using Iceberg with R2 meant managing additional infrastructure or relying on external data catalogs.</p><p>So we’re fixing this. Today, we’re launching the <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>R2 Data Catalog</u></a> in open beta, a managed Apache Iceberg catalog built directly into your Cloudflare R2 bucket.</p><p>If you’re not already familiar with it, Iceberg is an open table format built for large-scale analytics on datasets stored in object storage. With R2 Data Catalog, you get the database-like capabilities Iceberg is known for – <a href="https://en.wikipedia.org/wiki/ACID"><u>ACID</u></a> transactions, schema evolution, and efficient querying – without the overhead of managing your own external catalog.</p><p>R2 Data Catalog exposes a standard Iceberg REST catalog interface, so you can connect the engines you already use, like <a href="https://py.iceberg.apache.org/"><u>PyIceberg</u></a>, <a href="https://www.snowflake.com/"><u>Snowflake</u></a>, and <a href="https://spark.apache.org/"><u>Spark</u></a>. And, as always with R2, there are no egress fees, meaning that no matter which cloud or region your data is consumed from, you won’t have to worry about growing data transfer costs.</p><p>Ready to query data in R2 right now? Jump into the <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>developer docs</u></a> and enable a data catalog on your R2 bucket in just a few clicks. Or keep reading to learn more about Iceberg, data catalogs, how metadata files work under the hood, and how to create your first Iceberg table.</p>
    <div>
      <h2>What is Apache Iceberg?</h2>
      <a href="#what-is-apache-iceberg">
        
      </a>
    </div>
    <p><a href="https://iceberg.apache.org/"><u>Apache Iceberg</u></a> is an open table format for analyzing large datasets in object storage. It brings database-like features – ACID transactions, time travel, and schema evolution – to files stored in formats like <a href="https://parquet.apache.org/"><u>Parquet</u></a> or <a href="https://orc.apache.org/"><u>ORC</u></a>.</p><p>Historically, data lakes were just collections of raw files in object storage. However, without a unified metadata layer, datasets could easily become corrupted, were difficult to evolve, and queries often required expensive full-table scans.</p><p>Iceberg solves these problems by:</p><ul><li><p>Providing ACID transactions for reliable, concurrent reads and writes.</p></li><li><p>Maintaining optimized metadata, so engines can skip irrelevant files and avoid unnecessary full-table scans.</p></li><li><p>Supporting schema evolution, allowing columns to be added, renamed, or dropped without rewriting existing data.</p></li></ul><p>Iceberg is already <a href="https://iceberg.apache.org/vendors/"><u>widely supported</u></a> by engines like Apache Spark, Trino, Snowflake, DuckDB, and ClickHouse, with a fast-growing community behind it.</p>
    <div>
      <h3>How Iceberg tables are stored</h3>
      <a href="#how-iceberg-tables-are-stored">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/779M4zsH5QnpDwlTORk1fo/38e7732ca0e20645507bdc0c628f671b/1.png" />
          </figure><p>Internally, an Iceberg table is a collection of data files (typically stored in columnar formats like Parquet or ORC) and metadata files (typically stored in JSON or <a href="https://avro.apache.org/"><u>Avro</u></a>) that describe table snapshots, schemas, and partition layouts.</p><p>To understand how query engines interact efficiently with Iceberg tables, it helps to look at an Iceberg metadata file (simplified):</p>
            <pre><code>{
  "format-version": 2,
  "table-uuid": "0195e49b-8f7c-7933-8b43-d2902c72720a",
  "location": "s3://my-bucket/warehouse/0195e49b-79ca/table",
  "current-schema-id": 0,
  "schemas": [
    {
      "schema-id": 0,
      "type": "struct",
      "fields": [
        { "id": 1, "name": "id", "required": false, "type": "long" },
        { "id": 2, "name": "data", "required": false, "type": "string" }
      ]
    }
  ],
  "current-snapshot-id": 3567362634015106507,
  "snapshots": [
    {
      "snapshot-id": 3567362634015106507,
      "sequence-number": 1,
      "timestamp-ms": 1743297158403,
      "manifest-list": "s3://my-bucket/warehouse/0195e49b-79ca/table/metadata/snap-3567362634015106507-0.avro",
      "summary": {},
      "schema-id": 0
    }
  ],
  "partition-specs": [{ "spec-id": 0, "fields": [] }]
}</code></pre>
            <p>A few of the important components are:</p><ul><li><p><code>schemas</code>: Iceberg tracks schema changes over time. Engines use schema information to safely read and write data without needing to rewrite underlying files.</p></li><li><p><code>snapshots</code>: Each snapshot references a specific set of data files that represent the state of the table at a point in time. This enables features like time travel.</p></li><li><p><code>partition-specs</code>: These define how the table is logically partitioned. Query engines leverage this information during planning to skip unnecessary partitions, greatly improving query performance.</p></li></ul><p>By reading Iceberg metadata, query engines can efficiently prune partitions, load only the relevant snapshots, and fetch only the data files it needs, resulting in faster queries.</p>
    <div>
      <h3>Why do you need a data catalog?</h3>
      <a href="#why-do-you-need-a-data-catalog">
        
      </a>
    </div>
    <p>Although the Iceberg data and metadata files themselves live directly in object storage (like <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a>), the list of tables and pointers to the current metadata need to be tracked centrally by a data catalog.</p><p>Think of a data catalog as a library's index system. While books (your data) are physically distributed across shelves (object storage), the index provides a single source of truth about what books exist, their locations, and their latest editions. Without this index, readers (query engines) would waste time searching for books, might access outdated versions, or could accidentally shelve new books in ways that make them unfindable.</p><p>Similarly, data catalogs ensure consistent, coordinated access, allowing multiple query engines to safely read from and write to the same tables without conflicts or data corruption.</p>
    <div>
      <h2>Create your first Iceberg table on R2</h2>
      <a href="#create-your-first-iceberg-table-on-r2">
        
      </a>
    </div>
    <p>Ready to try it out? Here’s a quick example using <a href="https://py.iceberg.apache.org/"><u>PyIceberg</u></a> and Python to get you started. For a detailed step-by-step guide, check out our <a href="https://developers.cloudflare.com/r2/data-catalog/get-started/"><u>developer docs</u></a>.</p><p>1. Enable R2 Data Catalog on your bucket:
</p>
            <pre><code>npx wrangler r2 bucket catalog enable my-bucket</code></pre>
            <p>Or use the Cloudflare dashboard: Navigate to <b>R2 Object Storage</b> &gt; <b>Settings</b> &gt; <b>R2 Data Catalog</b> and click <b>Enable</b>.</p><p>2. Create a <a href="https://developers.cloudflare.com/r2/api/s3/tokens/"><u>Cloudflare API token</u></a> with permissions for both R2 storage and the data catalog.</p><p>3. Install <a href="https://py.iceberg.apache.org/"><u>PyIceberg</u></a> and <a href="https://arrow.apache.org/docs/index.html"><u>PyArrow</u></a>, then open a Python shell or notebook:</p>
            <pre><code>pip install pyiceberg pyarrow</code></pre>
            <p>4. Connect to the catalog and create a table:</p>
            <pre><code>import pyarrow as pa
from pyiceberg.catalog.rest import RestCatalog

# Define catalog connection details (replace variables)
WAREHOUSE = "&lt;WAREHOUSE&gt;"
TOKEN = "&lt;TOKEN&gt;"
CATALOG_URI = "&lt;CATALOG_URI&gt;"

# Connect to R2 Data Catalog
catalog = RestCatalog(
    name="my_catalog",
    warehouse=WAREHOUSE,
    uri=CATALOG_URI,
    token=TOKEN,
)

# Create default namespace
catalog.create_namespace("default")

# Create simple PyArrow table
df = pa.table({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})

# Create an Iceberg table
table = catalog.create_table(
    ("default", "my_table"),
    schema=df.schema,
)</code></pre>
            <p>You can now append more data or run queries, just as you would with any Apache Iceberg table.</p>
    <div>
      <h2>Pricing</h2>
      <a href="#pricing">
        
      </a>
    </div>
    <p>While R2 Data Catalog is in open beta, there will be no additional charges beyond standard R2 storage and operations costs incurred by query engines accessing data. <a href="https://r2-calculator.cloudflare.com/"><u>Storage pricing</u></a> for buckets with R2 Data Catalog enabled remains the same as standard R2 buckets – \$0.015 per GB-month. As always, egress directly from R2 buckets remains \$0.</p><p>In the future, we plan to introduce pricing for catalog operations (e.g., creating tables, retrieving table metadata, etc.) and data compaction.</p><p>Below is our current thinking on future pricing. We’ll communicate more details around timing well before billing begins, so you can confidently plan your workloads.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>Pricing</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>R2 storage</span></span></p>
                        <p><span><span>For standard storage class</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.015 per GB-month (no change)</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>R2 Class A operations</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$4.50 per million operations (no change)</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>R2 Class B operations</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.36 per million operations (no change)</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Data Catalog operations</span></span></p>
                        <p><span><span>e.g., create table, get table metadata, update table properties</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$9.00 per million catalog operations</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Data Catalog compaction data processed</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.05 per GB processed</span></span></p>
                        <p><span><span>$4.00 per million objects processed</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Data egress</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0 (no change, always free)</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>We’re excited to see how you use R2 Data Catalog! If you’ve never worked with Iceberg – or even analytics data – before, we think this is the easiest way to get started.</p><p>Next on our roadmap is tackling compaction and table optimization. Query engines typically perform better when dealing with fewer, but larger data files. We will automatically re-write collections of small data files into larger files to deliver even faster query performance. </p><p>We’re also collaborating with the broad Apache Iceberg community to expand query-engine compatibility with the Iceberg REST Catalog spec.</p><p>We’d love your feedback. Join the <a href="https://discord.cloudflare.com/"><u>Cloudflare Developer Discord</u></a> to ask questions and share your thoughts during the public beta. For more details, examples, and guides, visit our <a href="https://developers.cloudflare.com/r2/data-catalog/get-started/"><u>developer documentation</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[R2]]></category>
            <category><![CDATA[Data Catalog]]></category>
            <category><![CDATA[Storage]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">6JFB9cHUOoMZnVmYIuTLzd</guid>
            <dc:creator>Phillip Jones</dc:creator>
            <dc:creator>Garvit Gupta</dc:creator>
            <dc:creator>Alex Graham</dc:creator>
            <dc:creator>Garrett Gu</dc:creator>
        </item>
        <item>
            <title><![CDATA[Bringing Python to Workers using Pyodide and WebAssembly]]></title>
            <link>https://blog.cloudflare.com/python-workers/</link>
            <pubDate>Tue, 02 Apr 2024 13:00:45 GMT</pubDate>
            <description><![CDATA[ Introducing Cloudflare Workers in Python, now in open beta! We've revamped our systems to support Python, from the runtime to deployment. Learn about Python Worker's lifecycle, dynamic linking, and memory snapshots in this post ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4BCvEmK49aK7qLQuTkUsI1/0aecc2333aabe7e94ee99cdf9f830ef6/pythonweba.png" />
            
            </figure><p>Starting today, in open beta, you can now <a href="https://developers.cloudflare.com/workers/languages/python/">write Cloudflare Workers in Python</a>.</p><p>This new support for Python is different from how Workers have historically supported languages beyond JavaScript — in this case, we have directly integrated a Python implementation into <a href="https://github.com/cloudflare/workerd">workerd</a>, the open-source Workers runtime. All <a href="https://developers.cloudflare.com/workers/configuration/bindings/">bindings</a>, including bindings to <a href="https://developers.cloudflare.com/vectorize/">Vectorize</a>, <a href="https://developers.cloudflare.com/workers-ai/">Workers AI</a>, <a href="https://developers.cloudflare.com/r2/">R2</a>, <a href="https://developers.cloudflare.com/durable-objects/">Durable Objects</a>, and more are supported on day one. Python Workers can import a subset of popular Python <a href="https://developers.cloudflare.com/workers/languages/python/packages/">packages</a> including <a href="https://fastapi.tiangolo.com/">FastAPI</a>, <a href="https://python.langchain.com/docs/get_started/introduction">Langchain</a>, <a href="https://numpy.org/">Numpy</a> and more. There are no extra build steps or external toolchains.</p><p>To do this, we’ve had to push the bounds of all of our systems, from the runtime itself, to our deployment system, to the contents of the Worker bundle that is published across our <a href="https://www.cloudflare.com/network/">network</a>. You can <a href="https://developers.cloudflare.com/workers/languages/python/">read the docs</a>, and start using it today.</p><p>We want to use this post to pull back the curtain on the internal lifecycle of a Python Worker, share what we’ve learned in the process, and highlight where we’re going next.</p>
    <div>
      <h2>Beyond “Just compile to WebAssembly”</h2>
      <a href="#beyond-just-compile-to-webassembly">
        
      </a>
    </div>
    <p>Cloudflare Workers have supported WebAssembly <a href="/webassembly-on-cloudflare-workers">since 2018</a> — each Worker is a <a href="https://developers.cloudflare.com/workers/reference/how-workers-works/">V8 isolate</a>, powered by the same JavaScript engine as the Chrome web browser. In principle, it’s been <a href="/webassembly-on-cloudflare-workers">possible</a> for years to write Workers in any language — including Python — so long as it first compiles to WebAssembly or to JavaScript.</p><p>In practice, just because something is possible doesn’t mean it’s simple. And just because “hello world” works doesn’t mean you can reliably build an application. Building full applications requires supporting an ecosystem of packages that developers are used to building with. For a platform to truly support a programming language, it’s necessary to go much further than showing how to compile code using external toolchains.</p><p>Python Workers are different from what we’ve done in the past. It’s early, and still in beta, but we think it shows what providing first-class support for programming languages beyond JavaScript can look like on Workers.</p>
    <div>
      <h2>The lifecycle of a Python Worker</h2>
      <a href="#the-lifecycle-of-a-python-worker">
        
      </a>
    </div>
    <p>With Pyodide now <a href="https://github.com/cloudflare/workerd/tree/main/src/pyodide">built into workerd</a>, you can write a Worker like this:</p>
            <pre><code>from js import Response

async def on_fetch(request, env):
    return Response.new("Hello world!")</code></pre>
            <p>...with a wrangler.toml file that points to a .py file:</p>
            <pre><code>name = "hello-world-python-worker"
main = "src/entry.py"
compatibility_date = "2024-03-18"
compatibility_flags = ["python_workers"]</code></pre>
            <p>…and when you run <a href="https://developers.cloudflare.com/workers/wrangler/commands/#dev">npx wrangler@latest dev</a>, the Workers runtime will:</p><ol><li><p>Determine which <a href="https://developers.cloudflare.com/workers/languages/python/packages/">version of Pyodide</a> is required, based on your <a href="https://developers.cloudflare.com/workers/configuration/compatibility-dates/">compatibility date</a></p></li><li><p>Create an isolate for your Worker, and automatically inject Pyodide</p></li><li><p>Serve your Python code using Pyodide</p></li></ol><p>This all happens under the hood — no extra toolchain or precompilation steps needed. The Python execution environment is provided for you, mirroring how Workers written in JavaScript already work.</p>
    <div>
      <h2>A Python interpreter built into the Workers runtime</h2>
      <a href="#a-python-interpreter-built-into-the-workers-runtime">
        
      </a>
    </div>
    <p>Just as JavaScript has <a href="https://en.wikipedia.org/wiki/List_of_ECMAScript_engines">many engines</a>, Python has <a href="https://wiki.python.org/moin/PythonImplementations">many implementations</a> that can execute Python code. <a href="https://github.com/python/cpython">CPython</a> is the reference implementation of Python. If you’ve used Python before, this is almost certainly what you’ve used, and is commonly referred to as just “Python”.</p><p><a href="https://pyodide.org/en/stable/">Pyodide</a> is a port of CPython to WebAssembly. It interprets Python code, without any need to precompile the Python code itself to any other format. It runs in a web browser — check out this <a href="https://pyodide-console.pages.dev/">REPL</a>. It is true to the CPython that Python developers know and expect, providing <a href="https://developers.cloudflare.com/workers/languages/python/stdlib/">most of the Python Standard Library</a>. It provides a foreign function interface (FFI) to JavaScript, allowing you to call JavaScript APIs directly from Python — more on this below. It provides popular open-source <a href="https://developers.cloudflare.com/workers/languages/python/packages/">packages</a>, and can import pure Python packages directly from PyPI.</p><p>Pyodide struck us as the perfect fit for Workers. It is designed to allow the core interpreter and each native Python module to be built as separate WebAssembly modules, dynamically linked at runtime. This allows the code footprint for these modules to be shared among all Workers running on the same machine, rather than requiring each Worker to bring its own copy. This is essential to making WebAssembly work well in the Workers environment, where we often run <a href="https://www.infoq.com/presentations/cloudflare-v8/">thousands of Workers per machine</a> — we need Workers using the same programming language to share their runtime code footprint. Running thousands of Workers on every machine is what makes it possible for us to deploy every application in every location at a <a href="/workers-pricing-scale-to-zero">reasonable price</a>.</p><p>Just like with JavaScript Workers, with Python Workers we provide the runtime for you:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4eCllqbVYDOzao6swhou7t/c4731707e15b9b37b4198f05a88682ee/VMs--Containers--ans-Isolates-comparison.png" />
            
            </figure><p>Pyodide is currently the exception — most languages that target WebAssembly do not yet support dynamic linking, so each application ends up bringing its own copy of its language runtime. We hope to see more languages support dynamic linking in the future, so that we can more effectively bring them to Workers.</p>
    <div>
      <h3>How Pyodide works</h3>
      <a href="#how-pyodide-works">
        
      </a>
    </div>
    <p>Pyodide executes Python code in WebAssembly, which is a sandboxed environment, separated from the host runtime. Unlike running native code, all operations outside of pure computation (such as file reads) must be provided by a runtime environment, then <i>imported</i> by the WebAssembly module.</p><p><a href="https://llvm.org/">LLVM</a> provides three target triples for WebAssembly:</p><ol><li><p><b>wasm32-unknown-unknown</b> – this backend provides no C standard library or system call interface; to support this backend, we would need to manually rewrite every system or library call to make use of imports we would define ourselves in the runtime.</p></li><li><p><b>wasm32-wasi</b> – WASI is a standardized system interface, and defines a standard set of imports that are implemented in WASI runtimes such as <a href="https://github.com/bytecodealliance/wasmtime/">wasmtime</a>.</p></li><li><p><b>wasm32-unknown-emscripten</b> – Like WASI, Emscripten defines the imports that a WebAssembly program needs to execute, but also outputs an accompanying JavaScript library that implements these imported functions.</p></li></ol><p>Pyodide uses Emscripten, and provides three things:</p><ol><li><p>A distribution of the CPython interpreter, compiled using Emscripten</p></li><li><p>A foreign function interface (FFI) between Python and JavaScript</p></li><li><p>A set of third-party Python packages, compiled using Emscripten’s compiler to WebAssembly.</p></li></ol><p>Of these targets, only Emscripten currently supports dynamic linking, which, as we noted above, is essential to providing a shared language runtime for Python that is shared across isolates. Emscripten does this by <a href="https://emscripten.org/docs/compiling/Dynamic-Linking.html">providing implementations of dlopen and dlsym,</a> which use the accompanying JavaScript library to modify the WebAssembly program’s table to link additional WebAssembly-compiled modules at runtime. WASI <a href="https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md#runtime-dynamic-linking">does not yet support</a> the dlopen/dlsym dynamic linking abstractions used by CPython.</p>
    <div>
      <h2>Pyodide and the magic of foreign function interfaces (FFI)</h2>
      <a href="#pyodide-and-the-magic-of-foreign-function-interfaces-ffi">
        
      </a>
    </div>
    <p>You might have noticed that in our Hello World Python Worker, we import Response from the js module:</p>
            <pre><code>from js import Response

async def on_fetch(request, env):
    return Response.new("Hello world!")</code></pre>
            <p>Why is that?</p><p>Most Workers are written in JavaScript, and most of our engineering effort on the Workers runtime goes into improving JavaScript Workers. There is a risk in adding a second language that it might never reach feature parity with the first language and always be a second class citizen. Pyodide’s foreign function interface (FFI) is critical to avoiding this by providing access to all JavaScript functionality from Python. This can be used by the Worker author directly, and it is also used to make packages like <a href="https://developers.cloudflare.com/workers/languages/python/packages/fastapi/">FastAPI</a> and <a href="https://developers.cloudflare.com/workers/languages/python/packages/langchain/">Langchain</a> work out-of-the-box, as we’ll show later in this post.</p><p>An FFI is a system for calling functions in one language that are implemented in another language. In most cases, an FFI is defined by a "higher-level" language in order to call functions implemented in a systems language, often C. Python’s <a href="https://docs.python.org/3/library/ctypes.html#module-ctypes">ctypes module</a> is such a system. These sorts of foreign function interfaces are often difficult to use because of the nature of C APIs.</p><p>Pyodide’s foreign function interface is an interface between Python and JavaScript, which are two high level object-oriented languages with a lot of design similarities. When passed from one language to another, immutable types such as strings and numbers are transparently translated. All mutable objects are wrapped in an appropriate proxy.</p><p>When a JavaScript object is passed into Python, Pyodide determines which JavaScript protocols the object supports and <a href="https://github.com/pyodide/pyodide/blob/main/src/core/jsproxy.c#L3781-L3791">dynamically constructs</a> an appropriate Python class that implements the corresponding Python protocols. For example, if the JavaScript object supports the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols">JavaScript iteration protocol</a> then the proxy will support the <a href="https://docs.python.org/3/library/stdtypes.html#iterator-types">Python iteration protocol</a>. If the JavaScript object is a Promise or other <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise#thenables">thenable</a>, the Python object will be an <a href="https://docs.python.org/3/reference/datamodel.html#awaitable-objects">awaitable</a>.</p>
            <pre><code>from js import JSON

js_array = JSON.parse("[1,2,3]")

for entry in js_array:
   print(entry)</code></pre>
            <p>The lifecycle of a request to a Python Worker makes use of Pyodide’s FFI, wrapping the incoming JavaScript <a href="https://developers.cloudflare.com/workers/runtime-apis/request/">Request</a> object in a <a href="https://pyodide.org/en/stable/usage/api/python-api/ffi.html#pyodide.ffi.JsProxy">JsProxy</a> object that is accessible in your Python code. It then converts the value returned by the Python Worker’s <a href="https://developers.cloudflare.com/workers/runtime-apis/handlers/">handler</a> into a JavaScript <a href="https://developers.cloudflare.com/workers/runtime-apis/response/">Response</a> object that can be delivered back to the client:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/736nXjbNS1gxb4dm8xaptB/9f4dd0232d22e7d7070b1f716813a7e6/Python-Worker-Request-Lifecycle.png" />
            
            </figure>
    <div>
      <h2>Why dynamic linking is essential, and static linking isn’t enough</h2>
      <a href="#why-dynamic-linking-is-essential-and-static-linking-isnt-enough">
        
      </a>
    </div>
    <p>Python comes with <a href="https://cffi.readthedocs.io/en/stable/">a C FFI</a>, and many Python packages use this FFI to import native libraries. These libraries are typically written in C, so they must first be compiled down to WebAssembly in order to work on the Workers runtime. As we noted above, Pyodide is built with Emscripten, which overrides Python’s C FFI — any time a package tries to load a native library, it is instead loaded from a WebAssembly module that is provided by the Workers runtime. Dynamic linking is what makes this possible — it is what lets us override Python’s C FFI, allowing Pyodide to support many <a href="https://developers.cloudflare.com/workers/languages/python/packages/">Python packages</a> that have native library dependencies.</p><p>Dynamic linking is “pay as you go”, while static linking is “pay upfront” — if code is statically linked into your binary, it must be loaded upfront in order for the binary to run, even if this code is never used.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JiF8v11hINyO6CpNnuP6a/52fad68dedc7a4d1c6beba46eb13f964/Python-Workers---Runtime.png" />
            
            </figure><p>Dynamic linking enables the Workers runtime to share the underlying WebAssembly modules of packages across different Workers that are running on the same machine.</p><p>We won’t go too much into detail on <a href="https://emscripten.org/docs/compiling/Dynamic-Linking.html#runtime-dynamic-linking-with-dlopen">how dynamic linking works in Emscripten</a>, but the main takeaway is that the Emscripten runtime fetches WebAssembly modules from a filesystem abstraction provided in JavaScript. For each Worker, we generate a filesystem at runtime, whose structure mimics a Python distribution that has the Worker’s dependencies installed, but whose underlying files are shared between Workers. This makes it possible to share Python and WebAssembly files between multiple Workers that import the same dependency. Today, we’re able to share these files across Workers, but copy them into each new isolate. We think we can go even further, by employing <a href="https://en.wikipedia.org/wiki/Copy-on-write">copy-on-write</a> techniques to share the underlying resource across many Workers.</p>
    <div>
      <h2>Supporting Server and Client libraries</h2>
      <a href="#supporting-server-and-client-libraries">
        
      </a>
    </div>
    <p>Python has a wide variety of popular HTTP client libraries, including <a href="https://www.python-httpx.org/">httpx</a>, <a href="https://pypi.org/project/urllib3/">urllib3</a>, <a href="https://pypi.org/project/requests/">requests</a> and more. Unfortunately, none of them work out of the box in Pyodide. Adding support for these has been one of the longest running user requests for the Pyodide project. The Python HTTP client libraries all work with raw sockets, and the browser security model and CORS do not allow this, so we needed another way to make them work in the Workers runtime.</p>
    <div>
      <h3>Async Client libraries</h3>
      <a href="#async-client-libraries">
        
      </a>
    </div>
    <p>For libraries that can make requests asynchronously, including <a href="https://docs.aiohttp.org/en/stable/index.html">aiohttp</a> and <a href="https://www.python-httpx.org/">httpx</a>, we can use the <a href="https://developers.cloudflare.com/workers/runtime-apis/fetch/">Fetch API</a> to make requests. We do this by patching the library, instructing it to use the Fetch API from JavaScript — taking advantage of Pyodide’s FFI. <a href="https://github.com/cloudflare/pyodide/blob/main/packages/httpx/httpx_patch.py">The httpx patch</a> ends up quite simple —fewer than 100 lines of code. Simplified even further, it looks like this:</p>
            <pre><code>from js import Headers, Request, fetch

def py_request_to_js_request(py_request):
    js_headers = Headers.new(py_request.headers)
    return Request.new(py_request.url, method=py_request.method, headers=js_headers)

def js_response_to_py_response(js_response):
  ... # omitted

async def do_request(py_request):
  js_request = py_request_to_js_request(py_request)
    js_response = await fetch(js_request)
    py_response = js_response_to_py_response(js_response)
    return py_response</code></pre>
            
    <div>
      <h3>Synchronous Client libraries</h3>
      <a href="#synchronous-client-libraries">
        
      </a>
    </div>
    <p>Another challenge in supporting Python HTTP client libraries is that many Python APIs are synchronous. For these libraries, we cannot use the <a href="https://developers.cloudflare.com/workers/runtime-apis/fetch/">fetch API</a> directly because it is asynchronous.</p><p>Thankfully, Joe Marshall recently landed <a href="https://urllib3.readthedocs.io/en/stable/reference/contrib/emscripten.html">a contribution to urllib3</a> that adds Pyodide support in web browsers by:</p><ol><li><p>Checking if blocking with <code>Atomics.wait()</code> is possible</p><ol><li><p>If so, start a fetch worker thread</p></li><li><p>Delegate the fetch operation to the worker thread and serialize the response into a SharedArrayBuffer</p></li><li><p>In the Python thread, use Atomics.wait to block for the response in the SharedArrayBuffer</p></li></ol></li><li><p>If <code>Atomics.wait()</code> doesn’t work, fall back to a synchronous XMLHttpRequest</p></li></ol><p>Despite this, today Cloudflare Workers do not support <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers">worker threads</a> or synchronous <a href="https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest">XMLHttpRequest</a>, so neither of these two approaches will work in Python Workers. We do not support synchronous requests today, but there is a way forward…</p>
    <div>
      <h3>WebAssembly Stack Switching</h3>
      <a href="#webassembly-stack-switching">
        
      </a>
    </div>
    <p>There is an approach which will allow us to support synchronous requests. WebAssembly has <a href="https://github.com/WebAssembly/js-promise-integration">a stage 3 proposal adding support for stack switching</a>, which <a href="https://v8.dev/blog/jspi">v8 has an implementation of</a>. Pyodide contributors have been working on adding support for stack switching to Pyodide since September 2022, and it is almost ready.</p><p>With this support, Pyodide exposes a function called <code>run_sync</code> which can block for completion of an awaitable:</p>
            <pre><code>from pyodide.ffi import run_sync

def sync_fetch(py_request):
   js_request = py_request_to_js_request(py_request)
   js_response  = run_sync(fetch(js_request))
   return js_response_to_py_response(js_response)</code></pre>
            
    <div>
      <h3>FastAPI and Python’s Asynchronous Server Gateway Interface</h3>
      <a href="#fastapi-and-pythons-asynchronous-server-gateway-interface">
        
      </a>
    </div>
    <p><a href="https://fastapi.tiangolo.com/">FastAPI</a> is one of the most popular libraries for defining Python servers. FastAPI applications use a protocol called the <a href="https://asgi.readthedocs.io/en/latest/">Asynchronous Server Gateway Interface</a> (ASGI). This means that FastAPI never reads from or writes to a socket itself. An ASGI application expects to be hooked up to an ASGI server, typically <a href="https://www.uvicorn.org/">uvicorn</a>. The ASGI server handles all of the raw sockets on the application’s behalf.</p><p>Conveniently for us, this means that FastAPI works in Cloudflare Workers without any patches or changes to FastAPI itself. We simply need to replace <a href="https://www.uvicorn.org/">uvicorn</a> with an appropriate ASGI server that can run within a Worker. Our initial implementation lives <a href="https://github.com/cloudflare/workerd/blob/main/src/pyodide/internal/asgi.py">here</a>, in <a href="https://github.com/cloudflare/pyodide">the fork of Pyodide</a> that we maintain. We hope to add a more comprehensive feature set, add test coverage, and then upstream this implementation into Pyodide.</p><p>You can try this yourself by cloning <a href="https://github.com/cloudflare/python-workers-examples">cloudflare/python-workers-examples</a>, and running <code>npx wrangler@latest dev</code> in the directory of the FastAPI example.</p>
    <div>
      <h2>Importing Python Packages</h2>
      <a href="#importing-python-packages">
        
      </a>
    </div>
    <p>Python Workers support <a href="https://developers.cloudflare.com/workers/languages/python/packages/">a subset of Python packages</a>, which are <a href="https://github.com/cloudflare/pyodide/tree/main/packages">provided directly by Pyodide</a>, including <a href="https://numpy.org/">numpy</a>, <a href="https://www.python-httpx.org/">httpx</a>, <a href="https://developers.cloudflare.com/workers/languages/python/packages/fastapi/">FastAPI</a>, <a href="https://developers.cloudflare.com/workers/languages/python/packages/langchain/">Langchain</a>, and more. This ensures compatibility with the Pyodide runtime by pinning package versions to Pyodide versions, and allows Pyodide to patch internal implementations, as we showed above in the case of httpx.</p><p>To import a package, simply add it to your <code>requirements.txt</code> file, without adding a version number. A specific version of the package is provided directly by Pyodide. Today, you can use packages in local development, and in the coming weeks, you will be able to deploy Workers that define dependencies in a <code>requirements.txt</code> file. Later in this post, we’ll show how we’re thinking about managing new versions of Pyodide and packages.</p><p>We maintain our own fork of Pyodide, which allows us to provide patches specific to the Workers runtime, and to quickly expand our support for packages in Python Workers, while also committing to upstreaming our changes back to Pyodide, so that the whole ecosystem of developers can benefit.</p><p>Python packages are often big and memory hungry though, and they can do a lot of work at import time. How can we ensure that you can bring in the packages you need, while mitigating long cold start times?</p>
    <div>
      <h2>Making cold starts faster with memory snapshots</h2>
      <a href="#making-cold-starts-faster-with-memory-snapshots">
        
      </a>
    </div>
    <p>In the example at the start of this post, in local development, we mentioned injecting Pyodide into your Worker. Pyodide itself is 6.4 MB — and Python packages can also be quite large.</p><p>If we simply shoved Pyodide into your Worker and uploaded it to Cloudflare, that’d be quite a large Worker to load into a new isolate — cold starts would be slow. On a fast computer with a good network connection, Pyodide takes about two seconds to initialize in a web browser, one second of network time and one second of cpu time. It wouldn’t be acceptable to initialize it every time you update your code for every isolate your Worker runs in across <a href="https://www.cloudflare.com/network/">Cloudflare’s network</a>.</p><p>Instead, when you run <a href="https://developers.cloudflare.com/workers/wrangler/commands/#deploy">npx wrangler@latest deploy</a>, the following happens:</p><ol><li><p>Wrangler uploads your Python code and your <code>requirements.txt</code> file to the Workers API</p></li><li><p>We send your Python code, and your <code>requirements.txt</code> file to the Workers runtime to be validated</p></li><li><p>We create a new isolate for your Worker, and automatically inject Pyodide plus any <a href="https://developers.cloudflare.com/workers/languages/python/packages/">packages</a> you’ve specified in your <code>requirements.txt</code> file.</p></li><li><p>We scan the Worker’s code for import statements, execute them, and then take a snapshot of the Worker’s WebAssembly linear memory. Effectively, we perform the expensive work of importing packages at deploy time, rather than at runtime.</p></li><li><p>We deploy this snapshot alongside your Worker’s Python code to Cloudflare’s network.</p></li><li><p>Just like a JavaScript Worker, we execute the Worker’s <a href="https://developers.cloudflare.com/workers/platform/limits/#worker-startup-time">top-level scope</a>.</p></li></ol><p>When a request comes in to your Worker, we load this snapshot and use it to bootstrap your Worker in an isolate, avoiding expensive initialization time:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2q0ztvdb60NUqlAsOWWFX4/203e421e31e25c5794f5fada1ad94c40/apipyth.png" />
            
            </figure><p>This takes cold starts for a basic Python Worker down to below 1 second. We’re not yet satisfied with this though. We’re confident that we can drive this down much, much further. How? By reusing memory snapshots.</p>
    <div>
      <h3>Reusing Memory Snapshots</h3>
      <a href="#reusing-memory-snapshots">
        
      </a>
    </div>
    <p>When you upload a Python Worker, we generate a single memory snapshot of the Worker’s top-level imports, including both Pyodide and any dependencies. This snapshot is specific to your Worker. It can’t be shared, even though most of its contents are the same as other Python Workers.</p><p>Instead, we can create a single, shared snapshot ahead of time, and preload it into a pool of “pre-warmed” isolates. These isolates would already have the Pyodide runtime loaded and ready — making a Python Worker work just like a JavaScript Worker. In both cases, the underlying interpreter and execution environment is provided by the Workers runtime, and available on-demand without delay. The only difference is that with Python, the interpreter runs in WebAssembly, within the Worker.</p><p>Snapshots are a common pattern across runtimes and execution environments. Node.js <a href="https://docs.google.com/document/d/1YEIBdH7ocJfm6PWISKw03szNAgnstA2B3e8PZr_-Gp4/edit#heading=h.1v0pvnoifuah">uses V8 snapshots to speed up startup time</a>. You can take <a href="https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md">snapshots of Firecracker microVMs</a> and resume execution in a different process. There’s lots more we can do here — not just for Python Workers, but for Workers written in JavaScript as well, caching snapshots of compiled code from top-level scope and the state of the isolate itself. Workers are so fast and efficient that to-date we haven’t had to take snapshots in this way, but we think there are still big performance gains to be had.</p><p>This is our biggest lever towards driving cold start times down over the rest of 2024.</p>
    <div>
      <h2>Future proofing compatibility with Pyodide versions and Compatibility Dates</h2>
      <a href="#future-proofing-compatibility-with-pyodide-versions-and-compatibility-dates">
        
      </a>
    </div>
    <p>When you deploy a Worker to Cloudflare, you expect it to keep running indefinitely, even if you never update it again. There are Workers deployed in 2018 that are still running just fine in production.</p><p>We achieve this using <a href="https://developers.cloudflare.com/workers/configuration/compatibility-dates/">Compatibility Dates</a> and <a href="https://developers.cloudflare.com/workers/configuration/compatibility-dates/#compatibility-flags">Compatibility Flags</a>, which provide explicit opt-in mechanisms for new behavior and potentially backwards-incompatible changes, without impacting existing Workers.</p><p>This works in part because it mirrors how the Internet and web browsers work. You publish a web page with some JavaScript, and rightly expect it to work forever. Web browsers and Cloudflare Workers have the same type of commitment of stability to developers.</p><p>There is a challenge with Python though — both Pyodide and CPython are <a href="https://devguide.python.org/versions/">versioned</a>. Updated versions are published regularly and can contain breaking changes. And Pyodide provides a set of <a href="https://developers.cloudflare.com/workers/languages/python/packages/">built-in packages</a>, each with a pinned version number. This presents a question — how should we allow you to update your Worker to a newer version of Pyodide?</p><p>The answer is <a href="https://developers.cloudflare.com/workers/configuration/compatibility-dates/">Compatibility Dates</a> and <a href="https://developers.cloudflare.com/workers/configuration/compatibility-dates/#compatibility-flags">Compatibility Flags</a>.</p><p>A new version of Python is released every year in August, and a new version of Pyodide is released six (6) months later. When this new version of Pyodide is published, we will add it to Workers by gating it behind a Compatibility Flag, which is only enabled after a specified Compatibility Date. This lets us continually provide updates, without risk of breaking changes, extending the commitment we’ve made for JavaScript to Python.</p><p>Each Python release has a <a href="https://devguide.python.org/versions/">five (5) year support window</a>. Once this support window has passed for a given version of Python, security patches are no longer applied, making this version unsafe to rely on. To mitigate this risk, while still trying to hold as true as possible to our commitment of stability and long-term support, after five years any Python Worker still on a Python release that is outside of the support window will be automatically moved forward to the next oldest Python release. Python is a mature and stable language, so we expect that in most cases, your Python Worker will continue running without issue. But we recommend updating the compatibility date of your Worker regularly, to stay within the support window.</p><p>In between Python releases, we also expect to update and add additional <a href="https://developers.cloudflare.com/workers/languages/python/packages/%5C">Python packages</a>, using the same opt-in mechanism. A Compatibility Flag will be a combination of the Python version and the release date of a set of packages. For example, <b>python_3.17_packages_2025_03_01</b>.</p>
    <div>
      <h2>How bindings work in Python Workers</h2>
      <a href="#how-bindings-work-in-python-workers">
        
      </a>
    </div>
    <p>We mentioned earlier that Pyodide provides a foreign function interface (FFI) to JavaScript — meaning that you can directly use JavaScript objects, methods, functions and more, directly from Python.</p><p>This means that from day one, all <a href="https://developers.cloudflare.com/workers/configuration/bindings/">binding</a> APIs to other Cloudflare resources are supported in Cloudflare Workers. The env object that is provided by handlers in Python Workers is a JavaScript object that Pyodide provides a proxy API to, handling <a href="https://pyodide.org/en/stable/usage/type-conversions.html">type translations</a> across languages automatically.</p><p>For example, to write to and read from a <a href="https://developers.cloudflare.com/kv/">KV</a> namespace from a Python Worker, you would write:</p>
            <pre><code>from js import Response

async def on_fetch(request, env):
    await env.FOO.put("bar", "baz")
    bar = await env.FOO.get("bar")
    return Response.new(bar) # returns "baz"</code></pre>
            <p>This works for Web APIs too — see how Response is imported from the js module? You can import any global from JavaScript this way.</p>
    <div>
      <h2>Get this JavaScript out of my Python!</h2>
      <a href="#get-this-javascript-out-of-my-python">
        
      </a>
    </div>
    <p>You’re probably reading this post because you want to write Python <i>instead</i> of JavaScript. <code>from js import Response</code> just isn’t Pythonic. We know — and we have actually tackled this challenge before for another language (<a href="/workers-rust-sdk">Rust</a>). And we think we can do this even better for Python.</p><p>We launched <a href="https://github.com/cloudflare/workers-rs">workers-rs</a> in 2021 to make it possible to write Workers in <a href="https://www.rust-lang.org/">Rust</a>. For each JavaScript API in Workers, we, alongside open-source contributors, have written bindings that expose a more idiomatic Rust API.</p><p>We plan to do the same for Python Workers — starting with the bindings to <a href="https://developers.cloudflare.com/workers-ai/">Workers AI</a> and <a href="https://developers.cloudflare.com/vectorize/">Vectorize</a>. But while workers-rs requires that you use and update an external dependency, the APIs we provide with Python Workers will be built into the Workers runtime directly. Just update your compatibility date, and get the latest, most Pythonic APIs.</p><p>This is about more than just making bindings to resources on Cloudflare more Pythonic though — it’s about compatibility with the ecosystem.</p><p>Similar to how we <a href="https://github.com/cloudflare/workers-rs/pull/477">recently converted</a> workers-rs to use types from the <a href="https://crates.io/crates/http">http</a> crate, which makes it easy to use the <a href="https://docs.rs/axum/latest/axum/">axum</a> crate for routing, we aim to do the same for Python Workers. For example, the Python standard library provides a <a href="https://docs.python.org/3/library/socket.html">raw socket API</a>, which many Python packages depend on. Workers already provides <a href="https://developers.cloudflare.com/workers/runtime-apis/tcp-sockets/">connect()</a>, a JavaScript API for working with raw sockets. We see ways to provide at least a subset of the Python standard library’s socket API in Workers, enabling a broader set of Python packages to work on Workers, with less of a need for patches.</p><p>But ultimately, we hope to kick start an effort to create a standardized serverless API for Python. One that is easy to use for any Python developer and offers the same capabilities as JavaScript.</p>
    <div>
      <h2>We’re just getting started with Python Workers</h2>
      <a href="#were-just-getting-started-with-python-workers">
        
      </a>
    </div>
    <p>Providing true support for a new programming language is a big investment that goes far beyond making “hello world” work. We chose Python very intentionally — it’s the <a href="https://survey.stackoverflow.co/2023/#technology-most-popular-technologies">second most popular programming language after JavaScript</a> — and we are committed to continuing to improve performance and widen our support for Python packages.</p><p>We’re grateful to the Pyodide maintainers and the broader Python community — and we’d love to hear from you. Drop into the Python Workers channel in the <a href="https://discord.cloudflare.com/">Cloudflare Developers Discord</a>, or <a href="https://github.com/cloudflare/workerd/discussions/categories/python-packages">start a discussion on Github</a> about what you’d like to see next and which Python packages you’d like us to support.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1nmt4B6AYocqmJCw21v5pL/112dcd395906643cbf8a67de22470e13/Workers-and-Python.png" />
            
            </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[WebAssembly]]></category>
            <category><![CDATA[Python]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[WASM]]></category>
            <category><![CDATA[Developer Week]]></category>
            <guid isPermaLink="false">3Gqu0zcjgdix3M03fXEu8V</guid>
            <dc:creator>Hood Chatham</dc:creator>
            <dc:creator>Garrett Gu</dc:creator>
            <dc:creator>Dominik Picheta</dc:creator>
        </item>
    </channel>
</rss>