
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Fri, 10 Apr 2026 00:27:23 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare]]></title>
            <link>https://blog.cloudflare.com/cloudflare-data-platform/</link>
            <pubDate>Thu, 25 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ The Cloudflare Data Platform, launching today, is a fully-managed suite of products for ingesting, transforming, storing, and querying analytical data, built on Apache Iceberg and R2 storage. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>For Developer Week in April 2025, we announced <a href="https://blog.cloudflare.com/r2-data-catalog-public-beta/"><u>the public beta of R2 Data Catalog</u></a>, a fully managed <a href="https://iceberg.apache.org/docs/nightly/"><u>Apache Iceberg</u></a> catalog on top of <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>Cloudflare R2 object storage</u></a>. Today, we are building on that foundation with three launches:</p><ul><li><p><b>Cloudflare Pipelines</b> receives events sent via Workers or HTTP, transforms them with SQL, and ingests them into Iceberg or as files on R2</p></li><li><p><b>R2 Data Catalog</b> manages the Iceberg metadata and now performs ongoing maintenance, including compaction, to improve query performance</p></li><li><p><b>R2 SQL</b> is our in-house distributed SQL engine, designed to perform petabyte-scale queries over your data in R2</p></li></ul><p>Together, these products make up the <b>Cloudflare Data Platform</b>, a complete solution for ingesting, storing, and querying analytical data tables.</p><p>Like all <a href="https://www.cloudflare.com/developer-platform/products/"><u>Cloudflare Developer Platform products</u></a>, they run on our global compute infrastructure. They’re built around open standards and interoperability. That means that you can bring your own Iceberg query engine — whether that's PyIceberg, DuckDB, or Spark — connect with other platforms like Databricks and Snowflake — and pay no egress fees to access your data.</p><p>Analytical data is critical for modern companies. It allows you to understand your user’s behavior, your company’s performance, and alerts you to issues. But traditional data infrastructure is expensive and hard to operate, requiring fixed cloud infrastructure and in-house expertise. We built the Cloudflare Data Platform to be easy enough for anyone to use with affordable, usage-based pricing.</p><p>If you're ready to get started now, follow the <a href="https://developers.cloudflare.com/pipelines/getting-started/"><u>Data Platform tutorial</u></a> for a step-by-step guide through creating a <a href="https://developers.cloudflare.com/pipelines/"><u>Pipeline</u></a> that processes and delivers events to an <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>R2 Data Catalog</u></a> table, which can then be queried with <a href="https://developers.cloudflare.com/r2-sql/"><u>R2 SQL</u></a>. Or read on to learn about how we got here and how all of this works.</p>
    <div>
      <h3>How did we end up building a Data Platform?</h3>
      <a href="#how-did-we-end-up-building-a-data-platform">
        
      </a>
    </div>
    <p>We <a href="https://blog.cloudflare.com/introducing-r2-object-storage/"><u>launched R2 Object Storage in 2021</u></a> with a radical pricing strategy: no <a href="https://www.cloudflare.com/learning/cloud/what-are-data-egress-fees/"><u>egress fees</u></a> — the bandwidth costs that traditional cloud providers charge to get data out — effectively ransoming your data. This was possible because we had already built one of the largest global networks, interconnecting with thousands of ISPs, cloud services, and other enterprises.</p><p>Object storage powers a wide range of use cases, from media to static assets to AI training data. But over time, we've seen an increasing number of companies using open data and table formats to store their analytical data warehouses in R2.</p><p>The technology that enables this is <a href="https://iceberg.apache.org/"><u>Apache Iceberg</u></a>. Iceberg is a <i>table format</i>, which provides database-like capabilities (including updates, ACID transactions, and schema evolution) on top of data files in object storage. In other words, it’s a metadata layer that tells clients which data files make up a particular logical table, what the schemas are, and how to efficiently query them.</p><p>The adoption of Iceberg across the industry meant users were no longer locked-in to one query engine. But egress fees still make it cost-prohibitive to query data across regions and clouds. R2, with <a href="https://www.cloudflare.com/the-net/egress-fees-exit/"><u>zero-cost egress</u></a>, solves that problem — users would no longer be locked-in to their clouds either. They could store their data in a vendor-neutral location and let teams use whatever query engine made sense for their data and query patterns.</p><p>But users still had to manage all of the metadata and other infrastructure themselves. We realized there was an opportunity for us to solve a major pain point and reduce the friction of storing data lakes on R2. This became R2 Data Catalog, our managed Iceberg catalog.</p><p>With the data stored on R2 and metadata managed, that still left a few gaps for users to solve.</p><p>How do you get data into your Iceberg tables? Once it's there, how do you optimize for query performance? And how do you actually get value from your data without needing to self-host a query engine or use another cloud platform?</p><p>In the rest of this post, we'll walk through how the three products that make up the Data Platform solve these challenges.</p>
    <div>
      <h3>Cloudflare Pipelines</h3>
      <a href="#cloudflare-pipelines">
        
      </a>
    </div>
    <p>Analytical data tables are made up of <i>events</i>, things that happened at a particular point in time. They might come from server logs, mobile applications, or IoT devices, and are encoded in data formats like JSON, Avro, or Protobuf. They ideally have a schema — a standardized set of fields — but might just be whatever a particular team thought to throw in there.</p><p>But before you can query your events with Iceberg, they need to be ingested, structured according to a schema, and written into object storage. This is the role of <a href="https://developers.cloudflare.com/pipelines/"><u>Cloudflare Pipelines</u></a>.</p><p>Built on top of <a href="https://www.arroyo.dev"><u>Arroyo</u></a>, a stream processing engine we acquired earlier this year, Pipelines receives events, transforms them with SQL queries, and sinks them to R2 and R2 Data Catalog.</p><p>Pipelines is organized around three central objects:</p><p><a href="https://developers.cloudflare.com/pipelines/streams/"><b><u>Streams</u></b></a> are how you get data into Cloudflare. They're durable, buffered queues that receive events and store them for processing. Streams can accept events in two ways: via an HTTP endpoint or from a <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>Cloudflare Worker binding</u></a>.</p><p><a href="https://developers.cloudflare.com/pipelines/sinks/"><b><u>Sinks</u></b></a> define the destination for your data. We support ingesting into R2 Data Catalog, as well as writing raw files to R2 as JSON or <a href="https://parquet.apache.org/"><u>Apache Parquet</u></a>. Sinks can be configured to frequently write files, prioritizing low-latency ingestion, or to write less frequent, larger files to get better query performance. In either case, ingestion is <i>exactly-once</i>, which means that we will never duplicate or drop events on their way to R2.</p><p><a href="https://developers.cloudflare.com/pipelines/pipelines/"><b><u>Pipelines</u></b></a> connect streams and sinks via <a href="https://developers.cloudflare.com/pipelines/sql-reference/"><u>SQL transformations</u></a>, which can modify events before writing them to storage. This enables you to <i>shift left</i>, pushing validation, schematization, and processing to your ingestion layer to make your queries easy, fast, and correct.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7ExEHrwqgUuYLCm2q6yUkN/bf31d44dd7b97666af37cb2c25f12808/unnamed__33_.png" />
          </figure><p>For example, here's a pipeline that ingests events from a clickstream data source and writes them to Iceberg:</p>
            <pre><code>INSERT into events_table
SELECT
  user_id,
  lower(event) AS event_type,
  to_timestamp_micros(ts_us) AS event_time,
  regexp_match(url, '^https?://([^/]+)')[1]  AS domain,
  url,
  referrer,
  user_agent
FROM events_json
WHERE event = 'page_view'
  AND NOT regexp_like(user_agent, '(?i)bot|spider');</code></pre>
            <p>SQL transformations are very powerful and give you full control over how data is structured and written into the table. For example, you can</p><ul><li><p>Schematize and normalize your data, even using <a href="https://developers.cloudflare.com/pipelines/sql-reference/scalar-functions/json/"><u>JSON functions</u></a> to extract fields from arbitrary JSON</p></li><li><p>Filter out events or split them into separate tables with their own schemas</p></li><li><p>Redact sensitive information before storage with regexes</p></li><li><p>Unroll nested arrays and objects into separate events</p></li></ul><p>Initially, Pipelines supports stateless transformations. In the future, we'll leverage more of <a href="https://www.arroyo.dev/blog/stateful-stream-processing/"><u>Arroyo's stateful processing capabilities</u></a> to support aggregations, incrementally-updated materialized views, and joins.</p><p>Cloudflare Pipelines is available today in open beta. You can create a pipeline using the dashboard, Wrangler, or the REST API. To get started, check out our <a href="https://developers.cloudflare.com/pipelines/getting-started/"><u>developer docs.</u></a></p><p>We aren’t currently billing for Pipelines during the open beta. However, R2 storage and operations incurred by sinks writing data to R2 are billed at <a href="https://developers.cloudflare.com/r2/pricing/"><u>standard rates</u></a>. When we start billing, we anticipate charging based on the amount of data read, the amount of data processed via SQL transformations, and data delivered.</p>
    <div>
      <h3>R2 Data Catalog</h3>
      <a href="#r2-data-catalog">
        
      </a>
    </div>
    <p>We launched the open beta of <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>R2 Data Catalog</u></a> in April and have been amazed by the response. Query engines like DuckDB <a href="https://duckdb.org/docs/stable/guides/network_cloud_storage/cloudflare_r2_import.html"><u>have added native support</u></a>, and we've seen useful integrations like <a href="https://blog.cloudflare.com/marimo-cloudflare-notebooks/"><u>marimo notebooks</u></a>.</p><p>It makes getting started with Iceberg easy. There’s no need to set up a database cluster, connect to object storage, or manage any infrastructure. You can create a catalog with a couple of <a href="https://developers.cloudflare.com/workers/wrangler/"><u>Wrangler</u></a> commands:</p>
            <pre><code>$ npx wrangler bucket create mycatalog 
$ npx wrangler r2 bucket catalog enable mycatalog</code></pre>
            <p>This provisions a data lake that can scale to petabytes of storage, queryable by whatever engine you want to use with zero egress fees.</p><p>But just storing the data isn't enough. Over time, as data is ingested, the number of underlying data files that make up a table will grow, leading to slower and slower query performance.</p><p>This is a particular problem with low-latency ingestion, where the goal is to have events queryable as quickly as possible. Writing data frequently means the files are smaller, and there are more of them. Each file needed for a query has to be listed, downloaded, and read. The overhead of too many small files can dominate the total query time.</p><p>The solution is <i>compaction</i>, a periodic maintenance operation performed automatically by the catalog. Compaction rewrites small files into larger files which reduces metadata overhead and increases query performance. </p><p>Today we are launching compaction support in R2 Data Catalog. Enabling it for your catalog is as easy as:
</p>
            <pre><code>$ npx wrangler r2 bucket catalog compaction enable mycatalog</code></pre>
            <p>We're starting with support for small-file compaction, and will expand to additional compaction strategies in the future. Check out the <a href="https://developers.cloudflare.com/r2/data-catalog/about-compaction/"><u>compaction documentation</u></a> to learn more about how it works and how to enable it.</p><p>At this time, during open beta, we aren’t billing for R2 Data Catalog. Below is our current thinking on future pricing:</p><table><tr><td><p>
</p></td><td><p><b>Pricing*</b></p></td></tr><tr><td><p>R2 storage</p><p>For standard storage class</p></td><td><p>$0.015 per GB-month (no change)</p></td></tr><tr><td><p>R2 Class A operations</p></td><td><p>$4.50 per million operations (no change)</p></td></tr><tr><td><p>R2 Class B operations</p></td><td><p>$0.36 per million operations (no change)</p></td></tr><tr><td><p>Data Catalog operations</p><p>e.g., create table, get table metadata, update table properties</p></td><td><p>$9.00 per million catalog operations</p></td></tr><tr><td><p>Data Catalog compaction data processed</p></td><td><p>$0.005 per GB processed</p><p>$2.00 per million objects processed</p></td></tr><tr><td><p>Data egress</p></td><td><p>$0 (no change, always free)</p></td></tr></table><p><i>*prices subject to change prior to General Availability</i></p><p>We will provide at least 30 days notice before billing starts or if anything changes.</p>
    <div>
      <h3>R2 SQL</h3>
      <a href="#r2-sql">
        
      </a>
    </div>
    <p>Having data in R2 Data Catalog is only the first step; the real goal is getting insights and value from it. Traditionally, that means setting up and managing DuckDB, Spark, Trino, or another query engine, adding a layer of operational overhead between you and those insights. What if instead you could run queries directly on Cloudflare?</p><p>Now you can. We’ve built a query engine specifically designed for R2 Data Catalog and Cloudflare’s edge infrastructure. We call it <a href="https://developers.cloudflare.com/r2-sql/"><u>R2 SQL</u></a>, and it’s available today as an open beta.</p><p>With Wrangler, running a query on an R2 Data Catalog table is as easy as</p>
            <pre><code>$ npx wrangler r2 sql query "{WAREHOUSE}" "\
  SELECT user_id, url FROM events \
  WHERE domain = 'mywebsite.com'"</code></pre>
            <p>Cloudflare's ability to schedule compute anywhere on its global network is the foundation of R2 SQL's design. This lets us process data directly where it lives, instead of requiring you to manage centralized clusters for your analytical workloads.</p><p>R2 SQL is tightly integrated with R2 Data Catalog and R2, which allows the query planner to go beyond simple storage scanning and make deep use of the rich statistics stored in the R2 Data Catalog metadata. This provides a powerful foundation for a new class of query optimizations, such as auxiliary indexes or enabling more complex analytical functions in the future.</p><p>The result is a fully serverless experience for users. You can focus on your SQL without needing a deep understanding of how the engine operates. If you are interested in how R2 SQL works, the team has written <a href="https://blog.cloudflare.com/r2-sql-deep-dive"><u>a deep dive into how R2 SQL’s distributed query engine works at scale.</u></a></p><p>The open beta is an early preview of R2 SQL querying capabilities, and is initially focused around filter queries. Over time, we will be expanding its capabilities to cover more SQL features, like complex aggregations.</p><p>We're excited to see what our users do with R2 SQL. To try it out, see <a href="https://developers.cloudflare.com/r2-sql/"><u>the documentation</u></a> and <a href="https://developers.cloudflare.com/r2-sql/get-started/"><u>tutorials</u></a><b>. </b>During the beta, R2 SQL usage is not currently billed, but R2 storage and operations incurred by queries are billed at standard rates. We plan to charge for the volume of data scanned by queries in the future and will provide notice before billing begins.</p>
    <div>
      <h3>Wrapping up</h3>
      <a href="#wrapping-up">
        
      </a>
    </div>
    <p>Today, you can use the Cloudflare Data Platform to ingest events into R2 Data Catalog and query them via R2 SQL. In the first half of 2026, we’ll be expanding on the capabilities in all of these products, including:</p><ul><li><p>Integration with <a href="https://developers.cloudflare.com/logs/logpush/"><u>Logpush</u></a>, so you can transform, store, and query your logs directly within Cloudflare</p></li><li><p>User-defined functions via Workers, and stateful processing support for streaming transformations</p></li><li><p>Expanding the featureset of R2 SQL to cover aggregations and joins</p></li></ul><p>In the meantime, you can get started with the Cloudflare Data Platform by following <a href="https://developers.cloudflare.com/pipelines/getting-started/"><u>the tutorial</u></a> to create an end-to-end analytical data system, from ingestion with Pipelines, through storage in R2 Data Catalog, and querying with R2 SQL. 

We’re excited to see what you build! Come share your feedback with us on our <a href="http://discord.cloudflare.com/"><u>Developer Discord</u></a>.</p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[undefined]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Data Catalog]]></category>
            <category><![CDATA[Pipelines]]></category>
            <guid isPermaLink="false">1InN6nunuaGKjLU7DcoArr</guid>
            <dc:creator>Micah Wylde</dc:creator>
            <dc:creator>Alex Graham</dc:creator>
            <dc:creator>Jérôme Schneider</dc:creator>
        </item>
        <item>
            <title><![CDATA[Explore your Cloudflare data with Python notebooks, powered by marimo]]></title>
            <link>https://blog.cloudflare.com/marimo-cloudflare-notebooks/</link>
            <pubDate>Wed, 16 Jul 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ We’ve partnered with marimo to bring their best-in-class Python notebook experience to your Cloudflare data. ]]></description>
            <content:encoded><![CDATA[ <p>Many developers, data scientists, and researchers do much of their work in Python notebooks: they’ve been the de facto standard for data science and sharing for well over a decade. Notebooks are popular because they make it easy to code, explore data, prototype ideas, and share results. We use them heavily at Cloudflare, and we’re seeing more and more developers use notebooks to work with data – from analyzing trends in HTTP traffic, querying <a href="https://developers.cloudflare.com/analytics/analytics-engine/"><u>Workers Analytics Engine</u></a> through to querying their own <a href="https://blog.cloudflare.com/r2-data-catalog-public-beta/"><u>Iceberg tables stored in R2</u></a>.</p><p>Traditional notebooks are incredibly powerful — but they were not built with collaboration, reproducibility, or deployment as data apps in mind. As usage grows across teams and workflows, these limitations face the reality of work at scale.</p><p><a href="https://marimo.io/"><b><u>marimo</u></b></a> reimagines the notebook experience with these <a href="https://marimo.io/blog/lessons-learned"><u>challenges in mind</u></a>. It’s an <a href="https://github.com/marimo-team/marimo"><u>open-source</u></a> reactive Python notebook that’s built to be reproducible, easy to track in Git, executable as a standalone script, and deployable. We have partnered with the marimo team to bring this streamlined, production-friendly experience to Cloudflare developers. Spend less time wrestling with tools and more time exploring your data.</p><p>Today, we’re excited to announce three things:</p><ul><li><p><a href="https://notebooks.cloudflare.com/html-wasm/_start"><u>Cloudflare auth built into marimo notebooks</u></a> – Sign in with your Cloudflare account directly from a notebook and use Cloudflare APIs without needing to create API tokens</p></li><li><p><a href="https://github.com/cloudflare/notebook-examples"><u>Open-source notebook examples</u></a> – Explore your Cloudflare data with ready-to-run notebook examples for services like <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a>, <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a>, <a href="https://developers.cloudflare.com/d1/"><u>D1</u></a>, and more</p></li><li><p><a href="https://github.com/cloudflare/containers-demos"><u>Run marimo on Cloudflare Containers</u></a> – Easily deploy marimo notebooks to Cloudflare Containers for scalable, long-running data workflows</p></li></ul><p>Want to start exploring your Cloudflare data with marimo right now? Head over to <a href="http://notebooks.cloudflare.com"><u>notebooks.cloudflare.com</u></a>. Or, keep reading to learn more about marimo, how we’ve made authentication easy from within notebooks, and how you can use marimo to explore and share notebooks and apps on Cloudflare.</p>
    <div>
      <h3>Why marimo?</h3>
      <a href="#why-marimo">
        
      </a>
    </div>
    <p>marimo is an <a href="https://docs.marimo.io/"><u>open-source</u></a> reactive Python notebook designed specifically for working with data, built from the ground up to solve many problems with traditional notebooks.</p><p>The core feature that sets marimo apart from traditional notebooks is its <a href="https://marimo.io/blog/lessons-learned"><u>reactive execution model</u></a>, powered by a statically inferred dataflow graph on cells. Run a cell or interact with a <a href="https://docs.marimo.io/guides/interactivity/"><u>UI element</u></a>, and marimo either runs dependent cells or marks them as stale (your choice). This keeps code and outputs consistent, prevents bugs before they happen, and dramatically increases the speed at which you can experiment with data. </p><p>Thanks to reactive execution, notebooks are also deployable as data applications, making them easy to share. While you can run marimo notebooks locally, on cloud servers, GPUs — anywhere you can traditionally run software — you can also run them entirely in the browser <a href="https://docs.marimo.io/guides/wasm/"><u>with WebAssembly</u></a>, bringing the cost of sharing down to zero.</p><p>Because marimo notebooks are stored as Python, they <a href="https://marimo.io/blog/python-not-json"><u>enjoy all the benefits of software</u></a>: version with Git, execute as a script or pipeline, test with pytest, inline package requirements with uv, and import symbols from your notebook into other Python modules. Though stored as Python, marimo also <a href="https://docs.marimo.io/guides/working_with_data/sql/"><u>supports SQL</u></a> and data sources like DuckDB, Postgres, and Iceberg-based data catalogs (which marimo's <a href="https://docs.marimo.io/guides/generate_with_ai/"><u>AI assistant</u></a> can access, in addition to data in RAM).</p><p>To get an idea of what a marimo notebook is like, check out the embedded example notebook below:</p><div>
   <div>
       
   </div>
</div>
<p></p>
    <div>
      <h3>Exploring your Cloudflare data with marimo</h3>
      <a href="#exploring-your-cloudflare-data-with-marimo">
        
      </a>
    </div>
    <p>Ready to explore your own Cloudflare data in a marimo notebook? The easiest way to begin is to visit <a href="http://notebooks.cloudflare.com"><u>notebooks.cloudflare.com</u></a> and run one of our example notebooks directly in your browser via <a href="https://webassembly.org/"><u>WebAssembly (Wasm)</u></a>. You can also browse the source in our <a href="https://github.com/cloudflare/notebook-examples"><u>notebook examples GitHub repo</u></a>.</p><p>Want to create your own notebook to run locally instead? Here’s a quick example that shows you how to authenticate with your Cloudflare account and list the zones you have access to:</p><ol><li><p>Install <a href="https://docs.astral.sh/uv/"><u>uv</u></a> if you haven’t already by following the <a href="https://docs.astral.sh/uv/getting-started/installation/"><u>installation guide</u></a>.</p></li><li><p>Create a new project directory for your notebook:</p></li></ol>
            <pre><code>mkdir cloudflare-zones-notebook
cd cloudflare-zones-notebook</code></pre>
            <p>3. Initialize a new uv project (this creates a <code>.venv</code> and a <code>pyproject.toml</code>):</p>
            <pre><code>uv init</code></pre>
            <p>4. Add marimo and required dependencies:</p>
            <pre><code>uv add marimo</code></pre>
            <p>5. Create a file called <code>list-zones.py</code> and paste in the following notebook:</p>
            <pre><code>import marimo

__generated_with = "0.14.10"
app = marimo.App(width="full", auto_download=["ipynb", "html"])


@app.cell
def _():
    from moutils.oauth import PKCEFlow
    import requests

    # Start OAuth PKCE flow to authenticate with Cloudflare
    auth = PKCEFlow(provider="cloudflare")

    # Renders login UI in notebook
    auth
    return (auth,)


@app.cell
def _(auth):
    import marimo as mo
    from cloudflare import Cloudflare

    mo.stop(not auth.access_token, mo.md("Please **sign in** using the button above."))
    client = Cloudflare(api_token=auth.access_token)

    zones = client.zones.list()
    [zone.name for zone in zones.result]
    return


if __name__ == "__main__":
    app.run()</code></pre>
            <p>6. Open the notebook editor:</p>
            <pre><code>uv run marimo edit list-zones.py --sandbox</code></pre>
            <p>7. Log in via the OAuth prompt in the notebook. Once authenticated, you’ll see a list of your Cloudflare zones in the final cell.</p><p>That’s it! From here, you can expand the notebook to call <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a> models, query Iceberg tables in <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>R2 Data Catalog</u></a>, or interact with any Cloudflare API.</p>
    <div>
      <h3>How OAuth works in notebooks</h3>
      <a href="#how-oauth-works-in-notebooks">
        
      </a>
    </div>
    <p>Think of OAuth like a secure handshake between your notebook and Cloudflare. Instead of copying and pasting API tokens, you just click “Sign in with Cloudflare” and the notebook handles the rest.</p><p>We built this experience using PKCE (Proof Key for Code Exchange), a secure OAuth 2.0 flow that avoids client secrets and protects against code interception attacks. PKCE works by generating a one-time code that’s exchanged for a token after login, without ever sharing a client secret. <a href="https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow-with-pkce"><u>Learn more about how PKCE works</u></a>.</p><p>The login widget lives in <a href="https://github.com/marimo-team/moutils/blob/main/notebooks/pkceflow_login.py"><u>moutils.oauth</u></a>, a collaboration between Cloudflare and marimo to make OAuth authentication simple and secure in notebooks. To use it, just create a cell like this:</p>
            <pre><code>auth = PKCEFlow(provider="cloudflare")

# Renders login UI in notebook
auth</code></pre>
            <p>When you run the cell, you’ll see a Sign in with Cloudflare button:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2r3Dmuwcm4AZrhV39Gkhyl/c3f98a3780bc29f1c01ea945621fc005/image2.png" />
          </figure><p>Once logged in, you’ll have a read-only access token you can pass when using the Cloudflare API.</p>
    <div>
      <h3>Running marimo on Cloudflare: Workers and Containers</h3>
      <a href="#running-marimo-on-cloudflare-workers-and-containers">
        
      </a>
    </div>
    <p>In addition to running marimo notebooks locally, you can use Cloudflare to share and run them via <a href="https://developers.cloudflare.com/workers/static-assets/"><u>Workers Static Assets</u></a> or <a href="https://developers.cloudflare.com/containers/"><u>Cloudflare Containers</u></a>.</p><p>If you have a local notebook you want to share, you can publish it to Workers. This works because marimo can export notebooks to WebAssembly, allowing them to run entirely in the browser. You can get started with just two commands:</p>
            <pre><code>marimo export html-wasm notebook.py -o output_dir --mode edit --include-cloudflare
npx wrangler deploy
</code></pre>
            <p>If your notebook needs authentication, you can layer in <a href="https://developers.cloudflare.com/cloudflare-one/policies/access/"><u>Cloudflare Access</u></a> for secure, authenticated access.</p><p>For notebooks that require more compute, persistent sessions, or long-running tasks, you can deploy marimo on our <a href="https://blog.cloudflare.com/containers-are-available-in-public-beta-for-simple-global-and-programmable/"><u>new container platform</u></a>. To get started, check out our <a href="https://github.com/cloudflare/containers-demos/tree/main/marimo"><u>marimo container example</u></a> on GitHub.</p>
    <div>
      <h3>What’s next for Cloudflare + marimo</h3>
      <a href="#whats-next-for-cloudflare-marimo">
        
      </a>
    </div>
    <p>This blog post marks just the beginning of Cloudflare's partnership with marimo. While we're excited to see how you use our joint WebAssembly-based notebook platform to explore your Cloudflare data, we also want to help you bring serious compute to bear on your data — to empower you to run large scale analyses and batch jobs straight from marimo notebooks. Stay tuned!</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[R2]]></category>
            <category><![CDATA[Data Catalog]]></category>
            <category><![CDATA[Notebooks]]></category>
            <guid isPermaLink="false">1oYZ3vFOAUy5PhZyKNm286</guid>
            <dc:creator>Carlos Rodrigues</dc:creator>
            <dc:creator>Jorge Pacheco</dc:creator>
            <dc:creator>Keith Adler</dc:creator>
            <dc:creator>Akshay Agrawal (Guest Author)</dc:creator>
            <dc:creator>Myles Scolnick (Guest Author)</dc:creator>
        </item>
        <item>
            <title><![CDATA[R2 Data Catalog: Managed Apache Iceberg tables with zero egress fees]]></title>
            <link>https://blog.cloudflare.com/r2-data-catalog-public-beta/</link>
            <pubDate>Thu, 10 Apr 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ R2 Data Catalog is now in public beta: a managed Apache Iceberg data catalog built directly into your R2 bucket. ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://iceberg.apache.org/"><u>Apache Iceberg</u></a> is quickly becoming the standard table format for querying large analytic datasets in <a href="https://www.cloudflare.com/learning/cloud/what-is-object-storage/">object storage</a>. We’re seeing this trend firsthand as more and more developers and data teams adopt Iceberg on <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>Cloudflare R2</u></a>. But until now, using Iceberg with R2 meant managing additional infrastructure or relying on external data catalogs.</p><p>So we’re fixing this. Today, we’re launching the <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>R2 Data Catalog</u></a> in open beta, a managed Apache Iceberg catalog built directly into your Cloudflare R2 bucket.</p><p>If you’re not already familiar with it, Iceberg is an open table format built for large-scale analytics on datasets stored in object storage. With R2 Data Catalog, you get the database-like capabilities Iceberg is known for – <a href="https://en.wikipedia.org/wiki/ACID"><u>ACID</u></a> transactions, schema evolution, and efficient querying – without the overhead of managing your own external catalog.</p><p>R2 Data Catalog exposes a standard Iceberg REST catalog interface, so you can connect the engines you already use, like <a href="https://py.iceberg.apache.org/"><u>PyIceberg</u></a>, <a href="https://www.snowflake.com/"><u>Snowflake</u></a>, and <a href="https://spark.apache.org/"><u>Spark</u></a>. And, as always with R2, there are no egress fees, meaning that no matter which cloud or region your data is consumed from, you won’t have to worry about growing data transfer costs.</p><p>Ready to query data in R2 right now? Jump into the <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>developer docs</u></a> and enable a data catalog on your R2 bucket in just a few clicks. Or keep reading to learn more about Iceberg, data catalogs, how metadata files work under the hood, and how to create your first Iceberg table.</p>
    <div>
      <h2>What is Apache Iceberg?</h2>
      <a href="#what-is-apache-iceberg">
        
      </a>
    </div>
    <p><a href="https://iceberg.apache.org/"><u>Apache Iceberg</u></a> is an open table format for analyzing large datasets in object storage. It brings database-like features – ACID transactions, time travel, and schema evolution – to files stored in formats like <a href="https://parquet.apache.org/"><u>Parquet</u></a> or <a href="https://orc.apache.org/"><u>ORC</u></a>.</p><p>Historically, data lakes were just collections of raw files in object storage. However, without a unified metadata layer, datasets could easily become corrupted, were difficult to evolve, and queries often required expensive full-table scans.</p><p>Iceberg solves these problems by:</p><ul><li><p>Providing ACID transactions for reliable, concurrent reads and writes.</p></li><li><p>Maintaining optimized metadata, so engines can skip irrelevant files and avoid unnecessary full-table scans.</p></li><li><p>Supporting schema evolution, allowing columns to be added, renamed, or dropped without rewriting existing data.</p></li></ul><p>Iceberg is already <a href="https://iceberg.apache.org/vendors/"><u>widely supported</u></a> by engines like Apache Spark, Trino, Snowflake, DuckDB, and ClickHouse, with a fast-growing community behind it.</p>
    <div>
      <h3>How Iceberg tables are stored</h3>
      <a href="#how-iceberg-tables-are-stored">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/779M4zsH5QnpDwlTORk1fo/38e7732ca0e20645507bdc0c628f671b/1.png" />
          </figure><p>Internally, an Iceberg table is a collection of data files (typically stored in columnar formats like Parquet or ORC) and metadata files (typically stored in JSON or <a href="https://avro.apache.org/"><u>Avro</u></a>) that describe table snapshots, schemas, and partition layouts.</p><p>To understand how query engines interact efficiently with Iceberg tables, it helps to look at an Iceberg metadata file (simplified):</p>
            <pre><code>{
  "format-version": 2,
  "table-uuid": "0195e49b-8f7c-7933-8b43-d2902c72720a",
  "location": "s3://my-bucket/warehouse/0195e49b-79ca/table",
  "current-schema-id": 0,
  "schemas": [
    {
      "schema-id": 0,
      "type": "struct",
      "fields": [
        { "id": 1, "name": "id", "required": false, "type": "long" },
        { "id": 2, "name": "data", "required": false, "type": "string" }
      ]
    }
  ],
  "current-snapshot-id": 3567362634015106507,
  "snapshots": [
    {
      "snapshot-id": 3567362634015106507,
      "sequence-number": 1,
      "timestamp-ms": 1743297158403,
      "manifest-list": "s3://my-bucket/warehouse/0195e49b-79ca/table/metadata/snap-3567362634015106507-0.avro",
      "summary": {},
      "schema-id": 0
    }
  ],
  "partition-specs": [{ "spec-id": 0, "fields": [] }]
}</code></pre>
            <p>A few of the important components are:</p><ul><li><p><code>schemas</code>: Iceberg tracks schema changes over time. Engines use schema information to safely read and write data without needing to rewrite underlying files.</p></li><li><p><code>snapshots</code>: Each snapshot references a specific set of data files that represent the state of the table at a point in time. This enables features like time travel.</p></li><li><p><code>partition-specs</code>: These define how the table is logically partitioned. Query engines leverage this information during planning to skip unnecessary partitions, greatly improving query performance.</p></li></ul><p>By reading Iceberg metadata, query engines can efficiently prune partitions, load only the relevant snapshots, and fetch only the data files it needs, resulting in faster queries.</p>
    <div>
      <h3>Why do you need a data catalog?</h3>
      <a href="#why-do-you-need-a-data-catalog">
        
      </a>
    </div>
    <p>Although the Iceberg data and metadata files themselves live directly in object storage (like <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a>), the list of tables and pointers to the current metadata need to be tracked centrally by a data catalog.</p><p>Think of a data catalog as a library's index system. While books (your data) are physically distributed across shelves (object storage), the index provides a single source of truth about what books exist, their locations, and their latest editions. Without this index, readers (query engines) would waste time searching for books, might access outdated versions, or could accidentally shelve new books in ways that make them unfindable.</p><p>Similarly, data catalogs ensure consistent, coordinated access, allowing multiple query engines to safely read from and write to the same tables without conflicts or data corruption.</p>
    <div>
      <h2>Create your first Iceberg table on R2</h2>
      <a href="#create-your-first-iceberg-table-on-r2">
        
      </a>
    </div>
    <p>Ready to try it out? Here’s a quick example using <a href="https://py.iceberg.apache.org/"><u>PyIceberg</u></a> and Python to get you started. For a detailed step-by-step guide, check out our <a href="https://developers.cloudflare.com/r2/data-catalog/get-started/"><u>developer docs</u></a>.</p><p>1. Enable R2 Data Catalog on your bucket:
</p>
            <pre><code>npx wrangler r2 bucket catalog enable my-bucket</code></pre>
            <p>Or use the Cloudflare dashboard: Navigate to <b>R2 Object Storage</b> &gt; <b>Settings</b> &gt; <b>R2 Data Catalog</b> and click <b>Enable</b>.</p><p>2. Create a <a href="https://developers.cloudflare.com/r2/api/s3/tokens/"><u>Cloudflare API token</u></a> with permissions for both R2 storage and the data catalog.</p><p>3. Install <a href="https://py.iceberg.apache.org/"><u>PyIceberg</u></a> and <a href="https://arrow.apache.org/docs/index.html"><u>PyArrow</u></a>, then open a Python shell or notebook:</p>
            <pre><code>pip install pyiceberg pyarrow</code></pre>
            <p>4. Connect to the catalog and create a table:</p>
            <pre><code>import pyarrow as pa
from pyiceberg.catalog.rest import RestCatalog

# Define catalog connection details (replace variables)
WAREHOUSE = "&lt;WAREHOUSE&gt;"
TOKEN = "&lt;TOKEN&gt;"
CATALOG_URI = "&lt;CATALOG_URI&gt;"

# Connect to R2 Data Catalog
catalog = RestCatalog(
    name="my_catalog",
    warehouse=WAREHOUSE,
    uri=CATALOG_URI,
    token=TOKEN,
)

# Create default namespace
catalog.create_namespace("default")

# Create simple PyArrow table
df = pa.table({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})

# Create an Iceberg table
table = catalog.create_table(
    ("default", "my_table"),
    schema=df.schema,
)</code></pre>
            <p>You can now append more data or run queries, just as you would with any Apache Iceberg table.</p>
    <div>
      <h2>Pricing</h2>
      <a href="#pricing">
        
      </a>
    </div>
    <p>While R2 Data Catalog is in open beta, there will be no additional charges beyond standard R2 storage and operations costs incurred by query engines accessing data. <a href="https://r2-calculator.cloudflare.com/"><u>Storage pricing</u></a> for buckets with R2 Data Catalog enabled remains the same as standard R2 buckets – \$0.015 per GB-month. As always, egress directly from R2 buckets remains \$0.</p><p>In the future, we plan to introduce pricing for catalog operations (e.g., creating tables, retrieving table metadata, etc.) and data compaction.</p><p>Below is our current thinking on future pricing. We’ll communicate more details around timing well before billing begins, so you can confidently plan your workloads.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>Pricing</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>R2 storage</span></span></p>
                        <p><span><span>For standard storage class</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.015 per GB-month (no change)</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>R2 Class A operations</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$4.50 per million operations (no change)</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>R2 Class B operations</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.36 per million operations (no change)</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Data Catalog operations</span></span></p>
                        <p><span><span>e.g., create table, get table metadata, update table properties</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$9.00 per million catalog operations</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Data Catalog compaction data processed</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.05 per GB processed</span></span></p>
                        <p><span><span>$4.00 per million objects processed</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Data egress</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0 (no change, always free)</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>We’re excited to see how you use R2 Data Catalog! If you’ve never worked with Iceberg – or even analytics data – before, we think this is the easiest way to get started.</p><p>Next on our roadmap is tackling compaction and table optimization. Query engines typically perform better when dealing with fewer, but larger data files. We will automatically re-write collections of small data files into larger files to deliver even faster query performance. </p><p>We’re also collaborating with the broad Apache Iceberg community to expand query-engine compatibility with the Iceberg REST Catalog spec.</p><p>We’d love your feedback. Join the <a href="https://discord.cloudflare.com/"><u>Cloudflare Developer Discord</u></a> to ask questions and share your thoughts during the public beta. For more details, examples, and guides, visit our <a href="https://developers.cloudflare.com/r2/data-catalog/get-started/"><u>developer documentation</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[R2]]></category>
            <category><![CDATA[Data Catalog]]></category>
            <category><![CDATA[Storage]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">6JFB9cHUOoMZnVmYIuTLzd</guid>
            <dc:creator>Phillip Jones</dc:creator>
            <dc:creator>Garvit Gupta</dc:creator>
            <dc:creator>Alex Graham</dc:creator>
            <dc:creator>Garrett Gu</dc:creator>
        </item>
    </channel>
</rss>