
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Tue, 07 Apr 2026 13:01:56 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Partnering to make full-stack fast: deploy PlanetScale databases directly from Workers]]></title>
            <link>https://blog.cloudflare.com/planetscale-postgres-workers/</link>
            <pubDate>Thu, 25 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ We’ve teamed up with PlanetScale to make shipping full-stack applications on Cloudflare Workers even easier.  ]]></description>
            <content:encoded><![CDATA[ <p>We’re not burying the lede on this one: you can now connect <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Cloudflare Workers</u></a> to your PlanetScale databases directly and ship full-stack applications backed by Postgres or MySQL. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3tcLGobPxPIHoDYEiGcY0X/d970a4a6b8a9e6ebc7d06ab57b168007/Frame_1321317798__1_.png" />
          </figure><p>We’ve teamed up with <a href="https://planetscale.com/"><u>PlanetScale</u></a> because we wanted to partner with a database provider that we could confidently recommend to our users: one that shares our obsession with performance, reliability and developer experience. These are all critical factors for any development team building a serious application. </p><p>Now, when connecting to PlanetScale databases, your connections are automatically configured for optimal performance with <a href="https://www.cloudflare.com/developer-platform/products/hyperdrive/"><u>Hyperdrive</u></a>, ensuring that you have the fastest access from your Workers to your databases, regardless of where your Workers are running.</p>
    <div>
      <h3>Building full-stack</h3>
      <a href="#building-full-stack">
        
      </a>
    </div>
    <p>As Workers has matured into a full-stack platform, we’ve introduced more options to facilitate your connectivity to data. With <a href="https://developers.cloudflare.com/kv/"><u>Workers KV</u></a>, we made it easy to store configuration and cache unstructured data on the edge. With <a href="https://www.cloudflare.com/developer-platform/products/d1/"><u>D1</u></a> and <a href="https://www.cloudflare.com/developer-platform/products/durable-objects/"><u>Durable Objects</u></a>, we made it possible to build multi-tenant apps with simple, isolated SQL databases. And with Hyperdrive, we made connecting to external databases fast and scalable from Workers.</p><p>Today, we’re introducing a new choice for building on Cloudflare: Postgres and MySQL PlanetScale databases, directly accessible from within the Cloudflare dashboard. Link your Cloudflare and PlanetScale accounts, stop manually copying API keys back-and-forth, and connect Workers to any of your PlanetScale databases (production or otherwise!).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/71rXsGZgXWem4yvkhdtHsP/55f9433b5447c09703ef39a547881497/image3.png" />
          </figure><p><sup>Connect to a PlanetScale database — no figuring things out on your own</sup></p><p>Postgres and MySQL are the most popular options for building applications, and with good reason. Many large companies have built and scaled on these databases, providing for a robust ecosystem (like Cloudflare!). And you may want to have access to the power, familiarity, and functionality that these databases provide. </p><p>Importantly, all of this builds on <a href="https://blog.cloudflare.com/it-it/how-hyperdrive-speeds-up-database-access/"><u>Hyperdrive</u></a>, our distributed connection pooler and query caching infrastructure. Hyperdrive keeps connections to your databases warm to avoid incurring latency penalties for every new request, reduces the CPU load on your database by managing a connection pool, and can cache the results of your most frequent queries, removing load from your database altogether. Given that about 80% of queries for a typical transactional database are read-only, this can be substantial — we’ve observed this in reality!</p>
    <div>
      <h3>No more copying credentials around</h3>
      <a href="#no-more-copying-credentials-around">
        
      </a>
    </div>
    <p>Starting today, you can <a href="https://dash.cloudflare.com/?to=/:account/workers/hyperdrive?step=1&amp;modal=1"><u>connect to your PlanetScale databases from the Cloudflare dashboard</u></a> in just a few clicks. Connecting is now secure by default with a one-click password rotation option, without needing to copy and manage credentials back and forth. A Hyperdrive configuration will be created for your PlanetScale database, providing you with the optimal setup to start building on Workers.</p><p>And the experience spans both Cloudflare and PlanetScale dashboards: you can also create and view attached Hyperdrive configurations for your databases from the PlanetScale dashboard.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3I7WyAGXCLY8xhugPlIhl5/0ec38f0248140a628d805df7bb62dcc3/image2.png" />
          </figure><p>By automatically integrating with Hyperdrive, your PlanetScale databases are optimally configured for access from Workers. When you connect your database via Hyperdrive, Hyperdrive’s Placement system automatically determines the location of the database and places its pool of database connections in Cloudflare data centers with the lowest possible latency. </p><p>When one of your Workers connects to your Hyperdrive configuration for your PlanetScale database, Hyperdrive will ensure the fastest access to your database by eliminating the unnecessary roundtrips included in a typical database connection setup. Hyperdrive will resolve connection setup within the Hyperdrive client and use existing connections from the pool to quickly serve your queries. Better yet, Hyperdrive allows you to cache your query results in case you need to scale for high-read workloads. </p><p>This is a peek under the hood of how Hyperdrive makes access to PlanetScale as fast as possible. We’ve previously blogged about <a href="https://blog.cloudflare.com/it-it/how-hyperdrive-speeds-up-database-access/"><u>Hyperdrive’s technical underpinnings</u></a> — it’s worth a read. And with this integration with Hyperdrive, you can easily connect to your databases across different Workers applications or environments, without having to reconfigure your credentials. All in all, a perfect match.</p>
    <div>
      <h3>Get started with PlanetScale and Workers</h3>
      <a href="#get-started-with-planetscale-and-workers">
        
      </a>
    </div>
    <p>With this partnership, we’re making it trivially easy to build on Workers with PlanetScale. Want to build a new application on Workers that connects to your existing PlanetScale cluster? With just a few clicks, you can create a globally deployed app that can query your database, cache your hottest queries, and keep your database connections warmed for fast access from Workers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3eTtJKz4sxeNvClVQMWIFg/9c91fb02b1cd4eca7ad5ef013e7ab0f0/image4.png" />
          </figure><p><sup><i>Connect directly to your PlanetScale MySQL or Postgres databases from the Cloudflare dashboard, for optimal configuration with Hyperdrive.</i></sup></p><p>To get started, you can:</p><ul><li><p>Head to the <a href="https://dash.cloudflare.com/?to=/:account/workers/hyperdrive?step=1&amp;modal=1"><u>Cloudflare dashboard</u></a> and connect your PlanetScale account</p></li><li><p>… or head to <a href="https://app.planetscale.com/"><u>PlanetScale</u></a> and connect your Cloudflare account</p></li><li><p>… and then deploy a Worker</p></li></ul><p>Review the <a href="https://developers.cloudflare.com/hyperdrive/"><u>Hyperdrive docs</u></a> and/or the <a href="https://planetscale.com/docs"><u>PlanetScale docs</u></a> to learn more about how to connect Workers to PlanetScale and start shipping.</p> ]]></content:encoded>
            <category><![CDATA[Hyperdrive]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Partnership]]></category>
            <category><![CDATA[Database]]></category>
            <guid isPermaLink="false">7ibt13YouHX6Ew1wLZn5pi</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Thomas Gauvin</dc:creator>
            <dc:creator>Adrian Gracia  </dc:creator>
        </item>
        <item>
            <title><![CDATA[Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines]]></title>
            <link>https://blog.cloudflare.com/cloudflare-acquires-arroyo-pipelines-streaming-ingestion-beta/</link>
            <pubDate>Thu, 10 Apr 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ We’ve just shipped our new streaming ingestion service, Pipelines — and we’ve acquired Arroyo, enabling us to bring new SQL-based, stateful transformations to Pipelines and R2. ]]></description>
            <content:encoded><![CDATA[ <p>Today, we’re launching the open beta of Pipelines, our streaming ingestion product. Pipelines allows you to ingest high volumes of structured, real-time data, and load it into our <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>object storage service, R2</u></a>. You don’t have to manage any of the underlying infrastructure, worry about scaling shards or metadata services, and you pay for the data processed (and not by the hour). Anyone on a Workers paid plan can start using it to ingest and batch data — at tens of thousands of requests per second (RPS) — directly into R2.</p><p>But this is just the tip of the iceberg: you often want to transform the data you’re ingesting, hydrate it on-the-fly from other sources, and write it to an open table format (such as Apache Iceberg), so that you can efficiently query that data once you’ve landed it in object storage.</p><p>The good news is that we’ve thought about that too, and we’re excited to announce that we’ve acquired <a href="https://www.arroyo.dev/"><u>Arroyo</u></a>, a cloud-native, distributed stream processing engine, to make that happen.</p><p>With Arroyo <i>and </i>our just announced <a href="https://blog.cloudflare.com/r2-data-catalog-public-beta/">R2 Data Catalog</a>, we’re getting increasingly serious about building a data platform that allows you to ingest data across the planet, store it at scale, and <i>run compute over it</i>. </p><p>To get started, you can dive into the <a href="http://developers.cloudflare.com/pipelines/"><u>Pipelines developer docs</u></a> or just run this <a href="https://developers.cloudflare.com/workers/wrangler/"><u>Wrangler</u></a> command to create your first pipeline:</p>
            <pre><code>$ npx wrangler@latest pipelines create my-clickstream-pipeline --r2-bucket my-bucket

...
✅ Successfully created Pipeline my-clickstream-pipeline with ID 0e00c5ff09b34d018152af98d06f5a1xv</code></pre>
            <p>… and then write your first record(s):</p>
            <pre><code>$ curl -d '[{"payload": [],"id":"abc-def"}]' 
"https://0e00c5ff09b34d018152af98d06f5a1xvc.pipelines.cloudflarestorage.com/"</code></pre>
            <p>However, the true power comes from the processing of data streams between ingestion and when they’re written to sinks like R2. Being able to write SQL that acts on windows of data <i>as it’s being ingested</i>, that can transform &amp; aggregate it, and even extract insights from the data in real-time, turns out to be extremely powerful.</p><p>This is where Arroyo comes in, and we’re going to be bringing the best parts of Arroyo into Pipelines and deeply integrate it with Workers, R2, and the rest of our Developer Platform.</p>
    <div>
      <h2>The Arroyo origin story </h2>
      <a href="#the-arroyo-origin-story">
        
      </a>
    </div>
    <p><i>(By Micah Wylde, founder of Arroyo)</i></p><p>We started Arroyo in 2023 to bring real-time (<i>stream</i>) processing to everyone who works with data. Modern companies rely on data pipelines to power their applications and businesses — from user customization, recommendations, and anti-fraud, to the emerging world of AI agents.</p><p>But today, most of these pipelines operate in batch, running once per hour, day, or even month. After spending many years working on stream processing at companies like Lyft and Splunk, it was no mystery why: it was just too hard for developers and data scientists to build correct, performant, and reliable pipelines. Large tech companies hire streaming experts to build and operate these systems, but everyone else is stuck waiting for batches to arrive. </p><p>When we started, the dominant solution for streaming pipelines — and what we ran at Lyft and Splunk — was Apache Flink. Flink was the first system that successfully combined a fault-tolerant (able to recover consistently from failures), distributed (across multiple machines), stateful (and remember data about past events) dataflow with a graph-construction API. This combination of features meant that we could finally build powerful real-time data applications, with capabilities like windows, aggregations, and joins. But while Flink had the necessary power, in practice the API proved too hard and low-level for non-expert users, and the stateful nature of the resulting services required endless operations.</p><p>We realized we would need to build a new streaming engine — one with the power of Flink, but designed for product engineers and data scientists and to run on modern cloud infrastructure. We started with SQL as our API because it’s easy to use, widely known, and declarative. We built it in Rust for speed and operational simplicity (no JVM tuning required!). We constructed an object-storage-native state backend, simplifying the challenge of running stateful pipelines — which each are like a weird, specialized database. And then in the summer of 2023, we open-sourced it. Today, dozens of companies are running Arroyo pipelines with use cases including data ingestion, anti-fraud, IoT observability, and financial trading. </p><p>But we always knew that the engine was just one piece of the puzzle. To make streaming as easy as batch, users need to be able to develop and test query logic, backfill on historical data, and deploy serverlessly without having to worry about cluster sizing or ongoing operations. Democratizing streaming ultimately meant building a complete data platform. And when we started talking with Cloudflare, we realized they already had all of the pieces in place: R2 provides object storage for state and data at rest, Cloudflare <a href="https://developers.cloudflare.com/queues/"><u>Queues</u></a> for data in transit, and Workers to safely and efficiently run user code. And Cloudflare, uniquely, allows us to push these systems all the way to the edge, enabling a new paradigm of local stream processing that will be key for a future of data sovereignty and AI.</p><p>That’s why we’re incredibly excited to join with the Cloudflare team to make this vision a reality.</p>
    <div>
      <h2>Ingestion at scale</h2>
      <a href="#ingestion-at-scale">
        
      </a>
    </div>
    <p>While transformations and a streaming SQL API are on the way for Pipelines, it already solves two critical parts of the data journey: globally distributed, high-throughput ingestion and efficient loading into object storage. </p><p>Creating a pipeline is as simple as running one command: </p>
            <pre><code>$ npx wrangler@latest pipelines create my-clickstream-pipeline --r2-bucket my-bucket

🌀 Creating pipeline named "my-clickstream-pipeline"
✅ Successfully created pipeline my-clickstream-pipeline with ID 
0e00c5ff09b34d018152af98d06f5a1xvc

Id:    0e00c5ff09b34d018152af98d06f5a1xvc
Name:  my-clickstream-pipeline
Sources:
  HTTP:
    Endpoint:        https://0e00c5ff09b34d018152af98d06f5a1xvc.pipelines.cloudflare.com/
    Authentication:  off
    Format:          JSON
  Worker:
    Format:  JSON
Destination:
  Type:         R2
  Bucket:       my-bucket
  Format:       newline-delimited JSON
  Compression:  GZIP
Batch hints:
  Max bytes:     100 MB
  Max duration:  300 seconds
  Max records:   100,000

🎉 You can now send data to your pipeline!

Send data to your pipeline's HTTP endpoint:
curl "https://0e00c5ff09b34d018152af98d06f5a1xvc.pipelines.cloudflare.com/" -d '[{ ...JSON_DATA... }]'</code></pre>
            <p>By default, a pipeline can ingest data from two sources – Workers and an HTTP endpoint – and load batched events into an R2 bucket. This gives you an out-of-the-box solution for streaming raw event data into object storage. If the defaults don’t work, you can configure pipelines during creation or anytime after. Options include: adding authentication to the HTTP endpoint, configuring CORS to allow browsers to make cross-origin requests, and specifying output file compression and batch settings.</p><p>We’ve built Pipelines for high ingestion volumes from day 1. Each pipeline can scale to ~100,000 records per second (and we’re just getting started here). Once records are written to a Pipeline, they are then durably stored, batched, and written out as files in an R2 bucket. Batching is critical here: if you’re going to act on and query that data, you don’t want your query engine querying millions (or tens of millions) of tiny files. It’s slow (per-file &amp; request overheads), inefficient (more files to read), and costly (more operations). Instead, you want to find the right balance between batch size for your query engine and latency (not waiting too long for a batch): Pipelines allows you to configure this.</p><p>To further optimize queries, output files are partitioned by date and time, using the standard Hive partitioning scheme. This can optimize queries even further, because your query engine can just skip data that is irrelevant to the query you’re running. The output in your R2 bucket might look like this:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7q63u2kRoYBAZJtgfcF874/2a7341e1cba6e371e0eed311e89fec6a/image1.png" />
          </figure><p><sup><i>Hive-partioned files from Pipelines in an R2 bucket</i></sup></p><p>Output files are stored as new-line delimited JSON (NDJSON) — which makes it easy to materialize a stream from these files (hint: in the future you’ll be able to use R2 as a pipeline source too). Finally, the file names are <a href="https://github.com/ulid/spec"><u>ULIDs</u></a> - so they’re sorted by time by default.</p>
    <div>
      <h2>First you shard, then you shard some more</h2>
      <a href="#first-you-shard-then-you-shard-some-more">
        
      </a>
    </div>
    <p>What makes Pipelines so horizontally scalable <i>and</i> able to acknowledge writes quickly is how we built it: we use Durable Objects and the <a href="https://blog.cloudflare.com/sqlite-in-durable-objects/"><u>embedded, zero-latency SQLite</u></a> storage within each Durable Object to immediately persist data as it’s written, before then processing it and writing it to R2.</p><p>For example: imagine you’re an e-commerce or SaaS site and need to ingest website usage data (known as <i>clickstream data</i>), and make it available to your data science team to query. The infrastructure which handles this workload has to be resilient to several failure scenarios. The ingestion service needs to maintain high availability in the face of bursts in traffic. Once ingested, the data needs to be buffered, to minimize downstream invocations and thus downstream cost. Finally, the buffered data needs to be delivered to a sink, with appropriate retry &amp; failure handling if the sink is unavailable. Each step of this process needs to signal backpressure upstream when overloaded. It also needs to scale: up during major sales or events, and down during the quieter periods of the day.</p><p>Data engineers reading this post might be familiar with the status quo of using Kafka and the associated ecosystem to handle this. But if you’re an application engineer: you use Pipelines to build an ingestion service <i>without </i>learning about Kafka, Zookeeper, and Kafka streams.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/eRIUocbyvY2oHwEK34pzE/e2ef72b2858c02e890446cfd34accb45/image3.png" />
          </figure><p><sup><i>Pipelines horizontal sharding</i></sup></p><p>The diagram above shows how Pipelines splits the control plane, which is responsible for accounting, tracking shards, and Pipelines lifecycle events, and the data path, which is a scalable group of Durable Objects shards.</p><p>When a record (or batch of records) is written to Pipelines:</p><ol><li><p>The Pipelines Worker receives the records either through the fetch handler or worker binding.</p></li><li><p>Contacts the Coordinator, based upon the <code>pipeline_id</code> to get the execution plan: subsequent reads are cached to reduce pressure on the coordinator.</p></li><li><p>Executes the plan, which first shards to a set of Executors, while are primarily serving to scale read request handling</p></li><li><p>These then re-shard to another set of executors that are actually handling the writes, beginning with persisting to Durable Object storage, which will be replicated for durability and availability by the <a href="https://blog.cloudflare.com/sqlite-in-durable-objects/#under-the-hood-storage-relay-service"><u>Storage Relay Service</u></a> (SRS). </p></li><li><p>After SRS, we pass to any configured Transform Workers to customize the data.</p></li><li><p>The data is batched, written to output files, and compressed (if applicable).</p></li><li><p>The files are compressed, data is packaged into the final batches, and written to the configured R2 bucket.</p></li></ol><p>Each step of this pipeline can signal backpressure upstream. We do this by leveraging <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream"><u>ReadableStreams</u></a> and responding with <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429"><u>429s</u></a> when the total number of bytes awaiting write exceeds a threshold. Each ReadableStream is able to cross Durable Object boundaries by using <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/"><u>JSRPC</u></a> calls between Durable Objects. To improve performance, we use RPC stubs for connection reuse between Durable Objects. Each step is also able to retry operations, to handle any temporary unavailability in the Durable Objects or R2.</p><p>We also guarantee delivery even while updating an existing pipeline. When you update an existing pipeline, we create a new deployment, including all the shards and Durable Objects described above. Requests are gracefully re-routed to the new pipeline. The old pipeline continues to write data into R2, until all the Durable Object storage is drained. We spin down the old pipeline only after all the data has been written out. This way, you won’t lose data even while updating a pipeline.</p><p>You’ll notice there’s one interesting part in here — the Transform Workers — which we haven’t yet exposed. As we work to integrate Arroyo’s streaming engine with Pipelines, this will be a key part of how we hand over data for Arroyo to process.</p>
    <div>
      <h2>So, what’s it cost?</h2>
      <a href="#so-whats-it-cost">
        
      </a>
    </div>
    <p>During the first phase of the open beta, there will be no additional charges beyond standard R2 storage and operation costs incurred when loading and accessing data. And as always, egress directly from R2 buckets is free, so you can process and query your data from any cloud or region without worrying about data transfer costs adding up.</p><p>In the future, we plan to introduce pricing based on volume of data ingested into Pipelines and delivered from Pipelines:</p><table><tr><td><p>
</p></td><td><p><b>Workers Paid ($5 / month)</b></p></td></tr><tr><td><p><b>Ingestion</b></p></td><td><p>First 50 GB per month included</p><p>\$0.02 per additional GB</p></td></tr><tr><td><p><b>Delivery to R2</b></p></td><td><p>First 50 GB per month included</p><p>\$0.02 per additional GB</p></td></tr></table><p>We’re also planning to make Pipelines available on the Workers Free plan as the beta progresses.</p><p>We’ll be sharing more as we bring transformations and additional sinks to Pipelines. We’ll provide at least 30 days notice before we make any changes or start charging for usage, which we expect to do by September 15, 2025.</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>There’s a lot to build here, and we’re keen to build on a lot of the powerful components that Arroyo has built: integrating Workers as UDFs (User-Defined Functions), adding new sources like Kafka clients, and extending Pipelines with new sinks (beyond R2).</p><p>We’ll also be integrating Pipelines with our just-launched <a href="https://blog.cloudflare.com/r2-data-catalog-public-beta/">R2 Data Catalog</a>: enabling you ingest streams of data directly into Iceberg tables and immediately query them, without needing to rely on other systems.</p><p>In the meantime, you can:</p><ul><li><p>Get started and <a href="http://developers.cloudflare.com/pipelines/getting-started/"><u>create your first Pipeline</u></a></p></li><li><p><a href="http://developers.cloudflare.com/pipelines/"><u>Read the docs</u></a></p></li><li><p>Join the <code>#pipelines-beta</code> channel on <a href="http://discord.cloudflare.com/"><u>our Developer Discord</u></a></p></li></ul><p>… or deploy the example project directly: </p>
            <pre><code>$ npm create cloudflare@latest -- pipelines-starter 
--template="cloudflare/pipelines-starter"</code></pre>
            <p></p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[R2]]></category>
            <category><![CDATA[Pipelines]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <guid isPermaLink="false">7rKz4iUFCDuhtjGXVbgFzl</guid>
            <dc:creator>Micah Wylde</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Pranshu Maheshwari</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare acquires Outerbase to expand database and agent developer experience capabilities]]></title>
            <link>https://blog.cloudflare.com/cloudflare-acquires-outerbase-database-dx/</link>
            <pubDate>Mon, 07 Apr 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare has acquired Outerbase, expanding our database and agent developer experience capabilities. ]]></description>
            <content:encoded><![CDATA[ <p>I’m thrilled to share that Cloudflare has acquired <a href="https://www.outerbase.com/"><u>Outerbase</u></a>. This is such an amazing opportunity for us, and I want to explain how we got here, what we’ve built so far, and why we are so excited about becoming part of the Cloudflare team.</p><p>Databases are key to building almost any production application: you need to persist state for your users (or agents), be able to query it from a number of different clients, and you want it to be fast. But databases aren’t always easy to use: designing a good schema, writing performant queries, creating indexes, and optimizing your access patterns tends to require a lot of experience. Add that to exposing your data through easy-to-grok APIs that make the ‘right’ way to do things obvious, a great developer experience (from dashboard to CLI), and well… there’s a lot of work involved.</p><p>The Outerbase team is already getting to work on some big changes to how databases (and your data) are viewed, edited, and visualized from within <a href="https://developers.cloudflare.com/workers/"><u>Workers</u></a>, and we’re excited to give you a few sneak peeks into what we’ll be landing as we get to work.</p>
    <div>
      <h3>Database DX</h3>
      <a href="#database-dx">
        
      </a>
    </div>
    <p>When we first started Outerbase, we saw how complicated databases could be. Even experienced developers struggled with writing queries, indexing data, and locking down their data. Meanwhile, non-developers often felt locked out and that they couldn’t access the data they needed. We believed there had to be a better way. From day one, our goal was to make data accessible to everyone, no matter their skill level. While it started out by simply building a better database interface, it quickly evolved into something much more special.</p><p>Outerbase became a platform that helps you manage data in a way that feels natural. You can browse tables, edit rows, and run queries without having to deal with memorizing SQL structure. Even if you do know SQL, you can use Outerbase to dive in deeper and share your knowledge with your team. We also added visualization features so entire teams, both technical and not, could see what’s happening with their data at a glance. Then, with the growth of AI, we realized we could use it to handle many of the more complicated tasks.</p><p>One of our more exciting offerings is Starbase, a <a href="https://www.cloudflare.com/developer-platform/products/d1/">SQLite-compatible database</a> built on top of Cloudflare’s <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a>. Our goal was never to simply wrap a legacy system in a shiny interface; we wanted to make it so easy to get started from day one with nothing, and Cloudflare’s Durable Objects gave us a way to easily manage and spin up databases for anyone who needed one. On top of them, we provided automatic REST APIs, row-level security, WebSocket support for streaming queries, and much more.</p>
    <div>
      <h3>1 + 1 = 3</h3>
      <a href="#1-1-3">
        
      </a>
    </div>
    <p>Our collaboration with Cloudflare first started last year, when we introduced a way for developers to import and manage their <a href="https://developers.cloudflare.com/d1/"><u>D1</u></a> databases inside Outerbase. We were impressed with how powerful Cloudflare’s tools are for deploying and scaling applications. As we worked together, we quickly saw how well our missions aligned. Cloudflare was building the infrastructure we wished we’d had when we first started, and we were building the data experience that many Cloudflare developers were asking for. This eventually led to the seemingly obvious decision of Outerbase joining Cloudflare — it just made so much sense.</p><p>Going forward, we’ll integrate Outerbase’s core features into Cloudflare’s platform. If you’re a developer using D1 or Durable Objects, you’ll start seeing features from Outerbase show up in the Cloudflare dashboard. Expect to see our data explorer for browsing and editing tables, new REST APIs, query editor with type-ahead functionality, real-time data capture, and more of the other tooling we’ve been refining over the last couple of years show up inside the Cloudflare dashboard.</p><p>As part of this transition, the hosted Outerbase cloud will shut down on October 15, 2025, which is about six months from now. We know some of you rely on Outerbase as it stands today, so we’re leaving the open-source repositories as they are.</p><p>You will still be able to self-host Outerbase if you prefer, and we’ll provide guidance on how to do that within your own Cloudflare account. Our main goal will be to ensure that the best parts of Outerbase become part of the Cloudflare developer experience, so you no longer have to make a choice (it’ll be obvious!).</p>
    <div>
      <h3>Sneak peek</h3>
      <a href="#sneak-peek">
        
      </a>
    </div>
    <p>We’ve already done a lot of thinking about how we’re going to bring the best parts of Outerbase into D1, Durable Objects, Workflows, and Agents, and we’re going to a share a little about what will be landing over the course of Q2 2025 as the Outerbase team gets to work.</p><p>Specifically, we’ll be heads-down focusing on:</p><ul><li><p>Adapting the powerful table viewer and query runner experiences to D1 and Durable Objects (amongst many other things!)</p></li><li><p>Making it easier to get started with Durable Objects: improving the experience in Wrangler (our CLI tooling), the Cloudflare dashboard, and how you plug into them from your client applications</p></li><li><p>Improvements to how you visualize the state of a Workflow and the (thousands to millions!) of Workflow instances you might have at any point in time</p></li><li><p>Pre- and post-query hooks for D1 that allow you to automatically register handlers that can act on your data</p></li><li><p>Bringing the <a href="https://starbasedb.com/"><u>Starbase</u></a> API to D1, expanding D1’s existing REST API, and adding WebSockets support — making it easier to use D1, even for applications hosted outside of Workers.</p></li></ul><p>We have already started laying the groundwork for these changes. In the coming weeks, we’ll release a unified data explorer for D1 and Durable Objects that borrows heavily from the Outerbase interface you know. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/FHinAqMr5I8ukmIZLln3a/a34734a3ed680556b01794c6de5e1f63/image2.png" />
          </figure><p><i><sup>Bringing Outerbase’s Data Explorer into the Cloudflare Dashboard</sup></i></p><p>We’ll also tie some of Starbase’s features directly into Cloudflare’s platform, so you can tap into its unique offerings like pre- and post-query hooks or row-level security right from your existing D1 databases and Durable Objects:</p>
            <pre><code>const beforeQuery = ({ sql, params }) =&gt; {
    // Prevent unauthorized queries
    if (!isAllowedQuery(sql)) throw new Error('Query not allowed');
};

const afterQuery = ({ sql, result }) =&gt; {
    // Basic PII masking example
    for (const row of result) {
        if ('email' in row) row.email = '[redacted]';
    }
};

// Execute the query with pre- and post- query hooks
const { results } = await env.DB.prepare("SELECT * FROM users;", beforeQuery, afterQuery);</code></pre>
            <p><i><sup>Define hooks on your D1 queries that can be re-used, shared and automatically executed before or after your queries run.</sup></i></p><p>This should give you more clarity and control over your data, as well as new ways to secure and optimize it.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6W2C3WRqP13ghnHYnZsJHl/fecc1a6f8e92b6cac9499716ab5d7bc4/image1.png" />
          </figure><p><sup><i>Rethinking the Durable Objects getting started experience</i></sup></p><p>We have even begun optimizing the Cloudflare dashboard experience around Durable Objects and D1 to improve the empty state, provide more Getting Started resources, and overall, make managing and tracking your database resources even easier.

For those of you who’ve supported us, given us feedback, and stuck with us as we grew: thank you. You have helped shape Outerbase into what it is today. This acquisition means we can pour even more resources and attention into building the data experience we’ve always wanted to deliver. Our hope is that, by working as part of Cloudflare, we can help reach even more developers by building intuitive experiences, accelerating the speed of innovation, and creating tools that naturally fit into your workflows.</p><p>This is a big step for Outerbase, and we couldn’t be more excited. Thank you for being part of our journey so far. We can’t wait to show you what we’ve got in store as we continue to make data more accessible, intuitive, and powerful — together with Cloudflare.</p>
    <div>
      <h3>What’s next?</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>We’re planning to get to work on some of the big changes to how you interact with your data on Cloudflare, starting with D1 and Durable Objects.</p><p>We’ll also be ensuring we bring a great developer experience to the broader database &amp; storage platform on Cloudflare, including how you access data in <a href="https://developers.cloudflare.com/kv/"><u>Workers KV</u></a>, <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a>, <a href="https://developers.cloudflare.com/workflows/"><u>Workflows</u></a> and even your <a href="https://developers.cloudflare.com/agents/"><u>AI Agents</u></a> (just to name a few).</p><p>To keep up, follow the new <a href="https://developers.cloudflare.com/changelog/"><u>Cloudflare Changelog</u></a> and join our <a href="http://discord.cloudflare.com/"><u>Developer Discord</u></a> to chat with the team and see early previews before they land.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[D1]]></category>
            <category><![CDATA[Durable Objects]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <guid isPermaLink="false">4Epls86yTVhCR1tmlP4u67</guid>
            <dc:creator>Brandon Strittmatter</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare Workflows is now GA: production-ready durable execution]]></title>
            <link>https://blog.cloudflare.com/workflows-ga-production-ready-durable-execution/</link>
            <pubDate>Mon, 07 Apr 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Workflows — a durable execution engine built directly on top of Workers — is now Generally Available. We’ve landed new human-in-the-loop capabilities, more scale, and more metrics. ]]></description>
            <content:encoded><![CDATA[ <p>Betas are useful for feedback and iteration, but at the end of the day, not everyone is willing to be a guinea pig or can tolerate the occasional sharp edge that comes along with beta software. Sometimes you need that big, shiny “Generally Available” label (or blog post), and now it’s Workflows’ turn.</p><p><a href="https://developers.cloudflare.com/workflows/"><u>Workflows</u></a>, our serverless durable execution engine that allows you to build long-running, multi-step applications (some call them “step functions”) on Workers, is now GA.</p><p>In short, that means it’s <i>production ready</i> —  but it also doesn’t mean Workflows is going to ossify. We’re continuing to scale Workflows (including more concurrent instances), bring new capabilities (like the new <code>waitForEvent</code> API), and make it easier to build <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/">AI agents</a> with <a href="https://developers.cloudflare.com/agents/api-reference/run-workflows/"><u>our Agents SDK and Workflows</u></a>.</p><p>If you prefer code to prose, you can quickly install the Workflows starter project and start exploring the code and the API with a single command:</p>
            <pre><code>npm create cloudflare@latest workflows-starter -- 
--template="cloudflare/workflows-starter"</code></pre>
            <p>How does Workflows work? What can I build with it? How do I think about building AI agents with Workflows and the <a href="https://developers.cloudflare.com/agents/"><u>Agents SDK</u></a>? Well, read on.</p>
    <div>
      <h2>Building with Workflows</h2>
      <a href="#building-with-workflows">
        
      </a>
    </div>
    <p>Workflows is a durable execution engine built on Cloudflare Workers that allows you to build resilient, multi-step applications.</p><p>At its core, Workflows implements a step-based architecture where each step in your application is independently retriable, with state automatically persisted between steps. This means that even if a step fails due to a transient error or network issue, Workflows can retry just that step without needing to restart your entire application from the beginning.</p><p>When you define a Workflow, you break your application into logical steps.</p><ul><li><p>Each step can either execute code (<code>step.do</code>), put your Workflow to sleep (<code>step.sleep</code> or <code>step.sleepUntil</code>), or wait on an event (<code>step.waitForEvent</code>).</p></li><li><p>As your Workflow executes, it automatically persists the state returned from each step, ensuring that your application can continue exactly where it left off, even after failures or hibernation periods. </p></li><li><p>This durable execution model is particularly powerful for applications that coordinate between multiple systems, process data in sequence, or need to handle long-running tasks that might span minutes, hours, or even days.</p></li></ul><p>Workflows are particularly useful at handling complex business processes that traditional stateless functions struggle with.</p><p>For example, an e-commerce order processing workflow might check inventory, charge a payment method, send an email confirmation, and update a database — all as separate steps. If the payment processing step fails due to a temporary outage, Workflows will automatically retry just that step when the payment service is available again, without duplicating the inventory check or restarting the entire process. </p><p>You can see how this works below: each call to a service can be modelled as a step, independently retried, and if needed, recovered from that step onwards:</p>
            <pre><code>import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent } from 'cloudflare:workers';

// The params we expect when triggering this Workflow
type OrderParams = {
	orderId: string;
	customerId: string;
	items: Array&lt;{ productId: string; quantity: number }&gt;;
	paymentMethod: {
		type: string;
		id: string;
	};
};

// Our Workflow definition
export class OrderProcessingWorkflow extends WorkflowEntrypoint&lt;Env, OrderParams&gt; {
	async run(event: WorkflowEvent&lt;OrderParams&gt;, step: WorkflowStep) {
		// Step 1: Check inventory
		const inventoryResult = await step.do('check-inventory', async () =&gt; {
			console.log(`Checking inventory for order ${event.payload.orderId}`);

			// Mock: In a real workflow, you'd query your inventory system
			const inventoryCheck = await this.env.INVENTORY_SERVICE.checkAvailability(event.payload.items);

			// Return inventory status as state for the next step
			return {
				inStock: true,
				reservationId: 'inv-123456',
				itemsChecked: event.payload.items.length,
			};
		});

		// Exit workflow if items aren't in stock
		if (!inventoryResult.inStock) {
			return { status: 'failed', reason: 'out-of-stock' };
		}

		// Step 2: Process payment
		// Configure specific retry logic for payment processing
		const paymentResult = await step.do(
			'process-payment',
			{
				retries: {
					limit: 3,
					delay: '30 seconds',
					backoff: 'exponential',
				},
				timeout: '2 minutes',
			},
			async () =&gt; {
				console.log(`Processing payment for order ${event.payload.orderId}`);

				// Mock: In a real workflow, you'd call your payment processor
				const paymentResponse = await this.env.PAYMENT_SERVICE.processPayment({
					customerId: event.payload.customerId,
					orderId: event.payload.orderId,
					amount: calculateTotal(event.payload.items),
					paymentMethodId: event.payload.paymentMethod.id,
				});

				// If payment failed, throw an error that will trigger retry logic
				if (paymentResponse.status !== 'success') {
					throw new Error(`Payment failed: ${paymentResponse.message}`);
				}

				// Return payment info as state for the next step
				return {
					transactionId: 'txn-789012',
					amount: 129.99,
					timestamp: new Date().toISOString(),
				};
			},
		);

		// Step 3: Send email confirmation
		await step.do('send-confirmation-email', async () =&gt; {
			console.log(`Sending confirmation email for order ${event.payload.orderId}`);
			console.log(`Including payment confirmation ${paymentResult.transactionId}`);
			return await this.env.EMAIL_SERVICE.sendOrderConfirmation({ ... })
		});

		// Step 4: Update database
		const dbResult = await step.do('update-database', async () =&gt; {
			console.log(`Updating database for order ${event.payload.orderId}`);
			await this.updateOrderStatus(...)

			return { dbUpdated: true };
		});

		// Return final workflow state
		return {
			orderId: event.payload.orderId,
			processedAt: new Date().toISOString(),
		};
	}
}</code></pre>
            <p>
This combination of durability, automatic retries, and state persistence makes Workflows ideal for building reliable distributed applications that can handle real-world failures gracefully.</p>
    <div>
      <h2>Human-in-the-loop</h2>
      <a href="#human-in-the-loop">
        
      </a>
    </div>
    <p>Workflows are just code, and that makes them extremely powerful: you can define steps dynamically and on-the-fly, conditionally branch, and make API calls to any system you need. But sometimes you also need a Workflow to wait for something to happen in the real world.</p><p>For example:</p><ul><li><p>Approval from a human to progress.</p></li><li><p>An incoming webhook, like from a Stripe payment or a GitHub event. </p></li><li><p>A state change, such as a file upload to R2 that triggers an <a href="https://developers.cloudflare.com/r2/buckets/event-notifications/"><u>Event Notification</u></a>, and then pushes a reference to the file to the Workflow, so it can process the file (or run it through an AI model).</p></li></ul><p>The new <code>waitForEvent</code> API in Workflows allows you to do just that: </p>
            <pre><code>let event = await step.waitForEvent&lt;IncomingStripeWebhook&gt;("receive invoice paid webhook from Stripe", { type: "stripe-webhook", timeout: "1 hour" }) </code></pre>
            <p>You can then send an event to a specific instance from any external service that can make a HTTP request:</p>
            <pre><code>curl -d '{"transaction":"complete","id":"1234-6789"}' \
  -H "Authorization: Bearer ${CF_TOKEN}" \
\ "https://api.cloudflare.com/client/v4/accounts/{account_id}/workflows/{workflow_name}/instances/{instance_id}/events/{event_type}"</code></pre>
            <p>… or via the <a href="https://developers.cloudflare.com/workflows/build/workers-api/#workflowinstance"><u>Workers API</u></a> within a Worker itself:</p>
            <pre><code>interface Env {
  MY_WORKFLOW: Workflow;
}

interface Payload {
  transaction: string;
  id: string;
}

export default {
  async fetch(req: Request, env: Env) {
    const instanceId = new URL(req.url).searchParams.get("instanceId")
    const webhookPayload = await req.json&lt;Payload&gt;()

    let instance = await env.MY_WORKFLOW.get(instanceId);
    // Send our event, with `type` matching the event type defined in
    // our step.waitForEvent call
    await instance.sendEvent({type: "stripe-webhook", payload: webhookPayload})
    
    return Response.json({
      status: await instance.status(),
    });
  },
};</code></pre>
            <p>You can even wait for multiple events, using the <code>type</code> parameter, and/or race multiple events using <code>Promise.race</code> to continue on depending on which event was received first:</p>
            <pre><code>export class MyWorkflow extends WorkflowEntrypoint&lt;Env, Params&gt; {
	async run(event: WorkflowEvent&lt;Params&gt;, step: WorkflowStep) {
		let state = await step.do("get some data", () =&gt; { /* step call here */ })
		// Race the events, resolving the Promise based on which event
// we receive first
		let value = Promise.race([
step.waitForEvent("payment success", { type: "payment-success-webhook", timeout: "4 hours" ),
step.waitForEvent("payment failure", { type: "payment-failure-webhook", timeout: "4 hours" ),
])
// Continue on based on the value and event received
	}
}</code></pre>
            <p>To visualize <code>waitForEvent</code> in a bit more detail, let’s assume we have a Workflow that is triggered by a code review agent that watches a GitHub repository.</p><p>Without the ability to wait on events, our Workflow can’t easily get human approval to write suggestions back (or even submit a PR of its own). It <i>could</i> potentially poll for some state that was updated, but that means we have to call <code>step.sleep</code> for arbitrary periods of time, poll a storage service for an updated value, and repeat if it’s not there. That’s a lot of code and room for error:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64dgTwe9V6bAfKUDQgJ1z3/e0a897623a8ca452139f00dd2cff9733/1.png" />
          </figure><p><sup><i>Without waitForEvent, it’s harder to send data to a Workflow instance that’s running</i></sup></p><p>If we modified that same example to incorporate the new waitForEvent API, we could use it to wait for human approval before making a mutating change:  </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2BIuiSytb7roytyDhHVioz/0e005829fea9e60d772dcb6888acac2c/2.png" />
          </figure><p><sup><i>Adding waitForEvent to our code review Workflow, so it can seek explicit approval.</i></sup></p><p>You could even imagine an AI agent itself sending and/or acting on behalf of a human here: <code>waitForEvent</code> simply exposes a way for a Workflow to retrieve and pause on something in the world to change before it continues (or not).</p><p>Critically, you can call <code>waitForEvent</code> just like any other step in Workflows: you can call it conditionally, and/or multiple times, and/or in a loop. Workflows are just Workers: you have the full power of a programming language and are not restricted by a <a href="https://en.wikipedia.org/wiki/Domain-specific_language"><u>domain specific language (DSL)</u></a> or config language.</p>
    <div>
      <h2>Pricing</h2>
      <a href="#pricing">
        
      </a>
    </div>
    <p>Good news: we haven’t changed much since our original beta announcement! We’re adding storage pricing for state stored by your Workflows, and retaining our CPU-based and request (invocation) based pricing as follows:</p><table><tr><td><p><b>Unit</b></p></td><td><p><b>Workers Free</b></p></td><td><p><b>Workers Paid</b></p></td></tr><tr><td><p><b>CPU time (ms)</b></p></td><td><p>10 ms per Workflow</p></td><td><p>30 million CPU milliseconds included per month</p><p>+$0.02 per additional million CPU milliseconds</p></td></tr><tr><td><p><b>Requests</b></p></td><td><p>100,000 Workflow invocations per day (<a href="https://developers.cloudflare.com/workers/platform/pricing/#workers"><u>shared with Workers</u></a>)</p></td><td><p>10 million included per month</p><p>+$0.30 per additional million</p></td></tr><tr><td><p><b>Storage (GB)</b></p></td><td><p>1 GB</p></td><td><p>1 GB included per month
+ $0.20/ GB-month</p></td></tr></table><p>Because the storage pricing is new, we will not actively bill for storage until September 15, 2025. We will notify users above the included 1 GB limit ahead of charging for storage, and by default, Workflows will expire stored state after three (3) days (Free plan) or thirty (30) days (Paid plan).</p><p>If you’re wondering what “CPU time” is here: it’s the time your Workflow is actively consuming compute resources. It <i>doesn’t</i> include time spent waiting on API calls, reasoning LLMs, or other I/O (like writing to a database). That might seem like a small thing, but in practice, it adds up: most applications have single digit milliseconds of CPU time, and multiple seconds of wall time: an API or two taking 100 - 250 ms to respond adds up!</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6zRZ3gFQ0TrCetwlW0bqWG/87e41b7ab75ae48a4f2a6655d8ac2a86/3.png" />
          </figure><p><sup><i>Bill for CPU, not for time spent when a Workflow is idle or waiting.</i></sup></p><p>Workflow engines, especially, tend to spend a lot of time waiting: reading data from <a href="https://www.cloudflare.com/learning/cloud/what-is-object-storage/">object storage</a> (like <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>Cloudflare R2</u></a>), calling third-party APIs or LLMs like o3-mini or Claude 3.7, even querying databases like <a href="https://developers.cloudflare.com/d1/"><u>D1</u></a>, Postgres, or MySQL. With Workflows, just like Workers: you don’t pay for time your application is just waiting.</p>
    <div>
      <h2>Start building</h2>
      <a href="#start-building">
        
      </a>
    </div>
    <p>So you’ve got a good handle on Workflows, how it works, and want to get building. What next?</p><ol><li><p><a href="https://developers.cloudflare.com/workflows/"><u>Visit the Workflows documentation</u></a> to learn how it works, understand the Workflows API, and best practices</p></li><li><p>Review the code in the <a href="https://github.com/cloudflare/workflows-starter"><u>starter project</u></a></p></li><li><p>And lastly, deploy the starter to your own Cloudflare account with a few clicks:</p></li></ol><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/workflows-starter"><img src="https://deploy.workers.cloudflare.com/button" /></a><p></p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workflows]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">7ju3oFGzR3iR8gO2TmMleF</guid>
            <dc:creator>Sid Chatterjee</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[Making Cloudflare the best platform for building AI Agents]]></title>
            <link>https://blog.cloudflare.com/build-ai-agents-on-cloudflare/</link>
            <pubDate>Tue, 25 Feb 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Today we’re excited to share a few announcements on how we’re making it even easier to build AI agents on Cloudflare. ]]></description>
            <content:encoded><![CDATA[ <p>As engineers, we’re obsessed with efficiency and automating anything we find ourselves doing more than twice. If you’ve ever done this, you know that the happy path is always easy, but the second the inputs get complex, automation becomes really hard. This is because computers have traditionally required extremely specific instructions in order to execute.</p><p>The state of AI models available to us today has changed that. We now have access to computers that can reason, and make judgement calls in lieu of specifying every edge case under the sun.</p><p>That’s what <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/">AI agents</a> are all about.</p><p>Today we’re excited to share a few announcements on how we’re making it <i>even</i> <i>easier</i> to build AI agents on Cloudflare, including:</p><ul><li><p><code>agents-sdk</code> — a new JavaScript framework for building AI agents</p></li><li><p>Updates to Workers AI: structured outputs, tool calling, and longer context windows for <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a>, Cloudflare’s serverless inference engine</p></li><li><p>An update to the <a href="https://github.com/cloudflare/workers-ai-provider"><u>workers-ai-provider</u></a> for the AI SDK</p></li></ul><p>We truly believe that Cloudflare is the ideal platform for building Agents and AI applications (more on why below), and we’re constantly working to make it better — you can expect to see more announcements from us in this space in the future.</p><p>Before we dive deep into the announcements, we wanted to give you a quick primer on agents. If you are familiar with agents, feel free to skip ahead. </p>
    <div>
      <h2>What are agents?</h2>
      <a href="#what-are-agents">
        
      </a>
    </div>
    <p>Agents are AI systems that can autonomously execute tasks by making decisions about tool usage and process flow. Unlike traditional automation that follows predefined paths, agents can dynamically adapt their approach based on context and intermediate results. Agents are also distinct from co-pilots (e.g. traditional chat applications) in that they can fully automate a task, as opposed to simply augmenting and extending human input.</p><ul><li><p>Agents → non-linear, non-deterministic (can change from run to run)</p></li><li><p>Workflows → linear, deterministic execution paths</p></li><li><p>Co-pilots → augmentative AI assistance requiring human intervention</p></li></ul>
    <div>
      <h3>Example: booking vacations</h3>
      <a href="#example-booking-vacations">
        
      </a>
    </div>
    <p>If this is your first time working with, or interacting with agents, this example will illustrate how an agent works within a context like booking a vacation.</p><p>Imagine you're trying to book a vacation. You need to research flights, find hotels, check restaurant reviews, and keep track of your budget.</p><p><b>Traditional workflow automation</b></p><p>A traditional automation system follows a predetermined sequence: it can take inputs such as dates, location, and budget, and make calls to predefined APIs in a fixed order. However, if any unexpected situations arise, such as flights being sold out, or the specified hotels being unavailable, it cannot adapt. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7fHwj0r4JgRDawOQnNN618/2f369a5224dee288d3baf656d5952469/image1.png" />
          </figure><p><b>AI co-pilot</b></p><p>A co-pilot acts as an intelligent assistant that can provide hotel and itinerary recommendations based on your preferences. If you have questions, it can understand and respond to natural language queries and offer guidance and suggestions. However, it is unable to take the next steps to execute the end-to-end action on its own. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/24e3EQSLKo3CJsKv0gFban/6a23620857c6bca8a873da185ee5be56/image2.png" />
          </figure><p><b>Agent</b></p><p>An agent combines AI's ability to make judgements and call the relevant tools to execute the task. An agent's output will be nondeterministic given: real-time availability and pricing changes, dynamic prioritization of constraints, ability to recover from failures, and adaptive decision-making based on intermediate results. In other words, if flights or hotels are unavailable, an agent can reassess and suggest a new itinerary with altered dates or locations, and continue executing your travel booking.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/30QFnfkVyFm1tyV9B2QvXU/ac79ff6ac70ba609d4ecf714d34f0146/image3.png" />
          </figure>
    <div>
      <h2>agents-sdk — the framework for building agents</h2>
      <a href="#agents-sdk-the-framework-for-building-agents">
        
      </a>
    </div>
    <p>You can now add agent powers to any existing Workers project with just one command:</p>
            <pre><code>$ npm i agents-sdk</code></pre>
            <p>… or if you want to build something from scratch, you can bootstrap your project with the <a href="https://github.com/cloudflare/agents-starter"><u>agents-starter template</u></a>:</p>
            <pre><code>$ npm create cloudflare@latest -- --template cloudflare/agents-starter
// ... and then deploy it
$ npm run deploy</code></pre>
            <p><code>agents-sdk</code> is a framework that allows you to build agents —  software that can autonomously execute tasks — and deploy them directly into production on Cloudflare Workers.</p><p>Your agent can start with the basics and act on HTTP requests…</p>
            <pre><code>import { Agent } from "agents-sdk";

export class IntelligentAgent extends Agent {
  async onRequest(request) {
    // Transform intention into response
    return new Response("Ready to assist.");
  }
}</code></pre>
            <p>Although this is just the initial release of <code>agents-sdk</code>, we wanted to ship more than just a thin wrapper over an existing library. Agents can communicate with clients in real time, persist state, execute long-running tasks on a schedule, send emails, run asynchronous workflows, browse the web, query data from your Postgres database, call AI models, and support human-in-the-loop use-cases. All of this works today, out of the box.</p><p>For example, you can build a powerful chat agent with the <code>AIChatAgent</code> class:</p>
            <pre><code>// src/index.ts
export class Chat extends AIChatAgent&lt;Env&gt; {
  /**
   * Handles incoming chat messages and manages the response stream
   * @param onFinish - Callback function executed when streaming completes
   */
  async onChatMessage(onFinish: StreamTextOnFinishCallback&lt;any&gt;) {
    // Create a streaming response that handles both text and tool outputs
    return agentContext.run(this, async () =&gt; {
      const dataStreamResponse = createDataStreamResponse({
        execute: async (dataStream) =&gt; {
          // Process any pending tool calls from previous messages
          // This handles human-in-the-loop confirmations for tools
          const processedMessages = await processToolCalls({
            messages: this.messages,
            dataStream,
            tools,
            executions,
          });

          // Initialize OpenAI client with API key from environment
          const openai = createOpenAI({
            apiKey: this.env.OPENAI_API_KEY,
          });

          // Cloudflare AI Gateway
          // const openai = createOpenAI({
          //   apiKey: this.env.OPENAI_API_KEY,
          //   baseURL: this.env.GATEWAY_BASE_URL,
          // });

          // Stream the AI response using GPT-4
          const result = streamText({
            model: openai("gpt-4o-2024-11-20"),
            system: `
              You are a helpful assistant that can do various tasks. If the user asks, then you can also schedule tasks to be executed later. The input may have a date/time/cron pattern to be input as an object into a scheduler The time is now: ${new Date().toISOString()}.
              `,
            messages: processedMessages,
            tools,
            onFinish,
            maxSteps: 10,
          });

          // Merge the AI response stream with tool execution outputs
          result.mergeIntoDataStream(dataStream);
        },
      });

      return dataStreamResponse;
    });
  }
  async executeTask(description: string, task: Schedule&lt;string&gt;) {
    await this.saveMessages([
      ...this.messages,
      {
        id: generateId(),
        role: "user",
        content: `scheduled message: ${description}`,
      },
    ]);
  }
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext) {
    if (!env.OPENAI_API_KEY) {
      console.error(
        "OPENAI_API_KEY is not set, don't forget to set it locally in .dev.vars, and use `wrangler secret bulk .dev.vars` to upload it to production"
      );
      return new Response("OPENAI_API_KEY is not set", { status: 500 });
    }
    return (
      // Route the request to our agent or return 404 if not found
      (await routeAgentRequest(request, env)) ||
      new Response("Not found", { status: 404 })
    );
  },
} satisfies ExportedHandler&lt;Env&gt;;</code></pre>
            <p>… and connect to your Agent with any React-based front-end with the <a href="https://github.com/cloudflare/agents-starter/blob/main/src/app.tsx"><code><u>useAgent</u></code></a> hook that can automatically establish a bidirectional WebSocket, sync client state, and allow you to build Agent-based applications without a mountain of bespoke code:</p>
            <pre><code>// src/app.tsx
import { useAgent } from "agents-sdk/react";  

const agent = useAgent({
  agent: "chat",
});</code></pre>
            <p>We spent some time thinking about the production story here too: an agent framework that absolves itself of the hard parts — durably persisting state, handling long-running tasks &amp; loops, and horizontal scale — is only going to get you so far. Agents built with <code>agents-sdk</code> can be deployed directly to Cloudflare and run on top of <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> — which you can think of as stateful micro-servers that can scale to tens of millions — and are able to run wherever they need to. Close to a user for low-latency, close to your data, and/or anywhere in between.</p><p><code>agents-sdk</code> also exposes:</p><ul><li><p>Integration with React applications via a <code>useAgent</code> hook that can automatically set up a WebSocket connection between your app and an agent</p></li><li><p>An <code>AIChatAgent</code> extension that makes it easier to build intelligent chat agents</p></li><li><p>State management APIs via <code>this.setState</code> as well as a native <code>sql</code> API for writing and querying data within each Agent</p></li><li><p>State synchronization between frontend applications and the agent state</p></li><li><p>Agent routing, enabling agent-per-user or agent-per-workflow use-cases. Spawn millions (or tens of millions) of agents without having to think about how to make the infrastructure work, provision CPU, or scale out storage.</p></li></ul><p>Over the coming weeks, expect to see even more here: tighter integration with email APIs to enable more human-in-the-loop use-cases, hooks into WebRTC for voice &amp; video interactivity, a built-in evaluation (evals) framework, and the ability to self-host agents on your own infrastructure.</p><p>We’re aiming high here: we think this is just the beginning of what agents are capable of, and we think we can make Workers the best place (but not the only place) to build &amp; run them.</p>
    <div>
      <h2>JSON mode, longer context windows, and improved tool calling in Workers AI</h2>
      <a href="#json-mode-longer-context-windows-and-improved-tool-calling-in-workers-ai">
        
      </a>
    </div>
    <p>When users express needs conversationally, tool calling converts these requests into structured formats like JSON that APIs can understand and process, allowing the AI to interact with databases, services, and external systems. This is essential for building agents, as it allows users to express complex intentions in natural language, and AI to decompose these requests, call appropriate tools, evaluate responses and deliver meaningful outcomes.</p><p>When using tool calling or building AI agents, the text generation model must respond with valid JSON objects rather than natural language. Today, we're adding JSON mode support to Workers AI, enabling applications to request a structured output response when interacting with AI models. Here's a request to <code>@cf/meta/llama-3.1-8b-instruct-fp8-fast</code> using JSON mode:</p>
            <pre><code>{
  "messages": [
    {
      "role": "system",
      "content": "Extract data about a country."
    },
    {
      "role": "user",
      "content": "Tell me about India."
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "capital": {
          "type": "string"
        },
        "languages": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      },
      "required": [
        "name",
        "capital",
        "languages"
      ]
    }
  }
}</code></pre>
            <p>And here’s how the model will respond:</p>
            <pre><code>{
  "response": {
    "name": "India",
    "capital": "New Delhi",
    "languages": [
      "Hindi",
      "English",
      "Bengali",
      "Telugu",
      "Marathi",
      "Tamil",
      "Gujarati",
      "Urdu",
      "Kannada",
      "Odia",
      "Malayalam",
      "Punjabi",
      "Sanskrit"
    ]
  }
}</code></pre>
            <p>As you can see, the model is complying with the JSON schema definition in the request and responding with a validated JSON object. JSON mode is compatible with OpenAI’s <code>response_format</code> implementation:</p>
            <pre><code>response_format: {
  title: "JSON Mode",
  type: "object",
  properties: {
    type: {
      type: "string",
      enum: ["json_object", "json_schema"],
    },
    json_schema: {},
  }
}</code></pre>
            <p>This is the list of models that now support JSON mode:</p><ul><li><p><a href="https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-fast/"><u>@cf/meta/llama-3.1-8b-instruct-fast</u></a></p></li><li><p><a href="https://developers.cloudflare.com/workers-ai/models/llama-3.1-70b-instruct/"><u>@cf/meta/llama-3.1-70b-instruct</u></a></p></li><li><p><a href="https://developers.cloudflare.com/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/"><u>@cf/meta/llama-3.3-70b-instruct-fp8-fast</u></a></p></li><li><p><a href="https://developers.cloudflare.com/workers-ai/models/deepseek-r1-distill-qwen-32b/"><u>@cf/deepseek-ai/deepseek-r1-distill-qwen-32b</u></a></p></li><li><p><a href="https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct/"><u>@cf/meta/llama-3-8b-instruct</u></a></p></li><li><p><a href="https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/"><u>@cf/meta/llama-3.1-8b-instruct</u></a></p></li><li><p><a href="https://developers.cloudflare.com/workers-ai/models/hermes-2-pro-mistral-7b/"><u>@hf/nousresearch/hermes-2-pro-mistral-7b</u></a></p></li></ul><p>We will continue extending this list to keep up with new, and requested models.</p><p>Lastly, we are changing how we restrict the size of AI requests to text generation models, moving from byte-counts to token-counts, introducing the concept of <b>context window</b> and raising the limits of the models in our catalog.</p><p>In generative AI, the context window is the sum of the number of input, reasoning, and completion or response tokens a model supports. You can now find the context window limit on each <a href="https://developers.cloudflare.com/workers-ai/models/llama-3.1-70b-instruct/"><u>model page</u></a> in our developer documentation and decide which suits your requirements and use case.</p><p>JSON mode is also the perfect companion when using function calling. You can use structured JSON outputs with traditional function calling or the Vercel AI SDK via the <code>workers-ai-provider</code>.</p>
    <div>
      <h2><a href="https://github.com/cloudflare/workers-ai-provider">workers-ai-provider</a> 0.1.1</h2>
      <a href="#0-1-1">
        
      </a>
    </div>
    <p>One of the most common ways to build with AI tooling today is by using the popular <a href="https://sdk.vercel.ai/docs/introduction"><u>AI SDK</u></a>. <a href="https://github.com/cloudflare/workers-ai-provider"><u>Cloudflare’s provider</u></a> for the AI SDK makes it easy to use Workers AI the same way you would call any other LLM, directly from your code.</p><p>In the <a href="https://github.com/cloudflare/workers-ai-provider/tree/workers-ai-provider%400.1.1"><u>most recent version</u></a>, we’ve shipped the following improvements: </p><ul><li><p>Tool calling enabled for generateText</p></li><li><p>Streaming now works out of the box</p></li><li><p>Usage statistics are now enabled</p></li><li><p>You can now use AI Gateway, even when streaming</p></li></ul><p>A key part of building agents is using LLMs for routing, and making decisions on which tools to call next, and summarizing structured and unstructured data. All of these things need to happen quickly, as they are on the critical path of the user-facing experience.</p><p>Workers AI, with its globally distributed fleet of GPUs, is a perfect fit for smaller, low-latency LLMs, so we’re excited to make it easy to use with tools developers are already familiar with. </p>
    <div>
      <h2>Why build agents on Cloudflare? </h2>
      <a href="#why-build-agents-on-cloudflare">
        
      </a>
    </div>
    <p>Since launching Workers in 2017, we’ve been building a platform to allow developers to build applications that are fast, scalable, and cost-efficient from day one. We took a fundamentally different approach from the way code was previously run on servers, making a bet about what the future of applications was going to look like — isolates running on a global network, in a way that was truly serverless. No regions, no concurrency management, no managing or scaling infrastructure. </p><p>The release of Workers was just the beginning, and we continued shipping primitives to extend what developers could build. Some more familiar, like a key-value store (<a href="https://developers.cloudflare.com/kv/"><u>Workers KV</u></a>), and some that we thought would play a role in enabling net new use cases like <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a>. While we didn’t quite predict AI agents (though “Agents” was one of the proposed names for Durable Objects), we inadvertently created the perfect platform for building them. </p><p>What do we mean by that? </p>
    <div>
      <h3>A platform that only charges you for what you use (regardless of how long it takes)</h3>
      <a href="#a-platform-that-only-charges-you-for-what-you-use-regardless-of-how-long-it-takes">
        
      </a>
    </div>
    <p>To be able to run agents efficiently, you need a system that can seamlessly scale up and down to support the constant stop, go, wait patterns. Agents are basically long-running tasks, sometimes waiting on slow reasoning LLMs and external tools to execute. With Cloudflare, you don’t have to pay for long-running processes when your code is not executing. Cloudflare Workers is designed to scale down and <a href="https://blog.cloudflare.com/workers-pricing-scale-to-zero/"><u>only charge you for CPU time</u></a>, as opposed to wall-clock time. </p><p>In many cases, especially when calling LLMs, the difference can be in orders of magnitude — e.g. 2–3 milliseconds of CPU vs. 10 seconds of wall-clock time. When building on Workers, we pass that difference on to you as cost savings. </p>
    <div>
      <h3>Serverless AI Inference</h3>
      <a href="#serverless-ai-inference">
        
      </a>
    </div>
    <p>We took a similar serverless approach when it comes to inference itself. When you need to call an AI model, you need it to be instantaneously available. While the foundation model providers offer APIs that make it possible to just call the LLM, if you’re running open-source models, <a href="https://www.cloudflare.com/learning/ai/what-is-lora/"><u>LoRAs</u></a>, or self-trained models, most cloud providers today require you to pre-provision resources for what your peak traffic will look like. This means that the rest of the time, you’re still paying for GPUs to sit there idle. With Workers AI, you can pay only when you’re calling our inference APIs, as opposed to unused infrastructure. In fact, you don’t have to think about infrastructure at all, which is the principle at the core of everything we do. </p>
    <div>
      <h3>A platform designed for durable execution</h3>
      <a href="#a-platform-designed-for-durable-execution">
        
      </a>
    </div>
    <p><a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> and <a href="https://developers.cloudflare.com/workflows"><u>Workflows</u></a> provide a robust programming model that ensures guaranteed execution for asynchronous tasks that require persistence and reliability. This makes them ideal for handling complex operations like long-running deep thinking LLM calls, human-in-the-loop approval processes, or interactions with unreliable third-party APIs. By maintaining state across requests and automatically handling retries, these tools create a resilient foundation for building sophisticated AI agents that can perform complex, multistep tasks without losing context or progress, even when operations take significant time to complete.</p>
    <div>
      <h2>Lastly, new and updated agents documentation</h2>
      <a href="#lastly-new-and-updated-agents-documentation">
        
      </a>
    </div>
    <p>Did you catch all of that?</p><p>No worries if not: we’ve updated our <a href="https://developers.cloudflare.com/agents"><u>agents documentation</u></a> to include everything we talked about above, from breaking down the basics of agents, to showing you how to tackle foundational examples of building with agents.</p><p>We’ve also updated our <a href="https://developers.cloudflare.com/workers/get-started/prompting/"><u>Workers prompt</u></a> with knowledge of the agents-sdk library, so you can use Cursor, Windsurf, Zed, ChatGPT or Claude to help you build AI Agents and deploy them to Cloudflare.</p>
    <div>
      <h2>Can’t wait to see what you build! </h2>
      <a href="#cant-wait-to-see-what-you-build">
        
      </a>
    </div>
    <p>We’re just getting started, and we love to see all that you build. Please join our <a href="https://discord.com/invite/cloudflaredev"><u>Discord</u></a>, ask questions, and tell us what you’re building.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Durable Objects]]></category>
            <guid isPermaLink="false">1k3ytqqRxQ9SsiYLMSBDfO</guid>
            <dc:creator>Rita Kozlov</dc:creator>
            <dc:creator>Sunil Pai</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare incident on February 6, 2025]]></title>
            <link>https://blog.cloudflare.com/cloudflare-incident-on-february-6-2025/</link>
            <pubDate>Fri, 07 Feb 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[ On Thursday, February 6, 2025, we experienced an outage with our object storage service (R2) and products that rely on it. Here's what happened and what we're doing to fix this going forward. ]]></description>
            <content:encoded><![CDATA[ <p>Multiple Cloudflare services, including our <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2 object storage</u></a>, were unavailable for 59 minutes on Thursday, February 6, 2025. This caused all operations against R2 to fail for the duration of the incident, and caused a number of other Cloudflare services that depend on R2 — including <a href="https://www.cloudflare.com/developer-platform/products/cloudflare-stream/"><u>Stream</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/cloudflare-images/"><u>Images</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/cache-reserve/"><u>Cache Reserve</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/vectorize/"><u>Vectorize</u></a> and <a href="https://developers.cloudflare.com/logs/edge-log-delivery/"><u>Log Delivery</u></a> — to suffer significant failures.</p><p>The incident occurred due to human error and insufficient validation safeguards during a routine abuse remediation for a report about a phishing site hosted on R2. The action taken on the complaint resulted in an advanced product disablement action on the site that led to disabling the production R2 Gateway service responsible for the R2 API.  </p><p>Critically, this incident did <b>not</b> result in the loss or corruption of any data stored on R2. </p><p>We’re deeply sorry for this incident: this was a failure of a number of controls, and we are prioritizing work to implement additional system-level controls related not only to our abuse processing systems, but so that we continue to reduce the blast radius of <i>any</i> system- or human- action that could result in disabling any production service at Cloudflare.</p>
    <div>
      <h2>What was impacted?</h2>
      <a href="#what-was-impacted">
        
      </a>
    </div>
    <p>All customers using Cloudflare R2 would have observed a 100% failure rate against their R2 buckets and objects during the primary incident window. Services that depend on R2 (detailed in the table below) observed heightened error rates and failure modes depending on their usage of R2.</p><p>The primary incident window occurred between 08:14 UTC to 09:13 UTC, when operations against R2 had a 100% error rate. Dependent services (detailed below) observed increased failure rates for operations that relied on R2.</p><p>From 09:13 UTC to 09:36 UTC, as R2 recovered and clients reconnected, the backlog and resulting spike in client operations caused load issues with R2's metadata layer (built on Durable Objects). This impact was significantly more isolated: we observed a 0.09% increase in error rates in calls to Durable Objects running in North America during this window. </p><p>The following table details the impacted services, including the user-facing impact, operation failures, and increases in error rates observed:</p><table><tr><td><p><b>Product/Service</b></p></td><td><p><b>Impact</b></p></td></tr><tr><td><p><b>R2</b></p></td><td><p>100% of operations against R2 buckets and objects, including uploads, downloads, and associated metadata operations were impacted during the primary incident window. During the secondary incident window, we observed a &lt;1% increase in errors as clients reconnected and increased pressure on R2's metadata layer.</p><p>There was no data loss within the R2 storage subsystem: this incident impacted the HTTP frontend of R2. Separation of concerns and blast radius management meant that the underlying R2 infrastructure was unaffected by this.</p></td></tr><tr><td><p><b>Stream</b></p></td><td><p>100% of operations (upload &amp; streaming delivery) against assets managed by Stream were impacted during the primary incident window.</p></td></tr><tr><td><p><b>Images</b></p></td><td><p>100% of operations (uploads &amp; downloads) against assets managed by Images were impacted during the primary incident window.</p><p>Impact to Image Delivery was minor: success rate dropped to 97% as these assets are fetched from existing customer backends and do not rely on intermediate storage.</p></td></tr><tr><td><p><b>Cache Reserve</b></p></td><td><p>Cache Reserve customers observed an increase in requests to their origin during the incident window as 100% of operations failed. This resulted in an increase in requests to origins to fetch assets unavailable in Cache Reserve during this period. This impacted less than 0.049% of all cacheable requests served during the incident window.</p><p>User-facing requests for assets to sites with Cache Reserve did not observe failures as cache misses failed over to the origin.</p></td></tr><tr><td><p><b>Log Delivery</b></p></td><td><p>Log delivery was delayed during the primary incident window, resulting in significant delays (up to an hour) in log processing, as well as some dropped logs. </p><p>Specifically:</p><p>Non-R2 delivery jobs would have experienced up to 4.5% data loss during the incident. This level of data loss could have been different between jobs depending on log volume and buffer capacity in a given location.</p><p>R2 delivery jobs would have experienced up to 13.6% data loss during the incident. </p><p>R2 is a major destination for Cloudflare Logs. During the primary incident window, all available resources became saturated attempting to buffer and deliver data to R2. This prevented other jobs from acquiring resources to process their queues. Data loss (dropped logs) occurred when the job queues expired their data (to allow for new, incoming data). The system recovered when we enabled a kill switch to stop processing jobs sending data to R2.</p></td></tr><tr><td><p><b>Durable Objects</b></p></td><td><p>Durable Objects, and services that rely on it for coordination &amp; storage, were impacted as the stampeding horde of clients re-connecting to R2 drove an increase in load.</p><p>We observed a 0.09% actual increase in error rates in calls to Durable Objects running in North America, starting at 09:13 UTC and recovering by 09:36 UTC.</p></td></tr><tr><td><p><b>Cache Purge</b></p></td><td><p>Requests to the Cache Purge API saw a 1.8% error rate (HTTP 5xx) increase and a 10x increase in p90 latency for purge operations during the primary incident window. Error rates returned to normal immediately after this.</p></td></tr><tr><td><p><b>Vectorize</b></p></td><td><p>Queries and operations against Vectorize indexes were impacted during the primary incident window. 75% of queries to indexes failed (the remainder were served out of cache) and 100% of insert, upsert, and delete operations failed during the incident window as Vectorize depends on R2 for persistent storage. Once R2 recovered, Vectorize systems recovered in full.</p><p>We observed no continued impact during the secondary incident window, and we have not observed any index corruption as the Vectorize system has protections in place for this.</p></td></tr><tr><td><p><b>Key Transparency Auditor</b></p></td><td><p>100% of signature publish &amp; read operations to the KT auditor service failed during the primary incident window. No third party reads occurred during this window and thus were not impacted by the incident.</p></td></tr><tr><td><p><b>Workers &amp; Pages</b></p></td><td><p>A small volume (0.002%) of deployments to Workers and Pages projects failed during the primary incident window. These failures were limited to services with bindings to R2, as our control plane was unable to communicate with the R2 service during this period.</p></td></tr></table>
    <div>
      <h2>Incident timeline and impact</h2>
      <a href="#incident-timeline-and-impact">
        
      </a>
    </div>
    <p>The incident timeline, including the initial impact, investigation, root cause, and remediation, are detailed below.</p><p><b>All timestamps referenced are in Coordinated Universal Time (UTC).</b></p>
<div><table><colgroup>
<col></col>
<col></col>
</colgroup>
<thead>
  <tr>
    <th><span>Time</span></th>
    <th><span>Event</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>2025-02-06 08:12</span></td>
    <td><span>The R2 Gateway service is inadvertently disabled while responding to an abuse report.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:14</span></td>
    <td><span>-- IMPACT BEGINS --</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:15</span></td>
    <td><span>R2 service metrics begin to show signs of service degradation.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:17</span></td>
    <td><span>Critical R2 alerts begin to fire due to our service no longer responding to our health checks.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:18</span></td>
    <td><span>R2 on-call engaged and began looking at our operational dashboards and service logs to understand impact to availability.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:23</span></td>
    <td><span>Sales engineering escalated to the R2 engineering team that customers are experiencing a rapid increase in HTTP 500’s from all R2 APIs.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:25 </span></td>
    <td><span>Internal incident declared.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:33</span></td>
    <td><span>R2 on-call was unable to identify the root cause and escalated to the lead on-call for assistance.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:42</span></td>
    <td><span>Root cause identified as R2 team reviews service deployment history and configuration, which surfaces the action and the validation gap that allowed this to impact a production service.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:46</span></td>
    <td><span>On-call attempts to re-enable the R2 Gateway service using our internal admin tooling, however this tooling was unavailable because it relies on R2.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:49</span></td>
    <td><span>On-call escalates to an operations team who has lower level system access and can re-enable the R2 Gateway service. </span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 08:57</span></td>
    <td><span>The operations team engaged and began to re-enable the R2 Gateway service.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 09:09</span></td>
    <td><span>R2 team triggers a redeployment of the R2 Gateway service.</span></td>
  </tr>
  <tr>
    <td><span> 2025-02-06 09:10</span></td>
    <td><span>R2 began to recover as the forced re-deployment rolled out as clients were able to reconnect to R2.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 09:13</span></td>
    <td><span>-- IMPACT ENDS --</span><br /><span>R2 availability recovers to within its service-level objective (SLO). Durable Objects begins to observe a slight increase in error rate (0.09%) for Durable Objects running in North America due to the spike in R2 clients reconnecting.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 09:36</span></td>
    <td><span>The Durable Objects error rate recovers.</span></td>
  </tr>
  <tr>
    <td><span>2025-02-06 10:29</span></td>
    <td><span>The incident is closed after monitoring error rates.</span></td>
  </tr>
</tbody></table></div><p>At the R2 service level, our internal Prometheus metrics showed R2’s SLO near-immediately drop to 0% as R2’s Gateway service stopped serving all requests and terminated in-flight requests.</p><p>The slight delay in failure was due to the product disablement action taking 1–2 minutes to take effect as well as our configured metrics aggregation intervals:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4pbONRcG99RWttIUyGqnI6/bad397f73762a706285ea143ed2418b3/BLOG-2685_2.png" />
          </figure><p>For context, R2’s architecture separates the Gateway service, which is responsible for authenticating and serving requests to R2’s S3 &amp; REST APIs and is the “front door” for R2 — its metadata store (built on Durable Objects), our intermediate caches, and the underlying, distributed storage subsystem responsible for durably storing objects. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/E2cgDKA2zGwaQDBs31tPk/4272c94625fd788148d16a90cc7cceaa/Image_20250206_172217_707.png" />
          </figure><p>During the incident, all other components of R2 remained up: this is what allowed the service to recover so quickly once the R2 Gateway service was restored and re-deployed. The R2 Gateway acts as the coordinator for all work when operations are made against R2. During the request lifecycle, we validate authentication and authorization, write any new data to a new immutable key in our object store, then update our metadata layer to point to the new object. When the service was disabled, all running processes stopped.</p><p>While this means that all in-flight and subsequent requests fail, anything that had received a HTTP 200 response had already succeeded with no risk of reverting to a prior version when the service recovered. This is critical to R2’s consistency guarantees and mitigates the chance of a client receiving a successful API response without the underlying metadata <i>and </i>storage infrastructure having persisted the change.  </p>
    <div>
      <h2>Deep dive </h2>
      <a href="#deep-dive">
        
      </a>
    </div>
    <p><b>Due to human error and insufficient validation safeguards in our admin tooling, the R2 Gateway service was taken down as part of a routine remediation for a phishing URL.</b></p><p>During a routine abuse remediation, action was taken on a complaint that inadvertently disabled the R2 Gateway service instead of the specific endpoint/bucket associated with the report. This was a failure of multiple system level controls (first and foremost) and operator training. </p><p>A key system-level control that led to this incident was in how we identify (or "tag") internal accounts used by our teams. Teams typically have multiple accounts (dev, staging, prod) to reduce the blast radius of any configuration changes or deployments, but our abuse processing systems were not explicitly configured to identify these accounts and block disablement actions against them. Instead of disabling the specific endpoint associated with the abuse report, the system allowed the operator to (incorrectly) disable the R2 Gateway service. </p><p>Once we identified this as the cause of the outage, remediation and recovery was inhibited by the lack of direct controls to revert the product disablement action and the need to engage an operations team with lower level access than is routine. The R2 Gateway service then required a re-deployment in order to rebuild its routing pipeline across our edge network.</p><p>Once re-deployed, clients were able to re-connect to R2, and error rates for dependent services (including Stream, Images, Cache Reserve and Vectorize) returned to normal levels.</p>
    <div>
      <h2>Remediation and follow-up steps</h2>
      <a href="#remediation-and-follow-up-steps">
        
      </a>
    </div>
    <p>We have taken immediate steps to resolve the validation gaps in our tooling to prevent this specific failure from occurring in the future.</p><p>We are prioritizing several work-streams to implement stronger, system-wide controls (defense-in-depth) to prevent this, including how we provision internal accounts so that we are not relying on our teams to correctly and reliably tag accounts. A key theme to our remediation efforts here is around removing the need to rely on training or process, and instead ensuring that our systems have the right guardrails and controls built-in to prevent operator errors.</p><p>These work-streams include (but are not limited to) the following:</p><ul><li><p><b>Actioned: </b>deployed additional guardrails implemented in the Admin API to prevent product disablement of services running in internal accounts.</p></li><li><p><b>Actioned</b>: Product disablement actions in the abuse review UI have been disabled while we add more robust safeguards. This will prevent us from inadvertently repeating similar high-risk manual actions.</p></li><li><p><b>In-flight</b>: Changing how we create all internal accounts (staging, dev, production) to ensure that all accounts are correctly provisioned into the correct organization. This must include protections against creating standalone accounts to avoid re-occurrence of this incident (or similar) in the future.</p></li><li><p><b>In-flight: </b>Further restricting access to product disablement actions beyond the remediations recommended by the system to a smaller group of senior operators.</p></li><li><p><b>In-flight</b>: Two-party approval required for ad-hoc product disablement actions. Going forward, if an investigator requires additional remediations, they must be submitted to a manager or a person on our approved remediation acceptance list to approve their additional actions on an abuse report. </p></li><li><p><b>In-flight</b>: Expand existing abuse checks that prevent accidental blocking of internal hostnames to also prevent any product disablement action of products associated with an internal Cloudflare account.  </p></li><li><p><b>In-flight</b>: Internal accounts are being moved to our new Organizations model ahead of public release of this feature. The R2 production account was a member of this organization, but our abuse remediation engine did not have the necessary protections to prevent acting against accounts within this organization.</p></li></ul><p>We’re continuing to discuss &amp; review additional steps and effort that can continue to reduce the blast radius of any system- or human- action that could result in disabling any production service at Cloudflare.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>We understand this was a serious incident, and we are painfully aware of — and extremely sorry for — the impact it caused to customers and teams building and running their businesses on Cloudflare.</p><p>This is the first (and ideally, the last) incident of this kind and duration for R2, and we’re committed to improving controls across our systems and workflows to prevent this in the future.</p> ]]></content:encoded>
            <category><![CDATA[Post Mortem]]></category>
            <category><![CDATA[Outage]]></category>
            <category><![CDATA[undefined]]></category>
            <guid isPermaLink="false">mDiwAePfMfpVHMlYrfrFu</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Javier Castro</dc:creator>
        </item>
        <item>
            <title><![CDATA[Build durable applications on Cloudflare Workers: you write the Workflows, we take care of the rest]]></title>
            <link>https://blog.cloudflare.com/building-workflows-durable-execution-on-workers/</link>
            <pubDate>Thu, 24 Oct 2024 13:05:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare Workflows is now in open beta! Workflows allows you to build reliable, repeatable, long-lived multi-step applications that can automatically retry, persist state, and scale out. Read on to learn how Workflows works, how we built it on top of Durable Objects, and how you can deploy your first Workflows application. ]]></description>
            <content:encoded><![CDATA[ <p>Workflows, Cloudflare’s durable execution engine that allows you to build reliable, repeatable multi-step applications that scale for you, is now in open beta. Any developer with a free or paid <a href="https://workers.cloudflare.com/"><u>Workers</u></a> plan can build and deploy a Workflow right now: no waitlist, no sign-up form, no fake line around-the-block.</p><p>If you learn by doing, you can create your first Workflow via a single command (or <a href="https://developers.cloudflare.com/workflows/get-started/guide/"><u>visit the docs for the full guide)</u></a>:</p>
            <pre><code>npm create cloudflare@latest workflows-starter -- \
  --template "cloudflare/workflows-starter"</code></pre>
            <p>Open the <code>src/index.ts</code> file, poke around, start extending it, and deploy it with a quick <code>wrangler deploy</code>.</p><p>If you want to learn more about how Workflows works, how you can use it to build applications, and how we built it, read on.</p>
    <div>
      <h2>Workflows? Durable Execution?</h2>
      <a href="#workflows-durable-execution">
        
      </a>
    </div>
    <p>Workflows—which we <a href="https://blog.cloudflare.com/data-anywhere-events-pipelines-durable-execution-workflows/#durable-execution"><u>announced back during Developer Week</u></a> earlier this year—is our take on the concept of “Durable Execution”: the ability to build and execute applications that are <i>durable</i> in the face of errors, network issues, upstream API outages, rate limits, and (most importantly) infrastructure failure.</p><p>As <a href="https://cloudflare.tv/event/xvm4qdgm?startTime=8m5s"><u>over 2.4 million developers</u></a> continue to build applications on top of Cloudflare Workers, R2, and Workers AI, we’ve noticed more developers building multi-step applications and workflows that process user data, transform unstructured data into structured, export metrics, persist state as they progress, and automatically retry &amp; restart. But writing any non-trivial application and making it <i>durable</i> in the face of failure is hard: this is where Workflows comes in. Workflows manages the retries, emitting the metrics, and durably storing the state (without you having to stand up your own database) as the Workflow progresses.</p><p>What makes Workflows different from other takes on “Durable Execution” is that we manage the underlying compute and storage infrastructure for you. You’re not left managing a compute cluster and hoping it scales both up (on a Monday morning) and down (during quieter periods) to manage costs, or ensuring that you have compute running in the right locations. Workflows is built on Cloudflare Workers — our job is to run your code and operate the infrastructure for you.</p><p>As an example of how Workflows can help you build durable applications, assume you want to post-process file uploads from your users that were uploaded to an R2 bucket directly via <a href="https://developers.cloudflare.com/r2/api/s3/presigned-urls/"><u>a pre-signed URL</u></a>. That post-processing could involve multiple actions: text extraction via a <a href="https://developers.cloudflare.com/workers-ai/models/"><u>Workers AI model</u></a>, calls to a third-party API to validate data, updating or querying rows in a database once the file has been processed… the list goes on.</p><p>But what each of these actions has in common is that it could <i>fail</i>. Maybe that upstream API is unavailable, maybe you get rate-limited, maybe your database is down. Having to write extensive retry logic around each action, manage backoffs, and (importantly) ensure your application doesn’t have to start from scratch when a later <i>step</i> fails is more boilerplate to write and more code to test and debug.</p><p>What’s a <i>step, </i>you ask? The core building block of every Workflow is the step: an individually retriable component of your application that can optionally emit state. That state is then persisted, even if subsequent steps were to fail. This means that your application doesn’t have to restart, allowing it to not only recover more quickly from failure scenarios, but it can also avoid doing redundant work. You don’t want your application hammering an expensive third-party API (or getting you rate limited) because it’s naively retrying an API call that you don’t have to.</p>
            <pre><code>export class MyWorkflow extends WorkflowEntrypoint&lt;Env, Params&gt; {
	async run(event: WorkflowEvent&lt;Params&gt;, step: WorkflowStep) {
		const files = await step.do('my first step', async () =&gt; {
			return {
				inputParams: event,
				files: [
					'doc_7392_rev3.pdf',
					'report_x29_final.pdf',
					'memo_2024_05_12.pdf',
					'file_089_update.pdf',
					'proj_alpha_v2.pdf',
					'data_analysis_q2.pdf',
					'notes_meeting_52.pdf',
					'summary_fy24_draft.pdf',
				],
			};
		});

		// Other steps...
	}
}
</code></pre>
            <p>Notably, a Workflow can have hundreds of steps: one of the <a href="https://developers.cloudflare.com/workflows/build/rules-of-workflows/"><u>Rules of Workflows</u></a> is to encapsulate every API call or stateful action within your application into its own step. Each step can also define its own retry strategy, automatically backing off, adding a delay and/or (eventually) giving up after a set number of attempts.</p>
            <pre><code>await step.do(
	'make a call to write that could maybe, just might, fail',
	// Define a retry strategy
	{
		retries: {
			limit: 5,
			delay: '5 seconds',
			backoff: 'exponential',
		},
		timeout: '15 minutes',
	},
	async () =&gt; {
		// Do stuff here, with access to the state from our previous steps
		if (Math.random() &gt; 0.5) {
			throw new Error('API call to $STORAGE_SYSTEM failed');
		}
	},
);
</code></pre>
            <p>To illustrate this further, imagine you have an application that reads text files from an R2 storage bucket, pre-processes the text into chunks, generates text embeddings <a href="https://developers.cloudflare.com/workers-ai/models/bge-large-en-v1.5/"><u>using Workers AI</u></a>, and then inserts those into a vector database (like <a href="https://developers.cloudflare.com/vectorize/"><u>Vectorize</u></a>) for semantic search.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7b9m0rPDlGvIiTnhguyvzI/3f27678b141ce600f1f54eb999e9d671/WORKFLOWS.png" />
          </figure><p>In the Workflows programming model, each of those is a discrete step, and each can emit state. For example, each of the four actions below can be a discrete <code>step.do</code> call in a Workflow:</p><ol><li><p>Reading the files from storage and emitting the list of filenames</p></li><li><p>Chunking the text and emitting the results</p></li><li><p>Generating text embeddings</p></li><li><p>Upserting them into Vectorize and capturing the result of a test query</p></li></ol><p>You can also start to imagine that some steps, such as chunking text or generating text embeddings, can be broken down into even more steps — a step per file that we chunk, or a step per API call to our text embedding model, so that our application is even more resilient to failure.</p><p>Steps can be created programmatically or conditionally based on input, allowing you to dynamically create steps based on the number of inputs your application needs to process. You do not need to define all steps ahead of time, and each instance of a Workflow may choose to conditionally create steps on the fly.</p>
    <div>
      <h2>Building Cloudflare on Cloudflare</h2>
      <a href="#building-cloudflare-on-cloudflare">
        
      </a>
    </div>
    <p>As the Cloudflare Developer platform <a href="https://www.cloudflare.com/birthday-week/"><u>continues to grow</u></a>, almost all of our own products are built on top of it. Workflows is yet another example of how we built a new product from scratch using nothing but Workers and its vast catalog of features and APIs. This section of the blog has two goals: to explain how we built it, and to demonstrate that anyone can create a complex application or platform with demanding requirements and multiple architectural layers on our stack, too.</p><p>If you’re wondering how Workflows manages to make durable execution easy, how it persists state, and how it automatically scales: it’s because we built it on Cloudflare Workers, including the brand-new <a href="https://blog.cloudflare.com/sqlite-in-durable-objects/"><u>zero-latency SQLite storage we recently introduced to Durable Objects</u></a>.
</p><p>To understand how Workflows uses Workers &amp; Durable Objects, here’s the high-level overview of our architecture:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7pknYk0Sshxka3iPbxBCRj/bb8b75986601e38b6b69fe8d849c0cbe/image9.png" />
          </figure><p>There are three main blocks in this diagram:</p><p>The user-facing APIs are where the user interacts with the platform, creating and deploying new workflows or instances, controlling them, and accessing their state and activity logs. These operations can be executed through our public <a href="https://developers.cloudflare.com/api/"><u>API gateway</u></a> using REST calls, a Worker script using bindings, <a href="https://blog.cloudflare.com/wrangler3"><u>Wrangler</u></a> (Cloudflare's developer platform command line tool), or via the <a href="https://dash.cloudflare.com/"><u>Dashboard</u></a> user interface.</p><p>The managed platform holds the internal configuration APIs running on a Worker implementing a catalog of REST endpoints, the binding shim, which is supported by another dedicated Worker, every account controller, and their correspondent workflow engines, all powered by SQLite-backed Durable Objects. This is where all the magic happens and what we are sharing more details about in this technical blog.</p><p>Finally, there are the workflow instances, essentially independent clones of the workflow application. Instances are user account-owned and have a one-to-one relationship with a managed engine that powers them. You can run as many instances and engines as you want concurrently.</p><p>Let's get into more detail…</p>
    <div>
      <h3>Configuration API and Binding Shim</h3>
      <a href="#configuration-api-and-binding-shim">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2qEGr9M8KwgPS66Ju8mELL/189db9764392c00ae34dd3a44eeb1ed7/image6.png" />
          </figure><p>The Configuration API and the Binding Shim are two stateless Workers; one receives REST API calls from clients calling our <a href="https://developers.cloudflare.com/api/"><u>API Gateway</u></a> directly, using <a href="https://developers.cloudflare.com/workers/wrangler/"><u>Wrangler</u></a>, or navigating the <a href="https://dash.cloudflare.com/"><u>Dashboard</u></a> UI, and the other is the endpoint for the Workflows <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>binding</u></a>, an efficient and authenticated interface to interact with the Cloudflare Developer Platform resources from a Workers script.</p><p>The configuration API worker uses <a href="https://hono.dev/docs/getting-started/cloudflare-workers"><u>HonoJS</u></a> and <a href="https://hono.dev/examples/zod-openapi"><u>Zod</u></a> to implement the REST endpoints, which are declared in an <a href="https://swagger.io/specification/"><u>OpenAPI</u></a> schema and exported to our API Gateway, thus adding our methods to the Cloudflare API <a href="https://developers.cloudflare.com/api/"><u>catalog</u></a>.</p>
            <pre><code>import { swaggerUI } from '@hono/swagger-ui';
import { createRoute, OpenAPIHono, z } from '@hono/zod-openapi';
import { Hono } from 'hono';

...

​​api.openapi(
  createRoute({
    method: 'get',
    path: '/',
    request: {
      query: PaginationParams,
    },
    responses: {
      200: {
        content: {
          'application/json': {
             schema: APISchemaSuccess(z.array(WorkflowWithInstancesCountSchema)),
          },
        },
        description: 'List of all Workflows belonging to a account.',
      },
    },
  }),
  async (ctx) =&gt; {
    ...
  },
);

...

api.route('/:workflow_name', routes.workflows);
api.route('/:workflow_name/instances', routes.instances);
api.route('/:workflow_name/versions', routes.versions);</code></pre>
            <p>These Workers perform two different functions, but they share a large portion of their code and implement similar logic; once the request is authenticated and ready to travel to the next stage, they use the account ID to delegate the operation to a Durable Object called Account Controller.</p>
            <pre><code>// env.ACCOUNTS is the Account Controllers Durable Objects namespace
const accountStubId = c.env.ACCOUNTS.idFromName(accountId.toString());
const accountStub = c.env.ACCOUNTS.get(accountStubId);</code></pre>
            <p>As you can see, every account has its own Account Controller Durable Object.</p>
    <div>
      <h3>Account Controllers</h3>
      <a href="#account-controllers">
        
      </a>
    </div>
    <p>The Account Controller is a dedicated persisted database that stores the list of all the account’s workflows, versions, and instances. We scale to millions of account controllers, one per every Cloudflare account using Workflows, by leveraging the power of <a href="https://developers.cloudflare.com/durable-objects/best-practices/access-durable-objects-storage/#sqlite-storage-backend"><u>Durable Objects with SQLite backend</u></a>.</p><p><a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> (DOs) are single-threaded singletons that run in our data centers and are bound to a stateful storage API, in this case, SQLite. They are also Workers, just a special kind, and have access to all of our other APIs. This makes it easy to build consistent, highly available distributed applications with them.</p><p>Here’s what we get for free by using one Durable Object per Workflows account:</p><ul><li><p>Sharding based on account boundaries aligns perfectly with the way we manage resources at Cloudflare internally. Also, due to the nature of DOs, there are other things that this model gets us for free: Not that we expect them, but eventual bugs or state inconsistencies during beta are confined to the affected account, and don’t impact everyone.</p></li><li><p>DO instances run close to the end user; Alice is in London and will call the config API through our <a href="https://www.cloudflare.com/en-gb/network/"><u>LHR data center</u></a>, while Bob is in Lisbon and will connect to LIS.</p></li><li><p>Because every account is a Worker, we can gradually upgrade them to new versions, starting with the internal users, thus derisking real customers.</p></li></ul><p>Before SQLite, our only option was to use the Durable Object's <a href="https://developers.cloudflare.com/durable-objects/api/storage-api/#get"><u>key-value</u></a> storage API, but having a relational database at our fingertips and being able to create tables and do complex queries is a significant enabler. For example, take a look at how we implement the internal method getWorkflow():</p>
            <pre><code>async function getWorkflow(accountId: number, workflowName: string) {
  try {
    const res = this.ctx.storage.transactionSync(() =&gt; {
      const cursor = Array.from(
        this.ctx.storage.sql.exec(
          `
                    SELECT *,
                    (SELECT class_name
                        FROM   versions
                        WHERE  workflow_id = w.id
                        ORDER  BY created_on DESC
                        LIMIT  1) AS class_name
                    FROM   workflows w
                    WHERE  w.name = ? 
                    `,
          workflowName
        )
      )[0] as Workflow;

      return cursor;
    });

    this.sendAnalytics(accountId, begin, "getWorkflow");
    return res as Workflow | undefined;
  } catch (err) {
    this.sendErrorAnalytics(accountId, begin, "getWorkflow");
    throw err;
  }
}
</code></pre>
            <p>The other thing we take advantage of in Workflows is using the recently <a href="https://blog.cloudflare.com/javascript-native-rpc/"><u>announced</u></a> JavaScript-native RPC feature when communicating between components.</p><p>Before <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/"><u>RPC</u></a>, we had to <code>fetch()</code> between components, make HTTP requests, and serialize and deserialize the parameters and the payload. Now, we can async call the remote object's method as if it was local. Not only does this feel more natural and simplify our logic, but it's also more efficient, and we can take advantage of TypeScript type-checking when writing code.</p><p>This is how the Configuration API would call the Account Controller’s <code>countWorkflows()</code> method before:</p>
            <pre><code>const resp = await accountStub.fetch(
      "https://controller/count-workflows",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/json; charset=utf-8",
        },
        body: JSON.stringify({ accountId }),
      },
    );

if (!resp.ok) {
  return new Response("Internal Server Error", { status: 500 });
}

const result = await resp.json();
const total_count = result.total_count;</code></pre>
            <p>This is how we do it using RPC:</p>
            <pre><code>const total_count = await accountStub.countWorkflows(accountId);</code></pre>
            <p>The other powerful feature of our RPC system is that it supports passing not only <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Structured_clone_algorithm#supported_types"><u>Structured Cloneable</u></a> objects back and forth but also entire classes. More on this later.</p><p>Let’s move on to Engine.</p>
    <div>
      <h3>Engine and instance</h3>
      <a href="#engine-and-instance">
        
      </a>
    </div>
    <p>Every instance of a workflow runs alongside an Engine instance. The Engine is responsible for starting up the user’s workflow entry point, executing the steps on behalf of the user, handling their results, and tracking the workflow state until completion.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6yrKsuF501oRCDujckr3yM/bde40097ec5bedda07793375e53e99b9/image1.png" />
          </figure><p>When we started thinking about the Engine, we thought about modeling it after a <a href="https://en.wikipedia.org/wiki/Finite-state_machine"><u>state machine</u></a>, and that was what our initial prototypes looked like. However, state machines require an ahead-of-time understanding of the userland code, which implies having a build step before running them. This is costly at scale and introduces additional complexity.</p><p>A few iterations later, we had another idea. What if we could model the engine as a game loop?</p><p>Unlike other computer programs, games operate regardless of a user's input. The game loop is essentially a sequence of tasks that implement the game's logic and update the display, typically one loop per video frame. Here’s an example of a game loop in pseudo-code:</p>
            <pre><code>while (game in running)
    check for user input
    move graphics
    play sounds
end while</code></pre>
            <p>Well, an oversimplified version of our Workflow engine would look like this:</p>
            <pre><code>while (last step not completed)
    iterate every step
       use memoized cache as response if the step has run already
       continue running step or timer if it hasn't finished yet
end while</code></pre>
            <p>A workflow is indeed a loop that keeps on going, performing the same sequence of logical tasks until the last step completes.</p><p>The Engine and the instance run hand-in-hand in a one-to-one relationship. The first is managed, and part of the platform. It uses SQLite and other platform APIs internally, and we can constantly add new features, fix bugs, and deploy new versions, while keeping everything transparent to the end user. The second is the actual account-owned Worker script that declares the Workflow steps.</p><p>For example, when someone passes a callback into <code>step.do()</code>:</p>
            <pre><code>export class MyWorkflow extends WorkflowEntrypoint&lt;Env, Params&gt; {
  async run(event: WorkflowEvent&lt;Params&gt;, step: WorkflowStep) {
    step.do('step1', () =&gt; { ... });
  }
}</code></pre>
            <p>We switch execution over to the Engine. Again, this is possible because of the power of JS RPC. Besides passing Structured Cloneable objects back and forth, JS RPC allows us to <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/#send-functions-as-parameters-of-rpc-methods"><u>create and pass entire application-defined classes</u></a> that extend the built-in RpcTarget. So this is what happens behind the scenes when your Instance calls <code>step.do()</code> (simplified):</p>
            <pre><code>export class Context extends RpcTarget {

  async do&lt;T&gt;(name: string, callback: () =&gt; Promise&lt;T&gt;): Promise&lt;T&gt; {

    // First we check we have a cache of this step.do() already
    const maybeResult = await this.#state.storage.get(name);

    // We return the cache if it exists
    if (maybeValue) { return maybeValue; }

    // Else we run the user callback
    return doWrapper(callback);
  }

}
</code></pre>
            <p>Here’s a more complete diagram of the Engine’s <code>step.do()</code> lifecycle:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4MymVGS7BxwityCRlWcBOX/136d4dcf0affce04164f87b6bbe8b12a/image5.png" />
          </figure><p>Again, this diagram only partially represents everything we do in the Engine; things like logging for observability or handling exceptions are missing, and we don't get into the details of how queuing is implemented. However, it gives you a good idea of how the Engine abstracts and handles all the complexities of completing a step under the hood, allowing us to expose a simple-to-use API to end users.</p><p>Also, it's worth reiterating that every workflow instance is an Engine behind the scenes, and every Engine is an SQLite-backed Durable Object. This ensures that every instance runtime and state are isolated and independent of each other and that we can effortlessly scale to run billions of workflow instances, a solved problem for Durable Objects.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4uEoEAtsjNquPCD3F50S9d/006556baf2a0478d1de10e4514843baa/image3.png" />
          </figure>
    <div>
      <h3>Durability</h3>
      <a href="#durability">
        
      </a>
    </div>
    <p>Durable Execution is all the rage now when we talk about workflow engines, and ours is no exception. Workflows are typically long-lived processes that run multiple functions in sequence where anything can happen. Those functions can time out or fail because of a remote server error or a network issue and need to be retried. A workflow engine ensures that your application runs smoothly and completes regardless of the problems it encounters.</p><p>Durability means that if and when a workflow fails, the Engine can re-run it, resume from the last recorded step, and deterministically re-calculate the state from all the successful steps' cached responses. This is possible because steps are stateful and idempotent; they produce the same result no matter how many times we run them, thus not causing unintended duplicate effects like sending the same invoice to a customer multiple times.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1R5UfQfNMKI7hB6QXJfCUr/242e85f2b5287394871e916844359bd4/image7.png" />
          </figure><p>We ensure durability and handle failures and retries by sharing the same technique we use for a <code>step.sleep()</code> that requires sleeping for days or months: a combination of using <code>scheduler.wait()</code>, a method of the <a href="https://github.com/WICG/scheduling-apis"><u>upcoming WICG Scheduling API</u></a> that we already <a href="https://developers.cloudflare.com/workers/platform/changelog/historical-changelog/#2021-12-10"><u>support</u></a>, and <a href="https://developers.cloudflare.com/durable-objects/api/alarms/"><u>Durable Objects alarms</u></a>, which allow you to schedule the Durable Object to be woken up at a time in the future.</p><p>These two APIs allow us to overcome the lack of guarantees that a Durable Object runs forever, giving us complete control of its lifecycle. Since every state transition through userland code persists in the Engine’s strongly consistent SQLite, we track timestamps when a step begins execution, its attempts (if it needs retries), and its completion.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6FSCXRt9fO4EaaBP7hLV8x/a59de27dfbe18f39addd4eb8240b9df9/image10.png" />
          </figure><p>This means that steps pending if a Durable Object is <a href="https://developers.cloudflare.com/durable-objects/reference/in-memory-state/"><u>evicted</u></a> — perhaps due to a two-month-long timer — get rerun on the next lifetime of the Engine (with its cache from the previous lifetime hydrated) that is triggered by an alarm set with the timestamp of the next expected state transition. </p>
    <div>
      <h2>Real-life workflow, step by step</h2>
      <a href="#real-life-workflow-step-by-step">
        
      </a>
    </div>
    <p>Let's walk through an example of a real-life application. You run an e-commerce website and would like to send email reminders to your customers for forgotten carts that haven't been checked out in a few days.</p><p>What would typically have to be a combination of a queue, a cron job, and querying a database table periodically can now simply be a Workflow that we start on every new cart:</p>
            <pre><code>import {
  WorkflowEntrypoint,
  WorkflowEvent,
  WorkflowStep,
} from "cloudflare:workers";
import { sendEmail } from "./legacy-email-provider";

type Params = {
  cartId: string;
};

type Env = {
  DB: D1Database;
};

export class Purchase extends WorkflowEntrypoint&lt;Env, Params&gt; {
  async run(
    event: WorkflowEvent&lt;Params&gt;,
    step: WorkflowStep
  ): Promise&lt;unknown&gt; {
    await step.sleep("wait for three days", "3 days");

    // Retrieve cart from D1
    const cart = await step.do("retrieve cart from database", async () =&gt; {
      const { results } = await this.env.DB.prepare(`SELECT * FROM cart WHERE id = ?`)
        .bind(event.payload.cartId)
        .all();
      return results[0];
    });

    if (!cart.checkedOut) {
      await step.do("send an email", async () =&gt; {
        await sendEmail("reminder", cart);
      });
    }
  }
}
</code></pre>
            <p>This works great. However, sometimes the <code>sendEmail</code> function fails due to an upstream provider erroring out. While <code>step.do</code> automatically retries with a reasonable default configuration, we can define our settings:</p>
            <pre><code>if (cart.isComplete) {
  await step.do(
    "send an email",
    {
      retries: {
        limit: 5,
        delay: "1 min",
        backoff: "exponential",
      },
    },
    async () =&gt; {
      await sendEmail("reminder", cart);
    }
  );
}
</code></pre>
            
    <div>
      <h3>Managing Workflows</h3>
      <a href="#managing-workflows">
        
      </a>
    </div>
    <p>Workflows allows us to create and manage workflows using four different interfaces:</p><ul><li><p>Using our REST HTTP API available on <a href="https://developers.cloudflare.com/api/"><u>Cloudflare’s API catalog</u></a></p></li><li><p>Using <a href="https://developers.cloudflare.com/workers/wrangler/"><u>Wrangler</u></a>, Cloudflare's developer platform command-line tool</p></li><li><p>Programmatically inside a Worker using <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>bindings</u></a></p></li><li><p>Using our Web UI in the <a href="https://dash.cloudflare.com/"><u>dashboard</u></a></p></li></ul><p>The HTTP API makes it easy to trigger new instances of workflows from any system, even if it isn’t on Cloudflare, or from the command line. For example:</p>
            <pre><code>curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/workflows/purchase-workflow/instances/$CART_INSTANCE_ID \
  --header 'Authorization: Bearer $ACCOUNT_TOKEN \
  --header 'Content-Type: application/json' \
  --data '{
	"id": "$CART_INSTANCE_ID",
	"params": {
		"cartId": "f3bcc11b-2833-41fb-847f-1b19469139d1"
	}
  }'</code></pre>
            <p>Wrangler goes one step further and gives us a friendlier set of commands to interact with workflows with fancy formatted outputs without needing to authenticate with tokens. Type <code>npx wrangler workflows</code> for help, or:</p>
            <pre><code>npx wrangler workflows trigger purchase-workflow '{ "cartId": "f3bcc11b-2833-41fb-847f-1b19469139d1" }'</code></pre>
            <p>Furthermore, Workflows has first-party support in wrangler, and you can test your instances locally. A Workflow is similar to a regular<a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/rpc/"><u> WorkerEntrypoint</u></a> in your Worker, which means that <code>wrangler dev</code> just naturally works.</p>
            <pre><code>❯ npx wrangler dev

 ⛅️ wrangler 3.82.0
----------------------------

Your worker has access to the following bindings:
- Workflows:
  - CART_WORKFLOW: EcommerceCartWorkflow
⎔ Starting local server...
[wrangler:inf] Ready on http://localhost:8787
╭───────────────────────────────────────────────╮
│  [b] open a browser, [d] open devtools        │
╰───────────────────────────────────────────────╯
</code></pre>
            <p>Workflow APIs are also available as a Worker binding. You can interact with the platform programmatically from another Worker script in the same account without worrying about permissions or authentication. You can even have workflows that call and interact with other workflows.</p>
            <pre><code>import { WorkerEntrypoint } from "cloudflare:workers";

type Env = { DEMO_WORKFLOW: Workflow };
export default class extends WorkerEntrypoint&lt;Env&gt; {
  async fetch() {
    // Pass in a user defined name for this instance
    // In this case, we use the same as the cartId
    const instance = await this.env.DEMO_WORKFLOW.create({
      id: "f3bcc11b-2833-41fb-847f-1b19469139d1",
      params: {
          cartId: "f3bcc11b-2833-41fb-847f-1b19469139d1",
      }
    });
  }
  async scheduled() {
    // Restart errored out instances in a cron
    const instance = await this.env.DEMO_WORKFLOW.get(
      "f3bcc11b-2833-41fb-847f-1b19469139d1"
    );
    const status = await instance.status();
    if (status.error) {
      await instance.restart();
    }
  }
}</code></pre>
            
    <div>
      <h3>Observability </h3>
      <a href="#observability">
        
      </a>
    </div>
    <p>Having good <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability</a> and data on often long-lived asynchronous tasks is crucial to understanding how we're doing under normal operation and, more importantly, when things go south, and we need to troubleshoot problems or when we are iterating on code changes.</p><p>We designed Workflows around the philosophy that there is no such thing as too much logging. You can get all the SQLite data for your workflow and its instances by calling the REST APIs. Here is the output of an instance:</p>
            <pre><code>{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "status": "running",
    "params": {},
    "trigger": { "source": "api" },
    "versionId": "ae042999-39ff-4d27-bbcd-22e03c7c4d02",
    "queued": "2024-10-21 17:15:09.350",
    "start": "2024-10-21 17:15:09.350",
    "end": null,
    "success": null,
    "steps": [
      {
        "name": "send email",
        "start": "2024-10-21 17:15:09.411",
        "end": "2024-10-21 17:15:09.678",
        "attempts": [
          {
            "start": "2024-10-21 17:15:09.411",
            "end": "2024-10-21 17:15:09.678",
            "success": true,
            "error": null
          }
        ],
        "config": {
          "retries": { "limit": 5, "delay": 1000, "backoff": "constant" },
          "timeout": "15 minutes"
        },
        "output": "celso@example.com",
        "success": true,
        "type": "step"
      },
      {
        "name": "sleep-1",
        "start": "2024-10-21 17:15:09.763",
        "end": "2024-10-21 17:17:09.763",
        "finished": false,
        "type": "sleep",
        "error": null
      }
    ],
    "error": null,
    "output": null
  }
}</code></pre>
            <p>As you can see, this is essentially a dump of the instance engine SQLite in JSON. You have the <b>errors</b>, <b>messages</b>, current <b>status</b>, and what happened with <b>every step</b>, all time stamped to the millisecond.</p><p>It's one thing to get data about a specific workflow instance, but it's another to zoom out and look at aggregated statistics of all your workflows and instances over time. Workflows data is available through our <a href="https://developers.cloudflare.com/analytics/graphql-api/"><u>GraphQL Analytics API</u></a>, so you can query it in aggregate and generate valuable insights and reports. In this example we ask for aggregated analytics about the wall time of all the instances of the “e-commerce-carts” workflow:</p>
            <pre><code>{
  viewer {
    accounts(filter: { accountTag: "febf0b1a15b0ec222a614a1f9ac0f0123" }) {
      wallTime: workflowsAdaptiveGroups(
        limit: 10000
        filter: {
          datetimeHour_geq: "2024-10-20T12:00:00.000Z"
          datetimeHour_leq: "2024-10-21T12:00:00.000Z"
          workflowName: "e-commerce-carts"
        }
        orderBy: [count_DESC]
      ) {
        count
        sum {
          wallTime
        }
        dimensions {
          date: datetimeHour
        }
      }
    }
  }
}
</code></pre>
            <p>For convenience, you can evidently also use Wrangler to describe a workflow or an instance and get an instant and beautifully formatted response:</p>
            <pre><code>sid ~ npx wrangler workflows instances describe purchase-workflow latest

 ⛅️ wrangler 3.80.4

Workflow Name:         purchase-workflow
Instance Id:           d4280218-7756-41d2-bccd-8d647b82d7ce
Version Id:            0c07dbc4-aaf3-44a9-9fd0-29437ed11ff6
Status:                ✅ Completed
Trigger:               🌎 API
Queued:                14/10/2024, 16:25:17
Success:               ✅ Yes
Start:                 14/10/2024, 16:25:17
End:                   14/10/2024, 16:26:17
Duration:              1 minute
Last Successful Step:  wait for three days
Output:                false
Steps:

  Name:      wait for three days
  Type:      💤 Sleeping
  Start:     14/10/2024, 16:25:17
  End:       17/10/2024, 16:25:17
  Duration:  3 day</code></pre>
            <p>And finally, we worked really hard to get you the best dashboard UI experience when navigating Workflows data.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64XUtBwldkSXUTJ5xEJBgo/2aa861583c8c56c19194cb0869a15a2a/image8.png" />
          </figure>
    <div>
      <h2>So, how much does it cost?</h2>
      <a href="#so-how-much-does-it-cost">
        
      </a>
    </div>
    <p>It’d be painful if we introduced a powerful new way to build Workers applications but made it cost prohibitive.</p><p>Workflows is <a href="https://developers.cloudflare.com/workers/platform/pricing/#workers"><u>priced</u></a> just like Cloudflare Workers, where we <a href="https://blog.cloudflare.com/workers-pricing-scale-to-zero/"><u>introduced CPU-based pricing</u></a>: only on active CPU time and requests, not duration (aka: wall time).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/11WroT4xt0zPj6bsou4u3X/8f2775569f280107345322cb97603b3e/image4.png" />
          </figure><p><sup><i>Workers Standard pricing model</i></sup></p><p>This is especially advantageous when building the long-running, multi-step applications that Workflows enables: if you had to pay while your Workflow was sleeping, waiting on an event, or making a network call to an API, writing the “right” code would be at odds with writing affordable code.</p><p>There’s also no need to keep a Kubernetes cluster or a group of virtual machines running (and burning a hole in your wallet): we manage the infrastructure, and you only pay for the compute your Workflows consume.   </p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Today, after months of developing the platform, we are announcing the open beta program, and we couldn't be more excited to see how you will be using Workflows. Looking forward, we want to do things like triggering instances from queue messages and have other ideas, but at the same time, we are certain that your feedback will help us shape the roadmap ahead.</p><p>We hope that this blog post gets you thinking about how to use Workflows for your next application, but also that it inspires you on what you can build on top of Workers. Workflows as a platform is entirely built on top of Workers, its resources, and APIs. Anyone can do it, too.</p><p>To chat with the team and other developers building on Workflows, join the #workflows-beta channel on the<a href="https://discord.cloudflare.com/"> <u>Cloudflare Developer Discord</u></a>, and keep an eye on the<a href="https://developers.cloudflare.com/workflows/reference/changelog/"> <u>Workflows changelog</u></a> during the beta. Otherwise,<a href="https://developers.cloudflare.com/workflows/get-started/guide/"> visit the Workflows tutorial</a> to get started.</p><p>If you're an engineer, <a href="https://www.cloudflare.com/en-gb/careers/jobs/"><u>look for opportunities</u></a> to work with us and help us improve Workflows or build other products.</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Durable Objects]]></category>
            <category><![CDATA[Workflows]]></category>
            <guid isPermaLink="false">1YRfz7LKvAGrEMbRGhNrFP</guid>
            <dc:creator>Sid Chatterjee</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Celso Martinho</dc:creator>
        </item>
        <item>
            <title><![CDATA[Data Anywhere with Pipelines, Event Notifications, and Workflows]]></title>
            <link>https://blog.cloudflare.com/data-anywhere-events-pipelines-durable-execution-workflows/</link>
            <pubDate>Wed, 03 Apr 2024 13:00:17 GMT</pubDate>
            <description><![CDATA[ We make it easy to build scalable, reliable, data-driven applications, so we’re announcing a new Event Notifications framework; our take on durable execution; and a streaming ingestion service. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Data is fundamental to any real-world application: the database storing your user data and inventory, the analytics tracking sales events and/or error rates, the object storage with your web assets and/or the Parquet files driving your data science team, and the vector database enabling semantic search or AI-powered recommendations for your users.</p><p>When we first announced Workers <a href="/introducing-cloudflare-workers">back in 2017</a>, and then <a href="/introducing-workers-kv">Workers KV</a>, <a href="https://www.cloudflare.com/developer-platform/r2/">Cloudflare R2</a>, and <a href="https://www.cloudflare.com/developer-platform/products/d1/">D1</a>, it was obvious that the next big challenge to solve for developers would be in making it easier to ingest, store, and query the data needed to build scalable, full-stack applications.</p><p>To that end, as part of our quest to make building stateful, distributed-by-default applications even easier, we’re launching our new Event Notifications service; a preview of our upcoming streaming ingestion product, Pipelines; and a sneak peek into our take on durable execution, Workflows.</p>
    <div>
      <h3>Event-based architectures</h3>
      <a href="#event-based-architectures">
        
      </a>
    </div>
    <p>When you’re writing data — whether that’s new data, changing existing data, or deleting old data — you often want to trigger other, asynchronous work to run in response. That could be processing user-driven uploads, updating search indexes as the underlying data changes, or removing associated rows in your SQL database when content is removed.</p><p>In order to make these event-driven workflows far easier to build across Cloudflare, we’re launching the first step towards a wider Event Notifications platform across Cloudflare, starting with notifications support in R2.</p><p>You can read more in the deep-dive on <a href="/r2-events-gcs-migration-infrequent-access/">Event Notifications for R2</a>, but in a nutshell: you can configure changes to content in any R2 bucket to write directly to a <a href="https://developers.cloudflare.com/queues/">Queue</a>, allowing you to reliably consume those events in a Worker or to <a href="https://developers.cloudflare.com/queues/reference/pull-consumers/">pull from compute</a> in a legacy cloud.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/63wR0NuQIFaZUo5TvDvS29/79b6444d125a9c373f59d6cbdf1bb9c3/image2-3.png" />
            
            </figure><p>Event Notifications for R2 are just the beginning, though. There are many kinds of events you might want to trigger as a developer — these are just some of the event types we’re planning to support:</p><ul><li><p>Changes (writes) to key-value pairs in your <a href="https://developers.cloudflare.com/kv/">Workers KV</a> namespaces.</p></li><li><p>Updates to your <a href="https://developers.cloudflare.com/d1/">D1 databases</a>, including changed rows or triggers.</p></li><li><p><a href="https://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/">Deployments</a> to your Cloudflare Workers</p></li></ul><p>Consuming event notifications from a single Worker is just <i>one</i> approach, though. As you start to consume events, you may want to trigger multi-step <i>workflows</i> that execute reliably, resume from errors or exceptions, and ensure that previous steps aren’t duplicated or repeated unnecessarily. An event notification framework turns out to be just the thing needed to drive <a href="#durable-exectution">a workflow engine that <i>executes durably</i>…</a></p>
    <div>
      <h3>Making it even easier to ingest data</h3>
      <a href="#making-it-even-easier-to-ingest-data">
        
      </a>
    </div>
    <p>When we launched <a href="https://developers.cloudflare.com/r2/">Cloudflare R2</a>, our <a href="https://www.cloudflare.com/developer-platform/products/r2/">object storage service</a>, we knew that supporting the de facto-standard <a href="https://developers.cloudflare.com/r2/api/s3/api/">S3 API</a> was critical in order to allow developers to bring the tooling and services they already had over to R2. But the S3 API is designed to be simple: at its core, it provides APIs for upload, download, multipart and metadata operations, and many tools <i>don’t</i> support the S3 API.</p><p>What if you want to batch clickstream data from your web services so that it’s efficient (and cost-effective) to query by your analytics team? Or partition data by customer ID, merchant ID, or locale within a structured data format like JSON?</p><p>Well, we want to help solve this problem too, and so we’re announcing Pipelines, an upcoming streaming ingestion service designed to ingest data at scale, aggregate it, and write it directly to R2, without you having to manage infrastructure, partitions, runners, or worry about durability.</p><p>With Pipelines, creating a globally scalable ingestion endpoint that can ingest tens-of-thousands of events per second doesn’t require any code:</p>
            <pre><code>$ wrangler pipelines create clickstream-ingest-prod --batch-size="1MB" --batch-timeout-secs=120 --batch-on-json-key=".merchantId" --destination-bucket="prod-cs-data"

✅ Successfully created new pipeline "clickstream-ingest-prod"
📥 Created endpoints:
➡ HTTPS: https://d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com/clickstream-ingest-prod
➡ WebSocket: wss://d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com:8443/clickstream-ingest-prod
➡ Kafka: d458dbe698b8eef41837f941d73bc5b3.pipelines.cloudflarestorage.com:9092 (topic: clickstream-ingest-prod)</code></pre>
            <p>As you can see here, we’re already thinking about how to make Pipelines protocol-agnostic: write from a HTTP client, stream events over a WebSocket, and/or redirect your existing Kafka producer (and stop having to manage and scale Kafka) directly to Pipelines.</p><p>But that’s just the beginning of our vision here. Scalable ingestion and simple batching is one thing, but what about if you have more complex needs? Well, we have a massively scalable compute platform (<a href="https://developers.cloudflare.com/workers/">Cloudflare Workers</a>) that can help address this too.</p><p>The code below is just an initial exploration for how we’re thinking about an API for running transforms over streaming data. If you’re aware of projects like <a href="https://beam.apache.org/documentation/programming-guide/">Apache Beam</a> or <a href="https://flink.apache.org/">Flink</a>, this programming model might even look familiar:</p>
            <pre><code>export default {    
   // Pipeline handler is invoked when batch criteria are met
   async pipeline(stream: StreamPipeline, env: Env, ctx: ExecutionContext): Promise&lt;StreamingPipeline&gt; {
      // ...
      return stream
         // Type: transform(label: string, transformFunc: TransformFunction): Promise&lt;StreamPipeline&gt;
         // Each transform has a label that is used in metrics to provide
    // per-transform observability and debugging
         .transform("human readable label", (events: Array&lt;StreamEvent&gt;) =&gt; {
            return events.map((e) =&gt; ...)
         })
         .transform("another transform", (events: Array&lt;StreamEvent&gt;) =&gt; {
            return events.map((e) =&gt; ...)
         })
         .writeToR2({
            format: "json",
            bucket: "MY_BUCKET_NAME",
            prefix: somePrefix,
            batchSize: "10MB"
         })
   }
}</code></pre>
            <p>Specifically:</p><ul><li><p>The Worker describes a pipeline of transformations (mapping, reducing, filtering) that operates over each subset of events (records)</p></li><li><p>You can call out to other services — including D1 or KV — in order to synchronously or asynchronously hydrate data or lookup values during your stream processing</p></li><li><p>We take care of scaling horizontally based on records-per-second and/or any concurrency settings you configure based on processing latency requirements.</p></li></ul><p><b>We’ll be bringing Pipelines into open beta later in 2024</b>, and it will initially launch with support for HTTP ingestion and R2 as a destination (sink), but we’re already thinking bigger.</p><p>We’ll be sharing more as Pipelines gets closer to release. In the meantime, you can <a href="https://docs.google.com/forms/d/e/1FAIpQLSeuaQ5YZoXJej5h5KoEz6LNrVb7gASJ8msahJg8VmBeC0HEYQ/viewform?usp=sf_link">register your interest and share your use-case</a>, and we’ll reach out when Pipelines reaches open beta.</p>
    <div>
      <h3>Durable Execution</h3>
      <a href="#durable-execution">
        
      </a>
    </div>
    <p>If the term “Durable Execution” is new to you, don’t worry: the term comes from the desire to run applications that can resume execution from where they left off, even if the underlying host or compute fails (where the “durable” part comes from).</p><p>As we’ve continued to build out our data and AI platforms, we’ve been acutely aware that developers need ways to create reliable, repeatable workflows that operate over that data, turn unstructured data into structured data, trigger on fresh data (or periodically), and automatically retry, restart, and export metrics for each step along the way. The industry calls this Durable Execution: we’re just calling it <i>Workflows</i>.</p><p>What makes Workflows different from other takes on Durable Execution is that we provide the underlying compute as part of the platform. You don’t have to bring-your-own compute, or worry about scaling it or provisioning it in the right locations. Workflows runs on top of <a href="https://developers.cloudflare.com/workers/">Cloudflare Workers</a> – you write the workflow, and we take care of the rest.</p><p>Here’s an early example of writing a Workflow that generates text embeddings using Workers AI and stores them (ready to query) in Vectorize as new content is written to (or updated within) R2.</p><ul><li><p>Each Workflow <i>run</i> is triggered by an Event Notification consumed from a Queue, but could also be triggered by a HTTP request, another Worker, or even scheduled on a timer.</p></li><li><p>Individual <i>steps</i> within the Workflow allow us to define individually retriable units of work: in this case, we’re reading the new objects from R2, creating text embeddings using Workers AI, and then inserting.</p></li><li><p>State is <i>durably</i> persisted between steps: each step can emit state, and Workflows will automatically persist that so that any underlying failures, uncaught exceptions or network retries can resume execution from the last successful step.</p></li><li><p>Every call to step() automatically emits metrics associated with the unique Workflow run, making it easier to debug within each step and/or break down your application into its smallest units of execution, without having to worry about <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability</a>.</p></li></ul><p>Step-by-step, it looks like this:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6OCCJMjL6NCXeeVTD4aNeb/43cda373e2969d263d03226b19244429/image4-5.png" />
            
            </figure><p>Transforming this series of steps into real code, here’s what this would look like with Workflows:</p>
            <pre><code>import { Ai } from "@cloudflare/ai";
import { Workflow } from "cloudflare:workers";

export interface Env {
  R2: R2Bucket;
  AI: any;
  VECTOR_INDEX: VectorizeIndex;
}

export default class extends Workflow {
  async run(event: Event) {
    const ai = new Ai(this.env.AI);

    // List of keys to fetch from our incoming event notification
    const keysToFetch = event.messages.map((val) =&gt; {
      return val.object.key;
    });

    // The return value of each step is stored (the "durable" part
    // of "durable execution")
    // This ensures that state can be persisted between steps, reducing
    // the need to recompute results ($$, time) should subsequent
    // steps fail.
    const inputs = await this.ctx.run(
      // Each step has a user-defined label
      // Metrics are emitted as each step runs (to success or failure)
// with this label attached and available within per-Workflow
// analytics in near-real-time.
"read objects from R2", async () =&gt; {
      const objects = [];

      for (const key of keysToFetch) {
        const object = await this.env.R2.get(key);
        objects.push(await object.text());
      }

      return objects;
    });


    // Persist the output of this step.
    const embeddings = await this.ctx.run(
      "generate embeddings",
      async () =&gt; {
        const { data } = await ai.run("@cf/baai/bge-small-en-v1.5", {
          text: inputs,
        });

        if (data.length) {
          return data;
        } else {
          // Uncaught exceptions trigger an automatic retry of the step
          // Retries and timeouts have sane defaults and can be overridden
    // per step
          throw new Error("Failed to generate embeddings");
        }
      },
      {
        retries: {
          limit: 5,
          delayMs: 1000,
          backoff: "exponential",
        },
      }
    );

    await this.ctx.run("insert vectors", async () =&gt; {
      const vectors = [];

      keysToFetch.forEach((key, index) =&gt; {
        vectors.push({
          id: crypto.randomUUID(),
          // Our embeddings from the previous step
          values: embeddings[index].values, 
          // The path to each R2 object to map back to during
 	    // vector search
          metadata: { r2Path: key },
        });
      });

      return this.env.VECTOR_INDEX.upsert(vectors);
    });
  }
}</code></pre>
            <p>This is just one example of what a Workflow can do. The ability to durably execute an application, modeled as a series of steps, applies to a wide number of domains. You can apply this model of execution to a number of use-cases, including:</p><ul><li><p>Deploying software: each step can define a build step and subsequent health check, gating further progress until your deployment meets your criteria for “healthy”.</p></li><li><p>Post-processing user data: triggering a workflow based on user uploads (e.g. to Cloudflare R2) that then subsequently parses that data asynchronously, redacts PII or sensitive data, writes the sanitized output, and triggers a notification via email, webhook, or mobile push.</p></li><li><p>Payment and batch workflows: aggregating raw customer usage data on a periodic schedule by querying your data warehouse (or <a href="https://developers.cloudflare.com/analytics/analytics-engine/">Workers Analytics Engine</a>), triggering usage or spend alerts, and/or generating PDF invoices.</p></li></ul><p>Each of these use cases model tasks that you want to run to completion, minimize redundant retries by persisting intermediate state, and (importantly) easily observe success and failure.</p><p><b>We’ll be sharing more about Workflows during the second quarter of 2024 as we work towards an open (public!) beta</b>. This includes how we’re thinking about idempotency and interactions with our storage, per-instance observability and metrics, local development, and templates to bootstrap common workflows.</p>
    <div>
      <h3>Putting it together</h3>
      <a href="#putting-it-together">
        
      </a>
    </div>
    <p>We’ve often thought of Cloudflare’s own network as one massively scalable parallel data processing cluster: <a href="https://www.cloudflare.com/network/">data centers in 310+ cities</a>, with the ability to run compute close to users and/or <a href="https://smart-placement-demo.pages.dev/">close to data</a>, keep it within the bounds of regulatory or compliance requirements, and most importantly, use our massive scale to enable our customers to scale as well.</p><p>Recapping, a fully-fledged data platform needs to enable three things:</p><ol><li><p>Ingesting data: getting data into the platform (in the right format, from the right sources)</p></li><li><p>Storing data: securely, reliably, and durably.</p></li><li><p>Querying data: understanding and extracting insights from the data, and/or transforming it for use by other tools.</p></li></ol><p>When we launched R2 we tackled the second part, but knew that we’d need to follow up with the first and third parts in order to make it easier for developers to get data in and make use of it.</p><p>If we look at how we can build a system that helps us solve each of these three parts together with Pipelines, Event Notifications, R2, and Workflows, we end up with an architecture that resembles this:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/15ue6XZBrDyTtfja0z8VS5/57140a77d3881b3b3c8c5a0186c2a84f/image1-2.png" />
            
            </figure><p>Specifically, we have Pipelines (1) scaling out to ingest data, batch it, filter it, and then durably store it in R2 (2) in a format that’s ready and optimized for querying. Workflows, ClickHouse, Databricks, or the query engine of your choice can then query (3) that data as soon as it’s ready — with “ready” being automatically triggered by an Event Notification <i>as soon as the data is ingested and written to R2</i>.</p><p>There’s no need to poll, no need to batch after the fact, no need to have your query engine slow down on data that wasn’t pre-aggregated or filtered, and no need to manage and scale infrastructure in order to keep up with load or data jurisdiction requirements. Create a Pipeline, write your data directly to R2, and query directly from it.</p><p>If you’re also looking at this and wondering about the costs of moving this data around, then we’re holding to one important principle: <a href="https://www.cloudflare.com/the-net/cloud-egress-fees-challenge-future-ai/">zero egress fees</a> across all of our data products. Just as we set the stage for this with <a href="/introducing-r2-object-storage">our R2 object storage</a>, we intend to apply this to every data product we’re building, Pipelines included.</p>
    <div>
      <h3>Start Building</h3>
      <a href="#start-building">
        
      </a>
    </div>
    <p>We’ve shared a lot of what we’re building so that developers have an opportunity to provide feedback (including via our <a href="https://discord.cloudflare.com/">Developer Discord</a>), share use-cases, and think about how to build their <i>next</i> application on Cloudflare.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Connectivity Cloud]]></category>
            <guid isPermaLink="false">28tFcr3KkFpN9Y4ogpxvHX</guid>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[Making state easy with D1 GA, Hyperdrive, Queues and Workers Analytics Engine updates]]></title>
            <link>https://blog.cloudflare.com/making-full-stack-easier-d1-ga-hyperdrive-queues/</link>
            <pubDate>Mon, 01 Apr 2024 13:00:06 GMT</pubDate>
            <description><![CDATA[ We kick off the week with announcements that help developers build stateful applications on top of Cloudflare, including making D1, our SQL database and Hyperdrive, our database accelerating service, generally available ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4BKrpfqvHnl6yaHdXXsCoc/70280206c43fc4ecfa026968440f52f0/image4-31.png" />
            
            </figure>
    <div>
      <h3>Making full-stack easier</h3>
      <a href="#making-full-stack-easier">
        
      </a>
    </div>
    <p>Today might be April Fools, and while we like to have fun as much as anyone else, we like to use this day for serious announcements. In fact, as of today, there are over 2 million developers building on top of Cloudflare’s platform — that’s no joke!</p><p>To kick off this Developer Week, we’re flipping the big “production ready” switch on three products: <a href="https://developers.cloudflare.com/d1/">D1, our serverless SQL database</a>; <a href="https://developers.cloudflare.com/hyperdrive/">Hyperdrive</a>, which makes your <i>existing</i> databases feel like they’re distributed (and faster!); and <a href="https://developers.cloudflare.com/analytics/analytics-engine/">Workers Analytics Engine</a>, our time-series database.</p><p>We’ve been on a mission to allow developers to bring their entire stack to Cloudflare for some time, but what might an application built on Cloudflare look like?</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5D3F21rYXhLv0bI6FID3Kc/4b0ca6dfc52e168a852599345e111a02/image6-11.png" />
            
            </figure><p>The diagram itself shouldn’t look too different from the tools you’re already familiar with: you want a <a href="https://developers.cloudflare.com/d1/">database</a> for your core user data. <a href="https://www.cloudflare.com/learning/cloud/what-is-object-storage/">Object storage</a> for assets and user content. Maybe a <a href="https://developers.cloudflare.com/queues/">queue</a> for background tasks, like email or upload processing. A <a href="https://developers.cloudflare.com/kv/">fast key-value store</a> for runtime configuration. Maybe even a <a href="https://developers.cloudflare.com/analytics/analytics-engine/">time-series database</a> for aggregating user events and/or performance data. And that’s before we get to <a href="https://developers.cloudflare.com/workers-ai/">AI</a>, which is increasingly becoming a core part of many applications in search, recommendation and/or image analysis tasks (at the very least!).</p><p>Yet, without having to think about it, this architecture runs on Region: Earth, which means it’s scalable, reliable and fast — all out of the box.</p>
    <div>
      <h3>D1 GA: Production Ready</h3>
      <a href="#d1-ga-production-ready">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6FBwcKFjSHTCL2LcJRtNCo/46c6a403e7f8c743ac8a4dff252d85e4/image2-35.png" />
            
            </figure><p>Your core database is one of the most critical pieces of your infrastructure. It needs to be ultra-reliable. It can’t lose data. It needs to scale. And so we’ve been heads down over the last year getting the pieces into place to make sure D1 is production-ready, and we’re extremely excited to say that D1 — our <a href="https://www.cloudflare.com/developer-platform/products/d1/">global, serverless SQL database</a> — is now Generally Available.</p><p>The GA for D1 lands some of the most asked-for features, including:</p><ul><li><p>Support for 10GB databases — and 50,000 databases per account;</p></li><li><p>New data export capabilities; and</p></li><li><p>Enhanced query debugging (we call it “D1 Insights”) — that allows you to understand what queries are consuming the most time, cost, or that are just plain inefficient…  </p></li></ul><p>… to empower developers to build production-ready applications with D1 to meet all their relational SQL needs. And importantly, in an era where the concept of a “<a href="https://www.cloudflare.com/plans/free/">free plan</a>” or “hobby plan” is seemingly at risk, we have no intention of removing the free tier for D1 or reducing the <i>25 billion row reads</i> included in the $5/mo Workers Paid plan:</p><table><colgroup><col></col><col></col><col></col><col></col></colgroup><tbody><tr><td><p><span>Plan</span></p></td><td><p><span>Rows Read</span></p></td><td><p><span>Rows Written</span></p></td><td><p><span>Storage</span></p></td></tr><tr><td><p><span>Workers</span><span> </span><span>Paid</span></p></td><td><p><span>First 25 billion / month included</span><span><br /></span><span><br /></span><span>+ $0.001 / million rows</span></p></td><td><p><span>First 50 million / month included</span><span><br /></span><span><br /></span><span>+ $1.00 / million rows</span></p></td><td><p><span>First 5 GB included</span></p><br /><p><span>+ $0.75 / GB-mo</span></p></td></tr><tr><td><p><span>Workers Free</span></p></td><td><p><span>5 million / day</span></p></td><td><p><span>100,000 / day</span><span><span>	</span></span></p></td><td><p><span>5 GB (total)</span></p></td></tr></tbody></table><p><i>For those who’ve been following D1 since the start: this is the same pricing we announced at </i><a href="/d1-open-beta-is-here"><i>open beta</i></a></p><p>But things don’t just stop at GA: we have some major new features lined up for D1, including global read replication, even larger databases, more <a href="https://developers.cloudflare.com/d1/reference/time-travel/">Time Travel</a> capabilities that will allow you to branch your database, and new APIs for dynamically querying and/or creating new databases-on-the-fly from within a Worker.</p><p>D1’s read replication will automatically deploy read replicas as needed to get data closer to your users: and without you having to spin up, manage scaling, or run into consistency (replication lag) issues. Here’s a sneak preview of what D1’s upcoming Replication API looks like:</p>
            <pre><code>export default {
  async fetch(request: Request, env: Env) {
    const {pathname} = new URL(request.url);
    let resp = null;
    let session = env.DB.withSession(token); // An optional commit token or mode

    // Handle requests within the session.
    if (pathname === "/api/orders/list") {
      // This statement is a read query, so it will work against any
      // replica that has a commit equal or later than `token`.
      const { results } = await session.prepare("SELECT * FROM Orders");
      resp = Response.json(results);
    } else if (pathname === "/api/orders/add") {
      order = await request.json();

      // This statement is a write query, so D1 will send the query to
      // the primary, which always has the latest commit token.
      await session.prepare("INSERT INTO Orders VALUES (?, ?, ?)")
        .bind(order.orderName, order.customer, order.value);
        .run();

      // In order for the application to be correct, this SELECT
      // statement must see the results of the INSERT statement above.
      //
      // D1's new Session API keeps track of commit tokens for queries
      // within the session and will ensure that we won't execute this
      // query until whatever replica we're using has seen the results
      // of the INSERT.
      const { results } = await session.prepare("SELECT COUNT(*) FROM Orders")
        .run();
      resp = Response.json(results);
    }

    // Set the token so we can continue the session in another request.
    resp.headers.set("x-d1-token", session.latestCommitToken);
    return resp;
  }
}</code></pre>
            <p>Importantly, we will give developers the ability to maintain session-based consistency, so that users still see their own changes reflected, whilst still benefiting from the performance and latency gains that replication can bring.</p><p>You can learn more about how D1’s read replication works under the hood <a href="/building-d1-a-global-database/">in our deep-dive post</a>, and if you want to start building on D1 today, <a href="https://developers.cloudflare.com/d1/">head to our developer docs</a> to create your first database.</p>
    <div>
      <h3>Hyperdrive: GA</h3>
      <a href="#hyperdrive-ga">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/47WBGHvqFpRkza2ldA5RBi/7f7f47055e1f4f066e213b88e9e98737/image1-37.png" />
            
            </figure><p>We launched Hyperdrive into open beta <a href="/hyperdrive-making-regional-databases-feel-distributed">last September during Birthday Week</a>, and it’s now Generally Available — or in other words, battle-tested and production-ready.</p><p>If you’re not caught up on what Hyperdrive is, it’s designed to make the centralized databases you already have feel like they’re global. We use our <a href="https://www.cloudflare.com/network/">global network</a> to get faster routes to your database, keep connection pools primed, and cache your most frequently run queries as close to users as possible.</p><p>Importantly, Hyperdrive supports the most popular drivers and ORM (Object Relational Mapper) libraries out of the box, so you don’t have to re-learn or re-write your queries:</p>
            <pre><code>// Use the popular 'pg' driver? Easy. Hyperdrive just exposes a connection string
// to your Worker.
const client = new Client({ connectionString: env.HYPERDRIVE.connectionString });
await client.connect();

// Prefer using an ORM like Drizzle? Use it with Hyperdrive too.
// https://orm.drizzle.team/docs/get-started-postgresql#node-postgres
const client = new Client({ connectionString: env.HYPERDRIVE.connectionString });
await client.connect();
const db = drizzle(client);</code></pre>
            <p>But the work on Hyperdrive doesn’t stop just because it’s now “GA”. Over the next few months, we’ll be bringing support for the <i>other</i> most widely deployed database engine there is: MySQL. We’ll also be bringing support for connecting to databases inside private networks (including cloud VPC networks) via <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/">Cloudflare Tunnel</a> and <a href="https://developers.cloudflare.com/magic-wan/">Magic WAN</a> On top of that, we plan to bring more configurability around invalidation and caching strategies, so that you can make more fine-grained decisions around performance vs. data freshness.</p><p>As we thought about how we wanted to price Hyperdrive, we realized that it just didn’t seem right to charge for it. After all, the performance benefits from Hyperdrive are not only significant, but essential to connecting to traditional database engines. Without Hyperdrive, paying the latency overhead of 6+ round-trips to connect &amp; query your database per request just isn’t right.</p><p>And so we’re happy to announce that <b>for any developer on a Workers Paid plan, Hyperdrive is free</b>. That includes both query caching and connection pooling, as well as the ability to create multiple Hyperdrives — to separate different applications, prod vs. staging, or to provide different configurations (cached vs. uncached, for example).</p><table><colgroup><col></col><col></col><col></col></colgroup><tbody><tr><td><p><span>Plan</span></p></td><td><p><span>Price per query</span></p></td><td><p><span>Connection Pooling</span></p></td></tr><tr><td><p><span>Workers</span><span> </span><span>Paid</span></p></td><td><p><span>$0 </span></p></td><td><p><span>$0</span></p></td></tr></tbody></table><p>To get started with Hyperdrive, <a href="https://developers.cloudflare.com/hyperdrive/">head over to the docs</a> to learn how to connect your existing database and start querying it from your Workers.</p>
    <div>
      <h3>Queues: Pull From Anywhere</h3>
      <a href="#queues-pull-from-anywhere">
        
      </a>
    </div>
    <p>The task queue is an increasingly critical part of building a modern, full-stack application, and this is what we had in mind when we <a href="/cloudflare-queues-open-beta">originally announced</a> the open beta of <a href="https://developers.cloudflare.com/queues/">Queues</a>. We’ve since been working on several major Queues features, and we’re launching two of them this week: pull-based consumers and new message delivery controls.</p><p>Any HTTP-speaking client <a href="https://developers.cloudflare.com/queues/reference/pull-consumers/">can now pull messages from a queue</a>: call the new /pull endpoint on a queue to request a batch of messages, and call the /ack endpoint to acknowledge each message (or batch of messages) as you successfully process them:</p>
            <pre><code>// Pull and acknowledge messages from a Queue using any HTTP client
$  curl "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/queues/${QUEUE_ID}/messages/pull" -X POST --data '{"visibilityTimeout":10000,"batchSize":100}}' \
     -H "Authorization: Bearer ${QUEUES_TOKEN}" \
     -H "Content-Type:application/json"

// Ack the messages you processed successfully; mark others to be retried.
$ curl "https://api.cloudflare.com/client/v4/accounts/${CF_ACCOUNT_ID}/queues/${QUEUE_ID}/messages/ack" -X POST --data '{"acks":["lease-id-1", "lease-id-2"],"retries":["lease-id-100"]}' \
     -H "Authorization: Bearer ${QUEUES_TOKEN}" \
     -H "Content-Type:application/json"</code></pre>
            <p>A pull-based consumer can run anywhere, allowing you to run queue consumers alongside your existing legacy cloud infrastructure. Teams inside Cloudflare adopted this early on, with one use-case focused on writing device telemetry to a queue from our <a href="https://www.cloudflare.com/network/">310+ data centers</a> and consuming within some of our back-of-house infrastructure running on Kubernetes. Importantly, our globally distributed queue infrastructure means that messages are retained within the queue until the consumer is ready to process them.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2UUkrE3bqqIdQiemV49Hal/496c2d539b366a794d58479c99b1c9ec/image5-19.png" />
            
            </figure><p>Queues also <a href="https://developers.cloudflare.com/queues/reference/batching-retries/#delay-messages">now supports delaying messages</a>, both when sending to a queue, as well as when marking a message for retry. This can be useful to queue (pun intended) tasks for the future, as well apply a backoff mechanism if an upstream API or infrastructure has rate limits that require you to pace how quickly you are processing messages.</p>
            <pre><code>// Apply a delay to a message when sending it
await env.YOUR_QUEUE.send(msg, { delaySeconds: 3600 })

// Delay a message (or a batch of messages) when marking it for retry
for (const msg of batch.messages) {
	msg.retry({delaySeconds: 300})
} </code></pre>
            <p>We’ll also be bringing substantially increased per-queue throughput over the coming months on the path to getting Queues to GA. It’s important to us that Queues is <i>extremely</i> reliable: lost or dropped messages means that a user doesn’t get their order confirmation email, that password reset notification, and/or their uploads processed — each of those are user-impacting and hard to recover from.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1RhxWjKGRmoJtgQ4toybvY/57469d1ee721096a3c2b7551bbd277a4/image3-35.png" />
            
            </figure>
    <div>
      <h3>Workers Analytics Engine is GA</h3>
      <a href="#workers-analytics-engine-is-ga">
        
      </a>
    </div>
    <p><a href="https://developers.cloudflare.com/analytics/analytics-engine/">Workers Analytics Engine</a> provides unlimited-cardinality analytics at scale, via a built-in API to write data points from Workers, and a SQL API to query that data.</p><p>Workers Analytics Engine is backed by the same ClickHouse-based system we have depended on for years at Cloudflare. We use it ourselves to observe the health of our own services, to capture product usage data for billing, and to answer questions about specific customers’ usage patterns. At least one data point is written to this system on nearly every request to Cloudflare’s network. Workers Analytics Engine lets you build your own custom analytics using this same infrastructure, while we manage the hard parts for you.</p><p>Since <a href="/workers-analytics-engine">launching in beta</a>, developers have started depending on Workers Analytics Engine for these same use cases and more, from large enterprises to open-source projects like <a href="https://github.com/benvinegar/counterscale/">Counterscale</a>. Workers Analytics Engine has been operating at production scale with mission-critical workloads for years — but we hadn’t shared anything about pricing, until today.</p><p>We are keeping Workers Analytics Engine pricing simple, and based on two metrics:</p><ol><li><p><b>Data points written</b> — every time you call <a href="https://developers.cloudflare.com/analytics/analytics-engine/get-started/#3-write-data-from-your-worker">writeDataPoint()</a> in a Worker, this counts as one data point written. Every data point costs the same amount — unlike other platforms, there is no penalty for adding dimensions or cardinality, and no need to predict what the size and cost of a compressed data point might be.</p></li><li><p><b>Read queries</b> — every time you post to the Workers Analytics Engine <a href="https://developers.cloudflare.com/analytics/analytics-engine/sql-api/">SQL API</a>, this counts as one read query. Every query costs the same amount — unlike other platforms, there is no penalty for query complexity, and no need to reason about the number of rows of data that will be read by each query.</p></li></ol><p>Both the Workers Free and Workers Paid plans will include an allocation of data points written and read queries, with pricing for additional usage as follows:</p><table><colgroup><col></col><col></col><col></col></colgroup><tbody><tr><td><p><span>Plan</span></p></td><td><p><span>Data points written</span></p></td><td><p><span>Read queries</span></p></td></tr><tr><td><p><span>Workers</span><span> </span><span>Paid</span></p></td><td><p><span>10 million included per month</span></p><p><span><br /></span><span>+$0.25 per additional million</span></p></td><td><p><span>1 million included per month</span></p><p><span><br /></span><span>+$1.00 per additional million</span></p></td></tr><tr><td><p><span>Workers Free</span></p></td><td><p><span>100,000 included per day</span></p></td><td><p><span>10,000 included per day</span></p></td></tr></tbody></table><p>With this pricing, you can answer, “how much will Workers Analytics Engine cost me?” by counting the number of times you call a function in your Worker, and how many times you make a request to a HTTP API endpoint. Napkin math, rather than spreadsheet math.</p><p>This pricing will be made available to everyone in coming months. Between now and then, Workers Analytics Engine continues to be available at no cost. You can <a href="https://developers.cloudflare.com/analytics/analytics-engine/get-started/#limits">start writing data points from your Worker today</a> — it takes just a few minutes and less than 10 lines of code to start capturing data. We’d love to hear what you think.</p>
    <div>
      <h3>The week is just getting started</h3>
      <a href="#the-week-is-just-getting-started">
        
      </a>
    </div>
    <p>Tune in to what we have in store for you tomorrow on our second day of Developer Week. If you have questions or want to show off something cool you already built, please join our developer <a href="https://discord.cloudflare.com/"><i>Discord</i></a>.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[D1]]></category>
            <category><![CDATA[Hyperdrive]]></category>
            <category><![CDATA[Queues]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <guid isPermaLink="false">5O8kPvrc2dyHIwmf2c0shv</guid>
            <dc:creator>Rita Kozlov</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare incident on October 30, 2023]]></title>
            <link>https://blog.cloudflare.com/cloudflare-incident-on-october-30-2023/</link>
            <pubDate>Wed, 01 Nov 2023 16:39:43 GMT</pubDate>
            <description><![CDATA[ Multiple Cloudflare services were unavailable for 37 minutes on October 30, 2023, due to the misconfiguration of a deployment tool used by Workers KV. ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1fuf2Zu6hQEVim57ZP2cze/0c4311c85148c448749069bf6cb900a4/Vulnerabilitiy-1.png" />
            
            </figure><p>Multiple Cloudflare services were unavailable for 37 minutes on October 30, 2023. This was due to the misconfiguration of a deployment tool used by Workers KV. This was a frustrating incident, made more difficult by Cloudflare’s reliance on our own suite of products. We are deeply sorry for the impact it had on customers. What follows is a discussion of what went wrong, how the incident was resolved, and the work we are undertaking to ensure it does not happen again.</p><p>Workers KV is our globally distributed key-value store. It is used by both customers and Cloudflare teams alike to manage configuration data, routing lookups, static asset bundles, authentication tokens, and other data that needs low-latency access.</p><p>During this incident, KV returned what it believed was a valid HTTP 401 (Unauthorized) status code instead of the requested key-value pair(s) due to a bug in a new deployment tool used by KV.</p><p>These errors manifested differently for each product depending on how KV is used by each service, with their impact detailed below.</p>
    <div>
      <h3>What was impacted</h3>
      <a href="#what-was-impacted">
        
      </a>
    </div>
    <p>A number of Cloudflare services depend on Workers KV for distributing configuration, routing information, static asset serving, and authentication state globally. These services instead received an HTTP 401 (Unauthorized) error when performing any get, put, delete, or list operation against a KV namespace.</p><p>Customers using the following Cloudflare products would have observed heightened error rates and/or would have been unable to access some or all features for the duration of the incident:</p>
<table>
<thead>
  <tr>
    <th><span>Product</span></th>
    <th><span>Impact</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>Workers KV</span></td>
    <td><span>Customers with applications leveraging KV saw those applications fail during the duration of this incident, including both the KV API within Workers, and the REST API.</span><br /><span>Workers applications not using KV were not impacted.</span></td>
  </tr>
  <tr>
    <td><span>Pages</span></td>
    <td><span>Applications hosted on Pages were unreachable for the duration of the incident and returned HTTP 500 errors to users. New Pages deployments also returned HTTP 500 errors to users for the duration.</span></td>
  </tr>
  <tr>
    <td><span>Access</span></td>
    <td><span>Users who were unauthenticated could not log in; any origin attempting to validate the JWT using the /certs endpoint would fail; any application with a device posture policy failed for all users.</span><br /><span>Existing logged-in sessions that did not use the /certs endpoint or posture checks were unaffected. Overall, a large percentage of existing sessions were still affected.</span></td>
  </tr>
  <tr>
    <td><span>WARP / Zero Trust</span></td>
    <td><span>Users were unable to register new devices or connect to resources subject to policies that enforce Device Posture checks or WARP Session timeouts.</span><br /><span>Devices already enrolled, resources not relying on device posture, or that had re-authorized outside of this window were unaffected.</span></td>
  </tr>
  <tr>
    <td><span>Images</span></td>
    <td><span>The Images API returned errors during the incident. Existing image delivery was not impacted.</span></td>
  </tr>
  <tr>
    <td><span>Cache Purge (single file)</span></td>
    <td><span>Single file purge was partially unavailable for the duration of the incident as some data centers could not access configuration data in KV. Data centers that had existing configuration data locally cached were unaffected.</span><br /><span>Other cache purge mechanisms, including purge by tag, were unaffected.</span></td>
  </tr>
  <tr>
    <td><span>Workers</span></td>
    <td><span>Uploading or editing Workers through the dashboard, wrangler or API returned errors during the incident. Deployed Workers were not impacted, unless they used KV. </span></td>
  </tr>
  <tr>
    <td><span>AI Gateway</span></td>
    <td><span>AI Gateway was not able to proxy requests for the duration of the incident.</span></td>
  </tr>
  <tr>
    <td><span>Waiting Room</span></td>
    <td><span>Waiting Room configuration is stored at the edge in Workers KV. Waiting Room configurations, and configuration changes, were unavailable and the service failed open.</span><br /><span>When access to KV was restored, some Waiting Room users would have experienced queuing as the service came back up. </span></td>
  </tr>
  <tr>
    <td><span>Turnstile and Challenge Pages</span></td>
    <td><span>Turnstile's JavaScript assets are stored in KV, and the entry point for Turnstile (api.js) was not able to be served. Clients accessing pages using Turnstile could not initialize the Turnstile widget and would have failed closed during the incident window.</span><br /><span>Challenge Pages (which products like Custom, Managed and Rate Limiting rules use) also use Turnstile infrastructure for presenting challenge pages to users under specific conditions, and would have blocked users who were presented with a challenge during that period.</span></td>
  </tr>
  <tr>
    <td><span>Cloudflare Dashboard</span></td>
    <td><span>Parts of the Cloudflare dashboard that rely on Turnstile and/or our internal feature flag tooling (which uses KV for configuration) returned errors to users for the duration. </span></td>
  </tr>
</tbody>
</table>
    <div>
      <h3>Timeline</h3>
      <a href="#timeline">
        
      </a>
    </div>
    <p><i>All timestamps referenced are in Coordinated Universal Time (UTC).</i></p>
<table>
<thead>
  <tr>
    <th><span>Time</span></th>
    <th><span>Description</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>2023-10-30 18:58 UTC</span></td>
    <td><span>The Workers KV team began a progressive deployment of a new KV build to production.</span></td>
  </tr>
  <tr>
    <td><span>2023-10-30 19:29 UTC</span></td>
    <td><span>The internal progressive deployment API returns staging build GUID to a call to list production builds. </span></td>
  </tr>
  <tr>
    <td><span>2023-10-30 19:40 UTC</span></td>
    <td><span>The progressive deployment API was used to continue rolling out the release. This routed a percentage of traffic to the wrong destination, triggering alerting and leading to the decision to roll back.</span></td>
  </tr>
  <tr>
    <td><span>2023-10-30 19:54 UTC</span></td>
    <td><span>Rollback via progressive deployment API attempted, traffic starts to fail at scale. </span><span>— IMPACT START —</span></td>
  </tr>
  <tr>
    <td><span>2023-10-30 20:15 UTC</span></td>
    <td><span>Cloudflare engineers manually edit (via break glass mechanisms) deployment routes to revert to last known good build for the majority of traffic.</span></td>
  </tr>
  <tr>
    <td><span>2023-10-30 20:29 UTC</span></td>
    <td><span>Workers KV error rates return to normal pre-incident levels, and impacted services recover within the following minute.</span></td>
  </tr>
  <tr>
    <td><span>2023-10-30 20:31 UTC</span></td>
    <td><span>Impact resolved </span><span>— IMPACT END — </span></td>
  </tr>
</tbody>
</table><p>As shown in the above timeline, there was a delay between the time we realized we were having an issue at 19:54 UTC and the time we were actually able to perform the rollback at 20:15 UTC.</p><p>This was caused by the fact that multiple tools within Cloudflare rely on Workers KV including Cloudflare Access. Access leverages Workers KV as part of its request verification process. Due to this, we were unable to leverage our internal tooling and had to use break-glass mechanisms to bypass the normal tooling. As described below, we had not spent sufficient time testing the rollback mechanisms. We plan to harden this moving forward.</p>
    <div>
      <h3>Resolution</h3>
      <a href="#resolution">
        
      </a>
    </div>
    <p>Cloudflare engineers manually switched (via break glass mechanism) the production route to the previous working version of Workers KV, which immediately eliminated the failing request path and subsequently resolved the issue with the Workers KV deployment.</p>
    <div>
      <h3>Analysis</h3>
      <a href="#analysis">
        
      </a>
    </div>
    <p>Workers KV is a low-latency key-value store that allows users to store persistent data on Cloudflare's network, as close to the users as possible. This distributed key-value store is used in many applications, some of which are first-party Cloudflare products like Pages, Access, and Zero Trust.</p><p>The Workers KV team was progressively deploying a new release using a specialized deployment tool. The deployment mechanism contains a staging and a production environment, and utilizes a process where the production environment is upgraded to the new version at progressive percentages until all production environments are upgraded to the most recent production build. The deployment tool had a latent bug with how it returns releases and their respective versions. Instead of returning releases from a single environment, the tool returned a broader list of releases than intended, resulting in production and staging releases being returned together.</p><p>In this incident, the service was deployed and tested in staging. But because of the deployment automation bug, when promoting to production, a script that had been deployed to the staging account was incorrectly referenced instead of the pre-production version on the production account. As a result, the deployment mechanism pointed the production environment to a version that was not running anywhere in the production environment, effectively black-holing traffic.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1YKI1LYglUMvikcDlkZF41/ff8ba4a17059a4139884ee71524a8209/image1.png" />
            
            </figure><p>When this happened, Workers KV became unreachable in production, as calls to the product were directed to a version that was not authorized for production access, returning a HTTP 401 error code. This caused dependent products which stored key-value pairs in KV to fail, regardless of whether the key-value pair was cached locally or not.</p><p>Although automated alerting detected the issue immediately, there was a delay between the time we realized we were having an issue and the time we were actually able to perform the roll back. This was caused by the fact that multiple tools within Cloudflare rely on Workers KV including Cloudflare Access. Access uses Workers KV as part of the verification process for user JWTs (JSON Web Tokens).</p><p>These tools include the dashboard which was used to revert the change, and the authentication mechanism to access our continuous integration (CI) system. As Workers KV was down, so too were these services. Automatic rollbacks via our CI system had been successfully tested previously, but the authentication issues (Access relies on KV) due to the incident made accessing the necessary secrets to roll back the deploy impossible.</p><p>The fix ultimately was a manual change of the production build path to a previous and known good state. This path was known to have been deployed and was the previous production build before the attempted deployment.</p>
    <div>
      <h3>Next steps</h3>
      <a href="#next-steps">
        
      </a>
    </div>
    <p>As more teams at Cloudflare have built on Workers, we have "organically" ended up in a place where Workers KV now underpins a tremendous amount of our products and services. This incident has continued to reinforce the need for us to revisit how we can reduce the blast radius of critical dependencies, which includes improving the sophistication of our deployment tooling, its ease-of-use for our internal teams, and product-level controls for these dependencies. We’re prioritizing these efforts to ensure that there is not a repeat of this incident.</p><p>This also reinforces the need for Cloudflare to improve the tooling, and the safety of said tooling, around progressive deployments of Workers applications internally and for customers.</p><p>This includes (but is not limited) to the below list of key follow-up actions (in no specific order) this quarter:</p><ol><li><p>Onboard KV deployments to standardized Workers deployment models which use automated systems for impact detection and recovery.</p></li><li><p>Ensure that the rollback process has access to a known good deployment identifier and that it works when Cloudflare Access is down.</p></li><li><p>Add pre-checks to deployments which will validate input parameters to ensure version mismatches don't propagate to production environments.</p></li><li><p>Harden the progressive deployment tooling to operate in a way that is designed for multi-tenancy. The current design assumes a single-tenant model.</p></li><li><p>Add additional validation to progressive deployment scripts to verify that the deployment matches the app environment (production, staging, etc.).</p></li></ol><p>Again, we’re extremely sorry this incident occurred, and take the impact of this incident on our customers extremely seriously.</p> ]]></content:encoded>
            <category><![CDATA[Post Mortem]]></category>
            <guid isPermaLink="false">2RLr0QNONtOjY9xl3wKG1G</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Kris Evans</dc:creator>
        </item>
        <item>
            <title><![CDATA[Hyperdrive: making databases feel like they’re global]]></title>
            <link>https://blog.cloudflare.com/hyperdrive-making-regional-databases-feel-distributed/</link>
            <pubDate>Thu, 28 Sep 2023 13:02:00 GMT</pubDate>
            <description><![CDATA[ Hyperdrive makes accessing your existing databases from Cloudflare Workers, wherever they are running, hyper fast ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/38AL8kYOfUuUOlGD2p2Wfw/596490c5c841fb416154cbce56cc830b/image1-33.png" />
            
            </figure><p>Hyperdrive makes accessing your existing databases from Cloudflare Workers, wherever they are running, hyper fast. You connect Hyperdrive to your database, change one line of code to connect through Hyperdrive, and voilà: connections and queries get faster (and spoiler: <a href="https://developers.cloudflare.com/hyperdrive/">you can use it today</a>).</p><p>In a nutshell, Hyperdrive uses our global network to speed up queries to your existing databases, whether they’re in a legacy cloud provider or with <a href="https://www.cloudflare.com/developer-platform/products/d1/">your favorite serverless database provider;</a> dramatically reduces the <a href="https://www.cloudflare.com/learning/performance/glossary/what-is-latency/">latency</a> incurred from repeatedly setting up new database connections; and caches the most popular read queries against your database, often avoiding the need to go back to your database at all.</p><p>Without Hyperdrive, that core database — the one with your user profiles, product inventory, or running your critical web app — sitting in the us-east1 region of a legacy cloud provider is going to be really slow to access for users in Paris, Singapore and Dubai and slower than it should be for users in Los Angeles or Vancouver. With each round trip taking up to 200ms, it’s easy to burn up to a second (or more!) on the multiple round-trips needed just to set up a connection, before you’ve even made the query for your data. Hyperdrive is designed to fix this.</p><p>To demonstrate Hyperdrive’s performance, we built a <a href="https://hyperdrive-demo.pages.dev/">demo application</a> that makes back-to-back queries against the same database: both with Hyperdrive and without Hyperdrive (directly). The app selects a database in a neighboring continent: if you’re in Europe, it selects a database in the US — an all-too-common experience for many European Internet users — and if you’re in Africa, it selects a database in Europe (and so on). It returns raw results from a straightforward <code>SELECT</code> query, with no carefully selected averages or cherry-picked metrics.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3VWco8QZERMlkgBpiilOyA/99245c5a6ce8a208e7fc793121aeaef3/image2-25.png" />
            
            </figure><p><i>We</i> <a href="https://hyperdrive-demo.pages.dev/"><i>built a demo app</i></a> <i>that makes real queries to a PostgreSQL database, with and without Hyperdrive</i></p><p>Throughout internal testing, initial user reports and the multiple runs in our benchmark, Hyperdrive delivers a 17 - 25x performance improvement vs. going direct to the database for cached queries, and a 6 - 8x improvement for uncached queries and writes. The cached latency might not surprise you, but we think that being 6 - 8x faster on uncached queries changes “I can’t query a centralized database from Cloudflare Workers” to “where has this been all my life?!”. We’re also continuing to work on performance improvements: we’ve already identified additional latency savings, and we’ll be pushing those out in the coming weeks.</p><p>The best part? Developers with a Workers paid plan can <a href="https://developers.cloudflare.com/hyperdrive/">start using the Hyperdrive open beta immediately</a>: there are no waiting lists or special sign-up forms to navigate.</p>
    <div>
      <h3>Hyperdrive? Never heard of it?</h3>
      <a href="#hyperdrive-never-heard-of-it">
        
      </a>
    </div>
    <p>We’ve been working on Hyperdrive in secret for a short while: but allowing developers to connect to databases they already have — with their existing data, queries and tooling — has been something on our minds for quite some time.</p><p>In a modern distributed cloud environment like Workers, where compute is globally distributed (so it’s close to users) and functions are short-lived (so you’re billed no more than is needed), connecting to traditional databases has been both slow and unscalable. Slow because it takes upwards of seven round-trips (<a href="https://www.cloudflare.com/learning/ddos/glossary/tcp-ip/">TCP handshake</a>; <a href="https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/">TLS negotiation</a>; then auth) to establish the connection, and unscalable because databases like PostgreSQL have a <a href="https://www.postgresql.org/message-id/flat/31cc6df9-53fe-3cd9-af5b-ac0d801163f4%40iki.fi">high resource cost per connection</a>. Even just a couple of hundred connections to a database can consume non-negligible memory, separate from any memory needed for queries.</p><p>Our friends over at Neon (a popular serverless Postgres provider) wrote about this, and <a href="https://neon.tech/blog/serverless-driver-for-postgres">even released a WebSocket proxy and driver to reduce the</a> connection overhead, but are still fighting uphill in the snow: even with a custom driver, we’re down to 4 round-trips, each still potentially taking 50-200 milliseconds or more. When those connections are long-lived, that’s OK — it might happen once every few hours at best. But when they’re scoped to an individual function invocation, and are only useful for a few milliseconds to minutes at best — your code spends more time waiting. It’s effectively another kind of cold start: having to initiate a fresh connection to your database before making a query means that using a traditional database in a distributed or serverless environment is (to put it lightly) <i>really slow</i>.</p><p>To combat this, Hyperdrive does two things.</p><p>First, it maintains a set of regional database connection pools across Cloudflare’s network, so a Cloudflare Worker avoids making a fresh connection to a database on every request. Instead, the Worker can establish a connection to Hyperdrive (fast!), with Hyperdrive maintaining a pool of ready-to-go connections back to the database. Since a database can be anywhere from 30ms to (often) 300ms away over a <i>single</i> round-trip (let alone the seven or more you need for a new connection), having a pool of available connections dramatically reduces the latency issue that short-lived connections would otherwise suffer.</p><p>Second, it understands the difference between read (non-mutating) and write (mutating) queries and transactions, and can automatically cache your most popular read queries: which represent over 80% of most queries made to databases in typical web applications. That product listing page that tens of thousands of users visit every hour; open jobs on a major careers site; or even queries for config data that changes occasionally; a tremendous amount of what is queried does not change often, and caching it closer to where the user is querying it from can dramatically speed up access to that data for the next ten thousand users. Write queries, which can’t be safely cached, still get to benefit from both Hyperdrive’s connection pooling <i>and</i> Cloudflare’s <a href="https://www.cloudflare.com/network/">global network</a>: being able to take the fastest routes across the Internet across our backbone cuts down latency there, too.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2ZPuh9S7KNGNfcIOW1FSqF/07cd95b1d45dd66d7b13fcab51cf4189/image4-16.png" />
            
            </figure><p><i>Even if your database is on the other side of the country, 70ms x 6 round-trips is a lot of time for a user to be waiting for a query response.</i></p><p>Hyperdrive works not only with PostgreSQL databases — including <a href="https://neon.tech/">Neon</a>, Google Cloud SQL, AWS RDS, and <a href="https://www.timescale.com/">Timescale</a>, but also PostgreSQL-compatible databases like <a href="https://materialize.com/">Materialize</a> (a powerful stream-processing database), <a href="https://www.cockroachlabs.com/">CockroachDB</a> (a major distributed database), Google Cloud’s <a href="https://cloud.google.com/alloydb">AlloyDB</a>, and AWS Aurora Postgres.</p><p>We’re also working on bringing support for MySQL, including providers like PlanetScale, by the end of the year, with more database engines planned in the future.</p>
    <div>
      <h3>The magic connection string</h3>
      <a href="#the-magic-connection-string">
        
      </a>
    </div>
    <p>One of the major design goals for Hyperdrive was the need for developers to keep using their existing drivers, query builder and ORM (Object-Relational Mapper) libraries. It wouldn’t have mattered how fast Hyperdrive was if we required you to migrate away from your favorite ORM and/or rewrite hundreds (or more) lines of code &amp; tests to benefit from Hyperdrive’s performance.</p><p>To achieve this, we worked with the maintainers of popular open-source drivers — including <a href="https://node-postgres.com/">node-postgres</a> and <a href="https://github.com/porsager/postgres">Postgres.js</a> — to help their libraries support <a href="/workers-tcp-socket-api-connect-databases/">Worker’s new TCP socket API</a>, which is going through the <a href="https://github.com/wintercg/proposal-sockets-api">standardization process</a>, and we expect to see land in Node.js, Deno and Bun as well.</p><p>The humble database connection string is the shared language of database drivers, and typically takes on this format:</p>
            <pre><code>postgres://user:password@some.database.host.example.com:5432/postgres</code></pre>
            <p>The magic behind Hyperdrive is that you can start using it in your existing Workers applications, with your existing queries, just by swapping out your connection string for the one Hyperdrive generates instead.</p>
    <div>
      <h3>Creating a Hyperdrive</h3>
      <a href="#creating-a-hyperdrive">
        
      </a>
    </div>
    <p>With an existing database ready to go — in this example, we’ll use a Postgres database from <a href="https://neon.tech/">Neon</a> — it takes less than a minute to get Hyperdrive running (yes, we timed it).</p><p>If you don’t have an existing Cloudflare Workers project, you can quickly create one:</p>
            <pre><code>$ npm create cloudflare@latest
# Call the application "hyperdrive-demo"
# Choose "Hello World Worker" as your template</code></pre>
            <p>From here, we just need the database connection string for our database and a quick <a href="https://developers.cloudflare.com/workers/wrangler/install-and-update/">wrangler command-line</a> invocation to have Hyperdrive connect to it.</p>
            <pre><code># Using wrangler v3.10.0 or above
wrangler hyperdrive create a-faster-database --connection-string="postgres://user:password@neon.tech:5432/neondb"

# This will return an ID: we'll use this in the next step</code></pre>
            <p>Add our Hyperdrive to the <a href="https://developers.cloudflare.com/workers/configuration/bindings/">wrangler.toml configuration</a> file for our Worker:</p>
            <pre><code>[[hyperdrive]]
binding = "HYPERDRIVE"
id = "cdb28782-0dfc-4aca-a445-a2c318fb26fd"</code></pre>
            <p>We can now write a <a href="https://developers.cloudflare.com/workers/">Worker</a> — or take an existing Worker script — and use Hyperdrive to speed up connections and queries to our existing database. We use <a href="https://node-postgres.com/">node-postgres</a> here, but we could just as easily use <a href="https://orm.drizzle.team/">Drizzle ORM</a>.</p>
            <pre><code>import { Client } from 'pg';

export interface Env {
	HYPERDRIVE: Hyperdrive;
}

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext) {
		console.log(JSON.stringify(env));
		// Create a database client that connects to our database via Hyperdrive
		//
		// Hyperdrive generates a unique connection string you can pass to
		// supported drivers, including node-postgres, Postgres.js, and the many
		// ORMs and query builders that use these drivers.
		const client = new Client({ connectionString: env.HYPERDRIVE.connectionString });

		try {
			// Connect to our database
			await client.connect();

			// A very simple test query
			let result = await client.query({ text: 'SELECT * FROM pg_tables' });

			// Return our result rows as JSON
			return Response.json({ result: result });
		} catch (e) {
			console.log(e);
			return Response.json({ error: JSON.stringify(e) }, { status: 500 });
		}
	},
};</code></pre>
            <p>The code above is intentionally simple, but hopefully you can see the magic: our database driver gets a connection string from Hyperdrive, and is none-the-wiser. It doesn’t need to know anything about Hyperdrive, we don’t have to toss out our favorite query builder library, and we can immediately realize the speed benefits when making queries.</p><p>Connections are automatically pooled and kept warm, our most popular queries are cached, and our entire application gets faster.</p><p>We’ve also built out <a href="https://developers.cloudflare.com/hyperdrive/examples/">guides for every major database provider</a> to make it easy to get what you need from them (a connection string) into Hyperdrive.</p>
    <div>
      <h3>Going fast can’t be cheap, right?</h3>
      <a href="#going-fast-cant-be-cheap-right">
        
      </a>
    </div>
    <p>We think Hyperdrive is critical to accessing your existing databases when building on Cloudflare Workers: traditional databases were just never designed for a world where clients are globally distributed.</p><p><b>Hyperdrive’s connection pooling will always be free</b>, for both database protocols we support today and new database protocols we add in the future. Just like <a href="https://www.cloudflare.com/ddos/">DDoS protection</a> and our global <a href="https://www.cloudflare.com/application-services/products/cdn/">CDN</a>, we think access to Hyperdrive’s core feature is too useful to hold back.</p><p>During the open beta, Hyperdrive itself will not incur any charges for usage, regardless of how you use it. We’ll be announcing more details on how Hyperdrive will be priced closer to GA (early in 2024), with plenty of notice.</p>
    <div>
      <h3>Time to query</h3>
      <a href="#time-to-query">
        
      </a>
    </div>
    <p>So where to from here for Hyperdrive?</p><p>We’re planning on bringing Hyperdrive to GA in early 2024 — and we’re focused on landing more controls over how we cache &amp; automatically invalidate based on writes, detailed query and performance analytics (soon!), support for more database engines (including MySQL) as well as continuing to work on making it even faster.</p><p>We’re also working to enable private network connectivity via <a href="https://developers.cloudflare.com/magic-wan/">Magic WAN</a> and Cloudflare Tunnel, so that you can connect to databases that aren’t (or can’t be) exposed to the public Internet.</p><p>To connect Hyperdrive to your existing database, visit our <a href="https://developers.cloudflare.com/hyperdrive/">developer docs</a> — it takes less than a minute to create a Hyperdrive and update existing code to use it. Join the <i>#hyperdrive-beta</i> channel in our <a href="https://discord.cloudflare.com/">Developer Discord</a> to ask questions, surface bugs, and talk to our Product &amp; Engineering teams directly.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1BbZolVBokU7h0EedVawor/e98def78df39541f03a595ef2e083387/image3-31.png" />
            
            </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Database]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">3buVHDU7WwOkwln1S36n02</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Alex Robinson</dc:creator>
        </item>
        <item>
            <title><![CDATA[D1: open beta is here]]></title>
            <link>https://blog.cloudflare.com/d1-open-beta-is-here/</link>
            <pubDate>Thu, 28 Sep 2023 13:00:14 GMT</pubDate>
            <description><![CDATA[ D1 is now in open beta, and the theme is “scale”: with higher per-database storage limits and the ability to create more databases, we’re unlocking the ability for developers to build production-scale applications on D1 ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4sioTCCEWQ0hiLg5ZSCD46/c53658e14bc379bea56cd0f3fed1d42b/image1-37.png" />
            
            </figure><p><b>D1 is now in open beta</b>, and the theme is “scale”: with higher per-database storage limits <i>and</i> the ability to create more databases, we’re unlocking the ability for developers to build production-scale applications on D1. Any developers with an existing paid Workers plan don’t need to lift a finger to benefit: we’ve retroactively applied this to all existing D1 databases.</p><p>If you missed the <a href="/d1-turning-it-up-to-11/">last D1 update</a> back during Developer Week, the <a href="https://developers.cloudflare.com/d1/changelog/">multitude of updates in the changelog</a>, or are just new to D1 in general: read on.</p>
    <div>
      <h3>Remind me: D1? Databases?</h3>
      <a href="#remind-me-d1-databases">
        
      </a>
    </div>
    <p>D1 our <a href="https://www.cloudflare.com/developer-platform/products/d1/">native serverless database</a>, which we launched into alpha in November last year: the queryable database complement to <a href="https://developers.cloudflare.com/kv/">Workers KV</a>, <a href="https://developers.cloudflare.com/durable-objects/">Durable Objects</a> and <a href="https://developers.cloudflare.com/r2/">R2</a>.</p><p>When we set out to build D1, we knew a few things for certain: it needed to be fast, it needed to be incredibly easy to create a database, and it needed to be SQL-based.</p><p>That last one was critical: so that developers could a) avoid learning another custom query language and b) make it easier for existing query buildings, ORM (object relational mapper) libraries and other tools to connect to D1 with minimal effort. From this, we’ve seen a huge number of projects build support in for D1: from support for D1 in the <a href="https://github.com/drizzle-team/drizzle-orm/blob/main/examples/cloudflare-d1/README.md">Drizzle ORM</a> and <a href="https://developers.cloudflare.com/d1/platform/community-projects/#d1-adapter-for-kysely-orm">Kysely</a>, to the <a href="https://t4stack.com/">T4 App</a>, a full-stack toolkit that uses D1 as its database.</p><p>We also knew that D1 couldn’t be the only way to query a database from Workers: for teams with existing databases and thousands of lines of SQL or existing ORM code, migrating across to D1 isn’t going to be an afternoon’s work. For those teams, we built <a href="/hyperdrive-making-regional-databases-feel-distributed/">Hyperdrive</a>, allowing you to connect to your existing databases and make them feel global. We think this gives teams flexibility: combine D1 and Workers for globally distributed apps, and use Hyperdrive for querying the databases you have in legacy clouds and just can’t get rid of overnight.</p>
    <div>
      <h3>Larger databases, and more of them</h3>
      <a href="#larger-databases-and-more-of-them">
        
      </a>
    </div>
    <p>This has been the biggest ask from the thousands of D1 users throughout the alpha: not just more databases, but also <i>bigger</i> databases.</p><p><b>Developers on the Workers paid plan will now be able to grow each database up to 2GB and create 50,000 databases (up from 500MB and 10). Yes, you read that right: 50,000 databases per account. This unlocks a whole raft of database-per-user use-cases and enables true isolation between customers, something that traditional relational database deployments can’t.</b></p><p>We’ll be continuing to work on unlocking even larger databases over the coming weeks and months: developers using the D1 beta will see automatic increases to these limits published on <a href="https://developers.cloudflare.com/d1/changelog/">D1’s public changelog</a>.</p><p>One of the biggest impediments to double-digit-gigabyte databases is performance: we want to ensure that a database can load in and be ready <i>really</i> quickly — cold starts of seconds (or more) just aren’t acceptable. A 10GB or 20GB database that takes 15 seconds before it can answer a query ends up being pretty frustrating to use.</p><p>Users on the <a href="https://www.cloudflare.com/plans/free/">Workers free plan</a> will keep the ten 500MB databases (<a href="https://developers.cloudflare.com/d1/changelog/#per-database-limit-now-500-mb">changelog</a>) forever: we want to give more developers the room to experiment with D1 and Workers before jumping in.</p>
    <div>
      <h3>Time Travel is here</h3>
      <a href="#time-travel-is-here">
        
      </a>
    </div>
    <p><a href="https://developers.cloudflare.com/d1/learning/time-travel/">Time Travel</a> allows you to roll your database back to a specific point in time: specifically, any minute in the last 30 days. And it’s enabled by default for every D1 database, doesn’t cost any more, and doesn’t count against your storage limit.</p><p>For those who have been keeping tabs: we originally announced Time Travel earlier this year, and made it <a href="https://developers.cloudflare.com/d1/changelog/#time-travel">available to all D1 users in July</a>. At its core, it’s deceptively simple: Time Travel introduces the concept of a “bookmark” to D1. A bookmark represents the state of a database at a specific point in time, and is effectively an append-only log. Time Travel can take a timestamp and turn it into a bookmark, or a bookmark directly: allowing you to restore back to that point. Even better: restoring doesn’t prevent you from going back further.</p><p>We think Time Travel works best with an example, so let’s make a change to a database: one with an Order table that stores every order made against our e-commerce store:</p>
            <pre><code># To illustrate: we have 89,185 unique addresses in our order database. 
➜  wrangler d1 execute northwind --command "SELECT count(distinct ShipAddress) FROM [Order]" 
┌──────────┐
│ count(*) │
├──────────┤
│ 89185    │
└──────────┘</code></pre>
            <p>OK, great. Now what if we wanted to make a change to a specific set of orders: an address change or freight company change?</p>
            <pre><code># I think we might be forgetting something here...
➜  wrangler d1 execute northwind --command "UPDATE [Order] SET ShipAddress = 'Av. Veracruz 38, Roma Nte., Cuauhtémoc, 06700 Ciudad de México, CDMX, Mexico' </code></pre>
            <p>Wait: we’ve made a mistake that many, many folks have before: we forgot the WHERE clause on our UPDATE query. Instead of updating a specific order Id, we’ve instead updated the ShipAddress for every order in our table.</p>
            <pre><code># Every order is now going to a wine bar in Mexico City. 
➜  wrangler d1 execute northwind --command "SELECT count(distinct ShipAddress) FROM [Order]" 
┌──────────┐
│ count(*) │
├──────────┤
│ 1        │
└──────────┘</code></pre>
            <p>Panic sets in. Did we remember to make a backup before we did this? How long ago was it? Did we turn on point-in-time recovery? It seemed potentially expensive at the time…</p><p>It’s OK. We’re using D1. We can Time Travel. It’s on by default: let’s fix this and travel back a few minutes.</p>
            <pre><code># Let's go back in time.
➜  wrangler d1 time-travel restore northwind --timestamp="2023-09-23T14:20:00Z"

🚧 Restoring database northwind from bookmark 0000000b-00000002-00004ca7-9f3dba64bda132e1c1706a4b9d44c3c9
✔ OK to proceed (y/N) … yes

⚡️ Time travel in progress...
✅ Database dash-db restored back to bookmark 00000000-00000004-00004ca7-97a8857d35583887de16219c766c0785
↩️ To undo this operation, you can restore to the previous bookmark: 00000013-ffffffff-00004ca7-90b029f26ab5bd88843c55c87b26f497</code></pre>
            <p>Let's check if it worked:</p>
            <pre><code># Phew. We're good. 
➜  wrangler d1 execute northwind --command "SELECT count(distinct ShipAddress) FROM [Order]" 
┌──────────┐
│ count(*) │
├──────────┤
│ 89185    │
└──────────┘</code></pre>
            <p>We think that Time Travel becomes even more powerful when you have many smaller databases, too: the downsides of any restore operation is reduced further and scoped to a single user or tenant.</p><p>This is also just the beginning for Time Travel: we’re working to support not just only restoring a database, but also the ability to fork from and overwrite existing databases. If you can fork a database with a single command and/or test migrations and schema changes against real data, you can de-risk a lot of the traditional challenges that working with databases has historically implied.</p>
    <div>
      <h3>Row-based pricing</h3>
      <a href="#row-based-pricing">
        
      </a>
    </div>
    <p><a href="/d1-turning-it-up-to-11/#not-going-to-burn-a-hole-in-your-wallet">Back in May</a> we announced pricing for D1, to a lot of positive feedback around how much we’d included in our Free and Paid plans. In August, we published a new row-based model, replacing the prior byte-units, that makes it easier to predict and quantify your usage. Specifically, we moved to rows as it’s easier to reason about: if you’re writing a row, it doesn’t matter if it’s 1KB or 1MB. If your read query uses an indexed column to filter on, you’ll see not only performance benefits, but cost savings too.</p><p>Here’s D1’s pricing — almost everything has stayed the same, with the added benefit of charging based on rows:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4053N3dvxuEp46TQG6xec9/74244f620374666d3b8fcbcf5d0016bb/Screenshot-2023-09-29-at-09.33.51.png" />
            
            </figure><p>D1’s pricing — you can find more details in <a href="https://developers.cloudflare.com/d1/platform/pricing/">D1’s public documentation</a>.</p><p>As before, D1 does not charge you for “database hours”, the number of databases, or point-in-time recovery (<a href="https://developers.cloudflare.com/d1/learning/time-travel/">Time Travel</a>) — just query D1 and pay for your reads, writes, and storage — that’s it.</p><p>We believe this makes D1 not only far more cost-efficient, but also makes it easier to manage multiple databases to isolate customer data or prod vs. staging: we don’t care <i>which</i> database you query. Manage your data how you like, separate your customer data, and avoid having to fall for the trap of “Billing Based Architecture”, where you build solely around how you’re charged, even if it’s not intuitive or what makes sense for your team.</p><p>To make it easier to both see how much a given query charges <i>and</i> when to <a href="https://developers.cloudflare.com/d1/learning/using-indexes/">optimize your queries with indexes</a>, D1 also returns the number of rows a query read or wrote (or both) so that you can understand how it’s costing you in both cents and speed.</p><p>For example, the following query filters over orders based on date:</p>
            <pre><code>SELECT * FROM [Order] WHERE ShippedDate &gt; '2016-01-22'" 

[
  {
    "results": [],
    "success": true,
    "meta": {
      "duration": 5.032,
      "size_after": 33067008,
      "rows_read": 16818,
      "rows_written": 0
    }
  }
]</code></pre>
            <p>The unindexed query above scans 16,800 rows. Even if we don’t optimize it, D1 includes 25 billion queries per month for free, meaning we could make this query 1.4 million times for a whole month before having to worry about extra costs.</p><p>But we can do better with an index:</p>
            <pre><code>CREATE INDEX IF NOT EXISTS idx_orders_date ON [Order](ShippedDate)</code></pre>
            <p>With the index created, let’s see how many rows our query needs to read now:</p>
            <pre><code>SELECT * FROM [Order] WHERE ShippedDate &gt; '2016-01-22'" 

[
  {
    "results": [],
    "success": true,
    "meta": {
      "duration": 3.793,
             "size_after": 33067008,
      "rows_read": 417,
      "rows_written": 0
    }
  }
]</code></pre>
            <p>The same query with an index on the ShippedDate column reads just 417 rows: not only it is faster (duration is in milliseconds!), but it costs us less: we could run this query 59 million times per month before we’d have to pay any more than what the $5 Workers plan gives us.</p><p>D1 also <a href="https://developers.cloudflare.com/d1/platform/metrics-analytics/#metrics">exposes row counts</a> via both the Cloudflare dashboard and our GraphQL analytics API: so not only can you look at this per-query when you’re tuning performance, but also break down query patterns across all of your databases.</p>
    <div>
      <h3>D1 for Platforms</h3>
      <a href="#d1-for-platforms">
        
      </a>
    </div>
    <p>Throughout D1’s alpha period, we’ve both heard from and worked with teams who are excited about D1’s ability to scale out horizontally: the ability to deploy a database-per-customer (or user!) in order to keep data closer to where teams access it <i>and</i> more strongly isolate that data from their other users.</p><p>Teams building the next big thing on <a href="https://developers.cloudflare.com/cloudflare-for-platforms/workers-for-platforms/">Workers for Platforms</a> — think of it as “Functions as a Service, as a Service” — can use D1 to deploy a <b>database per user</b> — keeping customer data strongly separated from each other.</p><p>For example, and as one of the early adopters of D1, <a href="https://twitter.com/roninapp">RONIN</a> is building an edge-first content &amp; data platform backed by a dedicated D1 database per customer, which allows customers to place data closer to users and provides each customer isolation from the queries of others.</p><p>Instead of spinning up and managing countless traditional database instances, RONIN uses D1 for Platforms to offer automatic infinite scalability at the edge. This allows RONIN to focus on providing an intuitive editing experience for your content.</p><p>When it comes to enabling “D1 for Platforms”, we’ve thought about this in a few ways from the very beginning:</p><ul><li><p><b>Support for more than 100,000+ databases for Workers for Platforms users — there’s no limit, but if we said “unlimited” you might not believe us — on top of the 50,000 databases per account that D1 already enables.</b></p></li><li><p>D1’s pricing - you don’t pay per-database or for “idle databases”. If you have a range of users, from thousands of QPS down to 1-2 every 10 minutes — you aren’t paying more for “database hours” on the less trafficked databases, or having to plan around spiky workloads across your user-base.</p></li><li><p>The ability to programmatically configure more databases via <a href="https://developers.cloudflare.com/api/operations/cloudflare-d1-create-database">D1’s HTTP API</a> <i>and</i> <a href="https://developers.cloudflare.com/api/operations/worker-script-patch-settings">attach them to your Worker</a> without re-deploying. There’s no “provisioning” delay, either: you create the database, and it’s immediately ready to query by you or your users.</p></li><li><p>Detailed <a href="https://developers.cloudflare.com/d1/platform/metrics-analytics/">per-database analytics</a>, so you can understand which databases are being used and how they’re being queried via D1’s GraphQL analytics API.</p></li></ul><p>If you’re building the next big platform on top of Workers &amp; want to use D1 at scale — whether you’re part of the <a href="https://www.cloudflare.com/lp/workers-launchpad/">Workers Launchpad program</a> or not — reach out.</p>
    <div>
      <h3>What’s next for D1?</h3>
      <a href="#whats-next-for-d1">
        
      </a>
    </div>
    <p><b>We’re setting a clear goal: we want to make D1 “generally available” (GA) for production use-cases by early next year</b> <b>(Q1 2024)</b>. Although you can already use D1 without a waitlist or approval process, we understand that the GA label is an important one for many when it comes to a database (and as do we).</p><p>Between now and GA, we’re working on some really key parts of the D1 vision, with a continued focus on reliability and performance.</p><p>One of the biggest remaining pieces of that vision is global read replication, which we <a href="/d1-turning-it-up-to-11/">wrote about earlier this year</a>. Importantly, replication will be free, won’t multiply your storage consumption, and will still enable session consistency (read-your-writes). Part of D1’s mission is about getting data closer to where users are, and we’re excited to land it.</p><p>We’re also working to expand <a href="https://developers.cloudflare.com/d1/learning/time-travel/">Time Travel</a>, D1’s built-in point-in-time recovery capabilities, so that you can branch and/or clone a database from a specific point-in-time on the fly.</p><p>We’ll also <b>be progressively opening up our limits around per-database storage, unlocking more storage per account, and the number of databases you can create over the rest of this year</b>, so keep an eye on the D1 <a href="https://developers.cloudflare.com/d1/changelog/">changelog</a> (or your inbox).</p><p>In the meantime, if you haven’t yet used D1, you can <a href="https://developers.cloudflare.com/d1/get-started/">get started</a> right now, visit D1’s <a href="https://developers.cloudflare.com/d1/">developer documentation</a> to spark some ideas, or <a href="https://discord.cloudflare.com/">join the #d1-beta channel</a> on our Developer Discord to talk to other D1 developers and our product-engineering team.</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Database]]></category>
            <category><![CDATA[D1]]></category>
            <guid isPermaLink="false">5I0knbF5YIn2PbvvOTa1q2</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Ben Yule</dc:creator>
        </item>
        <item>
            <title><![CDATA[Vectorize: a vector database for shipping AI-powered applications to production, fast]]></title>
            <link>https://blog.cloudflare.com/vectorize-vector-database-open-beta/</link>
            <pubDate>Wed, 27 Sep 2023 13:00:31 GMT</pubDate>
            <description><![CDATA[ Vectorize is our brand-new vector database offering, designed to let you build full-stack, AI-powered applications entirely on Cloudflare’s global network: and you can start building with it right away ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4UEegJQ4EtPbnJwh7UQZcd/2221274c908415bce2e1eba81a115d90/image2-21.png" />
            
            </figure><p>Vectorize is our brand-new <a href="https://www.cloudflare.com/learning/ai/what-is-vector-database/">vector database</a> offering, designed to let you build full-stack, AI-powered applications entirely on Cloudflare’s global network: and you can start building with it right away. Vectorize is in open beta, and is available to any developer using <a href="https://workers.cloudflare.com/">Cloudflare Workers</a>.</p><p>You can use Vectorize with <a href="/workers-ai">Workers AI</a> to power semantic search, classification, recommendation and anomaly detection use-cases directly with Workers, improve the accuracy and context of answers from <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/">LLMs (Large Language Models)</a>, and/or bring-your-own <a href="https://www.cloudflare.com/learning/ai/what-are-embeddings/">embeddings</a> from popular platforms, including OpenAI and Cohere.</p><p>Visit <a href="https://developers.cloudflare.com/vectorize/get-started/">Vectorize’s developer documentation</a> to get started, or read on if you want to better understand what vector databases do and how Vectorize is different.</p>
    <div>
      <h2>Why do I need a vector database?</h2>
      <a href="#why-do-i-need-a-vector-database">
        
      </a>
    </div>
    
    <div>
      <h3>Machine learning models can’t remember anything: only what they were trained on.</h3>
      <a href="#machine-learning-models-cant-remember-anything-only-what-they-were-trained-on">
        
      </a>
    </div>
    <p>Vector databases are designed to solve this, by capturing how an ML model represents data — including structured and unstructured text, images and audio — and storing it in a way that allows you to compare against <i>future</i> inputs. This allows us to leverage the power of existing machine-learning models and LLMs (Large Language Models) for content they haven’t been trained on: which, given the tremendous cost of training models, turns out to be extremely powerful.</p><p>To better illustrate why a vector database like Vectorize is useful, let’s pretend they don’t exist, and see how painful it is to give context to an ML model or LLM for a semantic search or recommendation task. Our goal is to understand what content is similar to our query and return it: based on our own dataset.</p><ol><li><p>Our user query comes in: they’re searching for “how to write to R2 from Cloudflare Workers”</p></li><li><p>We load up our entire documentation dataset — a thankfully “small” dataset at about 65,000 sentences, or 2.1 GB — and provide it alongside the query from our user. This allows the model to have the context it needs, based on our data.</p></li><li><p><b>We wait.</b></p></li><li><p><b>(A long time)</b></p></li><li><p>We get our similarity scores back, with the sentences most similar to the user’s query, and then work to map those back to URLs before we return our search results.</p></li></ol><p>… and then another query comes in, and we have to start this all over again.</p><p>In practice, this isn’t really possible: we can’t pass that much context in an API call (prompt) to most <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/">machine learning models</a>, and even if we could, it’d take tremendous amounts of memory and time to process our dataset over-and-over again.</p><p>With a vector database, we don’t have to repeat step 2: we perform it once, or as our dataset updates, and use our vector database to provide a form of long-term memory for our machine learning model. Our workflow looks a little more like this:</p><ol><li><p>We load up our entire documentation dataset, run it through our model, and store the resulting vector embeddings in our vector database (just once).</p></li><li><p>For each user query (and only the query) we ask the same model and retrieve a vector representation.</p></li><li><p>We query our vector database with that query vector, which returns the vectors closest to our query vector.</p></li></ol><p>If we looked at these two flows side by side, we can quickly see how inefficient and impractical it is to use our own dataset with an existing model without a vector database:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6nVc2lsVxxlTjVWb5fF8Gn/df03f68c7792ece281f887608f0bad2f/image4-11.png" />
            
            </figure><p>Using a vector database to help machine learning models remember.</p><p>From this simple example, it’s probably starting to make some sense: but you might also be wondering why you need a vector database instead of just a regular database.</p><p>Vectors are the model’s representation of an input: how it maps that input to its internal structure, or “features”. Broadly, the more similar vectors are, the more similar the model believes those inputs to be based on how it extracts features from an input.</p><p>This is seemingly easy when we look at example vectors of only a handful of dimensions. But with real-world outputs, searching across 10,000 to 250,000 vectors, each potentially 1,536 dimensions wide, is non-trivial. This is where vector databases come in: to make search work at scale, vector databases use a specific class of algorithm, such as k-nearest neighbors (<a href="https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm">kNN</a>) or other approximate nearest neighbor (ANN) <a href="https://arxiv.org/abs/1603.09320">algorithms</a> to determine vector similarity.</p><p>And although vector databases are extremely useful when building <a href="https://www.cloudflare.com/learning/ai/what-is-artificial-intelligence/">AI</a> and machine learning powered applications, they’re not <i>only</i> useful in those use-cases: they can be used for a multitude of classification and anomaly detection tasks. Knowing whether a query input is similar — or potentially dissimilar — from other inputs can power content moderation (does this match known-bad content?) and security alerting (have I seen this before?) tasks as well.</p>
    <div>
      <h2>Building a recommendation engine with vector search</h2>
      <a href="#building-a-recommendation-engine-with-vector-search">
        
      </a>
    </div>
    <p>We built Vectorize to be a powerful partner to <a href="https://developers.cloudflare.com/workers-ai/">Workers AI</a>: enabling you to run vector search tasks as close to users as possible, and without having to think about how to scale it for production.</p><p>We’re going to take a real world example — building a (product) recommendation engine for an e-commerce store — and simplify a few things.</p><p>Our goal is to show a list of “relevant products” on each product listing page: a perfect use-case for vector search. Our input vectors in the example are placeholders, but in a real world application we would generate them based on product descriptions and/or cart data by passing them through a sentence similarity model (such as Worker’s AI’s <a href="https://developers.cloudflare.com/workers-ai/models/embedding/">text embedding model</a>)</p><p>Each vector represents a product across our store, and we associate the URL of the product with it. We could also set the ID of each vector to the product ID: both approaches are valid. Our query — vector search — represents the product description and content for the product user is currently viewing.</p><p>Let’s step through what this looks like in code: this example is pulled straight from our <a href="https://developers.cloudflare.com/vectorize/get-started/">developer documentation</a>:</p>
            <pre><code>export interface Env {
	// This makes our vector index methods available on env.MY_VECTOR_INDEX.*
	// e.g. env.MY_VECTOR_INDEX.insert() or .query()
	TUTORIAL_INDEX: VectorizeIndex;
}

// Sample vectors: 3 dimensions wide.
//
// Vectors from a machine-learning model are typically ~100 to 1536 dimensions
// wide (or wider still).
const sampleVectors: Array&lt;VectorizeVector&gt; = [
	{ id: '1', values: [32.4, 74.1, 3.2], metadata: { url: '/products/sku/13913913' } },
	{ id: '2', values: [15.1, 19.2, 15.8], metadata: { url: '/products/sku/10148191' } },
	{ id: '3', values: [0.16, 1.2, 3.8], metadata: { url: '/products/sku/97913813' } },
	{ id: '4', values: [75.1, 67.1, 29.9], metadata: { url: '/products/sku/418313' } },
	{ id: '5', values: [58.8, 6.7, 3.4], metadata: { url: '/products/sku/55519183' } },
];

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise&lt;Response&gt; {
		if (new URL(request.url).pathname !== '/') {
			return new Response('', { status: 404 });
		}
		// Insert some sample vectors into our index
		// In a real application, these vectors would be the output of a machine learning (ML) model,
		// such as Workers AI, OpenAI, or Cohere.
		let inserted = await env.TUTORIAL_INDEX.insert(sampleVectors);

		// Log the number of IDs we successfully inserted
		console.info(`inserted ${inserted.count} vectors into the index`);

		// In a real application, we would take a user query - e.g. "durable
		// objects" - and transform it into a vector emebedding first.
		//
		// In our example, we're going to construct a simple vector that should
		// match vector id #5
		let queryVector: Array&lt;number&gt; = [54.8, 5.5, 3.1];

		// Query our index and return the three (topK = 3) most similar vector
		// IDs with their similarity score.
		//
		// By default, vector values are not returned, as in many cases the
		// vectorId and scores are sufficient to map the vector back to the
		// original content it represents.
		let matches = await env.TUTORIAL_INDEX.query(queryVector, { topK: 3, returnVectors: true });

		// We map over our results to find the most similar vector result.
		//
		// Since our index uses the 'cosine' distance metric, scores will range
		// from 1 to -1.  A value of '1' means the vector is the same; the
		// closer to 1, the more similar. Values of -1 (least similar) and 0 (no
		// match).
		// let closestScore = 0;
		// let mostSimilarId = '';
		// matches.matches.map((match) =&gt; {
		// 	if (match.score &gt; closestScore) {
		// 		closestScore = match.score;
		// 		mostSimilarId = match.vectorId;
		// 	}
		// });

		return Response.json({
			// This will return the closest vectors: we'll see that the vector
			// with id = 5 has the highest score (closest to 1.0) as the
			// distance between it and our query vector is the smallest.
			// Return the full set of matches so we can see the possible scores.
			matches: matches,
		});
	},
};</code></pre>
            <p>The code above is intentionally simple, but illustrates vector search at its core: we insert vectors into our database, and query it for vectors with the smallest distance to our query vector.</p><p>Here are the results, with the values included, so we visually observe that our query vector <code>[54.8, 5.5, 3.1]</code> is similar to our highest scoring match: <code>[58.799, 6.699, 3.400]</code> returned from our search. This index uses <a href="https://en.wikipedia.org/wiki/Cosine_similarity">cosine</a> similarity to calculate the distance between vectors, which means that the closer the score to 1, the more similar a match is to our query vector.</p>
            <pre><code>{
  "matches": {
    "count": 3,
    "matches": [
      {
        "score": 0.999909,
        "vectorId": "5",
        "vector": {
          "id": "5",
          "values": [
            58.79999923706055,
            6.699999809265137,
            3.4000000953674316
          ],
          "metadata": {
            "url": "/products/sku/55519183"
          }
        }
      },
      {
        "score": 0.789848,
        "vectorId": "4",
        "vector": {
          "id": "4",
          "values": [
            75.0999984741211,
            67.0999984741211,
            29.899999618530273
          ],
          "metadata": {
            "url": "/products/sku/418313"
          }
        }
      },
      {
        "score": 0.611976,
        "vectorId": "2",
        "vector": {
          "id": "2",
          "values": [
            15.100000381469727,
            19.200000762939453,
            15.800000190734863
          ],
          "metadata": {
            "url": "/products/sku/10148191"
          }
        }
      }
    ]
  }
}</code></pre>
            <p>In a real application, we could now quickly return product recommendation URLs based on the most similar products, sorting them by their score (highest to lowest), and increasing the topK value if we want to show more. The metadata stored alongside each vector could also embed a path to an <a href="https://developers.cloudflare.com/r2/">R2 object</a>, a UUID for a row in a <a href="https://www.cloudflare.com/developer-platform/products/d1/">D1 database</a>, or a key-value pair from <a href="https://developers.cloudflare.com/kv/">Workers KV</a>.</p>
    <div>
      <h3>Workers AI + Vectorize: full stack vector search on Cloudflare</h3>
      <a href="#workers-ai-vectorize-full-stack-vector-search-on-cloudflare">
        
      </a>
    </div>
    <p>In a real application, we need a machine learning model that can both generate vector embeddings from our original dataset (to seed our database) and <i>quickly</i> turn user queries into vector embeddings too. These need to be from the same model, as each model represents features differently.</p><p>Here’s a compact example building an entire end-to-end vector search pipeline on Cloudflare:</p>
            <pre><code>import { Ai } from '@cloudflare/ai';
export interface Env {
	TEXT_EMBEDDINGS: VectorizeIndex;
	AI: any;
}
interface EmbeddingResponse {
	shape: number[];
	data: number[][];
}

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise&lt;Response&gt; {
		const ai = new Ai(env.AI);
		let path = new URL(request.url).pathname;
		if (path.startsWith('/favicon')) {
			return new Response('', { status: 404 });
		}

		// We only need to generate vector embeddings just the once (or as our
		// data changes), not on every request
		if (path === '/insert') {
			// In a real-world application, we could read in content from R2 or
			// a SQL database (like D1) and pass it to Workers AI
			const stories = ['This is a story about an orange cloud', 'This is a story about a llama', 'This is a story about a hugging emoji'];
			const modelResp: EmbeddingResponse = await ai.run('@cf/baai/bge-base-en-v1.5', {
				text: stories,
			});

			// We need to convert the vector embeddings into a format Vectorize can accept.
			// Each vector needs an id, a value (the vector) and optional metadata.
			// In a real app, our ID would typicaly be bound to the ID of the source
			// document.
			let vectors: VectorizeVector[] = [];
			let id = 1;
			modelResp.data.forEach((vector) =&gt; {
				vectors.push({ id: `${id}`, values: vector });
				id++;
			});

			await env.TEXT_EMBEDDINGS.upsert(vectors);
		}

		// Our query: we expect this to match vector id: 1 in this simple example
		let userQuery = 'orange cloud';
		const queryVector: EmbeddingResponse = await ai.run('@cf/baai/bge-base-en-v1.5', {
			text: [userQuery],
		});

		let matches = await env.TEXT_EMBEDDINGS.query(queryVector.data[0], { topK: 1 });
		return Response.json({
			// We expect vector id: 1 to be our top match with a score of
			// ~0.896888444
			// We are using a cosine distance metric, where the closer to one,
			// the more similar.
			matches: matches,
		});
	},
};</code></pre>
            <p>The code above does four things:</p><ol><li><p>It passes the three sentences to Workers AI’s <a href="https://developers.cloudflare.com/workers-ai/models/embedding/">text embedding model</a> (<code>@cf/baai/bge-base-en-v1.5</code>) and retrieves their vector embeddings.</p></li><li><p>It inserts those vectors into our Vectorize index.</p></li><li><p>Takes the user query and transforms it into a vector embedding via the same Workers AI model.</p></li><li><p>Queries our Vectorize index for matches.</p></li></ol><p>This example might look “too” simple, but in a production application, we’d only have to change two things: just insert our vectors once (or periodically via <a href="https://developers.cloudflare.com/workers/configuration/cron-triggers/">Cron Triggers</a>), and replace our three example sentences with real data stored in R2, a D1 database, or another storage provider.</p><p>In fact, this is incredibly similar to how we run <a href="https://developers.cloudflare.com/workers/ai/">Cursor</a>, the AI assistant that can answer questions about Cloudflare Worker: we migrated Cursor to run on Workers AI and Vectorize. We generate text embeddings from our developer documentation using its built-in text embedding model, insert them into a Vectorize index, and transform user queries on the fly via that same model.</p>
    <div>
      <h2>BYO embeddings from your favorite AI API</h2>
      <a href="#byo-embeddings-from-your-favorite-ai-api">
        
      </a>
    </div>
    <p>Vectorize isn’t just limited to Workers AI, though: it’s a fully-fledged, standalone vector database.</p><p>If you’re already using <a href="https://platform.openai.com/docs/guides/embeddings">OpenAI’s Embedding API</a>, Cohere’s <a href="https://docs.cohere.com/reference/embed">multilingual model</a>, or any other embedding API, then you can easily bring-your-own (BYO) vectors to Vectorize.</p><p>It works just the same: generate your embeddings, insert them into Vectorize, and pass your queries through the model before you query your index. Vectorize includes a few shortcuts for some of the most popular embedding models.</p>
            <pre><code># Vectorize has ready-to-go presets that set the dimensions and distance metric for popular embeddings models
$ wrangler vectorize create openai-index-example --preset=openai-text-embedding-ada-002</code></pre>
            <p>This can be particularly useful if you already have an existing workflow around an existing embeddings API, and/or have validated a specific multimodal or multilingual embeddings model for your use-case.</p>
    <div>
      <h2>Making the cost of AI predictable</h2>
      <a href="#making-the-cost-of-ai-predictable">
        
      </a>
    </div>
    <p>There’s a tremendous amount of excitement around AI and ML, but there’s also one big concern: that it’s too expensive to experiment with, and hard to predict at scale.</p><p>With Vectorize, we wanted to bring a simpler pricing model to vector databases. Have an idea for a proof-of-concept at work? That should fit into our free-tier limits. Scaling up and optimizing your embedding dimensions for performance vs. accuracy? It shouldn’t break the bank.</p><p>Importantly, Vectorize aims to be predictable: you don’t need to estimate CPU and memory consumption, which can be hard when you’re just starting out, and made even harder when trying to plan for your peak vs. off-peak hours in production for a brand new use-case. Instead, you’re charged based on the total number of vector dimensions you store, and the number of queries against them each month. It’s our job to take care of scaling up to meet your query patterns.</p><p>Here’s the pricing for Vectorize — and if you have a Workers paid plan now, Vectorize is entirely free to use until 2024:</p>
<table>
<thead>
  <tr>
    <th></th>
    <th><span>Workers Free (coming soon)</span></th>
    <th><span>Workers Paid ($5/month)</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>Queried vector dimensions included</span></td>
    <td><span>30M total queried dimensions / month</span></td>
    <td><span>50M total queried dimensions / month</span></td>
  </tr>
  <tr>
    <td><span>Stored vector dimensions included</span></td>
    <td><span>5M stored dimensions / month</span></td>
    <td><span>10M stored dimensions / month</span></td>
  </tr>
  <tr>
    <td><span>Additional cost </span></td>
    <td><span>$0.04 / 1M vector dimensions queried or stored</span></td>
    <td><span>$0.04 / 1M vector dimensions queried or stored</span></td>
  </tr>
</tbody>
</table><p>Pricing is based entirely on what you store and query: <code>(total vector dimensions queried + stored) * dimensions_per_vector * price</code>. Query more? Easy to predict. Optimizing for smaller dimensions per vector to improve speed and reduce overall latency? Cost goes down. Have a few indexes for prototyping or experimenting with new use-cases? We don’t charge per-index.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/9i10jyPHmjy6FTjqtCD2S/8362250de55ae98d45068fc5d37dc7e4/image1-25.png" />
            
            </figure><p><i>Create as many as you need indexes to prototype new ideas and/or separate production from dev.</i></p><p>As an example: if you load 10,000 Workers AI vectors (384 dimensions each) and make 5,000 queries against your index each day, it’d result in 49 million total vector dimensions queried and <i>still</i> fit into what we include in the Workers Paid plan ($5/month). Better still: we don’t delete your indexes due to inactivity.</p><p>Note that while this pricing isn’t final, we expect few changes going forward. We want to avoid the element of surprise: there’s nothing worse than starting to build on a platform and realizing the pricing is untenable <i>after</i> you’ve invested the time writing code, tests and learning the nuances of a technology.</p>
    <div>
      <h2>Vectorize!</h2>
      <a href="#vectorize">
        
      </a>
    </div>
    <p>Every Workers developer on a paid plan can start using Vectorize immediately: the open beta is available right now, and you can <a href="https://developers.cloudflare.com/vectorize/">visit our developer documentation to get started</a>.</p><p>This is also just the beginning of the vector database story for us at Cloudflare. Over the next few weeks and months, we intend to land a new query engine that should further improve query performance, support even larger indexes, introduce sub-index filtering capabilities, increased metadata limits, and per-index analytics.</p><p>If you’re looking for inspiration on what to build, <a href="http://developers.cloudflare.com/vectorize/get-started/embeddings/">see the semantic search tutorial</a> that combines Workers AI and Vectorize for document search, running entirely on Cloudflare. Or an example of <a href="https://developers.cloudflare.com/workers-ai/tutorials/build-a-retrieval-augmented-generation-ai/">how to combine OpenAI and Vectorize</a> to give an LLM more context and dramatically improve the accuracy of its answers.</p><p>And if you have questions about how to use Vectorize for our product &amp; engineering teams, or just want to bounce an idea off of other developers building on Workers AI, join the #vectorize and #workers-ai channels on our <a href="https://discord.cloudflare.com/">Developer Discord</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/V5sZHDJiYORdAiY3o6K6U/cd72b9e7eb6715300ce2b1afe4b7b26a/image6-3.png" />
            
            </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Vectorize]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Database]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">5I4TqJNTxn1vCQd79HEUoZ</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Jérôme Schneider</dc:creator>
        </item>
        <item>
            <title><![CDATA[Hardening Workers KV]]></title>
            <link>https://blog.cloudflare.com/workers-kv-restoring-reliability/</link>
            <pubDate>Wed, 02 Aug 2023 13:05:42 GMT</pubDate>
            <description><![CDATA[ A deep dive into the recent incidents relating to Workers KV, and how we’re going to fix them ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Over the last couple of months, Workers KV has suffered from a series of incidents, culminating in three back-to-back incidents during the week of July 17th, 2023. These incidents have directly impacted customers that rely on KV — and this isn’t good enough.</p><p>We’re going to share the work we have done to understand why KV has had such a spate of incidents and, more importantly, share in depth what we’re doing to dramatically improve how we deploy changes to KV going forward.</p>
    <div>
      <h3>Workers KV?</h3>
      <a href="#workers-kv">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/developer-platform/workers-kv/">Workers KV</a> — or just “KV” — is a key-value service for storing data: specifically, data with high read throughput requirements. It’s especially useful for user configuration, service routing, small assets and/or authentication data.</p><p>We use KV extensively inside Cloudflare too, with <a href="https://www.cloudflare.com/zero-trust/products/access/">Cloudflare Access</a> (part of our Zero Trust suite) and <a href="https://pages.cloudflare.com/">Cloudflare Pages</a> being some of our highest profile internal customers. Both teams benefit from KV’s ability to keep regularly accessed key-value pairs close to where they’re accessed, as well its ability to scale out horizontally without any need to become an expert in operating KV.</p><p>Given Cloudflare’s extensive use of KV, it wasn’t just external customers impacted. Our own internal teams felt the pain of these incidents, too.</p>
    <div>
      <h3>The summary of the post-mortem</h3>
      <a href="#the-summary-of-the-post-mortem">
        
      </a>
    </div>
    <p>Back in June 2023, we announced the move to a new architecture for KV, which is designed to address two major points of customer feedback we’ve had around KV: high latency for infrequently accessed keys (or a key accessed in different regions), and working to ensure the upper bound on KV’s eventual consistency model for writes is 60 seconds — not “mostly 60 seconds”.</p><p>At the time of the blog, we’d already been testing this internally, including early access with our community champions and running a small % of production traffic to validate stability and performance expectations beyond what we could emulate within a staging environment.</p><p>However, in the weeks between mid-June and culminating in the series of incidents during the week of July 17th, we would continue to increase the volume of new traffic onto the new architecture. When we did this, we would encounter previously unseen problems (many of these customer-impacting) — then immediately roll back, fix bugs, and repeat. Internally, we’d begun to identify that this pattern was becoming unsustainable — each attempt to cut traffic onto the new architecture would surface errors or behaviors we hadn’t seen before and couldn’t immediately explain, and thus we would roll back and assess.</p><p>The issues at the root of this series of incidents proved to be significantly challenging to track and observe. Once identified, the two causes themselves proved to be quick to fix, but an (1) observability gap in our error reporting and (2) a mutation to local state that resulted in an unexpected mutation of global state were both hard to observe and reproduce over the days following the customer-facing impact ending.</p>
    <div>
      <h3>The detail</h3>
      <a href="#the-detail">
        
      </a>
    </div>
    <p>One important piece of context to understand before we go into detail on the post-mortem: Workers KV is composed of two separate Workers scripts – internally referred to as the Storage Gateway Worker and SuperCache. SuperCache is an optional path in the Storage Gateway Worker workflow, and is the basis for KV's new (faster) backend (refer to the blog).</p><p>Here is a timeline of events:</p>
<table>
<thead>
  <tr>
    <th><span>Time</span></th>
    <th><span>Description</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>2023-07-17 21:52 UTC</span></td>
    <td><span>Cloudflare observes alerts showing 500 HTTP status codes in the MEL01 data-center (Melbourne, AU) and begins investigating.</span><br /><span>We also begin to see a small set of customers reporting HTTP 500s being returned via multiple channels. It is not immediately clear if this is a data-center-wide issue or KV specific, as there had not been a recent KV deployment, and the issue directly correlated with three data-centers being brought back online.</span></td>
  </tr>
  <tr>
    <td><span>2023-07-18 00:09 UTC</span></td>
    <td><span>We disable the new backend for KV in MEL01 in an attempt to mitigate the issue (noting that there had not been a recent deployment or change to the % of users on the new backend).</span></td>
  </tr>
  <tr>
    <td><span>2023-07-18 05:42 UTC</span></td>
    <td><span>Investigating alerts showing 500 HTTP status codes in VIE02 (Vienna, AT) and JNB01 (Johannesburg, SA).</span></td>
  </tr>
  <tr>
    <td><span>2023-07-18 13:51 UTC</span></td>
    <td><span>The new backend is disabled globally after seeing issues in VIE02 (Vienna, AT) and JNB01 (Johannesburg, SA) data-centers, similar to MEL01. In both cases, they had also recently come back online after maintenance, but it remained unclear as to why KV was failing.</span></td>
  </tr>
  <tr>
    <td><span>2023-07-20 19:12 UTC</span></td>
    <td><span>The new backend is inadvertently re-enabled while deploying the update due to a misconfiguration in a deployment script. </span></td>
  </tr>
  <tr>
    <td><span>2023-07-20 19:33 UTC</span></td>
    <td><span>The new backend is (re-) disabled globally as HTTP 500 errors return.</span></td>
  </tr>
  <tr>
    <td><span>2023-07-20 23:46 UTC</span></td>
    <td><span>Broken Workers script pipeline deployed as part of gradual rollout due to incorrectly defined pipeline configuration in the deployment script.</span><br /><span>Metrics begin to report that a subset of traffic is being black-holed.</span></td>
  </tr>
  <tr>
    <td><span>2023-07-20 23:56 UTC</span></td>
    <td><span>Broken pipeline rolled back; errors rates return to pre-incident (normal) levels.</span></td>
  </tr>
</tbody>
</table><p><i>All timestamps referenced are in Coordinated Universal Time (UTC).</i></p><p>We initially observed alerts showing 500 HTTP status codes in the MEL01 data-center (Melbourne, AU) at 21:52 UTC on July 17th, and began investigating. We also received reports from a small set of customers reporting HTTP 500s being returned via multiple channels. This correlated with three data centers being brought back online, and it was not immediately clear if it related to the data centers or was KV-specific — especially given there had not been a recent KV deployment. On 05:42, we began investigating alerts showing 500 HTTP status codes in VIE02 (Vienna) and JNB02 (Johannesburg) data-centers; while both had recently come back online after maintenance, it was still unclear why KV was failing. At 13:51 UTC, we made the decision to disable the new backend globally.</p><p>Following the incident on July 18th, we attempted to deploy an allow-list configuration to reduce the scope of impacted accounts. However, while attempting to roll out a change for the Storage Gateway Worker at 19:12 UTC on July 20th, an older configuration was progressed causing the new backend to be enabled again, leading to the third event. As the team worked to fix this and deploy this configuration, they attempted to manually progress the deployment at 23:46 UTC, which resulted in the passing of a malformed configuration value that caused traffic to be sent to an invalid Workers script configuration.</p><p>After all deployments and the broken Workers configuration (pipeline) had been rolled back at 23:56 on the 20th July, we spent the following three days working to identify the root cause of the issue. We lacked <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability</a> as KV's Worker script (responsible for much of KV's logic) was throwing an unhandled exception very early on in the request handling process. This was further exacerbated by prior work to disable error reporting in a disabled data-center due to the noise generated, which had previously resulted in logs being rate-limited upstream from our service.</p><p>This previous mitigation prevented us from capturing meaningful logs from the Worker, including identifying the exception itself, as an uncaught exception terminates request processing. This has raised the priority of improving how unhandled exceptions are reported and surfaced in a Worker (see Recommendations, below, for further details). This issue was exacerbated by the fact that KV's Worker script would fail to re-enter its "healthy" state when a Cloudflare data center was brought back online, as the Worker was mutating an environment variable perceived to be in request scope, but that was in global scope and persisted across requests. This effectively left the Worker “frozen” with the previous, invalid configuration for the affected locations.</p><p>Further, the introduction of a new progressive release process for Workers KV, designed to de-risk rollouts (as an action from a prior incident), prolonged the incident. We found a bug in the deployment logic that led to a broader outage due to an incorrectly defined configuration.</p><p>This configuration effectively caused us to drop a single-digit % of traffic until it was rolled back 10 minutes later. This code is untested at scale, and we need to spend more time hardening it before using it as the default path in production.</p><p>Additionally: although the root cause of the incidents was limited to three Cloudflare data-centers (Melbourne, Vienna, and Johannesburg), traffic across these regions still uses these data centers to route reads and writes to our system of record. Because these three data centers participate in KV’s new backend as regional tiers, a portion of traffic across the Oceania, Europe, and African regions was affected. Only a portion of keys from enrolled namespaces use any given data center as a regional tier in order to limit a single (regional) point of failure, so while traffic across <i>all</i> data centers in the region was impacted, nowhere was <i>all</i> traffic in a given data center affected.</p><p>We estimated the affected traffic to be 0.2-0.5% of KV's global traffic (based on our error reporting), however we observed some customers with error rates approaching 20% of their total KV operations. The impact was spread across KV namespaces and keys for customers within the scope of this incident.</p><p>Both KV’s high total traffic volume and its role as a critical dependency for many customers amplify the impact of even small error rates. In all cases, once the changes were rolled back, errors returned to normal levels and did not persist.</p>
    <div>
      <h3>Thinking about risks in building software</h3>
      <a href="#thinking-about-risks-in-building-software">
        
      </a>
    </div>
    <p>Before we dive into what we’re doing to significantly improve how we build, test, deploy and observe Workers KV going forward, we think there are lessons from the real world that can equally apply to how we improve the safety factor of the software we ship.</p><p>In traditional engineering and construction, there is an extremely common procedure known as a   “JSEA”, or <a href="https://en.wikipedia.org/wiki/Job_safety_analysis">Job Safety and Environmental Analysis</a> (sometimes just “JSA”). A JSEA is designed to help you iterate through a list of tasks, the potential hazards, and most importantly, the controls that will be applied to prevent those hazards from damaging equipment, injuring people, or worse.</p><p>One of the most critical concepts is the “hierarchy of controls” — that is, what controls should be applied to mitigate these hazards. In most practices, these are elimination, substitution, engineering, administration and personal protective equipment. Elimination and substitution are fairly self-explanatory: is there a different way to achieve this goal? Can we eliminate that task completely? Engineering and administration ask us whether there is additional engineering work, such as changing the placement of a panel, or using a horizontal boring machine to lay an underground pipe vs. opening up a trench that people can fall into.</p><p>The last and lowest on the hierarchy, is personal protective equipment (PPE). A hard hat can protect you from severe injury from something falling from above, but it’s a last resort, and it certainly isn’t guaranteed. In engineering practice, any hazard that <i>only</i> lists PPE as a mitigating factor is unsatisfactory: there must be additional controls in place. For example, instead of only wearing a hard hat, we should <i>engineer</i> the floor of scaffolding so that large objects (such as a wrench) cannot fall through in the first place. Further, if we require that all tools are attached to the wearer, then it significantly reduces the chance the tool can be dropped in the first place. These controls ensure that there are multiple degrees of mitigation — defense in depth — before your hard hat has to come into play.</p><p>Coming back to software, we can draw parallels between these controls: engineering can be likened to improving automation, gradual rollouts, and detailed metrics. Similarly, personal protective equipment can be likened to code review: useful, but code review cannot be the only thing protecting you from shipping bugs or untested code. Automation with linters, more robust testing, and new metrics are all vastly <i>safer</i> ways of shipping software.</p><p>As we spent time assessing where to improve our existing controls and how to put new controls in place to mitigate risks and improve the reliability (safety) of Workers KV, we took a similar approach: eliminating unnecessary changes, engineering more resilience into our codebase, automation, deployment tooling, and only then looking at human processes.</p>
    <div>
      <h3>How we plan to get better</h3>
      <a href="#how-we-plan-to-get-better">
        
      </a>
    </div>
    <p>Cloudflare is undertaking a larger, more structured review of KV's observability tooling, release infrastructure and processes to mitigate not only the contributing factors to the incidents within this report, but recent incidents related to KV. Critically, we see tooling and automation as the most powerful mechanisms for preventing incidents, with process improvements designed to provide an additional layer of protection. Process improvements alone cannot be the only mitigation.</p><p>Specifically, we have identified and prioritized the below efforts as the most important next steps towards meeting our own availability SLOs, and (above all) make KV a service that customers building on Workers can rely on for storing configuration and service data in the hot path of their traffic:</p><ul><li><p>Substantially improve the existing observability tooling for unhandled exceptions, both for internal teams and customers building on Workers. This is especially critical for high-volume services, where traditional logging alone can be too noisy (and not specific enough) to aid in tracking down these cases. The existing ongoing work to land this will be prioritized further. In the meantime, we have directly addressed the specific uncaught exception with KV's primary Worker script.</p></li><li><p>Improve the safety around the mutation of environmental variables in a Worker, which currently operate at "global" (per-isolate) scope, but can appear to be per-request. Mutating an environmental variable in request scope mutates the value for all requests transiting that same isolate (in a given location), which can be unexpected. Changes here will need to take backwards compatibility in mind.</p></li><li><p>Continue to expand KV’s test coverage to better address the above issues, in parallel with the aforementioned observability and tooling improvements, as an additional layer of defense. This includes allowing our test infrastructure to simulate traffic from any source data-center, which would have allowed us to more quickly reproduce the issue and identify a root cause.</p></li><li><p>Improvements to our release process, including how KV changes and releases are reviewed and approved, going forward. We will enforce a higher level of scrutiny for future changes, and where possible, reduce the number of changes deployed at once. This includes taking on new infrastructure dependencies, which will have a higher bar for both design and testing.</p></li><li><p>Additional logging improvements, including sampling, throughout our request handling process to improve troubleshooting &amp; debugging. A significant amount of the challenge related to these incidents was due to the lack of logging around specific requests (especially non-2xx requests)</p></li><li><p>Review and, where applicable, improve alerting thresholds surrounding error rates. As mentioned previously in this report, sub-% error rates at a global scale can have severe negative impact on specific users and/or locations: ensuring that errors are caught and not lost in the noise is an ongoing effort.</p></li><li><p>Address maturity issues with our progressive deployment tooling for Workers, which is net-new (and will eventually be exposed to customers directly).</p></li></ul><p>This is not an exhaustive list: we're continuing to expand on preventative measures associated with these and other incidents. These changes will not only improve KVs reliability, but other services across Cloudflare that KV relies on, or that rely on KV.</p><p>We recognize that KV hasn’t lived up to our customers’ expectations recently. Because we rely on KV so heavily internally, we’ve felt that pain first hand as well. The work to fix the issues that led to this cycle of incidents is already underway. That work will not only improve KV’s reliability but also improve the reliability of any software written on the Cloudflare Workers developer platform, whether by our customers or by ourselves.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Post Mortem]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">6sRjpTRuwGjPJmHgwHlg7u</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Charles Burnett</dc:creator>
            <dc:creator>Rob Sutter</dc:creator>
            <dc:creator>Kris Evans</dc:creator>
        </item>
        <item>
            <title><![CDATA[D1: We turned it up to 11]]></title>
            <link>https://blog.cloudflare.com/d1-turning-it-up-to-11/</link>
            <pubDate>Fri, 19 May 2023 13:05:00 GMT</pubDate>
            <description><![CDATA[ We've been heads down iterating on D1, and we've just shipped a major new version that's substantially faster, more reliable, and introduces Time Travel: the ability to restore a D1 database to any point in time. ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4yviS5FtF4KbuDnCWmxXmU/a99a32be9b00040a1d2cd20dc284e364/image1-51.png" />
            
            </figure><p>We’re not going to bury the lede: we’re excited to launch a major update to our D1 database, with dramatic improvements to performance and scalability. Alpha users (which includes <i>any</i> Workers user) can create new databases using the new storage backend right now with the following command:</p>
            <pre><code>$ wrangler d1 create your-database --experimental-backend</code></pre>
            <p>In the coming weeks, it’ll be the default experience for everyone, but we want to invite developers to start experimenting with the new version of D1 immediately. We’ll also be sharing more about how we built D1’s new storage subsystem, and how it benefits from Cloudflare’s distributed network, very soon.</p>
    <div>
      <h3>Remind me: What’s D1?</h3>
      <a href="#remind-me-whats-d1">
        
      </a>
    </div>
    <p>D1 is Cloudflare’s <a href="https://www.cloudflare.com/developer-platform/products/d1/">native serverless database</a>, which we <a href="/d1-open-alpha/">launched into alpha</a> in November last year. Developers have been building complex applications with Workers, KV, Durable Objects, and more recently, Queues &amp; R2, but they’ve also been consistently asking us for one thing: a database they can query.</p><p>We also heard consistent feedback that it should be SQL-based, scale-to-zero, and (just like Workers itself), take a Region: Earth approach to replication. And so we took that feedback and set out to build D1, with SQLite giving us a familiar SQL dialect, robust query engine and one of the most battle tested code-bases to build on.</p><p>We shipped the first version of D1 as a “real” alpha: a way for us to develop in the open, gather feedback directly from developers, and better prioritize what matters. And living up to the alpha moniker, there were bugs, performance issues and a fairly narrow “happy path”.</p><p>Despite that, we’ve seen developers spin up thousands of databases, make billions of queries, popular ORMs like <a href="https://github.com/drizzle-team/drizzle-orm">Drizzle</a> and <a href="https://github.com/kysely-org/kysely">Kysely</a> add support for D1 (already!), and <a href="https://github.com/jose-donato/race-stack">Remix</a> and <a href="https://github.com/Atinux/nuxt-todos-edge">Nuxt</a> templates build directly around it, as well.</p>
    <div>
      <h3>Turning it up to 11</h3>
      <a href="#turning-it-up-to-11">
        
      </a>
    </div>
    <p>If you’ve used D1 in its alpha state to date: forget everything you know. D1 is now substantially faster: up to 20x faster on the well-known <a href="https://github.com/cloudflare/d1-northwind">Northwind Traders Demo</a>, which we’ve <a href="https://northwind.d1sql.com/">just migrated</a> to use our new storage backend:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4sD4jP1YsKJQN81B4R2Mfv/2747c22b32fc38d1f02b4e47c61ed683/image5-12.png" />
            
            </figure><p>Our new architecture also increases write performance: a simple benchmark inserting 1,000 rows (each row about 200 bytes wide) is approximately 6.8x faster than the previous version of D1.</p><p>Larger batches (10,000 rows at ~200 bytes wide) see an even larger improvement: between 10-11x, with the new storage backend’s <a href="https://www.cloudflare.com/learning/performance/glossary/what-is-latency/">latency</a> also being significantly more consistent. We’ve also not yet started to optimize our overall write throughput, and so expect D1 to only get faster here.</p><p>With our new storage backend, we also want to make clear that D1 is not a toy, and we’re constantly benchmarking our performance against other serverless databases. A query against a 500,000 row key-value table (recognizing that benchmarks are inherently synthetic) sees D1 perform about 3.2x faster than a popular serverless Postgres provider:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2RQ3n0MEB7ZOGlrsg7rU0S/d536f7c25b66099897be571baa2be372/download-14.png" />
            
            </figure><p>We ran the Postgres queries several times to prime the page cache and then took the median query time, as measured by the server. We’ll continue to sharpen our performance edge as we go forward.</p><p>Developers with existing databases can import data into a new database backed by the storage engine by following the steps to <a href="https://developers.cloudflare.com/d1/learning/backups/#downloading-a-backup-locally">export their database</a> and then <a href="https://developers.cloudflare.com/d1/learning/importing-data/#import-an-existing-database">import it</a> in our docs.</p>
    <div>
      <h3>What did I miss?</h3>
      <a href="#what-did-i-miss">
        
      </a>
    </div>
    <p>We’ve also been working on a number of improvements to D1’s developer experience:</p><ul><li><p>A new console interface that allows you to issue queries directly from the dashboard, making it easier to get started and/or issue one-shot queries.</p></li><li><p>Formal <a href="https://developers.cloudflare.com/d1/learning/querying-json/">support for JSON functions</a> that query over JSON directly in your database.</p></li><li><p><a href="https://developers.cloudflare.com/d1/learning/data-location/">Location Hints</a>, allowing you to influence where your leader (which is responsible for writes) is located globally.</p></li></ul><p>Although D1 is designed to work natively within Cloudflare Workers, we realize that there’s often a need to quickly issue one-shot queries via CLI or a web editor when prototyping or just exploring a database. On top of the <a href="https://developers.cloudflare.com/workers/wrangler/commands/#execute">support in wrangler for executing queries</a> (and files), we’ve also introduced a console editor that allows you to issue queries, inspect tables, and even edit data on the fly:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2rZ9d0Hqrq10a8tfkJ4atE/b541e1c27513569c560d4da2a26b8d16/image3-20.png" />
            
            </figure><p><a href="https://developers.cloudflare.com/d1/learning/querying-json/#extracting-values">JSON functions</a> allow you to query JSON stored in TEXT columns in D1: allowing you to be flexible about what data is associated strictly with your relational database schema and what isn’t, whilst still being able to query all of it via SQL (before it reaches your app).</p><p>For example, suppose you store the last login timestamps as a JSON array in a login_history TEXT column: I can query (and extract) sub-objects or array items directly by providing a path to their key:</p>
            <pre><code>SELECT user_id, json_extract(login_history, '$.[0]') as latest_login FROM users</code></pre>
            <p>D1’s support for JSON functions is extremely flexible, and leverages the SQLite core that D1 builds on.</p><p>When you create a database for the first time with D1, we automatically infer the location based on where you’re currently connecting from. There are some cases, however, where you might want to influence that — maybe you’re traveling, or you have a distributed team that’s distinct from the region you expect the majority of your writes to come from.</p><p>D1’s support for Location Hints makes that easy:</p>
            <pre><code># Automatically inferred based your location
$ wrangler d1 create user-prod-db --experimental-backend

# Indicate a preferred location to create your database
$ wrangler d1 create eu-users-db --location=weur --experimental-backend</code></pre>
            <p><a href="https://developers.cloudflare.com/r2/buckets/data-location/">Location Hints</a> are also now available in the Cloudflare dashboard:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/12JpcimRiOP51cuhDFpmg4/a71d83d15d3277152775cb5a96f963dc/image4-20.png" />
            
            </figure><p>We’ve also published <a href="https://developers.cloudflare.com/d1/">more documentation</a> to help developers not only get started, but make use of D1’s advanced features. Expect D1’s documentation to continue to grow substantially over the coming months.</p>
    <div>
      <h3>Not going to burn a hole in your wallet</h3>
      <a href="#not-going-to-burn-a-hole-in-your-wallet">
        
      </a>
    </div>
    <p>We’ve had many, many developers ask us about how we’ll be pricing D1 since we announced the alpha, and we’re ready to share what it’s going to look like. We know it’s important to understand what something might cost <i>before</i> you start building on it, so you’re not surprised six months later.</p><p>In a nutshell:</p><ul><li><p>We’re announcing pricing so that you can start to model how much D1 will cost for your use-case ahead of time. Final pricing may be subject to change, although we expect changes to be relatively minor.</p></li><li><p>We won’t be enabling billing until later this year, and we’ll notify existing D1 users via email ahead of that change. Until then, D1 will remain free to use.</p></li><li><p>D1 will include an always-free tier, included usage as part of our $5/mo Workers subscription, and charge based on reads, writes and storage.</p></li></ul><p>If you’re already subscribed to Workers, then you don’t have to lift a finger: your existing subscription will have D1 usage included when we enable billing in the future.</p><p>Here’s a summary (we’re keeping it intentionally simple):</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3R6k7Iu1ARM90hAytxmGkK/7a909968043acb71cec31f73d7485484/Screenshot-2023-05-19-at-10.14.58.png" />
            
            </figure><p>Importantly, <b>when we enable global read replication, you won’t have to pay extra for it, nor will replication multiply your storage consumption</b>. We think built-in, automatic replication is important, and we don’t think developers should have to pay multiplicative costs (replicas x storage fees) in order to make their database fast <i>everywhere</i>.</p><p>Beyond that, we wanted to ensure D1 took the best parts of serverless pricing — scale-to-zero and pay-for-what-you-use — so that you’re not trying to figure out how many CPUs and/or how much memory you need for your workload or writing scripts to scale down your infrastructure during quieter hours.</p><p>D1’s read pricing is based on the familiar concept of a read unit (per 4KB read), and a write unit (per 1KB written). A query that reads (scans) ~10,000 rows of 64 bytes each would consume 160 read units. Write a big 3KB row in a “blog_posts” table that has a lot of <a href="https://blog.cloudflare.com/markdown-for-agents/">Markdown</a>, and that’s three write units. And if you <a href="https://developers.cloudflare.com/d1/learning/using-indexes/">create indexes for your most popular queries</a> to improve performance and reduce how much data those queries need to scan, you’ll also reduce how much we bill you. We think making the fast path more cost-efficient by default is the right approach.</p><p>Importantly: we’ll continue to take feedback on our pricing before we flip the switch.</p>
    <div>
      <h3>Time Travel</h3>
      <a href="#time-travel">
        
      </a>
    </div>
    <p>We’re also introducing new backup functionality: point-in-time-recovery, and we’re calling this Time Travel, because it feels just like it. <b>Time Travel allows you to restore your D1 database to any minute within the last 30 days, and will be built into D1 databases</b> <b>using our new storage system</b>. We expect to turn on Time Travel for new D1 databases in the very near future.</p><p>What makes Time Travel really powerful is that you <i>no longer need to panic</i> and wonder “oh wait, did I take a backup before I made this major change?!” — because we do it for you. We retain a stream of all changes to your database (the <a href="https://www.sqlite.org/wal.html">Write-Ahead Log</a>), allowing us to restore your database to a <i>point in time</i> by replaying those changes in sequence up until that point.</p><p>Here’s an example (subject to some minor API changes):</p>
            <pre><code># Using a precise Unix timestamp (in UTC):
$ wrangler d1 time-travel my-database --before-timestamp=1683570504

# Alternatively, restore prior to a specific transaction ID:
$ wrangler d1 time-travel my-database --before-tx-id=01H0FM2XHKACETEFQK2P5T6BWD</code></pre>
            <p>And although the idea of point-in-time recovery is not new, it’s often a paid add-on, if it is even available at all. Realizing you should have had it turned on after you’ve deleted or otherwise made a mistake means it’s often too late.</p><p>For example, imagine if I made the classic mistake of forgetting a WHERE on an UPDATE statement:</p>
            <pre><code>-- Don't do this at home
UPDATE users SET email = 'matt@example.com' -- missing: WHERE id = "abc123"</code></pre>
            <p>Without Time Travel, I’d have to hope that either a scheduled backup ran recently, or that I remembered to make a manual backup just prior. With Time Travel, I can restore to a point a minute or so before that mistake (and hopefully learn a lesson for next time).</p><p>We’re also exploring features that can surface larger changes to your database state, including making it easier to identify schema changes, the number of tables, large deltas in data stored <i>and even specific queries</i> (via transaction IDs) — to help you better understand exactly what point in time to restore your database to.</p>
    <div>
      <h3>On the roadmap</h3>
      <a href="#on-the-roadmap">
        
      </a>
    </div>
    <p>So what’s next for D1?</p><ul><li><p><b>Open beta</b>: we’re ensuring we’ve observed our new storage subsystem under load (and real-world usage) prior to making it the default for all `d1 create` commands. We hold a high bar for durability and availability, even for a “beta”, and we also recognize that access to backups (Time Travel) is important for folks to trust a new database. Keep an eye on the Cloudflare blog in the coming weeks for more news here!</p></li><li><p><b>Bigger databases</b>: we know this is a big ask from many, and we’re extremely close. Developers on the <a href="https://developers.cloudflare.com/workers/platform/pricing/#workers">Workers Paid plan</a> will get access to 1GB databases in the very near future, and we’ll be continuing to ramp up the maximum per-database size over time.</p></li><li><p><b>Metrics &amp; observability</b>: you’ll be able to inspect overall query volume by database, failing queries, storage consumed and read/write units via both the D1 dashboard and <a href="https://developers.cloudflare.com/analytics/graphql-api/">our GraphQL API</a>, so that it’s easier to debug issues and track spend.</p></li><li><p><b>Automatic read replication</b>: our new storage subsystem is built with replication in mind, and we’re working on ensuring our replication layer is both fast &amp; reliable before we roll it out to developers. Read replication is not only designed to improve query latency by storing copies — replicas — of your data in multiple locations, close to your users, but will also allow us to scale out D1 databases horizontally for those with larger workloads.</p></li></ul><p>In the meantime, you can <a href="https://developers.cloudflare.com/d1/get-started/">start prototyping and experimenting with D1</a> right now, explore our D1 + Drizzle + Remix <a href="https://github.com/rozenmd/d1-drizzle-remix-example">example project</a>, or join the <a href="https://discord.cloudflare.com/">#d1 channel</a> on the Cloudflare Developers Discord server to engage directly with the D1 team and others building on D1.</p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div></div><p></p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">7Cf3wNyDzlMT9quIsuxTYs</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>Glen Maddern</dc:creator>
        </item>
        <item>
            <title><![CDATA[Announcing connect() — a new API for creating TCP sockets from Cloudflare Workers]]></title>
            <link>https://blog.cloudflare.com/workers-tcp-socket-api-connect-databases/</link>
            <pubDate>Tue, 16 May 2023 13:00:13 GMT</pubDate>
            <description><![CDATA[ Today, we are excited to announce a new API in Cloudflare Workers for creating outbound TCP sockets, making it possible to connect directly to databases and any TCP-based service from Workers ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1CjlPkdLJUXlfgIKgq2Jvy/d2e17e3027c02f82e191007561640f79/image2-12.png" />
            
            </figure><p>Today, we are excited to announce a new API in Cloudflare Workers for creating outbound TCP sockets, making it possible to connect directly to any TCP-based service from Workers.</p><p>Standard protocols including <a href="https://www.cloudflare.com/learning/access-management/what-is-ssh/">SSH</a>, MQTT, SMTP, FTP, and IRC are all built on top of TCP. Most importantly, nearly all applications need to connect to databases, and most databases speak TCP. And while <a href="https://developers.cloudflare.com/d1/">Cloudflare D1</a> works seamlessly on Workers, and some <a href="https://developers.cloudflare.com/workers/learning/integrations/databases/">hosted database providers</a> allow connections over HTTP or WebSockets, the vast majority of databases, both relational (SQL) and document-oriented (NoSQL), require clients to connect by opening a direct TCP “socket”, an ongoing two-way connection that is used to send queries and receive data. Now, Workers provides an API for this, the first of many steps to come in allowing you to use any database or infrastructure you choose when building full-stack applications on Workers.</p><p>Database drivers, the client code used to connect to databases and execute queries, are already using this new API. <a href="https://github.com/brianc/node-postgres">pg</a>, the most widely used JavaScript database driver for PostgreSQL, works on Cloudflare Workers today, with more database drivers to come.</p><p>The TCP Socket API is available today to everyone. Get started by reading the <a href="https://developers.cloudflare.com/workers/runtime-apis/tcp-sockets">TCP Socket API docs</a>, or connect directly to any PostgreSQL database from your Worker by following <a href="https://developers.cloudflare.com/workers/databases/connect-to-postgres/">this guide</a>.</p>
    <div>
      <h2>First — what is a TCP Socket?</h2>
      <a href="#first-what-is-a-tcp-socket">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/learning/ddos/glossary/tcp-ip/">TCP (Transmission Control Protocol)</a> is a foundational networking protocol of the Internet. It is the underlying protocol that is used to make HTTP requests (prior to <a href="https://www.cloudflare.com/learning/performance/what-is-http3/">HTTP/3</a>, which uses <a href="https://cloudflare-quic.com/">QUIC</a>), to send email over <a href="https://www.cloudflare.com/learning/email-security/what-is-smtp/">SMTP</a>, to query databases using database–specific protocols like MySQL, and many other application-layer protocols.</p><p>A TCP socket is a programming interface that represents a two-way communication connection between two applications that have both agreed to “speak” over TCP. One application (ex: a Cloudflare Worker) initiates an outbound TCP connection to another (ex: a database server) that is listening for inbound TCP connections. Connections are established by negotiating a three-way handshake, and after the handshake is complete, data can be sent bi-directionally.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6xxArl43DbexJUoRmw8JrG/0ad545bb25f002a4598d387aca491997/image1-30.png" />
            
            </figure><p>A socket is the programming interface for a single TCP connection — it has both a readable and writable “stream” of data, allowing applications to read and write data on an ongoing basis, as long as the connection remains open.</p>
    <div>
      <h2>connect() — A simpler socket API</h2>
      <a href="#connect-a-simpler-socket-api">
        
      </a>
    </div>
    <p>With Workers, we aim to support standard APIs that are supported across browsers and non-browser environments wherever possible, so that as many NPM packages as possible work on Workers without changes, and package authors don’t have to write runtime-specific code. But for TCP sockets, we faced a challenge — there was no clear shared standard across runtimes. Node.js provides the <a href="https://nodejs.org/api/net.html">net</a> and <a href="https://nodejs.org/api/tls.html">tls</a> APIs, but Deno implements a different API — <a href="https://deno.land/api@v1.33.1?s=Deno.connect">Deno.connect</a>. And web browsers do not provide a raw TCP socket API, though a <a href="https://github.com/WICG/direct-sockets/blob/main/docs/explainer.md">WICG proposal</a> does exist, and it is different from both Node.js and Deno.</p><p>We also considered how a TCP socket API could be designed to maximize performance and ergonomics in a serverless environment. Most networking APIs were designed well before serverless emerged, with the assumption that the developer’s application is also the server, responsible for directly handling configuring TLS options and credentials.</p><p>With this backdrop, we reached out to the community, with a focus on maintainers of database drivers, ORMs and other libraries that create outbound TCP connections. Using this feedback, we’ve tried to incorporate the best elements of existing APIs and proposals, and intend to contribute back to future standards, as part of the <a href="/introducing-the-wintercg/">Web-interoperable Runtimes Community Group (WinterCG)</a>.</p><p>The API we landed on is a simple function, connect(), imported from the new cloudflare:sockets module, that returns an instance of a Socket. Here’s a simple example showing it used to connect to a <a href="https://www.w3.org/People/Bos/PROSA/rep-protocols.html#gopher">Gopher</a> server. Gopher was one of the Internet’s early protocols that relied on TCP/IP, and still works today:</p>
            <pre><code>import { connect } from 'cloudflare:sockets';

export default {
  async fetch(req: Request) {
    const gopherAddr = "gopher.floodgap.com:70";
    const url = new URL(req.url);

    try {
      const socket = connect(gopherAddr);

      const writer = socket.writable.getWriter()
      const encoder = new TextEncoder();
      const encoded = encoder.encode(url.pathname + "\r\n");
      await writer.write(encoded);

      return new Response(socket.readable, { headers: { "Content-Type": "text/plain" } });
    } catch (error) {
      return new Response("Socket connection failed: " + error, { status: 500 });
    }
  }
};</code></pre>
            <p>We think this API design has many benefits that can be realized not just on Cloudflare, but in any serverless environment that adopts this design:</p>
            <pre><code>connect(address: SocketAddress | string, options?: SocketOptions): Socket

declare interface Socket {
  get readable(): ReadableStream;
  get writable(): WritableStream;
  get closed(): Promise&lt;void&gt;;
  close(): Promise&lt;void&gt;;
  startTls(): Socket;
}

declare interface SocketOptions {
  secureTransport?: string;
  allowHalfOpen: boolean;
}

declare interface SocketAddress {
  hostname: string;
  port: number;
}</code></pre>
            
    <div>
      <h3>Opportunistic TLS (StartTLS), without separate APIs</h3>
      <a href="#opportunistic-tls-starttls-without-separate-apis">
        
      </a>
    </div>
    <p>Opportunistic TLS, a pattern of creating an initial insecure connection, and then upgrading it to a secure one that uses TLS, remains common, particularly with database drivers. In Node.js, you must use the <a href="https://nodejs.org/api/net.html#class-netsocket">net</a> API to create the initial connection, and then use the <a href="https://nodejs.org/api/tls.html">tls</a> API to create a new, upgraded connection. In Deno, you pass the original socket to <a href="https://deno.land/api@v1.33.1?s=Deno.startTls">Deno.startTls()</a>, which creates a new, upgraded connection.</p><p>Drawing on a <a href="https://www.w3.org/TR/tcp-udp-sockets/#idl-def-TCPOptions">previous W3C proposal</a> for a TCP Socket API, we’ve simplified this by providing one API, that allows TLS to be enabled, allowed, or used when creating a socket, and exposes a simple method, startTls(), for upgrading a socket to use TLS.</p>
            <pre><code>// Create a new socket without TLS. secureTransport defaults to "off" if not specified.
const socket = connect("address:port", { secureTransport: "off" })

// Create a new socket, then upgrade it to use TLS.
// Once startTls() is called, only the newly created socket can be used.
const socket = connect("address:port", { secureTransport: "starttls" })
const secureSocket = socket.startTls();

// Create a new socket with TLS
const socket = connect("address:port", { secureTransport: "use" })</code></pre>
            
    <div>
      <h3>TLS configuration — a concern of host infrastructure, not application code</h3>
      <a href="#tls-configuration-a-concern-of-host-infrastructure-not-application-code">
        
      </a>
    </div>
    <p>Existing APIs for creating TCP sockets treat TLS as a library that you interact with in your application code. The <a href="https://nodejs.org/api/tls.html#tlscreatesecurecontextoptions">tls.createSecureContext()</a> API from Node.js has a plethora of advanced configuration options that are mostly environment specific. If you use custom certificates when connecting to a particular service, you likely use a different set of credentials and options in production, staging and development. Managing direct file paths to credentials across environments and swapping out .env files in production build steps are common pain points.</p><p>Host infrastructure is best positioned to manage this on your behalf, and similar to Workers support for <a href="/mtls-workers/">making subrequests using mTLS</a>, TLS configuration and credentials for the socket API will be managed via Wrangler, and a connect() function provided via a <a href="https://developers.cloudflare.com/workers/platform/bindings/">capability binding</a>. Currently, custom TLS credentials and configuration are not supported, but are coming soon.</p>
    <div>
      <h3>Start writing data immediately, before the TLS handshake finishes</h3>
      <a href="#start-writing-data-immediately-before-the-tls-handshake-finishes">
        
      </a>
    </div>
    <p>Because the connect() API synchronously returns a new socket, one can start writing to the socket immediately, without waiting for the TCP handshake to first complete. This means that once the handshake completes, data is already available to send immediately, and host platforms can make use of pipelining to optimize performance.</p>
    <div>
      <h2>connect() API + DB drivers = Connect directly to databases</h2>
      <a href="#connect-api-db-drivers-connect-directly-to-databases">
        
      </a>
    </div>
    <p>Many <a href="https://www.cloudflare.com/developer-platform/products/d1/">serverless databases</a> already work on Workers, allowing clients to connect over HTTP or over <a href="/neon-postgres-database-from-workers/">WebSockets</a>. But most databases don’t “speak” HTTP, including databases hosted on most cloud providers.</p><p>Databases each have their own “wire protocol”, and open-source database “drivers” that speak this protocol, sending and receiving data over a TCP socket. Developers rely on these drivers in their own code, as do database ORMs. Our goal is to make sure that you can use the same drivers and ORMs you might use in other runtimes and on other platforms on Workers.</p>
    <div>
      <h2>Try it now — connect to PostgreSQL from Workers</h2>
      <a href="#try-it-now-connect-to-postgresql-from-workers">
        
      </a>
    </div>
    <p>We’ve worked with the maintainers of <a href="https://www.npmjs.com/package/pg">pg</a>, one of the most popular database drivers in the JavaScript ecosystem, used by ORMs including <a href="https://sequelize.org/docs/v6/getting-started/">Sequelize</a> and <a href="https://knexjs.org/">knex.js</a>, to add support for connect().</p><p>You can try this right now. First, create a new Worker and install pg:</p>
            <pre><code>wrangler init
npm install --save pg</code></pre>
            <p>As of this writing, you’ll need to <a href="https://developers.cloudflare.com/workers/wrangler/configuration/#add-polyfills-using-wrangler">enable the node_compat</a> option in wrangler.toml:</p><p><b>wrangler.toml</b></p>
            <pre><code>name = "my-worker"
main = "src/index.ts"
compatibility_date = "2023-05-15"
node_compat = true</code></pre>
            <p>In just 20 lines of TypeScript, you can create a connection to a Postgres database, execute a query, return results in the response, and close the connection:</p><p><b>index.ts</b></p>
            <pre><code>import { Client } from "pg";

export interface Env {
  DB: string;
}

export default {
  async fetch(
    request: Request,
    env: Env,
    ctx: ExecutionContext
  ): Promise&lt;Response&gt; {
    const client = new Client(env.DB);
    await client.connect();
    const result = await client.query({
      text: "SELECT * from customers",
    });
    console.log(JSON.stringify(result.rows));
    const resp = Response.json(result.rows);
    // Close the database connection, but don't block returning the response
    ctx.waitUntil(client.end());
    return resp;
  },
};</code></pre>
            <p>To test this in local development, use the <code>--experimental-local</code> flag (instead of <code>–local</code>), which <a href="/miniflare-and-workerd/">uses the open-source Workers runtime</a>, ensuring that what you see locally mirrors behavior in production:</p>
            <pre><code>wrangler dev --experimental-local</code></pre>
            
    <div>
      <h2>What’s next for connecting to databases from Workers?</h2>
      <a href="#whats-next-for-connecting-to-databases-from-workers">
        
      </a>
    </div>
    <p>This is only the beginning. We’re aiming for the two popular MySQL drivers, <a href="https://github.com/mysqljs/mysql">mysql</a> and <a href="https://github.com/sidorares/node-mysql2">mysql2</a>, to work on Workers soon, with more to follow. If you work on a database driver or ORM, we’d love to help make your library work on Workers.</p><p>If you’ve worked more closely with database scaling and performance, you might have noticed that in the example above, a new connection is created for every request. This is one of the biggest current challenges of connecting to databases from serverless functions, across all platforms. With typical client connection pooling, you maintain a local pool of database connections that remain open. This approach of storing a reference to a connection or connection pool in global scope will not work, and is a poor fit for serverless. Managing individual pools of client connections on a per-isolate basis creates other headaches — when and how should connections be terminated? How can you limit the total number of concurrent connections across many isolates and locations?</p><p>Instead, we’re already working on simpler approaches to connection pooling for the most popular databases. We see a path to a future where you don’t have to think about or manage client connection pooling on your own. We’re also working on a brand new approach to making your database reads lightning fast.</p>
    <div>
      <h2>What’s next for sockets on Workers?</h2>
      <a href="#whats-next-for-sockets-on-workers">
        
      </a>
    </div>
    <p>Supporting outbound TCP connections is only one half of the story — we plan to support inbound TCP and UDP connections, as well as new emerging application protocols based on QUIC, so that you can build applications beyond HTTP with <a href="/introducing-socket-workers/">Socket Workers</a>.</p><p>Earlier today we also announced <a href="/announcing-workers-smart-placement">Smart Placement</a>, which improves performance by placing any Worker that makes multiple HTTP requests to an origin run as close as possible to reduce round-trip time. We’re working on making this work with Workers that open TCP connections, so that if your Worker connects to a database in Virginia and makes many queries over a TCP connection, each query is lightning fast and comes from the nearest location on <a href="https://www.cloudflare.com/network/">Cloudflare’s global network</a>.</p><p>We also plan to support custom certificates and other TLS configuration options in the coming months — tell us what is a must-have in order to connect to the services you need to connect to from Workers.</p>
    <div>
      <h2>Get started, and share your feedback</h2>
      <a href="#get-started-and-share-your-feedback">
        
      </a>
    </div>
    <p>The TCP Socket API is available today to everyone. Get started by reading the <a href="https://developers.cloudflare.com/workers/runtime-apis/tcp-sockets">TCP Socket API docs</a>, or connect directly to any PostgreSQL database from your Worker by following <a href="https://developers.cloudflare.com/workers/databases/connect-to-postgres/">this guide</a>.</p><p>We want to hear your feedback, what you’d like to see next, and more about what you’re building. Join the <a href="https://discord.cloudflare.com/">Cloudflare Developers Discord</a>.</p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div></div><p></p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[TCP]]></category>
            <category><![CDATA[Database]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">14RexUSLCzOVnpWl5DkGIq</guid>
            <dc:creator>Brendan Irvine-Broque</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[Private by design: building privacy-preserving products with Cloudflare's Privacy Edge]]></title>
            <link>https://blog.cloudflare.com/privacy-edge-making-building-privacy-first-apps-easier/</link>
            <pubDate>Wed, 28 Sep 2022 13:00:00 GMT</pubDate>
            <description><![CDATA[ Introducing Privacy Edge – a collection of products that make it easier for site owners and developers to protect their users’ privacy by default.  ]]></description>
            <content:encoded><![CDATA[ <p></p><p>When Cloudflare was founded, our value proposition had three pillars: more secure, more reliable, and more performant. Over time, we’ve realized that a better Internet is also a more private Internet, and we want to play a role in building it.</p><p>User awareness and expectations of and for privacy are higher than ever, but we believe that application developers and platforms shouldn’t have to start from scratch. We’re excited to introduce Privacy Edge – Code Auditability, Privacy Gateway, Privacy Proxy, and Cooperative Analytics – a suite of products that make it easy for site owners and developers to build privacy into their products, by default.</p>
    <div>
      <h3>Building network-level privacy into the foundations of app infrastructure</h3>
      <a href="#building-network-level-privacy-into-the-foundations-of-app-infrastructure">
        
      </a>
    </div>
    <p>As you’re browsing the web every day, information from the networks and apps you use can expose more information than you intend. When accumulated over time, <a href="https://coveryourtracks.eff.org/">identifiers</a> like your IP address, cookies, browser and device characteristics create a unique profile that can be used to track your browsing activity. We don’t think this status quo is right for the Internet, or that consumers should have to understand the complex ecosystem of third-party trackers to maintain privacy. Instead, we’ve been working on technologies that encourage and enable website operators and app developers to build privacy into their products at the protocol level.</p><p>Getting privacy right is hard. We figured we’d start in the area we know best: building privacy into our network infrastructure. Like other work we’ve done in this space – offering <a href="https://www.cloudflare.com/application-services/products/ssl/">free SSL certificates</a> to make encrypted HTTP requests the norm, and <a href="/announcing-1111/">launching 1.1.1.1</a>, a privacy-respecting DNS resolver, for example – the products we’re announcing today are built upon the foundations of open Internet standards, many of which are co-authored by members of our <a href="https://research.cloudflare.com/">Research Team</a>.</p><p>Privacy Edge – the collection of products we’re announcing today, includes:</p><ul><li><p><b>Privacy Gateway:</b> A lightweight proxy that encrypts request data and forwards it through an IP-blinding relay</p></li><li><p><b>Code Auditability:</b> A solution to verifying that code delivered in your browser hasn’t been tampered with</p></li><li><p><b>Private Proxy:</b> A proxy that offers the protection of a VPN, built natively into application architecture</p></li><li><p><b>Cooperative Analytics:</b> A multi-party computation approach to measurement and analytics based on an emerging distributed aggregation protocol.</p></li></ul><p>Today’s announcement of Privacy Edge isn’t exhaustive. We’re continuing to explore, research and develop new privacy-enabling technologies, and we’re excited about all of them.</p>
    <div>
      <h3>Privacy Gateway: IP address privacy for your users</h3>
      <a href="#privacy-gateway-ip-address-privacy-for-your-users">
        
      </a>
    </div>
    <p>There are situations in which applications only need to receive certain HTTP requests for app functionality, but linking that data with who or where it came from creates a privacy concern.</p><p>We recently partnered with <a href="https://www.theverge.com/2022/9/14/23351957/flo-period-tracker-privacy-anonymous-mode">Flo Health</a>, a period tracking app, to solve exactly that privacy concern: for users that have turned on “Anonymous mode,” Flo encrypts and forwards traffic through Privacy Gateway so that the network-level request information (most importantly, users’ IP addresses) are replaced by the Cloudflare network.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6ZyPg6bstt6kfkkKdgLrxX/635d48b71118bdb22b9fd8b2d6cb5adf/Screen-Shot-2022-10-10-at-3.23.56-PM.png" />
            
            </figure><p>How data is encapsulated, forwarded, and decapsulated in the Privacy Gateway system.</p><p>So how does it work? <b>Privacy Gateway</b> is based on Oblivious HTTP, an <a href="https://datatracker.ietf.org/doc/draft-ietf-ohai-ohttp/">emerging IETF standard</a>, and at a high level describes the following data flow:</p><ol><li><p>The client <a href="https://datatracker.ietf.org/doc/html/draft-ietf-ohai-ohttp-04#section-4.3">encapsulates an HTTP request</a> using the public key of the customer’s gateway server, and sends it to the relay over a client&lt;&gt;relay HTTPS connection.</p></li><li><p>The relay forwards the request to the server over its own relay&lt;&gt;gateway HTTPS connection.</p></li><li><p>The gateway server decapsulates the request, forwarding it to the application server.</p></li><li><p>The gateway server returns an <a href="https://datatracker.ietf.org/doc/html/draft-thomson-http-oblivious-02#section-5.2">encapsulated response</a> to the relay, which then forwards the result to the client.</p></li></ol><p>The novel feature Privacy Gateway implements from the OHTTP specification is that messages sent through the relay are encrypted (via <a href="/hybrid-public-key-encryption/">HPKE</a>) <i>to the application server</i>, so that the relay learns nothing of the application data beyond the source and destination of each message.</p><p>The end result is that the relay will know where the data request is coming from (i.e. users’ IP addresses) but not what it contains (i.e. contents of the request), and the application can see what the data contains but won’t know where it comes from. A win for end-user privacy.</p>
    <div>
      <h3>Delivering verifiable and authentic code for privacy-critical applications</h3>
      <a href="#delivering-verifiable-and-authentic-code-for-privacy-critical-applications">
        
      </a>
    </div>
    <p>How can you ensure that the code — the JavaScript, CSS or even HTML —delivered to a browser hasn’t been tampered with?</p><p>One way is to generate a hash (a consistent, unique, and shorter representation) of the code, and have two independent parties compare those hashes when delivered to the user's browser.</p><p>Our <b>Code Auditability</b> service does exactly that, and our recent <a href="/cloudflare-verifies-code-whatsapp-web-serves-users/">partnership with Meta</a> deployed it at scale to WhatsApp Web. Installing their <a href="https://chrome.google.com/webstore/detail/code-verify/llohflklppcaghdpehpbklhlfebooeog/">Code Verify browser extension</a> ensures users can be sure that they are delivered the code they’re intended to run – free of tampering or corrupted files.</p><p>With WhatsApp Web:</p><ol><li><p>WhatsApp publishes the latest version of their JavaScript libraries to their servers, and the corresponding hash for that version to Cloudflare’s audit endpoint.</p></li><li><p>A WhatsApp web client fetches the latest libraries from WhatsApp.</p></li><li><p>The Code Verify browser extension subsequently fetches the hash for that version from Cloudflare over a separate, secure connection.</p></li><li><p>Code Verify compares the “known good” hash from Cloudflare with the hash of the libraries it locally computed.</p></li></ol><p>If the hashes match, as they should under almost any circumstance, the code is “verified” from the perspective of the extension. If the hashes don’t match, it indicates that the code running on the user's browser is different from the code WhatsApp intended to run on all its user's browsers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7J7iewJRkCFkCVS83Vz78z/78ab22efc4567616ce4710193358d18f/image4-22.png" />
            
            </figure><p>How Cloudflare and WhatsApp Web verify code shipped to users isn't tampered with.</p><p>Right now, we call this "Code Auditability" and we see a ton of other potential use cases including password managers, email applications, certificate issuance – all technologies that are potentially targets of tampering or security threats because of the sensitive data they handle.</p><p>In the near term, we’re working with other app developers to co-design solutions that meet their needs for privacy-critical products. In the long term, we’re working on standardizing the approach, including building on existing <a href="https://w3c.github.io/webappsec-csp">Content Security Policy</a> standards, or the <a href="https://github.com/WICG/isolated-web-apps">Isolated Web Apps</a> proposal, and even an approach towards building Code Auditability natively into the browser so that a browser extension (existing or new) isn't required.</p>
    <div>
      <h3>Privacy-preserving proxying – built into applications</h3>
      <a href="#privacy-preserving-proxying-built-into-applications">
        
      </a>
    </div>
    <p>What if applications could build the protection of a VPN into their products, by default?</p><p><b>Privacy Proxy</b> is our platform to proxy traffic through Cloudflare using a combination of privacy protocols that make it much more difficult to track users’ web browsing activity over time. At a high level, the Privacy Proxy Platform encrypts browsing traffic, replaces a device’s IP address with one from the Cloudflare network, and then forwards it onto its destination.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7o0fEdQW7qgzJugHzkc4qO/cc8b78a91f186f4706ba163cd9f31d2b/image5-21.png" />
            
            </figure><p>System architecture for Privacy Proxy.</p><p>The Privacy Proxy platform consists of several pieces and protocols to make it work:</p><ol><li><p>Privacy API: a service that issues unique <a href="https://www.ietf.org/archive/id/draft-private-access-tokens-01.html">cryptographic tokens</a>, later redeemed against the proxy service to ensure that only valid clients are able to connect to the service.</p></li><li><p>Geolocated IP assignment: a service that assigns each connection a new Cloudflare IP address based on the client’s <a href="/geoexit-improving-warp-user-experience-larger-network/">approximate location</a>.</p></li><li><p>Privacy Proxy: the <a href="https://datatracker.ietf.org/wg/masque/about/?cf_target_id=FFEC349381334FBB00C45C937C7B2088">HTTP CONNECT</a>-based service running on Cloudflare’s network that handles the proxying of traffic. This service validates the privacy token passed by the client, enforces any double spend prevention necessary for the token.</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7mtDkUW6hrd81lfVtxZJMw/5d5d85c528cc01e7b2d8be25c39fe000/image3-41.png" />
            
            </figure><p>We’re working on several partnerships to provide network-level protection for user’s browsing traffic, most recently with Apple for <a href="/icloud-private-relay/">Private Relay</a>. Private Relay’s design adds privacy to the traditional proxy design by adding an additional hop – an ingress proxy, operated by Apple – that separates handling users’ identities (i.e., whether they’re a valid iCloud+ user) from the proxying of traffic – the egress proxy, operated by Cloudflare.</p>
    <div>
      <h3>Measurements and analytics without seeing individual inputs</h3>
      <a href="#measurements-and-analytics-without-seeing-individual-inputs">
        
      </a>
    </div>
    <p>What if you could calculate the results of a poll, without seeing individuals' votes, or update inputs to a <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/">machine learning model</a> that predicted COVID-19 exposure without seeing who was exposed?</p><p>It might seem like magic, but it's actually just cryptography. <b>Cooperative Analytics</b> is a multi-party computation system for aggregating privacy-sensitive user measurements that doesn’t reveal individual inputs, based on the <a href="https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap">Distributed Aggregation Protocol</a> (DAP).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/l4oSgLQa7hJYT0ZDAR1b1/86d7511c982984e782f84a3e10e8e1c6/image6-11.png" />
            
            </figure><p>How data flows through the Cooperative Analytics system.</p><p>At a high-level, DAP takes the core concept behind <a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce</a> — what became a fundamental way to aggregate large amounts of data — and rethinks how it would work with privacy-built in, so that each individual input cannot be (provably) mapped back to the original user.</p><p>Specifically:</p><ol><li><p>Measurements are first "secret shared," or split into multiple pieces. For example, if a user's input is the number 5, her input could be split into two shares of [10,-5].</p></li><li><p>The input share pieces are then distributed between different, non-colluding servers for aggregation (in this example, simply summed up). Similar to Privacy Gateway or Private Proxy, no one party has all the information needed to reconstruct any user's input.</p></li><li><p>Depending on the use case, the servers will then communicate with one another in order to verify that the input is "valid" – so that no one can insert an input that throws off the entire results. The magic of multi-party computation is that the servers can perform this computation without learning anything about the input beyond its validity.</p></li><li><p>Once enough input shares have been aggregated to ensure strong anonymity and a statistically significant sample size – each server sends its sum of the input shares to the overall consumer of this service to then compute the final result.</p></li></ol><p>For simplicity, the above example talks about measurements as summed up numbers, but DAP describes algorithms for multiple different types of inputs: the most common string input, or a linear regression, for example.</p><p>Early iterations of this system have been implemented by Apple and Google for COVID-19 <a href="https://www.abetterinternet.org/post/prio-services-for-covid-en/">exposure notifications</a>, but there are many other potential use cases for a system like this: think sensitive browser telemetry, geolocation data – any situation where one has a question about a population of users, but doesn't want to have to measure them directly.</p><p>Because this system requires different parties to operate separate aggregation servers, Cloudflare is working with several partners to act as one of the aggregation servers for DAP. We’re calling our implementation <a href="https://github.com/cloudflare/daphne">Daphne</a>, and it’s built on top of Cloudflare Workers.</p>
    <div>
      <h3>Privacy still requires trust</h3>
      <a href="#privacy-still-requires-trust">
        
      </a>
    </div>
    <p>Part of what's cool about these systems is that they distribute information — whether user data, network traffic, or both — amongst multiple parties.</p><p>While we think that products included in Privacy Edge are moving the Internet in the right direction, we understand that trust only goes so far. To that end, we're trying to be as transparent as possible.</p><ul><li><p>We've open sourced the code for Privacy Gateway's server and DAP's aggregation server, and all the standards work we're doing is in public with the IETF.</p></li><li><p>We're also working on detailed and accessible privacy notices for each product that describe exactly what kind of network data Cloudflare sees, doesn't see, and how long we retain it for.</p></li><li><p>And, most importantly, we’re continuing to develop new protocols (like <a href="https://datatracker.ietf.org/doc/draft-ietf-ohai-ohttp/">Oblivious HTTP</a>) and technologies that don’t just require trust, but that can provably minimize the data observed or logged.</p></li></ul><p>We'd love to see more folks get involved in the standards space, and we welcome feedback from privacy experts and potential customers on how we can improve the integrity of these systems.</p>
    <div>
      <h3>We’re looking for collaborators</h3>
      <a href="#were-looking-for-collaborators">
        
      </a>
    </div>
    <p>Privacy Edge products are currently in early access.</p><p>We're looking for application developers who want to build more private user-facing apps with Privacy Gateway; browser and existing VPN vendors looking to improve network-level security for their users via Privacy Proxy; and anyone shipping sensitive software on the Internet that is looking to iterate with us on code auditability and web app signing.</p><p>If you're interested in working with us on furthering privacy on the Internet, then <a href="https://www.cloudflare.com/lp/privacy-edge/">please reach out</a>, and we’ll be in touch!</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Privacy]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">1xemlVnKaLgTuKXb1dHZc9</guid>
            <dc:creator>Mari Galicer</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[The first Zero Trust SIM]]></title>
            <link>https://blog.cloudflare.com/the-first-zero-trust-sim/</link>
            <pubDate>Mon, 26 Sep 2022 13:40:00 GMT</pubDate>
            <description><![CDATA[ We’re announcing the first Zero Trust SIM: the next major part of Cloudflare One, combining both software and hardware layers to rethink mobile device security for organizations ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The humble cell phone is now a critical tool in the modern workplace; even more so as the modern workplace has shifted out of the office. Given the billions of mobile devices on the planet — they now outnumber PCs by an order of magnitude — it should come as no surprise that they have become the threat vector of choice for those attempting to break through corporate defenses.</p><p>The problem you face in defending against such attacks is that for most <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/">Zero Trust</a> solutions, mobile is often a second-class citizen. Those solutions are typically hard to install and manage. And they only work at the software layer, such as with <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/">WARP</a>, the mobile (and desktop) apps that connect devices directly into our Zero Trust network. And all this is before you add in the further complication of Bring Your Own Device (BYOD) that more employees are using — you’re trying to deploy Zero Trust on a device that doesn’t belong to the company.</p><p>It’s a tricky — and increasingly critical — problem to solve. But it’s also a problem which we think we can help with.</p><p>What if employers could offer their employees a deal: we'll cover your monthly data costs if you agree to let us direct your work-related traffic through a network that has Zero Trust protections built right in? And what’s more, we’ll make it super easy to install — in fact, to take advantage of it, all you need to do is scan a QR code — which can be embedded in an employee's onboarding material — from your phone's camera.</p><p>Well, we’d like to introduce you to the Cloudflare SIM: the world’s first Zero Trust SIM.</p><p>In true Cloudflare fashion, we think that combining the software layer and the network layer enables better security, performance, and reliability. By targeting a foundational piece of technology that underpins every mobile device — the (not so) humble <a href="https://en.wikipedia.org/wiki/SIM_card">SIM card</a> — we’re aiming to bring an unprecedented level of security (and performance) to the mobile world.</p>
    <div>
      <h3>The threat is increasingly mobile</h3>
      <a href="#the-threat-is-increasingly-mobile">
        
      </a>
    </div>
    <p>When we say that mobile is the new threat vector, we’re not talking in the abstract. Last month, Cloudflare was one of 130 companies that were targeted by <a href="/2022-07-sms-phishing-attacks/">a sophisticated phishing attack</a>. Mobile was the cornerstone of the attack — employees were initially reached by SMS, and the attack relied heavily on compromising 2FA codes.</p><p>So far as we’re aware, we were the only company to not be compromised.</p><p>A big part of that was because we’re continuously pushing multi-layered Zero Trust defenses. Given how foundational mobile is to how companies operate today, we’ve been working hard to further shore up Zero Trust defenses in this sphere. And this is how we think about Zero Trust SIM: another layer of defense at a different level of the stack, making life even harder for those who are trying to penetrate your organization. With the Zero Trust SIM, you get the benefits of:</p><ul><li><p>Preventing employees from visiting phishing and malware sites: DNS requests leaving the device can automatically and implicitly use Cloudflare Gateway for DNS filtering.</p></li><li><p>Mitigating common SIM attacks: an eSIM-first approach allows us to prevent SIM-swapping or cloning attacks, and by locking SIMs to individual employee devices, bring the same protections to physical SIMs.</p></li><li><p>Enabling secure, identity-based private connectivity to cloud services, on-premise infrastructure and even other devices (think: fleets of IoT devices) via Magic WAN. Each SIM can be strongly tied to a specific employee, and treated as an identity signal in conjunction with other device posture signals already supported by WARP.</p></li></ul><p>By integrating Cloudflare’s security capabilities at the SIM-level, teams can better secure their fleets of mobile devices, especially in a world where BYOD is the norm and no longer the exception.</p>
    <div>
      <h3>Zero Trust works better when it’s software + On-ramps</h3>
      <a href="#zero-trust-works-better-when-its-software-on-ramps">
        
      </a>
    </div>
    <p>Beyond all the security benefits that we get for mobile devices, the Zero Trust SIM transforms mobile into another on-ramp pillar into the Cloudflare One platform.</p><p>Cloudflare One presents a single, unified control plane: allowing organizations to apply security controls across all the traffic coming to, and leaving from, their networks, devices and infrastructure. It’s the same with logging: you want one place to get your logs, and one location for all of your security analysis. With the Cloudflare SIM, mobile is now treated as just one more way that traffic gets passed around your corporate network.</p><p>Working at the on-ramp rather than the software level has another big benefit — it grants the flexibility to allow devices to reach services <i>not</i> on the Internet, including cloud infrastructure, data centers and branch offices connected into <a href="https://www.cloudflare.com/magic-wan/">Magic WAN</a>, our <a href="https://www.cloudflare.com/learning/network-layer/network-as-a-service-naas/">Network-as-a-Service</a> platform. In fact, under the covers, we’re using the same software networking foundations that our customers use to build out the connectivity layer behind the Zero Trust SIM. This will also allow us to support new capabilities like <a href="https://www.rfc-editor.org/rfc/rfc8926.html">Geneve</a>, a new network tunneling protocol, further expanding how customers can connect their infrastructure into Cloudflare One.</p><p>We’re following efforts like <a href="https://www.gsma.com/iot/iot-safe/">IoT SAFE</a> (and parallel, non-IoT standards) that enable SIM cards to be used as a root-of-trust, which will enable a stronger association between the Zero Trust SIM, employee identity, and the potential to act as a trusted hardware token.</p>
    <div>
      <h3>Get Zero Trust up and running on mobile immediately (and easily)</h3>
      <a href="#get-zero-trust-up-and-running-on-mobile-immediately-and-easily">
        
      </a>
    </div>
    <p>Of course, every Zero Trust solutions provider promises protection for mobile. But especially in the case of BYOD, getting employees up and running can be tough. To get a device onboarded, there is a deep tour of the Settings app of your phone: accepting profiles, trusting certificates, and (in most cases) a requirement for a mature mobile device management (MDM) solution.</p><p>It’s a pain to install.</p><p>Now, we’re not advocating the elimination of the client software on the phone any more than we would be on the PC. More layers of defense is always better than fewer. And it remains necessary to secure Wi-Fi connections that are established on the phone. But a big advantage is that the Cloudflare SIM gets employees protected behind Cloudflare’s Zero Trust platform immediately for all mobile traffic.</p><p>It’s not just the on-device installation we wanted to simplify, however. It’s companies' IT supply chains, as well.</p><p>One of the traditional challenges with SIM cards is that they have been, until recently, a physical card. A card that you have to mail to employees (a supply chain risk in modern times), that can be lost, stolen, and that can still fail. With a distributed workforce, all of this is made even harder. We know that whilst security is critical, security that is hard to deploy tends to be deployed haphazardly, ad-hoc, and often, not at all.</p><p>But in recent years, nearly every modern phone shipped today has an eSIM — or more precisely, <a href="https://www.emnify.com/iot-glossary/esim">an eUICC (Embedded Universal Integrated Circuit Card)</a> — that can be re-programmed dynamically. This is a huge advancement, for two major reasons:</p><ol><li><p>You avoid all the logistical issues of a physical SIM (mailing them; supply chain risk; getting users to install them!)</p></li><li><p>You can deploy them automatically, either via QR codes, <a href="https://support.apple.com/guide/deployment/deploy-devices-with-cellular-connections-dep36c581d6x/web">Mobile Device Management</a> (MDM) features built into mobile devices today, or via an app (for example, <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/">our WARP mobile app</a>).</p></li></ol><p>We’re also exploring introducing physical SIMs (just like the ones above): although we believe eSIMs are the future, especially given their deployment &amp; security advantages, we understand that the future is not always evenly distributed. We’ll be working to make sure that the physical SIMs we ship are as secure as possible, and we’ll be sharing more of how this works in the coming months.</p>
    <div>
      <h3>Privacy and transparency for employees</h3>
      <a href="#privacy-and-transparency-for-employees">
        
      </a>
    </div>
    <p>Of course, more and more of the devices that employees use for work are their own. And while employers want to make sure their corporate resources are secure, employees also have privacy concerns when work and private life are blended on the same device. You don’t want your boss knowing that you’re swiping on Tinder.</p><p>We want to be thoughtful about how we approach this, from the perspective of both sides. We have sophisticated logging set up as part of Cloudflare One, and this will extend to Cloudflare SIM. Today, Cloudflare One can be explicitly configured to log only the resources it blocks — the threats it’s protecting employees from — without logging every domain visited beyond that. We’re working to make this as obvious and transparent as possible to both employers and employees so that, in true Cloudflare fashion, security does not have to compromise privacy.</p>
    <div>
      <h3>What’s next?</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Like any product at Cloudflare, we’re testing this on ourselves first (or “dogfooding”, to those in the know). Given the services we provide for over 30% of the Fortune 1000, we continue to observe, and be the target of, increasingly sophisticated cybersecurity attacks. We believe that running the service first is an important step in ensuring we make the Zero Trust SIM both secure and as easy to deploy and manage across thousands of employees as possible.</p><p>We’re also bringing the Zero Trust SIM to <a href="/rethinking-internet-of-things-security/">the Internet of Things</a>: almost every vehicle shipped today has an expectation of cellular connectivity; an increasing number of payment terminals have a SIM card; and a growing number of industrial devices across manufacturing and logistics. IoT device security is under <a href="https://www.nist.gov/itl/applied-cybersecurity/nist-cybersecurity-iot-program">increasing levels of scrutiny</a>, and ensuring that the only way a device can connect is a secure one — protected by Cloudflare’s Zero Trust capabilities — can directly prevent devices from becoming part of the next big DDoS botnet.</p><p>We'll be rolling the Zero Trust SIM out to customers on a regional basis as we build our regional connectivity across the globe (if you’re an operator: <a href="/zero-trust-for-mobile-operators/">reach out</a>). We’d especially love to talk to organizations who don’t have an existing mobile device solution in place at all, or who are struggling to make things work today. If you're interested, then <a href="https://www.cloudflare.com/announcing-the-zero-trust-sim-register-interest/">sign up here</a>.</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[SIM]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Connectivity]]></category>
            <category><![CDATA[Mobile]]></category>
            <guid isPermaLink="false">5pjvwtb0IhWZzXBphArYtT</guid>
            <dc:creator>Matt Silverlock</dc:creator>
            <dc:creator>James Allworth</dc:creator>
        </item>
        <item>
            <title><![CDATA[Bringing Zero Trust to mobile network operators]]></title>
            <link>https://blog.cloudflare.com/zero-trust-for-mobile-operators/</link>
            <pubDate>Mon, 26 Sep 2022 13:19:00 GMT</pubDate>
            <description><![CDATA[ Better together: 5G mobile networks and Cloudflare’s all-in-one SASE platform ]]></description>
            <content:encoded><![CDATA[ <p></p><p>At Cloudflare, we’re excited about the quickly-approaching 5G future. Increasingly, we’ll have access to high throughput and low-latency wireless networks wherever we are. It will make the Internet feel instantaneous, and we’ll find new uses for this connectivity such as sensors that will help us be more productive and energy-efficient. However, this type of connectivity doesn’t have to come at the expense of security, a concern raised in <a href="https://www.wired.com/story/5g-api-flaws/">this</a> recent Wired article. Today we’re announcing the creation of a new partnership program for mobile networks—Zero Trust for Mobile Operators—to jointly solve the biggest security and performance challenges.</p>
    <div>
      <h3>SASE for Mobile Networks</h3>
      <a href="#sase-for-mobile-networks">
        
      </a>
    </div>
    <p>Every network is different, and the key to managing the complicated security environment of an <a href="https://www.cloudflare.com/learning/network-layer/enterprise-networking/">enterprise network</a> is having lots of tools in the toolbox. Most of these functions fall under the industry buzzword <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/">SASE</a>, which stands for Secure Access Service Edge. Cloudflare’s SASE product is Cloudflare One, and it’s a comprehensive platform for network operators.  It includes:</p><ul><li><p>Magic WAN, which offers secure <a href="https://www.cloudflare.com/learning/network-layer/network-as-a-service-naas/">Network-as-a-Service (NaaS)</a> connectivity for your data centers, branch offices and cloud VPCs and integrates with your legacy <a href="https://www.cloudflare.com/learning/network-layer/what-is-mpls/">MPLS networks</a></p></li><li><p>Cloudflare Access, which is a <a href="https://www.cloudflare.com/learning/access-management/what-is-ztna/">Zero Trust Network Access</a> (ZTNA) service requiring strict verification for every user and every device before authorizing them to access internal resources.</p></li><li><p>Gateway, our <a href="https://www.cloudflare.com/learning/access-management/what-is-a-secure-web-gateway/">Secure Web Gateway</a>, which operates between a corporate network and the Internet to enforce security policies and protect company data.</p></li><li><p>A <a href="https://www.cloudflare.com/learning/access-management/what-is-a-casb/">Cloud Access Security Broker</a>, which monitors the network and external cloud services for security threats.</p></li><li><p>Cloudflare Area 1, an email threat detection tool to scan email for phishing, malware, and other threats.</p></li></ul>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7MCaC864eESptQqX4WWJio/7467a87fd86762e054c1645d1b404565/image1-44.png" />
            
            </figure><p>We’re excited to partner with mobile network operators for these services because our networks and services are tremendously complementary. Let’s first think about <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-sd-wan/">SD-WAN (Software-Defined Wide Area Network)</a> connectivity, which is the foundation on which much of the SASE framework rests. As an example, imagine a developer working from home developing a solution with a Mobile Network Operator’s (MNO) Internet of Things APIs. Maybe they’re developing tracking software for the number of drinks left in a soda machine, or want to track the routes for delivery trucks.</p><p>The developer at home and their fleet of devices should be on the same <a href="https://www.cloudflare.com/learning/network-layer/what-is-a-wan/">wide area network</a>, securely, and at reasonable cost. What Cloudflare provides is the programmable software layer that enables this secure connectivity. The developer and the developer’s employer still need to have connectivity to the Internet at home, and for the fleet of devices. The ability to make a secure connection to your fleet of devices doesn’t do any good without enterprise connectivity, and the enterprise connectivity is only more valuable with the secure connection running on top of it. They’re the perfect match.</p><p>Once the connectivity is established, we can layer on a Zero Trust platform to ensure every user can only access a resource to which they’ve been explicitly granted permission. Any time a user wants to access a protected resource – via ssh, to a cloud service, etc. – they’re challenged to authenticate with their single-sign-on credentials before being allowed access. The networks we use are growing and becoming more distributed. A <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/">Zero Trust architecture</a> enables that growth while protecting against known risks.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4SToDV0xk97pkwZJN8E1UM/84604973c0e206f240ea11db14711f7c/9FF77BA6-4FFD-4FE4-9510-BF0831F1EFFC.png" />
            
            </figure>
    <div>
      <h3>Edge Computing</h3>
      <a href="#edge-computing">
        
      </a>
    </div>
    <p>Given the potential of low-latency 5G networks, consumers and operators are both waiting for a “killer 5G app”. Maybe it will be autonomous vehicles and virtual reality, but our bet is on a quieter revolution: moving compute – the “work” that a server needs to do to respond to a request – from big regional data centers to small city-level data centers, embedding the compute capacity inside wireless networks, and eventually even to the base of cell towers.</p><p>Cloudflare’s edge compute platform is called Workers, and it does exactly this – execute code at the edge. It’s designed to be simple. When a developer is building an API to support their product or service, they don’t want to worry about regions and availability zones. With Workers, a developer writes code they want executed at the edge, deploys it, and within seconds it’s running at every Cloudflare data center globally.</p><p>Some workloads we already see, and expect to see more of, include:</p><ul><li><p>IoT (Internet of Things) companies implementing complex device logic and security features directly at the edge, letting them add cutting-edge capabilities without adding cost or latency to their devices.</p></li><li><p><a href="https://www.cloudflare.com/ecommerce/">eCommerce</a> platforms storing and caching customized assets close to their visitors for <a href="https://www.cloudflare.com/solutions/ecommerce/optimization/">improved customer experience and great conversion rates</a>.</p></li><li><p>Financial data platforms, including new Web3 players, providing near real-time information and transactions to their users.</p></li><li><p>A/B testing and experimentation run at the edge without adding latency or introducing dependencies on the client-side.</p></li><li><p>Fitness-type devices tracking a user’s movement and health statistics can offload compute-heavy workloads while maintaining great speed/latency.</p></li><li><p>Retail applications providing fast service and a customized experience for each customer without an expensive on-prem solution.</p></li></ul><p>The Cloudflare Case Studies <a href="https://www.cloudflare.com/case-studies?product=Workers">section</a> has additional examples from <a href="https://www.cloudflare.com/case-studies/ncr/">NCR</a>, <a href="https://www.cloudflare.com/case-studies/edgemesh/">Edgemesh</a>, <a href="https://www.cloudflare.com/case-studies/blockfi/">BlockFi</a>, and others on how they’re using the Workers platform. While these examples are exciting, we’re most excited about providing the platform for new innovation.</p><p>You may have seen last week we <a href="/workers-for-platforms-ga/">announced</a> <a href="https://developers.cloudflare.com/cloudflare-for-platforms/workers-for-platforms/">Workers for Platforms</a> is now in General Availability. Workers for Platforms is an umbrella-like structure that allows a parent organization to enable Workers for their own customers. As an MNO, your focus is on providing the means for devices to send communication to clients. For IoT use cases, sending data is the first step, but the exciting potential of this connectivity is the applications it enables. With Workers for Platforms, MNOs can expose an embedded product that allows customers to access compute power at the edge.</p>
    <div>
      <h3>Network Infrastructure</h3>
      <a href="#network-infrastructure">
        
      </a>
    </div>
    <p>The <a href="https://www.cloudflare.com/the-net/network-infrastructure/">complementary networks</a> between mobile networks and Cloudflare is another area of opportunity. When a user is interacting with the Internet, one of the most important factors for the speed of their connection is the physical distance from their handset to the content and services they’re trying to access. If the data request from a user in Denver needs to wind its way to one of the major Internet hubs in Dallas, San Jose, or Chicago (and then all the way back!), that is going to be slow. But if the MNO can link to the service locally in Denver, the connection will be much faster.</p><p>One of the exciting developments with new 5G networks is the ability of MNOs to do more “local breakout”. Many MNOs are moving towards cloud-native and distributed radio access networks (RANs) which provides more flexibility to move and multiply packet cores. These packet cores are the heart of a mobile network and all of a subscriber’s data flows through one.</p><p>For Cloudflare – with a data center presence in 275+ cities globally – a user never has to wait long for our services. We can also take it a step further. In some cases, our services are embedded within the MNO or ISP’s own network. The traffic which connects a user to a device, authorizes the connection, and securely transmits data is all within the network boundary of the MNO – it never needs to touch the public Internet, incur added latency, or otherwise compromise the performance for your subscribers.</p><p>We’re excited to partner with mobile networks because our security services work best when our customers have excellent enterprise connectivity underneath. Likewise, we think mobile networks can offer more value to their customers with our security software added on top. If you’d like to talk about how to integrate Cloudflare One into your offerings, please email us at <a>mobile-operator-program@cloudflare.com</a>, and we’ll be in touch!</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[Mobile]]></category>
            <category><![CDATA[Network]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Cloudflare Zero Trust]]></category>
            <category><![CDATA[Connectivity]]></category>
            <guid isPermaLink="false">2qVewZ6ySZGdeXugWvkJ0y</guid>
            <dc:creator>Mike Conlow</dc:creator>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
        <item>
            <title><![CDATA[Securing the Internet of Things]]></title>
            <link>https://blog.cloudflare.com/rethinking-internet-of-things-security/</link>
            <pubDate>Mon, 26 Sep 2022 13:15:00 GMT</pubDate>
            <description><![CDATA[ We’ve been defending customers from Internet of Things botnets for years now, and it’s time to turn the tides: we’re bringing the same security behind our Zero Trust platform to IoT ]]></description>
            <content:encoded><![CDATA[ <p></p><p>It’s hard to imagine life without our smartphones. Whereas computers were mostly fixed and often shared, smartphones meant that every individual on the planet became a permanent, mobile node on the Internet — with some 6.5B smartphones on the planet today.</p><p>While that represents an explosion of devices on the Internet, it will be dwarfed by the next stage of the Internet’s evolution: connecting devices to give them intelligence. Already, Internet of Things (IoT) devices represent somewhere in the order of double the number of smartphones connected to the Internet today — and unlike smartphones, this number is expected to continue to grow tremendously, since they aren’t bound to the number of humans that can carry them.</p><p>But the exponential growth in devices has brought with it an explosion in risk. We’ve been defending against DDoS attacks from Internet of Things (IoT) driven botnets like <a href="/tag/mirai/">Mirai</a> and <a href="/meris-botnet/">Meris</a> for years now. They keep <a href="https://www.securityweek.com/cloudflare-mitigates-2-tbps-ddos-attack-launched-mirai-botnet">growing</a>, because securing IoT devices still remains challenging, and manufacturers are often not incentivized to secure them. This has driven NIST (the U.S. National Institute of Standards and Technology) <a href="https://csrc.nist.gov/publications/detail/nistir/8349/draft">to actively define requirements</a> to address the (lack of) IoT device security, and the EU isn’t far behind.</p><p>It’s also the type of problem that Cloudflare solves best.</p><p>Today, we’re excited to announce our Internet of Things platform: with the goal to provide a single pane-of-glass view over your IoT devices, provision connectivity for new devices, and critically, secure every device from the moment it powers on.</p>
    <div>
      <h3>Not just lightbulbs</h3>
      <a href="#not-just-lightbulbs">
        
      </a>
    </div>
    <p>It’s common to immediately think of lightbulbs or simple motion sensors when you read “IoT”, but that’s because we often don’t consider many of the devices we interact with on a daily basis as an IoT device.</p><p>Think about:</p><ul><li><p>Almost every payment terminal</p></li><li><p>Any modern car with an infotainment or GPS system</p></li><li><p>Millions of industrial devices that power — and are critical to — logistics services, industrial processes, and manufacturing businesses</p></li></ul><p><i>You especially may not realize that nearly every one of these devices has a SIM card, and connects over a cellular network.</i></p><p>Cellular connectivity has become increasingly ubiquitous, and if the device can connect independently of Wi-Fi network configuration (and work out of the box), you’ve immediately avoided a whole class of operational support challenges. If you’ve just read our earlier announcement about <a href="/the-first-zero-trust-sim/">the Zero Trust SIM</a>, you’re probably already seeing where we’re headed.</p><p>Hundreds of thousands of IoT devices already securely connect to our network today using mutual TLS and our <a href="https://developers.cloudflare.com/api-shield/">API Shield</a> product. Major device manufacturers use <a href="https://developers.cloudflare.com/workers/">Workers</a> and our Developer Platform to offload authentication, compute and most importantly, reduce the compute needed on the device itself. <a href="https://developers.cloudflare.com/pub-sub/">Cloudflare Pub/Sub</a>, our programmable, MQTT-based messaging service, is yet another building block.</p><p>But we realized there were still a few missing pieces: device management, analytics and anomaly detection. There are a <i>lot</i> of “IoT SIM” providers out there, but the clear majority are focused on shipping SIM cards at scale (great!) and less so on the security side (not so great) or the developer side (also not great). Customers have been telling us that they wanted a way to easily secure their IoT devices, just as they secure their employees with our Zero Trust platform.</p><p><b>Cloudflare’s IoT Platform will build in support for provisioning cellular connectivity at scale</b>: we’ll support ordering, provisioning and managing cellular connectivity for your devices. Every packet that leaves each IoT device can be inspected, approved or rejected by policies you create <i>before</i> it reaches the Internet, your cloud infrastructure, or your other devices.</p><p>Emerging standards like <a href="https://www.gsma.com/iot/iot-safe/">IoT SAFE</a> will also allow us to use the SIM card as a root-of-trust, storing device secrets (and API keys) securely on the device, whilst raising the bar to compromise.</p><p>This also doesn’t mean we’re leaving the world of mutual TLS behind: we understand that not every device makes sense to connect over solely over a cellular network, be it due to per-device costs, lack of coverage, or the need to support an existing deployment that can’t just be re-deployed.</p>
    <div>
      <h3>Bringing Zero Trust security to IoT</h3>
      <a href="#bringing-zero-trust-security-to-iot">
        
      </a>
    </div>
    <p>Unlike humans, who need to be able to access a potentially unbounded number of destinations (websites), the endpoints that an IoT device needs to speak to are typically far more bounded. But in practice, there are often few controls in place (or available) to ensure that a device only speaks to your API backend, your storage bucket, and/or your telemetry endpoint.</p><p>Our Zero Trust platform, however, has a solution for this: <a href="https://www.cloudflare.com/products/zero-trust/gateway/">Cloudflare Gateway</a>. You can create DNS, network or HTTP policies, and allow or deny traffic based not only on the source or destination, but on richer identity- and location- based controls. It seemed obvious that we could bring these same capabilities to IoT devices, and allow developers to better <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/">restrict and control</a> what endpoints their devices talk to (so they don’t become part of a botnet).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Z8zzOG3weGSYnfNH966TQ/c28bf1d7b37a3a9d39b3d4d7cfe61d90/image2-38.png" />
            
            </figure><p>At the same time, we also identified ways to extend Gateway to be aware of IoT device specifics. For example, imagine you’ve provisioned 5,000 IoT devices, all connected over cellular directly into Cloudflare’s network. You can then choose to lock these devices to a specific geography if there’s no need for them to “travel”; ensure they can only speak to your API backend and/or metrics provider; and even ensure that if the SIM is lifted from the device it no longer functions by locking it to the IMEI (the serial of the modem).</p><p>Building these controls at the network layer raises the bar on IoT device security and reduces the risk that your fleet of devices becomes the tool of a bad actor.</p>
    <div>
      <h3>Get the compute off the device</h3>
      <a href="#get-the-compute-off-the-device">
        
      </a>
    </div>
    <p>We’ve talked a lot about security, but what about compute and storage? A device can be extremely secure if it doesn’t have to do anything or communicate anywhere, but clearly that’s not practical.</p><p>Simultaneously, doing non-trivial amounts of compute “on-device” has a number of major challenges:</p><ul><li><p>It requires a more powerful (and thus, more expensive) device. Moderately powerful (e.g. ARMv8-based) devices with a few gigabytes of RAM might be getting cheaper, but they’re always going to be more expensive than a lower-powered device, and that adds up quickly at IoT-scale.</p></li><li><p>You can’t guarantee (or expect) that your device fleet is homogenous: the devices you deployed three years ago can easily be several times slower than what you’re deploying today. Do you leave those devices behind?</p></li><li><p>The more business logic you have on the device, the greater the operational and deployment risk. Change management becomes critical, and the risk of “bricking” — rendering a device non-functional in a way that you can’t fix it remotely — is never zero. It becomes harder to iterate and add new features when you’re deploying to a device on the other side of the world.</p></li><li><p>Security continues to be a concern: if your device needs to talk to external APIs, you have to ensure you have explicitly scoped the credentials they use to avoid them being pulled from the device and used in a way you don’t expect.</p></li></ul><p>We’ve heard other platforms talk about “edge compute”, but in practice they either mean “run the compute on the device” or “in a small handful of cloud regions” (introducing latency) — neither of which fully addresses the problems highlighted above.</p><p>Instead, by enabling secure access to <a href="https://workers.cloudflare.com/">Cloudflare Workers</a> for compute, <a href="/workers-analytics-engine/">Analytics Engine</a> for device telemetry, <a href="/introducing-d1/">D1</a> as a SQL database, and <a href="https://developers.cloudflare.com/pub-sub/">Pub/Sub</a> for massively scalable messaging — IoT developers can both keep the compute off the device, but still keep it <i>close</i> to the device thanks to our <a href="https://www.cloudflare.com/network/">global network</a> (275+ cities and counting).</p><p>On top of that, developers can use modern tooling like <a href="https://developers.cloudflare.com/workers/wrangler/get-started/">Wrangler</a> to both iterate more rapidly <i>and</i> deploy software more safely, avoiding the risk of bricking or otherwise breaking part of your IoT fleet.</p>
    <div>
      <h3>Where do I sign up?</h3>
      <a href="#where-do-i-sign-up">
        
      </a>
    </div>
    <p>You can <a href="https://www.cloudflare.com/register-for-our-iot-platform/">register your interest in our IoT Platform today</a>: we’ll be reaching out over the coming weeks to better understand the problems teams are facing and working to get our closed beta into the hands of customers in the coming months. We’re especially interested in teams who are in the throes of figuring out how to deploy a new set of IoT devices and/or expand an existing fleet, no matter the use-case.</p><p>In the meantime, you can start building on <a href="https://developers.cloudflare.com/api-shield/security/mtls/">API Shield</a> and <a href="https://developers.cloudflare.com/pub-sub/">Pub/Sub</a> (MQTT) if you need to start securing IoT devices today.</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[IoT]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Connectivity]]></category>
            <category><![CDATA[SIM]]></category>
            <guid isPermaLink="false">4XIkTS7tUujtr46Q9l1lUN</guid>
            <dc:creator>Matt Silverlock</dc:creator>
        </item>
    </channel>
</rss>