
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sun, 05 Apr 2026 16:08:32 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Active defense: introducing a stateful vulnerability scanner for APIs]]></title>
            <link>https://blog.cloudflare.com/vulnerability-scanner/</link>
            <pubDate>Mon, 09 Mar 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare’s new Web and API Vulnerability Scanner helps teams proactively find logic flaws. By using AI to build API call graphs, we identify vulnerabilities that standard defensive tools miss. ]]></description>
            <content:encoded><![CDATA[ <p>Security is traditionally a game of defense. You build walls, set up gates, and write rules to block traffic that looks suspicious. For years, Cloudflare has been a leader in this space: our <a href="https://www.cloudflare.com/application-services/products/"><u>Application Security platform</u></a> is designed to catch attacks in flight, dropping malicious requests at the edge before they ever reach your origin. But for <a href="https://www.cloudflare.com/learning/security/api/what-is-api-security/"><u>API security</u></a>, defensive posturing isn’t enough. </p><p>That’s why today, we are launching the beta of Cloudflare’s Web and API Vulnerability Scanner. </p><p>We are starting with the most pervasive and difficult-to-catch threat on the OWASP API Top 10: <a href="https://owasp.org/API-Security/editions/2023/en/0xa1-broken-object-level-authorization/"><u>Broken Object Level Authorization</u>, or BOLA.</a> We will add more vulnerability scan types over time, including both API and web application threats.</p><p>The most dangerous API vulnerabilities today aren’t generic injection attacks or malformed requests that a WAF can easily spot. They are logic flaws—perfectly valid HTTP requests that meet the protocol and application spec but defy the business logic.</p><p>To find these, you can’t just wait for an attack. You have to actively hunt for them.</p><p>The Web and API Vulnerability Scanner will be available first for <a href="https://developers.cloudflare.com/api-shield/"><u>API Shield</u></a> customers. Read on to learn why we are focused on API security scans for this first release.</p>
    <div>
      <h2>Why purely defensive security misses the mark</h2>
      <a href="#why-purely-defensive-security-misses-the-mark">
        
      </a>
    </div>
    <p>In the web application world, vulnerabilities often look like syntax errors. A <a href="https://www.cloudflare.com/learning/security/threats/sql-injection/"><u>SQL injection</u></a> attempt looks like code where data should be. A <a href="https://www.cloudflare.com/learning/security/threats/cross-site-scripting/"><u>cross-site scripting (XSS)</u></a> attack looks like a script tag in a form field. These have signatures.</p><p>API vulnerabilities are different. To illustrate, let’s imagine a food delivery mobile app that communicates solely with an API on the backend. Let’s take the orders endpoint:</p><p><b>Endpoint Definition: </b><code><b>/api/v1/orders</b></code></p><table><tr><td><p><b>Method</b></p></td><td><p><b>Resource Path</b></p></td><td><p><b>Description</b></p></td></tr><tr><td><p><b>GET</b></p></td><td><p>/api/v1/orders/{order_id}</p></td><td><p><b>Check Status.</b> Returns the tracking status of a specific order (e.g., "Kitchen is preparing").</p></td></tr><tr><td><p><b>PATCH</b></p></td><td><p>/api/v1/orders/{order_id}</p></td><td><p><b>Update Order.</b> Allows the user to modify the drop-off location or add delivery instructions.</p></td></tr></table><p>In a broken authorization attack like BOLA, User A (the attacker) requests to update the delivery address of a paid-for order belonging to User B (the victim). The attacker simply inserts User B’s <code>{order_id}</code> in the <code>PATCH</code> request.</p><p>Here is what that request looks like, with ‘8821’ as User B’s order ID. Notice that User A is fully authenticated with their own valid token:</p>
            <pre><code>PATCH /api/v1/orders/8821 HTTP/1.1
Host: api.example.com
Authorization: Bearer &lt;User_A_Valid_Token&gt;
Content-Type: application/json

{
  "delivery_address": "123 Attacker Way, Apt 4",
  "instructions": "Leave at front door, ring bell"
}
</code></pre>
            <p>The request headers are valid. The authentication token is valid. The schema is correct. To a standard WAF, this request looks perfect. A bot management offering may even be fooled if a human is manually sending the attack requests.</p><p>User A will now get B’s food delivered to them! The vulnerability exists because the API endpoint fails to validate if User A actually has permission to view or update user B’s data. This is a failure of logic, not syntax. To fix this, the API developer could implement a simple check: <code>if (order.userID != user.ID) throw Unauthorized;</code></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ZOOAjAYcZqzDYg9snYASL/65940a740ba7ef294b719e76d37f3cdd/BLOG-3161_2.png" />
          </figure><p>You can detect these types of vulnerabilities by actively sending API test traffic or passively listening to existing API traffic. Finding these vulnerabilities through passive scanning requires context. Last year <a href="https://developers.cloudflare.com/changelog/2025-11-12-bola-attack-detection/"><u>we launched BOLA vulnerability detection</u></a> for API Shield. This detection automatically finds these vulnerabilities by passively scanning customer traffic for usage anomalies. To be successful with this type of scanning, you need to know what a "valid" API call looks like, what the variable parameters are, how a typical user behaves, and how the API behaves when those parameters are manipulated.</p><p>Yet there are reasons security teams may not have any of that context, even with access to API Shield’s BOLA vulnerability detection. Development environments may need to be tested but lack user traffic. Production environments may (thankfully) have a lack of attack traffic yet still need analysis, and so on. In these circumstances, and to be proactive in general, teams can turn to Dynamic Application Security Testing (DAST). By creating net-new traffic profiles intended specifically for security testing, DAST tools can look for vulnerabilities in any environment at any time.</p><p>Unfortunately, traditional DAST tools have a high barrier to entry. They are often difficult to configure, require you to manually upload and maintain Swagger/OpenAPI files, struggle to authenticate correctly against modern complex login flows, and can simply lack any API-specific security tests (e.g. BOLA).</p>
    <div>
      <h2>Cloudflare’s API scanning advantage</h2>
      <a href="#cloudflares-api-scanning-advantage">
        
      </a>
    </div>
    <p>In the food delivery order example above, we assumed the attacker could find a valid order to modify. While there are often avenues for attackers to gather this type of intelligence in a live production environment, in a security testing exercise you must create your own objects before testing the API’s authorization controls. For typical DAST scans, this can be a problem, because many scanners treat each individual request on its own. This method fails to chain requests together in the logical pattern necessary to find broken authorization vulnerabilities. Legacy DAST scanners can also exist as an island within your security tooling and orchestration environment, preventing their findings from being shared or viewed in context.</p><p>Vulnerability scanning from Cloudflare is different for a few key reasons. </p><p>First, <a href="https://developers.cloudflare.com/security-center/security-insights/"><u>Security Insights</u></a> will list results from our new scans alongside any existing Cloudflare security findings for added context. You’ll see all your posture management information in one place. </p><p>Second,<b> </b>we already know your API’s inputs and outputs. If you are an API Shield customer, Cloudflare already understands your API. Our <a href="https://developers.cloudflare.com/api-shield/security/api-discovery/"><u>API Discovery</u></a> and <a href="https://developers.cloudflare.com/api-shield/management-and-monitoring/endpoint-management/schema-learning/"><u>Schema Learning</u></a> features passively catalog your endpoints and learn your traffic patterns. While you’ll need to manually upload an OpenAPI spec to get started for our initial release, you will be able to get started quickly without one in a future release.</p><p>Third, because we sit at the edge, we can turn passive traffic inspection knowledge into active intelligence. It will be easy to verify BOLA vulnerability detection risks (found via traffic inspection) by sending net-new HTTP requests with the vulnerability scanner.</p><p>And finally, we have built a new, stateful DAST platform, as we detail below. Most scanners require hours of setup to "teach" the tool how to talk to your API. With Cloudflare, you can effectively skip that step and get started quickly. You provide the API credentials, and we’ll use your API schemas to automatically construct a scan plan.</p>
    <div>
      <h2>Building automatic scan plans</h2>
      <a href="#building-automatic-scan-plans">
        
      </a>
    </div>
    <p>APIs are commonly documented using <a href="https://www.openapis.org/what-is-openapi"><u>OpenAPI schemas</u></a>. These schemas denote the host, method, and path (commonly, an “endpoint”) along with the expected parameters of incoming requests and resulting responses. In order to automatically build a scan plan, we must first make sense of these API specifications for any given API to be scanned.</p><p>Our scanner works by building up an API call graph from an OpenAPI document and subsequently walking it, using attacker and owner contexts. Owners create resources, attackers subsequently try to access them. Attackers are fully authenticated with their own set of valid credentials. If an attacker successfully reads, modifies or deletes an unowned resource, an authorization vulnerability is found.</p><p>Consider for example the above delivery order with ID 8821. For the server-side resource to exist, it needed to be originally created by an owner, most likely in a “genesis” <code>POST</code> request with no or minimal dependencies (previous necessary calls and resulting data). Modelling the API as a call graph, such an endpoint constitutes a node with no or few incoming edges (dependencies). Any subsequent request, such as the attacker’s <code>PATCH</code> above, then has a <i>data dependency</i> (the data is <code>order_id</code>) on the genesis request (the <code>POST</code>). Without all data provided, the <code>PATCH</code> cannot proceed.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7q0y7XZJE411gzhRuo9UjG/f7722c58e6cac751a1db44b612098a7b/BLOG-3161_3.png" />
          </figure><p>Here we see in purple arrows the nodes in this API graph that are necessary to visit to add a note to an order via the POST <code>/api/v1/orders/{order_id}/note/{note_id}</code> endpoint. <b>Importantly, none of the steps or logic shown in the diagram is available in the OpenAPI specification!</b> It must be inferred logically through some other means, and that is exactly what our vulnerability scanner will do automatically.</p><p>In order to reliably and automatically plan scans across a variety of APIs, we must accurately model these endpoint relationships from scratch. However, two problems arise: data quality of API specifications is not guaranteed, and even functionally complete schemas can have ambiguous naming schemes. Consider a simplified OpenAPI specification for the above API, which might look like</p>
            <pre><code>openapi: 3.0.0
info:
  title: Order API
  version: 1.0.0
paths:
  /api/v1/orders:
    post:
      summary: Create an order
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                product:
                  type: string
                count:
                  type: integer
              required:
                - product
                - count
      responses:
        '201':
          description: Item created successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  result:
                    type: object
                    properties:
                      id:
                        type: integer
                      created_at:
                        type: integer
                  errors:
                    type: array
                    items:
                      type: string
  /api/v1/orders/{order_id}:
    patch:
      summary: Modify an order by ID
      parameters:
        - name: order_id
          in: path
</code></pre>
            <p>We can see that the <code>POST</code> endpoint returns responses such as</p>
            <pre><code>{
    "result": {
        "id": 8821,
        "created_at": 1741476777
    },
   "errors": []
}
</code></pre>
            <p>To a human observer, it is quickly evident that <code>$.result.id</code> is the value to be injected in <code>order_id</code> for the <code>PATCH</code> endpoint. The <code>id</code> property might also be called <code>orderId, value</code> or something else, and be nested arbitrarily. These subtle inconsistencies in OpenAPI documents of arbitrary shape are intractable for heuristics-based approaches.</p><p>Our scanner uses Cloudflare’s own <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a> platform to tackle this fuzzy problem space. Models such as <a href="https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b/"><u>OpenAI’s open-weight gpt-oss-120b</u></a> are powerful enough to match data dependencies reliably, and to generate realistic fake<i> </i>data where necessary, essentially filling in the blanks of OpenAPI specifications. Leveraging <a href="https://platform.openai.com/docs/guides/structured-outputs"><u>structured outputs</u></a>, the model produces a representation of the API call graph for our scanner to walk, injecting attacker and owner credentials appropriately.</p><p>This approach tackles the problem of needing human intelligence to infer authorization and data relationships in OpenAPI schemas with artificial intelligence to do the same. Structured outputs bridge the gap from the natural language world of gpt-oss back to machine-executable instructions. In addition to Workers AI solving the planning problem, self-hosting on Workers AI means our system automatically benefits from Cloudflare’s highly available, globally distributed architecture.</p>
    <div>
      <h3>Built on proven foundations</h3>
      <a href="#built-on-proven-foundations">
        
      </a>
    </div>
    <p>Building a vulnerability scanner that customers will trust with their API credentials demands proven infrastructure. We did not reinvent the wheel here. Instead, we integrated services that have been validated and deployed across Cloudflare for two crucial components of our scanner platform: the scanner’s control plane and the scanner’s secrets store.</p><p>The scanner's control plane integrates with <a href="https://github.com/temporalio/temporal"><u>Temporal</u></a> for Scan Orchestration, on which other internal services at Cloudflare already rely. The complexity of the numerous test plans executed in each Scan is effectively managed by Temporal's durable execution framework. </p><p>The entire backend is written in Rust, which is widely adopted at Cloudflare for infrastructure services. This lets us reuse internal libraries and share architectural patterns across teams. It also positions our scanner for potential future integration with other Cloudflare systems like FL2 or our test framework <a href="https://blog.cloudflare.com/20-percent-internet-upgrade/#step-2-testing-and-automated-rollouts"><u>Flamingo</u></a> – enabling scenarios where scanning could coordinate more tightly with edge request handling or testing infrastructure.</p>
    <div>
      <h4>Credential security through HashiCorp’s Vault Transit Secret Engine</h4>
      <a href="#credential-security-through-hashicorps-vault-transit-secret-engine">
        
      </a>
    </div>
    <p>Scanning for broken authentication and broken authorization vulnerabilities requires handling API user credentials. Cloudflare takes this responsibility very seriously.</p><p>We ensure that our public API layer has minimal access to unencrypted customer credentials by using HashiCorp's <a href="https://developer.hashicorp.com/vault/docs/secrets/transit"><u>Vault Transit Secret Engine</u></a> (TSE) for encryption-as-a-service. Immediately upon submission, credentials are encrypted by TSE—which handles the encryption but does not store the ciphertext—and are subsequently stored on Cloudflare infrastructure. </p><p>Our API is not authorized to decrypt this data. Instead, decryption occurs only at the last stage when a TestPlan makes a request to the customer's infrastructure. Only the Worker executing the test is authorized to request decryption, a restriction we strengthen using strict typing with additional safety rails inside Rust to enforce minimal access to decryption methods.</p><p>We further secure our customers’ credentials through regular rotation and periodic rewraps using TSE to mitigate risk. This process means we only interact with the new ciphertext, and the original secret is kept unviewable.</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>We are releasing BOLA vulnerability scanning starting today as an Open Beta for all API Shield customers, and are working on future API threat scans for future release. Via the Cloudflare API, you can trigger scans, manage configuration, and retrieve results programmatically to integrate directly into your <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/"><u>CI/CD pipelines</u></a> or security dashboards. For API Shield Customers: check the <a href="https://developers.cloudflare.com/api-shield/security/vulnerability-scanner/"><u>developer docs</u></a> to start scanning your endpoints for BOLA vulnerabilities today.</p><p>We are starting with BOLA vulnerabilities because they are the hardest API vulnerability to solve and the highest risk for our customers. However, this scanning engine is built to be extensible.</p><p>In the near future, we plan to expand the scanner’s capabilities to cover the most popular of the <a href="https://owasp.org/www-project-top-ten/"><u>OWASP </u><i><u>Web</u></i><u> Top 10</u></a> as well: classic web vulnerabilities like SQL injection (SQLi) and cross-site scripting (XSS). To be notified upon release, <a href="https://www.cloudflare.com/lp/security-week/vulnerability-scanner/"><u>sign up for the waitlist here</u></a>, and you’ll be first to learn when we expand the engine to general web application vulnerabilities.</p> ]]></content:encoded>
            <category><![CDATA[Application Services]]></category>
            <category><![CDATA[Application Security]]></category>
            <category><![CDATA[Vulnerabilities]]></category>
            <category><![CDATA[API Security]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">7yIVIjWT0unNpdtbhOCVnh</guid>
            <dc:creator>John Cosgrove</dc:creator>
            <dc:creator>Alex Povel</dc:creator>
            <dc:creator>Malte Reddig</dc:creator>
        </item>
        <item>
            <title><![CDATA[We deserve a better streams API for JavaScript]]></title>
            <link>https://blog.cloudflare.com/a-better-web-streams-api/</link>
            <pubDate>Fri, 27 Feb 2026 06:00:00 GMT</pubDate>
            <description><![CDATA[ The Web streams API has become ubiquitous in JavaScript runtimes but was designed for a different era. Here's what a modern streaming API could (should?) look like. ]]></description>
            <content:encoded><![CDATA[ <p>Handling data in streams is fundamental to how we build applications. To make streaming work everywhere, the <a href="https://streams.spec.whatwg.org/"><u>WHATWG Streams Standard</u></a> (informally known as "Web streams") was designed to establish a common API to work across browsers and servers. It shipped in browsers, was adopted by Cloudflare Workers, Node.js, Deno, and Bun, and became the foundation for APIs like <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API"><u>fetch()</u></a>. It's a significant undertaking, and the people who designed it were solving hard problems with the constraints and tools they had at the time.</p><p>But after years of building on Web streams – implementing them in both Node.js and Cloudflare Workers, debugging production issues for customers and runtimes, and helping developers work through far too many common pitfalls – I've come to believe that the standard API has fundamental usability and performance issues that cannot be fixed easily with incremental improvements alone. The problems aren't bugs; they're consequences of design decisions that may have made sense a decade ago, but don't align with how JavaScript developers write code today.</p><p>This post explores some of the fundamental issues I see with Web streams and presents an alternative approach built around JavaScript language primitives that demonstrate something better is possible. </p><p>In benchmarks, this alternative can run anywhere between 2x to <i>120x</i> faster than Web streams in every runtime I've tested it on (including Cloudflare Workers, Node.js, Deno, Bun, and every major browser). The improvements are not due to clever optimizations, but fundamentally different design choices that more effectively leverage modern JavaScript language features. I'm not here to disparage the work that came before; I'm here to start a conversation about what can potentially come next.</p>
    <div>
      <h2>Where we're coming from</h2>
      <a href="#where-were-coming-from">
        
      </a>
    </div>
    <p>The Streams Standard was developed between 2014 and 2016 with an ambitious goal to provide "APIs for creating, composing, and consuming streams of data that map efficiently to low-level I/O primitives." Before Web streams, the web platform had no standard way to work with streaming data.</p><p>Node.js already had its own <a href="https://nodejs.org/api/stream.html"><u>streaming API</u></a> at the time that was ported to also work in browsers, but WHATWG chose not to use it as a starting point given that it is chartered to only consider the needs of Web browsers. Server-side runtimes only adopted Web streams later, after Cloudflare Workers and Deno each emerged with first-class Web streams support and cross-runtime compatibility became a priority.</p><p>The design of Web streams predates async iteration in JavaScript. The <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for-await...of"><code><u>for await...of</u></code></a> syntax didn't land until <a href="https://262.ecma-international.org/9.0/"><u>ES2018</u></a>, two years after the Streams Standard was initially finalized. This timing meant the API couldn't initially leverage what would eventually become the idiomatic way to consume asynchronous sequences in JavaScript. Instead, the spec introduced its own reader/writer acquisition model, and that decision rippled through every aspect of the API.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3X0niHShBlgF4LlpWYB7eC/f0bbf35f12ecc98a3888e6e3835acf3a/1.png" />
          </figure>
    <div>
      <h4>Excessive ceremony for common operations</h4>
      <a href="#excessive-ceremony-for-common-operations">
        
      </a>
    </div>
    <p>The most common task with streams is reading them to completion. Here's what that looks like with Web streams:</p>
            <pre><code>// First, we acquire a reader that gives an exclusive lock
// on the stream...
const reader = stream.getReader();
const chunks = [];
try {
  // Second, we repeatedly call read and await on the returned
  // promise to either yield a chunk of data or indicate we're
  // done.
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    chunks.push(value);
  }
} finally {
  // Finally, we release the lock on the stream
  reader.releaseLock();
}</code></pre>
            <p>You might assume this pattern is inherent to streaming. It isn't. The reader acquisition, the lock management, and the <code>{ value, done }</code> protocol are all just design choices, not requirements. They are artifacts of how and when the Web streams spec was written. Async iteration exists precisely to handle sequences that arrive over time, but async iteration did not yet exist when the streams specification was written. The complexity here is pure API overhead, not fundamental necessity.</p><p>Consider the alternative approach now that Web streams do support <code>for await...of</code>:</p>
            <pre><code>const chunks = [];
for await (const chunk of stream) {
  chunks.push(chunk);
}</code></pre>
            <p>This is better in that there is far less boilerplate, but it doesn't solve everything. Async iteration was retrofitted onto an API that wasn't designed for it, and it shows. Features like <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader"><u>BYOB (bring your own buffer)</u></a> reads aren't accessible through iteration. The underlying complexity of readers, locks, and controllers are still there, just hidden. When something does go wrong, or when additional features of the API are needed, developers find themselves back in the weeds of the original API, trying to understand why their stream is "locked" or why <code>releaseLock()</code> didn't do what they expected or hunting down bottlenecks in code they don't control.</p>
    <div>
      <h4>The locking problem</h4>
      <a href="#the-locking-problem">
        
      </a>
    </div>
    <p>Web streams use a locking model to prevent multiple consumers from interleaving reads. When you call <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/getReader"><code><u>getReader()</u></code></a>, the stream becomes locked. While locked, nothing else can read from the stream directly, pipe it, or even cancel it – only the code that is actually holding the reader can.</p><p>This sounds reasonable until you see how easily it goes wrong:</p>
            <pre><code>async function peekFirstChunk(stream) {
  const reader = stream.getReader();
  const { value } = await reader.read();
  // Oops — forgot to call reader.releaseLock()
  // And the reader is no longer available when we return
  return value;
}

const first = await peekFirstChunk(stream);
// TypeError: Cannot obtain lock — stream is permanently locked
for await (const chunk of stream) { /* never runs */ }</code></pre>
            <p>Forgetting <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/releaseLock"><code><u>releaseLock()</u></code></a> permanently breaks the stream. The <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/locked"><code><u>locked</u></code></a><code> </code>property tells you that a stream is locked, but not why, by whom, or whether the lock is even still usable. <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/pipeTo"><u>Piping</u></a> internally acquires locks, making streams unusable during pipe operations in ways that aren't obvious.</p><p>The semantics around releasing locks with pending reads were also unclear for years. If you called read() but didn't await it, then called releaseLock(), what happened? The spec was recently clarified to cancel pending reads on lock release – but implementations varied, and code that relied on the previous unspecified behavior can break.</p><p>That said, it's important to recognize that locking in itself is not bad. It does, in fact, serve an important purpose to ensure that applications properly and orderly consume or produce data. The key challenge is with the original manual implementation of it using APIs like <code>getReader() </code>and <code>releaseLock()</code>. With the arrival of automatic lock and reader management with async iterables, dealing with locks from the users point of view became a lot easier.</p><p>For implementers, the locking model adds a fair amount of non-trivial internal bookkeeping. Every operation must check lock state, readers must be tracked, and the interplay between locks, cancellation, and error states creates a matrix of edge cases that must all be handled correctly.</p>
    <div>
      <h4>BYOB: complexity without payoff</h4>
      <a href="#byob-complexity-without-payoff">
        
      </a>
    </div>
    <p><a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader"><u>BYOB (bring your own buffer)</u></a> reads were designed to let developers reuse memory buffers when reading from streams, an important optimization intended for high-throughput scenarios. The idea is sound: instead of allocating new buffers for each chunk, you provide your own buffer and the stream fills it.</p><p>In practice, (and yes, there are always exceptions to be found) BYOB is rarely used to any measurable benefit. The API is substantially more complex than default reads, requiring a separate reader type (<code>ReadableStreamBYOBReader</code>) and other specialized classes (e.g. <code>ReadableStreamBYOBRequest</code>), careful buffer lifecycle management, and understanding of <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer#transferring_arraybuffers"><code><u>ArrayBuffer</u></code><u> detachment</u></a> semantics. When you pass a buffer to a BYOB read, the buffer becomes detached – transferred to the stream – and you get back a different view over potentially different memory. This transfer-based model is error-prone and confusing:</p>
            <pre><code>const reader = stream.getReader({ mode: 'byob' });
const buffer = new ArrayBuffer(1024);
let view = new Uint8Array(buffer);

const result = await reader.read(view);
// 'view' should now be detached and unusable
// (it isn't always in every impl)
// result.value is a NEW view, possibly over different memory
view = result.value; // Must reassign</code></pre>
            <p>BYOB also can't be used with async iteration or TransformStreams, so developers who want zero-copy reads are forced back into the manual reader loop.</p><p>For implementers, BYOB adds significant complexity. The stream must track pending BYOB requests, handle partial fills, manage buffer detachment correctly, and coordinate between the BYOB reader and the underlying source. The <a href="https://github.com/web-platform-tests/wpt/tree/master/streams/readable-byte-streams"><u>Web Platform Tests for readable byte streams</u></a> include dedicated test files just for BYOB edge cases: detached buffers, bad views, response-after-enqueue ordering, and more.</p><p>BYOB ends up being complex for both users and implementers, yet sees little adoption in practice. Most developers stick with default reads and accept the allocation overhead.</p><p>Most userland implementations of custom ReadableStream instances do not typically bother with all the ceremony required to correctly implement both default and BYOB read support in a single stream – and for good reason. It's difficult to get right and most of the time consuming code is typically going to fallback on the default read path. The example below shows what a "correct" implementation would need to do. It's big, complex, and error prone, and not a level of complexity that the typical developer really wants to have to deal with:</p>
            <pre><code>new ReadableStream({
    type: 'bytes',
    
    async pull(controller: ReadableByteStreamController) {      
      if (offset &gt;= totalBytes) {
        controller.close();
        return;
      }
      
      // Check for BYOB request FIRST
      const byobRequest = controller.byobRequest;
      
      if (byobRequest) {
        // === BYOB PATH ===
        // Consumer provided a buffer - we MUST fill it (or part of it)
        const view = byobRequest.view!;
        const bytesAvailable = totalBytes - offset;
        const bytesToWrite = Math.min(view.byteLength, bytesAvailable);
        
        // Create a view into the consumer's buffer and fill it
        // not critical but safer when bytesToWrite != view.byteLength
        const dest = new Uint8Array(
          view.buffer,
          view.byteOffset,
          bytesToWrite
        );
        
        // Fill with sequential bytes (our "data source")
        // Can be any thing here that writes into the view
        for (let i = 0; i &lt; bytesToWrite; i++) {
          dest[i] = (offset + i) &amp; 0xFF;
        }
        
        offset += bytesToWrite;
        
        // Signal how many bytes we wrote
        byobRequest.respond(bytesToWrite);
        
      } else {
        // === DEFAULT READER PATH ===
        // No BYOB request - allocate and enqueue a chunk
        const bytesAvailable = totalBytes - offset;
        const chunkSize = Math.min(1024, bytesAvailable);
        
        const chunk = new Uint8Array(chunkSize);
        for (let i = 0; i &lt; chunkSize; i++) {
          chunk[i] = (offset + i) &amp; 0xFF;
        }
        
        offset += chunkSize;
        controller.enqueue(chunk);
      }
    },
    
    cancel(reason) {
      console.log('Stream canceled:', reason);
    }
  });</code></pre>
            <p>When a host runtime provides a byte-oriented ReadableStream from the runtime itself, for instance, as the <code>body </code>of a fetch <code>Response</code>, it is often far easier for the runtime itself to provide an optimized implementation of BYOB reads, but those still need to be capable of handling both default and BYOB reading patterns and that requirement brings with it a fair amount of complexity.</p>
    <div>
      <h4>Backpressure: good in theory, broken in practice</h4>
      <a href="#backpressure-good-in-theory-broken-in-practice">
        
      </a>
    </div>
    <p>Backpressure – the ability for a slow consumer to signal a fast producer to slow down – is a first-class concept in Web streams. In theory. In practice, the model has some serious flaws.</p><p>The primary signal is <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultController/desiredSize"><code><u>desiredSize</u></code></a> on the controller. It can be positive (wants data), zero (at capacity), negative (over capacity), or null (closed). Producers are supposed to check this value and stop enqueueing when it's not positive. But there's nothing enforcing this: <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultController/enqueue"><code><u>controller.enqueue()</u></code></a> always succeeds, even when desiredSize is deeply negative.</p>
            <pre><code>new ReadableStream({
  start(controller) {
    // Nothing stops you from doing this
    while (true) {
      controller.enqueue(generateData()); // desiredSize: -999999
    }
  }
});</code></pre>
            <p>Stream implementations can and do ignore backpressure; and some spec-defined features explicitly break backpressure. <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/tee"><code><u>tee()</u></code></a>, for instance, creates two branches from a single stream. If one branch reads faster than the other, data accumulates in an internal buffer with no limit. A fast consumer can cause unbounded memory growth while the slow consumer catches up, and there's no way to configure this or opt out beyond canceling the slower branch.</p><p>Web streams do provide clear mechanisms for tuning backpressure behavior in the form of the <code>highWaterMark</code> option and customizable size calculations, but these are just as easy to ignore as <code>desiredSize</code>, and many applications simply fail to pay attention to them.</p><p>The same issues exist on the <code>WritableStream</code> side. A <code>WritableStream</code> has a <code>highWaterMark</code> and <code>desiredSize</code>. There is a <code>writer.ready</code> promise that producers of data are supposed to pay attention but often don't.</p>
            <pre><code>const writable = getWritableStreamSomehow();
const writer = writable.getWriter();

// Producers are supposed to wait for the writer.ready
// It is a promise that, when resolves, indicates that
// the writables internal backpressure is cleared and
// it is ok to write more data
await writer.ready;
await writer.write(...);</code></pre>
            <p>For implementers, backpressure adds complexity without providing guarantees. The machinery to track queue sizes, compute <code>desiredSize</code>, and invoke <code>pull()</code> at the right times must all be implemented correctly. However, since these signals are advisory, all that work doesn't actually prevent the problems backpressure is supposed to solve.</p>
    <div>
      <h4>The hidden cost of promises</h4>
      <a href="#the-hidden-cost-of-promises">
        
      </a>
    </div>
    <p>The Web streams spec requires promise creation at numerous points, often in hot paths and often invisible to users. Each <code>read()</code> call doesn't just return a promise; internally, the implementation creates additional promises for queue management, <code>pull()</code> coordination, and backpressure signaling.</p><p>This overhead is mandated by the spec's reliance on promises for buffer management, completion, and backpressure signals. While some of it is implementation-specific, much of it is unavoidable if you're following the spec as written. For high-frequency streaming – video frames, network packets, real-time data – this overhead is significant.</p><p>The problem compounds in pipelines. Each <code>TransformStream</code> adds another layer of promise machinery between source and sink. The spec doesn't define synchronous fast paths, so even when data is available immediately, the promise machinery still runs.</p><p>For implementers, this promise-heavy design constrains optimization opportunities. The spec mandates specific promise resolution ordering, making it difficult to batch operations or skip unnecessary async boundaries without risking subtle compliance failures. There are many hidden internal optimizations that implementers do make but these can be complicated and difficult to get right.</p><p>While I was writing this blog post, Vercel's Malte Ubl published their own <a href="https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster"><u>blog post</u></a> describing some research work Vercel has been doing around improving the performance of Node.js' Web streams implementation. In that post they discuss the same fundamental performance optimization problem that every implementation of Web streams face:</p><blockquote><p>"Or consider pipeTo(). Each chunk passes through a full Promise chain: read, write, check backpressure, repeat. An {value, done} result object is allocated per read. Error propagation creates additional Promise branches.</p><p>None of this is wrong. These guarantees matter in the browser where streams cross security boundaries, where cancellation semantics need to be airtight, where you do not control both ends of a pipe. But on the server, when you are piping React Server Components through three transforms at 1KB chunks, the cost adds up.</p><p>We benchmarked native WebStream pipeThrough at 630 MB/s for 1KB chunks. Node.js pipeline() with the same passthrough transform: ~7,900 MB/s. That is a 12x gap, and the difference is almost entirely Promise and object allocation overhead." 
- Malte Ubl, <a href="https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster"><u>https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster</u></a></p></blockquote><p>As part of their research, they have put together a set of proposed improvements for Node.js' Web streams implementation that will eliminate promises in certain code paths which can yield a significant performance boost up to 10x faster, which only goes to prove the point: promises, while useful, add significant overhead. As one of the core maintainers of Node.js, I am looking forward to helping Malte and the folks at Vercel get their proposed improvements landed!</p><p>In a recent update made to Cloudflare Workers, I made similar kinds of modifications to an internal data pipeline that reduced the number of JavaScript promises created in certain application scenarios by up to 200x. The result is several orders of magnitude improvement in performance in those applications.</p>
    <div>
      <h3>Real-world failures</h3>
      <a href="#real-world-failures">
        
      </a>
    </div>
    
    <div>
      <h4>Exhausting resources with unconsumed bodies</h4>
      <a href="#exhausting-resources-with-unconsumed-bodies">
        
      </a>
    </div>
    <p>When <code>fetch()</code> returns a response, the body is a <a href="https://developer.mozilla.org/en-US/docs/Web/API/Response/body"><code><u>ReadableStream</u></code></a>. If you only check the status and don't consume or cancel the body, what happens? The answer varies by implementation, but a common outcome is resource leakage.</p>
            <pre><code>async function checkEndpoint(url) {
  const response = await fetch(url);
  return response.ok; // Body is never consumed or cancelled
}

// In a loop, this can exhaust connection pools
for (const url of urls) {
  await checkEndpoint(url);
}</code></pre>
            <p>This pattern has caused connection pool exhaustion in Node.js applications using <a href="https://nodejs.org/api/globals.html#fetch"><u>undici</u></a> (the <code>fetch() </code>implementation built into Node.js), and similar issues have appeared in other runtimes. The stream holds a reference to the underlying connection, and without explicit consumption or cancellation, the connection may linger until garbage collection – which may not happen soon enough under load.</p><p>The problem is compounded by APIs that implicitly create stream branches. <a href="https://developer.mozilla.org/en-US/docs/Web/API/Request/clone"><code><u>Request.clone()</u></code></a> and <a href="https://developer.mozilla.org/en-US/docs/Web/API/Response/clone"><code><u>Response.clone()</u></code></a> perform implicit <code>tee()</code> operations on the body stream – a detail that's easy to miss. Code that clones a request for logging or retry logic may unknowingly create branched streams that need independent consumption, multiplying the resource management burden.</p><p>Now, to be certain, these types of issues <i>are</i> implementation bugs. The connection leak was definitely something that undici needed to fix in its own implementation, but the complexity of the specification does not make dealing with these types of issues easy.</p><blockquote><p>"Cloning streams in Node.js's fetch() implementation is harder than it looks. When you clone a request or response body, you're calling tee() - which splits a single stream into two branches that both need to be consumed. If one consumer reads faster than the other, data buffers unbounded in memory waiting for the slow branch. If you don't properly consume both branches, the underlying connection leaks. The coordination required between two readers sharing one source makes it easy to accidentally break the original request or exhaust connection pools. It's a simple API call with complex underlying mechanics that are difficult to get right." - Matteo Collina, Ph.D. - Platformatic Co-Founder &amp; CTO, Node.js Technical Steering Committee Chair</p></blockquote>
    <div>
      <h4>Falling headlong off the tee() memory cliff</h4>
      <a href="#falling-headlong-off-the-tee-memory-cliff">
        
      </a>
    </div>
    <p><a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/tee"><code><u>tee()</u></code></a> splits a stream into two branches. It seems straightforward, but the implementation requires buffering: if one branch is read faster than the other, the data must be held somewhere until the slower branch catches up.</p>
            <pre><code>const [forHash, forStorage] = response.body.tee();

// Hash computation is fast
const hash = await computeHash(forHash);

// Storage write is slow — meanwhile, the entire stream
// may be buffered in memory waiting for this branch
await writeToStorage(forStorage);</code></pre>
            <p>The spec does not mandate buffer limits for <code>tee()</code>. And to be fair, the spec allows implementations to implement the actual internal mechanisms for <code>tee()</code>and other APIs in any way they see fit so long as the observable normative requirements of the specification are met. But if an implementation chooses to implement <code>tee()</code> in the specific way described by the streams specification, then <code>tee()</code> will come with a built-in memory management issue that is difficult to work around.</p><p>Implementations have had to develop their own strategies for dealing with this. Firefox initially used a linked-list approach that led to O<code>(n)</code> memory growth proportional to the consumption rate difference. In Cloudflare Workers, we opted to implement a shared buffer model where backpressure is signaled by the slowest consumer rather than the fastest.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5cl4vqYfaHaVXiHjLSXv0a/03a0b9fe4c9c0594e181ffee43b63998/2.png" />
          </figure>
    <div>
      <h4>Transform backpressure gaps</h4>
      <a href="#transform-backpressure-gaps">
        
      </a>
    </div>
    <p><code>TransformStream</code> creates a <code>readable/writable</code> pair with processing logic in between. The <code>transform()</code> function executes on <i>write</i>, not on read. Processing of the transform happens eagerly as data arrives, regardless of whether any consumer is ready. This causes unnecessary work when consumers are slow, and the backpressure signaling between the two sides has gaps that can cause unbounded buffering under load. The expectation in the spec is that the producer of the data being transformed is paying attention to the <code>writer.ready</code> signal on the writable side of the transform but quite often producers just simply ignore it.</p><p>If the transform's <code>transform() </code>operation is synchronous and always enqueues output immediately, it never signals backpressure back to the writable side even when the downstream consumer is slow. This is a consequence of the spec design that many developers completely overlook. In browsers, where there's only a single user and typically only a small number of stream pipelines active at any given time, this type of foot gun is often of no consequence, but it has a major impact on server-side or edge performance in runtimes that serve thousands of concurrent requests.</p>
            <pre><code>const fastTransform = new TransformStream({
  transform(chunk, controller) {
    // Synchronously enqueue — this never applies backpressure
    // Even if the readable side's buffer is full, this succeeds
    controller.enqueue(processChunk(chunk));
  }
});

// Pipe a fast source through the transform to a slow sink
fastSource
  .pipeThrough(fastTransform)
  .pipeTo(slowSink);  // Buffer grows without bound</code></pre>
            <p>What TransformStreams are supposed to do is check for backpressure on the controller and use promises to communicate that back to the writer:</p>
            <pre><code>const fastTransform = new TransformStream({
  async transform(chunk, controller) {
    if (controller.desiredSize &lt;= 0) {
      // Wait on the backpressure to clear somehow
    }

    controller.enqueue(processChunk(chunk));
  }
});</code></pre>
            <p>A difficulty here, however, is that the <code>TransformStreamDefaultController</code> does not have a ready promise mechanism like Writers do; so the <code>TransformStream</code> implementation would need to implement a polling mechanism to periodically check when <code>controller.desiredSize</code> becomes positive again.</p><p>The problem gets worse in pipelines. When you chain multiple transforms – say, parse, transform, then serialize – each <code>TransformStream</code> has its own internal readable and writable buffers. If implementers follow the spec strictly, data cascades through these buffers in a push-oriented fashion: the source pushes to transform A, which pushes to transform B, which pushes to transform C, each accumulating data in intermediate buffers before the final consumer has even started pulling. With three transforms, you can have six internal buffers filling up simultaneously.</p><p>Developers using the streams API are expected to remember to use options like <code>highWaterMark</code> when creating their sources, transforms, and writable destinations but often they either forget or simply choose to ignore it.</p>
            <pre><code>source
  .pipeThrough(parse)      // buffers filling...
  .pipeThrough(transform)  // more buffers filling...
  .pipeThrough(serialize)  // even more buffers...
  .pipeTo(destination);    // consumer hasn't started yet</code></pre>
            <p>Implementations have found ways to optimize transform pipelines by collapsing identity transforms, short-circuiting non-observable paths, deferring buffer allocation, or falling back to native code that does not run JavaScript at all. Deno, Bun, and Cloudflare Workers have all successfully implemented "native path" optimizations that can help eliminate much of the overhead, and Vercel's recent <a href="https://vercel.com/blog/we-ralph-wiggumed-webstreams-to-make-them-10x-faster"><u>fast-webstreams</u></a> research is working on similar optimizations for Node.js. But the optimizations themselves add significant complexity and still can't fully escape the inherently push-oriented model that TransformStream uses.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64FcAUPYrTvOSYOPoT2FkR/cc91e0d32dd47320e8ac9d6f431a2fda/3.png" />
          </figure>
    <div>
      <h4>GC thrashing in server-side rendering</h4>
      <a href="#gc-thrashing-in-server-side-rendering">
        
      </a>
    </div>
    <p>Streaming server-side rendering (SSR) is a particularly painful case. A typical SSR stream might render thousands of small HTML fragments, each passing through the streams machinery:</p>
            <pre><code>// Each component enqueues a small chunk
function renderComponent(controller) {
  controller.enqueue(encoder.encode(`&lt;div&gt;${content}&lt;/div&gt;`));
}

// Hundreds of components = hundreds of enqueue calls
// Each one triggers promise machinery internally
for (const component of components) {
  renderComponent(controller);  // Promises created, objects allocated
}</code></pre>
            <p>Every fragment means promises created for <code>read()</code> calls, promises for backpressure coordination, intermediate buffer allocations, and <code>{ value, done } </code>result objects – most of which become garbage almost immediately.</p><p>Under load, this creates GC pressure that can devastate throughput. The JavaScript engine spends significant time collecting short-lived objects instead of doing useful work. Latency becomes unpredictable as GC pauses interrupt request handling. I've seen SSR workloads where garbage collection accounts for a substantial portion (up to and beyond 50%) of total CPU time per request. That's time that could be spent actually rendering content.</p><p>The irony is that streaming SSR is supposed to improve performance by sending content incrementally. But the overhead of the streams machinery can negate those gains, especially for pages with many small components. Developers sometimes find that buffering the entire response is actually faster than streaming through Web streams, defeating the purpose entirely.</p>
    <div>
      <h3>The optimization treadmill</h3>
      <a href="#the-optimization-treadmill">
        
      </a>
    </div>
    <p>To achieve usable performance, every major runtime has resorted to non-standard internal optimizations for Web streams. Node.js, Deno, Bun, and Cloudflare Workers have all developed their own workarounds. This is particularly true for streams wired up to system-level I/O, where much of the machinery is non-observable and can be short-circuited.</p><p>Finding these optimization opportunities can itself be a significant undertaking. It requires end-to-end understanding of the spec to identify which behaviors are observable and which can safely be elided. Even then, whether a given optimization is actually spec-compliant is often unclear. Implementers must make judgment calls about which semantics they can relax without breaking compatibility. This puts enormous pressure on runtime teams to become spec experts just to achieve acceptable performance.</p><p>These optimizations are difficult to implement, frequently error-prone, and lead to inconsistent behavior across runtimes. Bun's "<a href="https://bun.sh/docs/api/streams#direct-readablestream"><u>Direct Streams</u></a>" optimization takes a deliberately and observably non-standard approach, bypassing much of the spec's machinery entirely. Cloudflare Workers' <a href="https://developers.cloudflare.com/workers/runtime-apis/streams/transformstream/"><code><u>IdentityTransformStream</u></code></a> provides a fast-path for pass-through transforms but is Workers-specific and implements behaviors that are not standard for a <code>TransformStream</code>. Each runtime has its own set of tricks and the natural tendency is toward non-standard solutions, because that's often the only way to make things fast.</p><p>This fragmentation hurts portability. Code that performs well on one runtime may behave differently (or poorly) on another, even though it's using "standard" APIs. The complexity burden on runtime implementers is substantial, and the subtle behavioral differences create friction for developers trying to write cross-runtime code, particularly those maintaining frameworks that must be able to run efficiently across many runtime environments.</p><p>It is also necessary to emphasize that many optimizations are only possible in parts of the spec that are unobservable to user code. The alternative, like Bun "Direct Streams", is to intentionally diverge from the spec-defined observable behaviors. This means optimizations often feel "incomplete". They work in some scenarios but not in others, in some runtimes but not others, etc. Every such case adds to the overall unsustainable complexity of the Web streams approach which is why most runtime implementers rarely put significant effort into further improvements to their streams implementations once the conformance tests are passing.</p><p>Implementers shouldn't need to jump through these hoops. When you find yourself needing to relax or bypass spec semantics just to achieve reasonable performance, that's a sign something is wrong with the spec itself. A well-designed streaming API should be efficient by default, not require each runtime to invent its own escape hatches.</p>
    <div>
      <h3>The compliance burden</h3>
      <a href="#the-compliance-burden">
        
      </a>
    </div>
    <p>A complex spec creates complex edge cases. The <a href="https://github.com/web-platform-tests/wpt/tree/master/streams"><u>Web Platform Tests for streams</u></a> span over 70 test files, and while comprehensive testing is a good thing, what's telling is what needs to be tested.</p><p>Consider some of the more obscure tests that implementations must pass:</p><ul><li><p>Prototype pollution defense: One test patches <code>Object.prototype.</code>then to intercept promise resolutions, then verifies that <code>pipeTo()</code> and <code>tee()</code> operations don't leak internal values through the prototype chain. This tests a security property that only exists because the spec's promise-heavy internals create an attack surface.</p></li><li><p>WebAssembly memory rejection: BYOB reads must explicitly reject ArrayBuffers backed by WebAssembly memory, which look like regular buffers but can't be transferred. This edge case exists because of the spec's buffer detachment model – a simpler API wouldn't need to handle it.</p></li><li><p>Crash regression for state machine conflicts: A test specifically checks that calling <code>byobRequest.respond()</code> after <code>enqueue()</code> doesn't crash the runtime. This sequence creates a conflict in the internal state machine — the <code>enqueue()</code> fulfills the pending read and should invalidate the <code>byobRequest</code>, but implementations must gracefully handle the subsequent <code>respond()</code> rather than corrupting memory in order to cover the very likely possibility that developers are not using the complex API correctly.</p></li></ul><p>These aren't contrived scenarios invented by test authors in total vacuum. They're consequences of the spec's design and reflect real world bugs.</p><p>For runtime implementers, passing the WPT suite means handling intricate corner cases that most application code will never encounter. The tests encode not just the happy path but the full matrix of interactions between readers, writers, controllers, queues, strategies, and the promise machinery that connects them all.</p><p>A simpler API would mean fewer concepts, fewer interactions between concepts, and fewer edge cases to get right resulting in more confidence that implementations actually behave consistently.</p>
    <div>
      <h3>The takeaway</h3>
      <a href="#the-takeaway">
        
      </a>
    </div>
    <p>Web streams are complex for users and implementers alike. The problems with the spec aren't bugs. They emerge from using the API exactly as designed. They aren't issues that can be fixed solely through incremental improvements. They're consequences of fundamental design choices. To improve things we need different foundations.</p>
    <div>
      <h2>A better streams API is possible</h2>
      <a href="#a-better-streams-api-is-possible">
        
      </a>
    </div>
    <p>After implementing the Web streams spec multiple times across different runtimes and seeing the pain points firsthand, I decided it was time to explore what a better, alternative streaming API could look like if designed from first principles today.</p><p>What follows is a proof of concept: it's not a finished standard, not a production-ready library, not even necessarily a concrete proposal for something new, but a starting point for discussion that demonstrates the problems with Web streams aren't inherent to streaming itself; they're consequences of specific design choices that could be made differently. Whether this exact API is the right answer is less important than whether it sparks a productive conversation about what we actually need from a streaming primitive.</p>
    <div>
      <h3>What is a stream?</h3>
      <a href="#what-is-a-stream">
        
      </a>
    </div>
    <p>Before diving into API design, it's worth asking: what is a stream?</p><p>At its core, a stream is just a sequence of data that arrives over time. You don't have all of it at once. You process it incrementally as it becomes available.</p><p>Unix pipes are perhaps the purest expression of this idea:</p>
            <pre><code>cat access.log | grep "error" | sort | uniq -c</code></pre>
            <p>
Data flows left to right. Each stage reads input, does its work, writes output. There's no pipe reader to acquire, no controller lock to manage. If a downstream stage is slow, upstream stages naturally slow down as well. Backpressure is implicit in the model, not a separate mechanism to learn (or ignore).</p><p>In JavaScript, the natural primitive for "a sequence of things that arrive over time" is already in the language: the async iterable. You consume it with <code>for await...of</code>. You stop consuming by stopping iteration.</p><p>This is the intuition the new API tries to preserve: streams should feel like iteration, because that's what they are. The complexity of Web streams – readers, writers, controllers, locks, queuing strategies – obscures this fundamental simplicity. A better API should make the simple case simple and only add complexity where it's genuinely needed.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3AUAA4bitbTOVSQg7Pd7fv/0856b44d78899dcffc4493f4146fb64f/4.png" />
          </figure>
    <div>
      <h3>Design principles</h3>
      <a href="#design-principles">
        
      </a>
    </div>
    <p>I built the proof-of-concept alternative around a different set of principles.</p>
    <div>
      <h4>Streams are iterables.</h4>
      <a href="#streams-are-iterables">
        
      </a>
    </div>
    <p>No custom <code>ReadableStream</code> class with hidden internal state. A readable stream is just an <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Iteration_protocols#the_async_iterator_and_async_iterable_protocols"><code><u>AsyncIterable&lt;Uint8Array[]&gt;</u></code></a>. You consume it with <code>for await...of</code>. No readers to acquire, no locks to manage.</p>
    <div>
      <h4>Pull-through transforms</h4>
      <a href="#pull-through-transforms">
        
      </a>
    </div>
    <p>Transforms don't execute until the consumer pulls. There's no eager evaluation, no hidden buffering. Data flows on-demand from source, through transforms, to the consumer. If you stop iterating, processing stops.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bEXBTEOHBMnCRKGA7odt5/cf51074cce3bb8b2ec1b5158c7560b68/5.png" />
          </figure>
    <div>
      <h4>Explicit backpressure</h4>
      <a href="#explicit-backpressure">
        
      </a>
    </div>
    <p>Backpressure is strict by default. When a buffer is full, writes reject rather than silently accumulating. You can configure alternative policies – block until space is available, drop oldest, drop newest – but you have to choose explicitly. No more silent memory growth.</p>
    <div>
      <h4>Batched chunks</h4>
      <a href="#batched-chunks">
        
      </a>
    </div>
    <p>Instead of yielding one chunk per iteration, streams yield <code>Uint8Array[]:</code> arrays of chunks. This amortizes the async overhead across multiple chunks, reducing promise creation and microtask latency in hot paths.</p>
    <div>
      <h4>Bytes only</h4>
      <a href="#bytes-only">
        
      </a>
    </div>
    <p>The API deals exclusively with bytes (<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array"><code><u>Uint8Array</u></code></a>). Strings are UTF-8 encoded automatically. There's no "value stream" vs "byte stream" dichotomy. If you want to stream arbitrary JavaScript values, use async iterables directly. While the API uses <code>Uint8Array</code>, it treats chunks as opaque. There is no partial consumption, no BYOB patterns, no byte-level operations within the streaming machinery itself. Chunks go in, chunks come out, unchanged unless a transform explicitly modifies them.</p>
    <div>
      <h4>Synchronous fast paths matter</h4>
      <a href="#synchronous-fast-paths-matter">
        
      </a>
    </div>
    <p>The API recognizes that synchronous data sources are both necessary and common. The application should not be forced to always accept the performance cost of asynchronous scheduling simply because that's the only option provided. At the same time, mixing sync and async processing can be dangerous. Synchronous paths should always be an option and should always be explicit.</p>
    <div>
      <h3>The new API in action</h3>
      <a href="#the-new-api-in-action">
        
      </a>
    </div>
    
    <div>
      <h4>Creating and consuming streams</h4>
      <a href="#creating-and-consuming-streams">
        
      </a>
    </div>
    <p>In Web streams, creating a simple producer/consumer pair requires <code>TransformStream</code>, manual encoding, and careful lock management:</p>
            <pre><code>const { readable, writable } = new TransformStream();
const enc = new TextEncoder();
const writer = writable.getWriter();
await writer.write(enc.encode("Hello, World!"));
await writer.close();
writer.releaseLock();

const dec = new TextDecoder();
let text = '';
for await (const chunk of readable) {
  text += dec.decode(chunk, { stream: true });
}
text += dec.decode();</code></pre>
            <p>Even this relatively clean version requires: a <code>TransformStream</code>, manual <code>TextEncoder</code> and <code>TextDecoder</code>, and explicit lock release.</p><p>Here's the equivalent with the new API:</p>
            <pre><code>import { Stream } from 'new-streams';

// Create a push stream
const { writer, readable } = Stream.push();

// Write data — backpressure is enforced
await writer.write("Hello, World!");
await writer.end();

// Consume as text
const text = await Stream.text(readable);</code></pre>
            <p>The readable is just an async iterable. You can pass it to any function that expects one, including <code>Stream.text()</code> which collects and decodes the entire stream.</p><p>The writer has a simple interface: <code>write(), writev()</code> for batched writes, <code>end()</code> to signal completion, and <code>abort()</code> for errors. That's essentially it.</p><p>The Writer is not a concrete class. Any object that implements <code>write()</code>, <code>end()</code>, and <code>abort()</code> can be a writer making it easy to adapt existing APIs or create specialized implementations without subclassing. There's no complex <code>UnderlyingSink</code> protocol with <code>start()</code>, <code>write()</code>, <code>close()</code>, <code>and abort() </code>callbacks that must coordinate through a controller whose lifecycle and state are independent of the <code>WritableStream</code> it is bound to.</p><p>Here's a simple in-memory writer that collects all written data:</p>
            <pre><code>// A minimal writer implementation — just an object with methods
function createBufferWriter() {
  const chunks = [];
  let totalBytes = 0;
  let closed = false;

  const addChunk = (chunk) =&gt; {
    chunks.push(chunk);
    totalBytes += chunk.byteLength;
  };

  return {
    get desiredSize() { return closed ? null : 1; },

    // Async variants
    write(chunk) { addChunk(chunk); },
    writev(batch) { for (const c of batch) addChunk(c); },
    end() { closed = true; return totalBytes; },
    abort(reason) { closed = true; chunks.length = 0; },

    // Sync variants return boolean (true = accepted)
    writeSync(chunk) { addChunk(chunk); return true; },
    writevSync(batch) { for (const c of batch) addChunk(c); return true; },
    endSync() { closed = true; return totalBytes; },
    abortSync(reason) { closed = true; chunks.length = 0; return true; },

    getChunks() { return chunks; }
  };
}

// Use it
const writer = createBufferWriter();
await Stream.pipeTo(source, writer);
const allData = writer.getChunks();</code></pre>
            <p>No base class to extend, no abstract methods to implement, no controller to coordinate with. Just an object with the right shape.</p>
    <div>
      <h4>Pull-through transforms</h4>
      <a href="#pull-through-transforms">
        
      </a>
    </div>
    <p>Under the new API design, transforms should not perform any work until the data is being consumed. This is a fundamental principle.</p>
            <pre><code>// Nothing executes until iteration begins
const output = Stream.pull(source, compress, encrypt);

// Transforms execute as we iterate
for await (const chunks of output) {
  for (const chunk of chunks) {
    process(chunk);
  }
}</code></pre>
            <p><code>Stream.pull()</code> creates a lazy pipeline. The <code>compress</code> and <code>encrypt</code> transforms don't run until you start iterating output. Each iteration pulls data through the pipeline on demand.</p><p>This is fundamentally different from Web streams' <a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/pipeThrough"><code><u>pipeThrough()</u></code></a>, which starts actively pumping data from the source to the transform as soon as you set up the pipe. Pull semantics mean you control when processing happens, and stopping iteration stops processing.</p><p>Transforms can be stateless or stateful. A stateless transform is just a function that takes chunks and returns transformed chunks:</p>
            <pre><code>// Stateless transform — a pure function
// Receives chunks or null (flush signal)
const toUpperCase = (chunks) =&gt; {
  if (chunks === null) return null; // End of stream
  return chunks.map(chunk =&gt; {
    const str = new TextDecoder().decode(chunk);
    return new TextEncoder().encode(str.toUpperCase());
  });
};

// Use it directly
const output = Stream.pull(source, toUpperCase);</code></pre>
            <p>Stateful transforms are simple objects with member functions that maintain state across calls:</p>
            <pre><code>// Stateful transform — a generator that wraps the source
function createLineParser() {
  // Helper to concatenate Uint8Arrays
  const concat = (...arrays) =&gt; {
    const result = new Uint8Array(arrays.reduce((n, a) =&gt; n + a.length, 0));
    let offset = 0;
    for (const arr of arrays) { result.set(arr, offset); offset += arr.length; }
    return result;
  };

  return {
    async *transform(source) {
      let pending = new Uint8Array(0);
      
      for await (const chunks of source) {
        if (chunks === null) {
          // Flush: yield any remaining data
          if (pending.length &gt; 0) yield [pending];
          continue;
        }
        
        // Concatenate pending data with new chunks
        const combined = concat(pending, ...chunks);
        const lines = [];
        let start = 0;

        for (let i = 0; i &lt; combined.length; i++) {
          if (combined[i] === 0x0a) { // newline
            lines.push(combined.slice(start, i));
            start = i + 1;
          }
        }

        pending = combined.slice(start);
        if (lines.length &gt; 0) yield lines;
      }
    }
  };
}

const output = Stream.pull(source, createLineParser());</code></pre>
            <p>For transforms that need cleanup on abort, add an abort handler:</p>
            <pre><code>// Stateful transform with resource cleanup
function createGzipCompressor() {
  // Hypothetical compression API...
  const deflate = new Deflater({ gzip: true });

  return {
    async *transform(source) {
      for await (const chunks of source) {
        if (chunks === null) {
          // Flush: finalize compression
          deflate.push(new Uint8Array(0), true);
          if (deflate.result) yield [deflate.result];
        } else {
          for (const chunk of chunks) {
            deflate.push(chunk, false);
            if (deflate.result) yield [deflate.result];
          }
        }
      }
    },
    abort(reason) {
      // Clean up compressor resources on error/cancellation
    }
  };
}</code></pre>
            <p>For implementers, there's no Transformer protocol with <code>start()</code>, <code>transform()</code>, <code>flush()</code> methods and controller coordination passed into a <code>TransformStream</code> class that has its own hidden state machine and buffering mechanisms. Transforms are just functions or simple objects: far simpler to implement and test.</p>
    <div>
      <h4>Explicit backpressure policies</h4>
      <a href="#explicit-backpressure-policies">
        
      </a>
    </div>
    <p>When a bounded buffer fills up and a producer wants to write more, there are only a few things you can do:</p><ol><li><p>Reject the write: refuse to accept more data</p></li><li><p>Wait: block until space becomes available</p></li><li><p>Discard old data: evict what's already buffered to make room</p></li><li><p>Discard new data: drop what's incoming</p></li></ol><p>That's it. Any other response is either a variation of these (like "resize the buffer," which is really just deferring the choice) or domain-specific logic that doesn't belong in a general streaming primitive. Web streams currently always choose Wait by default.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/68339c8QsvNmb7JcZ2lSDO/e52a86a9b8f52b52eb9328d5ee58f23a/6.png" />
          </figure><p>The new API makes you choose one of these four explicitly:</p><ul><li><p><code>strict</code> (default): Rejects writes when the buffer is full and too many writes are pending. Catches "fire-and-forget" patterns where producers ignore backpressure.</p></li><li><p><code>block</code>: Writes wait until buffer space is available. Use when you trust the producer to await writes properly.</p></li><li><p><code>drop-oldest</code>: Drops the oldest buffered data to make room. Useful for live feeds where stale data loses value.</p></li><li><p><code>drop-newest</code>: Discards incoming data when full. Useful when you want to process what you have without being overwhelmed.</p></li></ul>
            <pre><code>const { writer, readable } = Stream.push({
  highWaterMark: 10,
  backpressure: 'strict' // or 'block', 'drop-oldest', 'drop-newest'
});</code></pre>
            <p>No more hoping producers cooperate. The policy you choose determines what happens when the buffer fills.</p><p>Here's how each policy behaves when a producer writes faster than the consumer reads:</p>
            <pre><code>// strict: Catches fire-and-forget writes that ignore backpressure
const strict = Stream.push({ highWaterMark: 2, backpressure: 'strict' });
strict.writer.write(chunk1);  // ok (not awaited)
strict.writer.write(chunk2);  // ok (fills slots buffer)
strict.writer.write(chunk3);  // ok (queued in pending)
strict.writer.write(chunk4);  // ok (pending buffer fills)
strict.writer.write(chunk5);  // throws! too many pending writes

// block: Wait for space (unbounded pending queue)
const blocking = Stream.push({ highWaterMark: 2, backpressure: 'block' });
await blocking.writer.write(chunk1);  // ok
await blocking.writer.write(chunk2);  // ok
await blocking.writer.write(chunk3);  // waits until consumer reads
await blocking.writer.write(chunk4);  // waits until consumer reads
await blocking.writer.write(chunk5);  // waits until consumer reads

// drop-oldest: Discard old data to make room
const dropOld = Stream.push({ highWaterMark: 2, backpressure: 'drop-oldest' });
await dropOld.writer.write(chunk1);  // ok
await dropOld.writer.write(chunk2);  // ok
await dropOld.writer.write(chunk3);  // ok, chunk1 discarded

// drop-newest: Discard incoming data when full
const dropNew = Stream.push({ highWaterMark: 2, backpressure: 'drop-newest' });
await dropNew.writer.write(chunk1);  // ok
await dropNew.writer.write(chunk2);  // ok
await dropNew.writer.write(chunk3);  // silently dropped</code></pre>
            
    <div>
      <h4>Explicit Multi-consumer patterns</h4>
      <a href="#explicit-multi-consumer-patterns">
        
      </a>
    </div>
    
            <pre><code>// Share with explicit buffer management
const shared = Stream.share(source, {
  highWaterMark: 100,
  backpressure: 'strict'
});

const consumer1 = shared.pull();
const consumer2 = shared.pull(decompress);</code></pre>
            <p>Instead of <code>tee()</code> with its hidden unbounded buffer, you get explicit multi-consumer primitives. <code>Stream.share()</code> is pull-based: consumers pull from a shared source, and you configure the buffer limits and backpressure policy upfront.</p><p>There's also <code>Stream.broadcast()</code> for push-based multi-consumer scenarios. Both require you to think about what happens when consumers run at different speeds, because that's a real concern that shouldn't be hidden.</p>
    <div>
      <h4>Sync/async separation</h4>
      <a href="#sync-async-separation">
        
      </a>
    </div>
    <p>Not all streaming workloads involve I/O. When your source is in-memory and your transforms are pure functions, async machinery adds overhead without benefit. You're paying for coordination of "waiting" that adds no benefit.</p><p>The new API has complete parallel sync versions: <code>Stream.pullSync()</code>, <code>Stream.bytesSync()</code>, <code>Stream.textSync()</code>, and so on. If your source and transforms are all synchronous, you can process the entire pipeline without a single promise.</p>
            <pre><code>// Async — when source or transforms may be asynchronous
const textAsync = await Stream.text(source);

// Sync — when all components are synchronous
const textSync = Stream.textSync(source);</code></pre>
            <p>Here's a complete synchronous pipeline – compression, transformation, and consumption with zero async overhead:</p>
            <pre><code>// Synchronous source from in-memory data
const source = Stream.fromSync([inputBuffer]);

// Synchronous transforms
const compressed = Stream.pullSync(source, zlibCompressSync);
const encrypted = Stream.pullSync(compressed, aesEncryptSync);

// Synchronous consumption — no promises, no event loop trips
const result = Stream.bytesSync(encrypted);</code></pre>
            <p>The entire pipeline executes in a single call stack. No promises are created, no microtask queue scheduling occurs, and no GC pressure from short-lived async machinery. For CPU-bound workloads like parsing, compression, or transformation of in-memory data, this can be significantly faster than the equivalent Web streams code – which would force async boundaries even when every component is synchronous.</p><p>Web streams has no synchronous path. Even if your source has data ready and your transform is a pure function, you still pay for promise creation and microtask scheduling on every operation. Promises are fantastic for cases in which waiting is actually necessary, but they aren't always necessary. The new API lets you stay in sync-land when that's what you need.</p>
    <div>
      <h4>Bridging the gap between this and web streams</h4>
      <a href="#bridging-the-gap-between-this-and-web-streams">
        
      </a>
    </div>
    <p>The async iterator based approach provides a natural bridge between this alternative approach and Web streams. When coming from a ReadableStream to this new approach, simply passing the readable in as input works as expected when the ReadableStream is set up to yield bytes:</p>
            <pre><code>const readable = getWebReadableStreamSomehow();
const input = Stream.pull(readable, transform1, transform2);
for await (const chunks of input) {
  // process chunks
}</code></pre>
            <p>When adapting to a ReadableStream, a bit more work is required since the alternative approach yields batches of chunks, but the adaptation layer is as easily straightforward:</p>
            <pre><code>async function* adapt(input) {
  for await (const chunks of input) {
    for (const chunk of chunks) {
      yield chunk;
    }
  }
}

const input = Stream.pull(source, transform1, transform2);
const readable = ReadableStream.from(adapt(input));</code></pre>
            
    <div>
      <h4>How this addresses the real-world failures from earlier</h4>
      <a href="#how-this-addresses-the-real-world-failures-from-earlier">
        
      </a>
    </div>
    <ul><li><p>Unconsumed bodies: Pull semantics mean nothing happens until you iterate. No hidden resource retention. If you don't consume a stream, there's no background machinery holding connections open.</p></li><li><p>The <code>tee()</code> memory cliff: <code>Stream.share()</code> requires explicit buffer configuration. You choose the <code>highWaterMark</code> and backpressure policy upfront: no more silent unbounded growth when consumers run at different speeds.</p></li><li><p>Transform backpressure gaps: Pull-through transforms execute on-demand. Data doesn't cascade through intermediate buffers; it flows only when the consumer pulls. Stop iterating, stop processing.</p></li><li><p>GC thrashing in SSR: Batched chunks (<code>Uint8Array[]</code>) amortize async overhead. Sync pipelines via <code>Stream.pullSync()</code> eliminate promise allocation entirely for CPU-bound workloads.</p></li></ul>
    <div>
      <h3>Performance</h3>
      <a href="#performance">
        
      </a>
    </div>
    <p>The design choices have performance implications. Here are benchmarks from the reference implementation of this possible alternative compared to Web streams (Node.js v24.x, Apple M1 Pro, averaged over 10 runs):</p><table><tr><td><p><b>Scenario</b></p></td><td><p><b>Alternative</b></p></td><td><p><b>Web streams</b></p></td><td><p><b>Difference</b></p></td></tr><tr><td><p>Small chunks (1KB × 5000)</p></td><td><p>~13 GB/s</p></td><td><p>~4 GB/s</p></td><td><p>~3× faster</p></td></tr><tr><td><p>Tiny chunks (100B × 10000)</p></td><td><p>~4 GB/s</p></td><td><p>~450 MB/s</p></td><td><p>~8× faster</p></td></tr><tr><td><p>Async iteration (8KB × 1000)</p></td><td><p>~530 GB/s</p></td><td><p>~35 GB/s</p></td><td><p>~15× faster</p></td></tr><tr><td><p>Chained 3× transforms (8KB × 500)</p></td><td><p>~275 GB/s</p></td><td><p>~3 GB/s</p></td><td><p><b>~80–90× faster</b></p></td></tr><tr><td><p>High-frequency (64B × 20000)</p></td><td><p>~7.5 GB/s</p></td><td><p>~280 MB/s</p></td><td><p>~25× faster</p></td></tr></table><p>The chained transform result is particularly striking: pull-through semantics eliminate the intermediate buffering that plagues Web streams pipelines. Instead of each <code>TransformStream</code> eagerly filling its internal buffers, data flows on-demand from consumer to source.</p><p>Now, to be fair, Node.js really has not yet put significant effort into fully optimizing the performance of its Web streams implementation. There's likely significant room for improvement in Node.js' performance results through a bit of applied effort to optimize the hot paths there. That said, running these benchmarks in Deno and Bun also show a significant performance improvement with this alternative iterator based approach than in either of their Web streams implementations as well.</p><p>Browser benchmarks (Chrome/Blink, averaged over 3 runs) show consistent gains as well:</p><table><tr><td><p><b>Scenario</b></p></td><td><p><b>Alternative</b></p></td><td><p><b>Web streams</b></p></td><td><p><b>Difference</b></p></td></tr><tr><td><p>Push 3KB chunks</p></td><td><p>~135k ops/s</p></td><td><p>~24k ops/s</p></td><td><p>~5–6× faster</p></td></tr><tr><td><p>Push 100KB chunks</p></td><td><p>~24k ops/s</p></td><td><p>~3k ops/s</p></td><td><p>~7–8× faster</p></td></tr><tr><td><p>3 transform chain</p></td><td><p>~4.6k ops/s</p></td><td><p>~880 ops/s</p></td><td><p>~5× faster</p></td></tr><tr><td><p>5 transform chain</p></td><td><p>~2.4k ops/s</p></td><td><p>~550 ops/s</p></td><td><p>~4× faster</p></td></tr><tr><td><p>bytes() consumption</p></td><td><p>~73k ops/s</p></td><td><p>~11k ops/s</p></td><td><p>~6–7× faster</p></td></tr><tr><td><p>Async iteration</p></td><td><p>~1.1M ops/s</p></td><td><p>~10k ops/s</p></td><td><p><b>~40–100× faster</b></p></td></tr></table><p>These benchmarks measure throughput in controlled scenarios; real-world performance depends on your specific use case. The difference between Node.js and browser gains reflects the distinct optimization paths each environment takes for Web streams.</p><p>It's worth noting that these benchmarks compare a pure TypeScript/JavaScript implementation of the new API against the native (JavaScript/C++/Rust) implementations of Web streams in each runtime. The new API's reference implementation has had no performance optimization work; the gains come entirely from the design. A native implementation would likely show further improvement.</p><p>The gains illustrate how fundamental design choices compound: batching amortizes async overhead, pull semantics eliminate intermediate buffering, and the freedom for implementations to use synchronous fast paths when data is available immediately all contribute.</p><blockquote><p>"We’ve done a lot to improve performance and consistency in Node streams, but there’s something uniquely powerful about starting from scratch. New streams’ approach embraces modern runtime realities without legacy baggage, and that opens the door to a simpler, performant and more coherent streams model." 
- Robert Nagy, Node.js TSC member and Node.js streams contributor</p></blockquote>
    <div>
      <h2>What's next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>I'm publishing this to start a conversation. What did I get right? What did I miss? Are there use cases that don't fit this model? What would a migration path for this approach look like? The goal is to gather feedback from developers who've felt the pain of Web streams and have opinions about what a better API should look like.</p>
    <div>
      <h3>Try it yourself</h3>
      <a href="#try-it-yourself">
        
      </a>
    </div>
    <p>A reference implementation for this alternative approach is available now and can be found at <a href="https://github.com/jasnell/new-streams"><u>https://github.com/jasnell/new-streams</u></a>.</p><ul><li><p>API Reference: See the <a href="https://github.com/jasnell/new-streams/blob/main/API.md"><u>API.md</u></a> for complete documentation</p></li><li><p>Examples: The <a href="https://github.com/jasnell/new-streams/tree/main/samples"><u>samples directory</u></a> has working code for common patterns</p></li></ul><p>I welcome issues, discussions, and pull requests. If you've run into Web streams problems I haven't covered, or if you see gaps in this approach, let me know. But again, the idea here is not to say "Let's all use this shiny new object!"; it is to kick off a discussion that looks beyond the current status quo of Web Streams and returns back to first principles.</p><p>Web streams was an ambitious project that brought streaming to the web platform when nothing else existed. The people who designed it made reasonable choices given the constraints of 2014 – before async iteration, before years of production experience revealed the edge cases.</p><p>But we've learned a lot since then. JavaScript has evolved. A streaming API designed today can be simpler, more aligned with the language, and more explicit about the things that matter, like backpressure and multi-consumer behavior.</p><p>We deserve a better stream API. So let's talk about what that could look like.</p> ]]></content:encoded>
            <category><![CDATA[Standards]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[TypeScript]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Node.js]]></category>
            <category><![CDATA[Performance]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">37h1uszA2vuOfmXb3oAnZr</guid>
            <dc:creator>James M Snell</dc:creator>
        </item>
        <item>
            <title><![CDATA[DIY BYOIP: a new way to Bring Your Own IP prefixes to Cloudflare]]></title>
            <link>https://blog.cloudflare.com/diy-byoip/</link>
            <pubDate>Fri, 07 Nov 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Announcing a new self-serve API for Bring Your Own IP (BYOIP), giving customers unprecedented control and flexibility to onboard, manage, and use their own IP prefixes with Cloudflare's services. ]]></description>
            <content:encoded><![CDATA[ <p>When a customer wants to <a href="https://blog.cloudflare.com/bringing-your-own-ips-to-cloudflare-byoip/"><u>bring IP address space to</u></a> Cloudflare, they’ve always had to reach out to their account team to put in a request. This request would then be sent to various Cloudflare engineering teams such as addressing and network engineering — and then the team responsible for the particular service they wanted to use the prefix with (e.g., CDN, Magic Transit, Spectrum, Egress). In addition, they had to work with their own legal teams and potentially another organization if they did not have primary ownership of an IP prefix in order to get a <a href="https://developers.cloudflare.com/byoip/concepts/loa/"><u>Letter of Agency (LOA)</u></a> issued through hoops of approvals. This process is complex, manual, and  time-consuming for all parties involved — sometimes taking up to 4–6 weeks depending on various approvals. </p><p>Well, no longer! Today, we are pleased to announce the launch of our self-serve BYOIP API, which enables our customers to onboard and set up their BYOIP prefixes themselves.</p><p>With self-serve, we handle the bureaucracy for you. We have automated this process using the gold standard for routing security — the Resource Public Key Infrastructure, RPKI. All the while, we continue to ensure the best quality of service by generating LOAs on our customers’ behalf, based on the security guarantees of our new ownership validation process. This ensures that customer routes continue to be accepted in every corner of the Internet.</p><p>Cloudflare takes the security and stability of the whole Internet very seriously. RPKI is a cryptographically-strong authorization mechanism and is, we believe, substantially more reliable than common practice which relies upon human review of scanned documents. However, deployment and availability of some RPKI-signed artifacts like the AS Path Authorisation (ASPA) object remains limited, and for that reason we are limiting the initial scope of self-serve onboarding to BYOIP prefixes originated from Cloudflare's autonomous system number (ASN) AS 13335. By doing this, we only need to rely on the publication of Route Origin Authorisation (ROA) objects, which are widely available. This approach has the advantage of being safe for the Internet and also meeting the needs of most of our BYOIP customers. </p><p>Today, we take a major step forward in offering customers a more comprehensive IP address management (IPAM) platform. With the recent update to <a href="https://blog.cloudflare.com/your-ips-your-rules-enabling-more-efficient-address-space-usage/"><u>enable multiple services on a single BYOIP prefix</u></a> and this latest advancement to enable self-serve onboarding via our API, we hope customers feel empowered to take control of their IPs on our network.</p>
    <div>
      <h2>An evolution of Cloudflare BYOIP</h2>
      <a href="#an-evolution-of-cloudflare-byoip">
        
      </a>
    </div>
    <p>We want Cloudflare to feel like an extension of your infrastructure, which is why we <a href="https://blog.cloudflare.com/bringing-your-own-ips-to-cloudflare-byoip/"><u>originally launched Bring-Your-Own-IP (BYOIP) back in 2020</u></a>. </p><p>A quick refresher: Bring-your-own-IP is named for exactly what it does - it allows customers to bring their own IP space to Cloudflare. Customers choose BYOIP for a number of reasons, but the main reasons are control and configurability. An IP prefix is a range or block of IP addresses. Routers create a table of reachable prefixes, known as a routing table, to ensure that packets are delivered correctly across the Internet. When a customer's Cloudflare services are configured to use the customer's own addresses, onboarded to Cloudflare as BYOIP, a packet with a corresponding destination address will be routed across the Internet to Cloudflare's global edge network, where it will be received and processed. BYOIP can be used with our Layer 7 services, Spectrum, or Magic Transit. </p>
    <div>
      <h2>A look under the hood: How it works</h2>
      <a href="#a-look-under-the-hood-how-it-works">
        
      </a>
    </div>
    
    <div>
      <h3>Today’s world of prefix validation</h3>
      <a href="#todays-world-of-prefix-validation">
        
      </a>
    </div>
    <p>Let’s take a step back and take a look at the state of the BYOIP world right now. Let’s say a customer has authority over a range of IP addresses, and they’d like to bring them to Cloudflare. We require customers to provide us with a Letter of Authorization (LOA) and have an Internet Routing Registry (IRR) record matching their prefix and ASN. Once we have this, we require manual review by a Cloudflare engineer. There are a few issues with this process:</p><ul><li><p>Insecure: The LOA is just a document—a piece of paper. The security of this method rests entirely on the diligence of the engineer reviewing the document. If the review is not able to detect that a document is fraudulent or inaccurate, it is possible for a prefix or ASN to be hijacked.</p></li><li><p>Time-consuming: Generating a single LOA is not always sufficient. If you are leasing IP space, we will ask you to provide documentation confirming that relationship as well, so that we can see a clear chain of authorisation from the original assignment or allocation of addresses to you. Getting all the paper documents to verify this chain of ownership, combined with having to wait for manual review can result in weeks of waiting to deploy a prefix!</p></li></ul>
    <div>
      <h3>Automating trust: How Cloudflare verifies your BYOIP prefix ownership in minutes</h3>
      <a href="#automating-trust-how-cloudflare-verifies-your-byoip-prefix-ownership-in-minutes">
        
      </a>
    </div>
    <p>Moving to a self-serve model allowed us to rethink the manner in which we conduct prefix ownership checks. We asked ourselves: How can we quickly, securely, and automatically prove you are authorized to use your IP prefix and intend to route it through Cloudflare?</p><p>We ended up killing two birds with one stone, thanks to our two-step process involving the creation of an RPKI ROA (verification of intent) and modification of IRR or rDNS records (verification of ownership). Self-serve unlocks the ability to not only onboard prefixes more quickly and without human intervention, but also exercises more rigorous ownership checks than a simple scanned document ever could. While not 100% foolproof, it is a significant improvement in the way we verify ownership.</p>
    <div>
      <h3>Tapping into the authorities	</h3>
      <a href="#tapping-into-the-authorities">
        
      </a>
    </div>
    <p>Regional Internet Registries (RIRs) are the organizations responsible for distributing and managing Internet number resources like IP addresses. They are composed of 5 different entities operating in different regions of the world (<a href="https://developers.cloudflare.com/byoip/get-started/#:~:text=Your%20prefix%20must%20be%20registered%20under%20one%20of%20the%20Regional%20Internet%20Registries%20(RIRs)%3A"><u>RIRs</u></a>). Originally allocated address space from the Internet Assigned Numbers Authority (IANA), they in turn assign and allocate that IP space to Local Internet Registries (LIRs) like ISPs.</p><p>This process is based on RIR policies which generally look at things like legal documentation, existing database/registry records, technical contacts, and BGP information. End-users can obtain addresses from an LIR, or in some cases through an RIR directly. As IPv4 addresses have become more scarce, brokerage services have been launched to allow addresses to be leased for fixed periods from their original assignees.</p><p>The Internet Routing Registry (IRR) is a separate system that focuses on routing rather than address assignment. Many organisations operate IRR instances and allow routing information to be published, including all five RIRs. While most IRR instances impose few barriers to the publication of routing data, those that are operated by RIRs are capable of linking the ability to publish routing information with the organisations to which the corresponding addresses have been assigned. We believe that being able to modify an IRR record protected in this way provides a good signal that a user has the rights to use a prefix.</p><p>Example of a route object containing validation token (using the documentation-only address 192.0.2.0/24):</p>
            <pre><code>% whois -h rr.arin.net 192.0.2.0/24

route:          192.0.2.0/24
origin:         AS13335
descr:          Example Company, Inc.
                cf-validation: 9477b6c3-4344-4ceb-85c4-6463e7d2453f
admin-c:        ADMIN2521-ARIN
tech-c:         ADMIN2521-ARIN
tech-c:         CLOUD146-ARIN
mnt-by:         MNT-CLOUD14
created:        2025-07-29T10:52:27Z
last-modified:  2025-07-29T10:52:27Z
source:         ARIN</code></pre>
            <p>For those that don’t want to go through the process of IRR-based validation, reverse DNS (rDNS) is provided as another secure method of verification. To manage rDNS for a prefix — whether it's creating a PTR record or a security TXT record — you must be granted permission by the entity that allocated the IP block in the first place (usually your ISP or the RIR).</p><p>This permission is demonstrated in one of two ways:</p><ul><li><p>Directly through the IP owner’s authenticated customer portal (ISP/RIR).</p></li><li><p>By the IP owner delegating authority to your third-party DNS provider via an NS record for your reverse zone.</p></li></ul><p>Example of a reverse domain lookup using dig command (using the documentation-only address 192.0.2.0/24):</p>
            <pre><code>% dig cf-validation.2.0.192.in-addr.arpa TXT

; &lt;&lt;&gt;&gt; DiG 9.10.6 &lt;&lt;&gt;&gt; cf-validation.2.0.192.in-addr.arpa TXT
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 16686
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cf-validation.2.0.192.in-addr.arpa. IN TXT

;; ANSWER SECTION:
cf-validation.2.0.192.in-addr.arpa. 300 IN TXT "b2f8af96-d32d-4c46-a886-f97d925d7977"

;; Query time: 35 msec
;; SERVER: 127.0.2.2#53(127.0.2.2)
;; WHEN: Fri Oct 24 10:43:52 EDT 2025
;; MSG SIZE  rcvd: 150</code></pre>
            <p>So how exactly is one supposed to modify these records? That’s where the validation token comes into play. Once you choose either the IRR or Reverse DNS method, we provide a unique, single-use validation token. You must add this token to the content of the relevant record, either in the IRR or in the DNS. Our system then looks for the presence of the token as evidence that the request is being made by someone with authorization to make the requested modification. If the token is found, verification is complete and your ownership is confirmed!</p>
    <div>
      <h3>The digital passport 🛂</h3>
      <a href="#the-digital-passport">
        
      </a>
    </div>
    <p>Ownership is only half the battle; we also need to confirm your intention that you authorize Cloudflare to advertise your prefix. For this, we rely on the gold standard for routing security: the Resource Private Key Infrastructure (RPKI), and in particular Route Origin Authorization (ROA) objects.</p><p>A ROA is a cryptographically-signed document that specifies which Autonomous System Number (ASN) is authorized to originate your IP prefix. You can think of a ROA as the digital equivalent of a certified, signed, and notarised contract from the owner of the prefix.</p><p>Relying parties can validate the signatures in a ROA using the RPKI.You simply create a ROA that specifies Cloudflare's ASN (AS13335) as an authorized originator and arrange for it to be signed. Many of our customers used hosted RPKI systems available through RIR portals for this. When our systems detect this signed authorization, your routing intention is instantly confirmed. </p><p>Many other companies that support BYOIP require a complex workflow involving creating self-signed certificates and manually modifying RDAP (Registration Data Access Protocol) records—a heavy administrative lift. By embracing a choice of IRR object modification and Reverse DNS TXT records, combined with RPKI, we offer a verification process that is much more familiar and straightforward for existing network operators.</p>
    <div>
      <h3>The global reach guarantee</h3>
      <a href="#the-global-reach-guarantee">
        
      </a>
    </div>
    <p>While the new self-serve flow ditches the need for the "dinosaur relic" that is the LOA, many network operators around the world still rely on it as part of the process of accepting prefixes from other networks.</p><p>To help ensure your prefix is accepted by adjacent networks globally, Cloudflare automatically generates a document on your behalf to be distributed in place of a LOA. This document provides information about the checks that we have carried out to confirm that we are authorised to originate the customer prefix, and confirms the presence of valid ROAs to authorise our origination of it. In this way we are able to support the workflows of network operators we connect to who rely upon LOAs, without our customers having the burden of generating them.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1GimIe80gJn5PrRUGkEMpF/130d2590e45088d58ac62ab2240f4d5c/image1.png" />
          </figure>
    <div>
      <h2>Staying away from black holes</h2>
      <a href="#staying-away-from-black-holes">
        
      </a>
    </div>
    <p>One concern in designing the Self-Serve API is the trade-off between giving customers flexibility while implementing the necessary safeguards so that an IP prefix is never advertised without a matching service binding. If this were to happen, Cloudflare would be advertising a prefix with no idea on what to do with the traffic when we receive it! We call this “blackholing” traffic. To handle this, we introduced the requirement of a default service binding — i.e. a service binding that spans the entire range of the IP prefix onboarded. </p><p>A customer can later layer different service bindings on top of their default service binding via <a href="https://blog.cloudflare.com/your-ips-your-rules-enabling-more-efficient-address-space-usage/"><u>multiple service bindings</u></a>, like putting CDN on top of a default Spectrum service binding. This way, a prefix can never be advertised without a service binding and blackhole our customers’ traffic.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/20QAM5GITJ5m5kYkNlh701/82812d202ffa7b9a4e46838aa6c04937/image2.png" />
          </figure>
    <div>
      <h2>Getting started</h2>
      <a href="#getting-started">
        
      </a>
    </div>
    <p>Check out our <a href="https://developers.cloudflare.com/byoip/get-started/"><u>developer docs</u></a> on the most up-to-date documentation on how to onboard, advertise, and add services to your IP prefixes via our API. Remember that onboardings can be complex, and don’t hesitate to ask questions or reach out to our <a href="https://www.cloudflare.com/professional-services/"><u>professional services</u></a> team if you’d like us to do it for you.</p>
    <div>
      <h2>The future of network control</h2>
      <a href="#the-future-of-network-control">
        
      </a>
    </div>
    <p>The ability to script and integrate BYOIP management into existing workflows is a game-changer for modern network operations, and we’re only just getting started. In the months ahead, look for self-serve BYOIP in the dashboard, as well as self-serve BYOIP offboarding to give customers even more control.</p><p>Cloudflare's self-serve BYOIP API onboarding empowers customers with unprecedented control and flexibility over their IP assets. This move to automate onboarding empowers a stronger security posture, moving away from manually-reviewed PDFs and driving <a href="https://rpki.cloudflare.com/"><u>RPKI adoption</u></a>. By using these API calls, organizations can automate complex network tasks, streamline migrations, and build more resilient and agile network infrastructures.</p> ]]></content:encoded>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Addressing]]></category>
            <category><![CDATA[BYOIP]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[IPv6]]></category>
            <category><![CDATA[Spectrum]]></category>
            <category><![CDATA[CDN]]></category>
            <category><![CDATA[Magic Transit]]></category>
            <category><![CDATA[Egress]]></category>
            <category><![CDATA[Cloudflare Gateway]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Aegis]]></category>
            <category><![CDATA[Smart Shield]]></category>
            <guid isPermaLink="false">4usaEaUwShJ04VKzlMV0V9</guid>
            <dc:creator>Ash Pallarito</dc:creator>
            <dc:creator>Lynsey Haynes</dc:creator>
            <dc:creator>Gokul Unnikrishnan</dc:creator>
        </item>
        <item>
            <title><![CDATA[Making the Internet observable: the evolution of Cloudflare Radar]]></title>
            <link>https://blog.cloudflare.com/evolution-of-cloudflare-radar/</link>
            <pubDate>Mon, 27 Oct 2025 12:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare Radar has evolved significantly since its 2020 launch, offering deeper insights into Internet security, routing, and traffic with new tools and data that help anyone understand what's happening online. ]]></description>
            <content:encoded><![CDATA[ <p>The Internet is constantly changing in ways that are difficult to see. How do we measure its health, spot new threats, and track the adoption of new technologies? When we <a href="https://blog.cloudflare.com/introducing-cloudflare-radar/"><u>launched Cloudflare Radar in 2020</u></a>, our goal was to illuminate the Internet's patterns, helping anyone understand what was happening from a security, performance, and usage perspective, based on aggregated data from Cloudflare services. From the start, Internet measurement, transparency, and resilience has been at the core of our mission.</p><p>The launch blog post noted, “<i>There are three key components that we’re launching today: Radar </i><a href="https://blog.cloudflare.com/introducing-cloudflare-radar/#radar-internet-insights"><i><u>Internet Insights</u></i></a><i>, Radar </i><a href="https://blog.cloudflare.com/introducing-cloudflare-radar/#radar-domain-insights"><i><u>Domain Insights</u></i></a><i> and Radar </i><a href="https://blog.cloudflare.com/introducing-cloudflare-radar/#radar-ip-insights"><i><u>IP Insights</u></i></a><i>.</i>” These components have remained at the core of Radar, and they have been continuously expanded and complemented by other data sets and capabilities to support that mission. By shining a brighter light on Internet security, routing, traffic disruptions, protocol adoption, DNS, and now AI, Cloudflare Radar has become an increasingly comprehensive source of information and insights. And despite our expanding scope, we’ve focused on maintaining Radar’s “easy access” by evolving our information architecture, making our search capabilities more powerful, and building everything on top of a powerful, publicly-accessible API.</p><p>Now more than ever, Internet observability matters. New protocols and use cases compete with new security threats. Connectivity is threatened not only by errant construction equipment, but also by governments practicing targeted content blocking. Cloudflare Radar is uniquely positioned to provide actionable visibility into these trends, threats, and events with local, network, and global level insights, spanning multiple data sets. Below, we explore some highlights of Radar’s evolution over the five years since its launch, looking at how Cloudflare Radar is building one of the industry’s most comprehensive views of what is happening on the Internet.</p>
    <div>
      <h2>Making Internet security more transparent</h2>
      <a href="#making-internet-security-more-transparent">
        
      </a>
    </div>
    <p>The <a href="https://research.cloudflare.com/"><u>Cloudflare Research</u></a> team takes a practical <a href="https://research.cloudflare.com/about/approach/"><u>approach</u></a> to research, tackling projects that have the potential to make a big impact. A number of these projects have been in the security space, and for three of them, we’ve collaborated to bring associated data sets to Radar, highlighting the impact of these projects.</p><p>The 2025 <a href="https://blog.cloudflare.com/new-regional-internet-traffic-and-certificate-transparency-insights-on-radar/#introducing-certificate-transparency-insights-on-radar"><u>launch</u></a> of the <a href="https://radar.cloudflare.com/certificate-transparency"><u>Certificate Transparency (CT) section on Radar</u></a> was the culmination of several months of collaborative work to expand visibility into key metrics for the Certificate Transparency ecosystem, enabling us to deprecate the original <a href="https://blog.cloudflare.com/a-tour-through-merkle-town-cloudflares-ct-ecosystem-dashboard/"><u>Merkle Town CT dashboard</u></a>, which was launched in 2018. Digital certificates are the foundation of trust on the modern Internet, and Certificate Authorities (CAs) serve as trusted gatekeepers, issuing those certificates, with CT logs providing a public, auditable record of every certificate issued, making it possible to detect fraudulent or mis-issued certificates. The information available in the new CT section allows users to explore information about these certificates and CAs, as well as about the CT logs that capture information about every issued certificate.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7peWlbK1j0Da36jqjlD6rV/4fd7ef53247992078bbc89bd34f18fa9/image3.png" />
          </figure><p>In 2024, members of Cloudflare’s Research team collaborated with outside researchers to publish a paper titled “<a href="https://research.cloudflare.com/publications/SundaraRaman2023/"><u>Global, Passive Detection of Connection Tampering</u></a>”. Among the findings presented in the paper, it noted that globally, about 20% of all connections to Cloudflare close unexpectedly before any useful data exchange occurs. This unexpected closure is consistent with connection tampering by a third party, which may occur, for instance, when repressive governments seek to block access to websites or applications. Working with the Research team, we added visibility into <a href="https://blog.cloudflare.com/tcp-resets-timeouts/"><u>TCP resets and timeouts</u></a> to the <a href="https://radar.cloudflare.com/security/network-layer#tcp-resets-and-timeouts"><u>Network Layer Security page</u></a> on Radar. This graph, such as the example below for Turkmenistan, provides a perspective on potential connection tampering activity globally, and at a country level. Changes and trends visible in this graph can be used to corroborate reports of content blocking and other local restrictions on Internet connectivity.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/lxyCbxlW0mUHP9cU0n3Dp/a27081a3926ac4b0917fef1870197fce/image6.png" />
          </figure><p>The research team has been working on post-quantum encryption <a href="https://blog.cloudflare.com/tag/post-quantum/page/2/"><u>since 2017</u></a>, racing improvements in quantum computing to help ensure that today’s encrypted data and communications are resistant to being decrypted in the future. They have led the drive to incorporate post-quantum encryption across Cloudflare’s infrastructure and services, and in 2023 <a href="https://blog.cloudflare.com/post-quantum-crypto-should-be-free/"><u>we announced that it would be included in our delivery services</u></a>, available to everyone and free of charge, forever. However, to take full advantage, support is needed on the client side as well, so to track that, we worked together to add a <a href="https://radar.cloudflare.com/adoption-and-usage#post-quantum-encryption-adoption"><u>graph</u></a> to Radar’s <a href="https://radar.cloudflare.com/adoption-and-usage"><u>Adoption &amp; Usage</u></a> page that tracks the post-quantum encrypted share of HTTPS request traffic. <a href="https://radar.cloudflare.com/adoption-and-usage?dateStart=2024-01-01&amp;dateEnd=2024-01-28#post-quantum-encryption-adoption"><u>Starting 2024 at under 3%</u></a>, it has <a href="https://radar.cloudflare.com/adoption-and-usage?dateStart=2025-10-10&amp;dateEnd=2025-10-16#post-quantum-encryption-adoption"><u>grown to just over 47%</u></a>, thanks to major browsers and code libraries <a href="https://developers.cloudflare.com/ssl/post-quantum-cryptography/pqc-support/"><u>activating post-quantum support by default</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3l2ceulOBO9S3Yytv7wIUr/c24b02ee132b7ced328993e2557cf765/image11.png" />
          </figure>
    <div>
      <h2>Measuring AI bot &amp; crawler activity</h2>
      <a href="#measuring-ai-bot-crawler-activity">
        
      </a>
    </div>
    <p>The rapid proliferation and growth of AI platforms since the launch of OpenAI’s ChatGPT in November 2022 has upended multiple industries. This is especially true for content creators. Over the last several decades, they generally allowed their sites to be crawled in exchange for the traffic that the search engines would send back to them — traffic that could be monetized in various ways. However, two developments have changed this dynamic. First, AI platforms began aggressively crawling these sites to vacuum up content to use for training their models (with no compensation to content creators). Second, search engines have evolved into answer engines, drastically reducing the amount of traffic they send back to sites. This has led content owners to demand <a href="https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/"><u>solutions</u></a>.</p><p>Among these solutions is providing customers with increased visibility into how frequently AI crawlers are <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scraping their content</a>, and Radar has built on that to provide aggregated perspectives on this activity. Radar’s <a href="https://radar.cloudflare.com/ai-insights"><u>AI Insights page</u></a> provides graphs based on crawling traffic, including <a href="https://radar.cloudflare.com/ai-insights?industrySet=Finance#http-traffic-by-bot"><u>traffic trends by bot</u></a> and <a href="https://radar.cloudflare.com/ai-insights?industrySet=Finance#crawl-purpose"><u>traffic trends by crawl purpose</u></a>, both of which can be broken out by industry set as well. Customers can compare the traffic trends we show on the dashboard with trends across their industry.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4wNMjFo5eR2gBV78u2ITuD/833d83029224095d22fa0ad96aff9356/image1.png" />
          </figure><p>One key insight is the crawl-to-refer ratio:  a measure of how many HTML pages a crawler consumes in comparison to the number of page visits that they refer back to the crawled site. A view into these ratios by platform, and how they change over time, gives content creators insight into just how significant the reciprocal traffic imbalances are, and the impact of the ongoing transition of search engines into answer engines.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7fGlbFfPnuhaizCNZ5Wlr5/4e75c7fbb317428bffa5ea915d2ca428/image5.png" />
          </figure><p>Over the three decades, the humble <a href="https://www.robotstxt.org/robotstxt.html"><u>robots.txt file</u></a> has served as something of a gatekeeper for websites, letting crawlers know <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">if they are allowed to access content</a> on the site, and if so, which content. Well-behaved crawlers read and parse the file, and adjust their crawling activity accordingly. Based on the robots.txt files found across Radar’s top 10,000 domains, Radar’s AI Insights page shows <a href="https://radar.cloudflare.com/ai-insights#ai-user-agents-found-in-robotstxt"><u>how many of these sites explicitly allow or disallow these AI crawlers to access content</u></a>, and how complete that access/restriction is. With the ability to filter the data by domain category, this graph can provide site owners with visibility into how their peers may be dealing with these AI crawlers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5r1iT7cCKr1OeCx3XlsFVq/78d101224e54673cfd84e513c73f6527/image8.png" />
          </figure>
    <div>
      <h2>Improving Internet resilience with routing visibility</h2>
      <a href="#improving-internet-resilience-with-routing-visibility">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/learning/network-layer/what-is-routing/"><u>Routing</u></a> is the process of selecting a path across one or more networks, and in the context of the Internet, routing selects the paths for <a href="https://www.cloudflare.com/learning/ddos/glossary/internet-protocol/"><u>Internet Protocol (IP)</u></a> packets to travel from their origin to their destination. It is absolutely critical to the functioning of the Internet, but lots of things can go wrong, and when they do, they can take a whole network offline. (And depending on the network, a <a href="https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/"><u>larger blast radius</u></a> of sites, applications, and other service providers may be impacted.</p><p>Routing visibility provides insights into the health of a network, and its relationship to other networks. These insights can help identify or troubleshoot problems when they occur. Among the more significant things that can go wrong are route leaks and origin hijacks. <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/#about-bgp-and-route-leaks"><u>Route leaks</u></a> occur when a routing announcement propagates beyond its intended scope — that is, when the announcement reaches networks that it shouldn’t. An <a href="https://blog.cloudflare.com/bgp-hijack-detection/#what-is-bgp-origin-hijacking"><u>origin hijack</u></a> occurs when an attacker creates fake announcements for a targeted prefix, falsely identifying an <a href="https://developers.cloudflare.com/radar/glossary/#autonomous-systems"><u>autonomous systems (AS)</u></a> under their control as the origin of the prefix — in other words, the attacker claims that their network is responsible for a given set of IP addresses, which would cause traffic to those addresses to be routed to them.</p><p>In 2022 and 2023 respectively, we added <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/"><u>route leak</u></a> and <a href="https://blog.cloudflare.com/bgp-hijack-detection/"><u>origin hijack</u></a> detection to <a href="https://radar.cloudflare.com/routing#routing-anomalies"><u>Radar</u></a>, providing network operators and other interested groups (such as researchers) with information to help identify which networks may be party to such events, whether as a leaker/hijacker, or a victim. And perhaps more importantly, in 2023 we also <a href="https://blog.cloudflare.com/traffic-anomalies-notifications-radar/#notifications-overview"><u>launched notifications</u></a> for route leaks and origin hijacks, automatically notifying subscribers via email or webhook when such an event is detected, enabling them to take immediate action.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/31Q0SVrOitlfKiw4jkT0Po/298ad31e36807c3ebc89aa1adfb149f8/image7.png" />
          </figure><p>In 2025, we further improved this visibility by adding two additional capabilities. The first was <a href="https://blog.cloudflare.com/bringing-connections-into-view-real-time-bgp-route-visibility-on-cloudflare/"><u>real-time BGP route visibility</u></a>, which illustrates how a given network prefix is connected to other networks — what is the route that packets take to get from that set of IP addresses to the large “tier 1” network providers? Network administrators can use this information when facing network outages, implementing new deployments, or investigating route leaks.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/69wWKFXVxN91YMHydKj8P9/858e3d8d0b90fbf737bb3b0b195b4885/image4.png" />
          </figure><p>An <a href="https://www.apnic.net/manage-ip/using-whois/guide/as-set/"><u>AS-SET</u></a> is a grouping of related networks, historically used for multiple purposes such as grouping together a list of downstream customers of a particular network provider. Our recently announced <a href="https://blog.cloudflare.com/monitoring-as-sets-and-why-they-matter/"><u>AS-SET monitoring</u></a> enables network operators to monitor valid and invalid AS-SET memberships for their networks, which can help prevent misuse and issues like route leaks.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5sydRT2tCT7VDJ84S87z7t/1386688e7fb96b477dcf56fbaae090ca/image10.png" />
          </figure>
    <div>
      <h2>Not just pretty pictures</h2>
      <a href="#not-just-pretty-pictures">
        
      </a>
    </div>
    <p>While Radar has been historically focused on providing clear, informative visualizations, we have also launched capabilities that enable users to get at the underlying data more directly, enabling them to use it in a more programmatic fashion. The most important one is the <a href="https://developers.cloudflare.com/api/resources/radar/"><u>Radar API</u></a>, <a href="https://blog.cloudflare.com/radar2/#sharing-insights"><u>launched in 2022</u></a>. Requiring just an access token, users can get access to all the data shown on Radar, as well as some more advanced filters that provide more specific data, enabling them to incorporate Radar data into their own tools, websites, and applications. The example below shows a simple API call that returns the global distribution of human and bot traffic observed over the last seven days.</p>
            <pre><code>curl -X 'GET' \
'https://api.cloudflare.com/client/v4/radar/http/summary/bot_class?name=main&amp;dateRange=1d' \
-H 'accept: application/json' \
-H 'Authorization: Bearer $TOKEN'</code></pre>
            
            <pre><code>{
  "success": true,
  "errors": [],
  "result": {
    "main": {
      "human": "72.520636",
      "bot": "27.479364"
    },
    "meta": {
      "dateRange": [
        {
          "startTime": "2025-10-19T19:00:00Z",
          "endTime": "2025-10-20T19:00:00Z"
        }
      ],
      "confidenceInfo": {
        "level": null,
        "annotations": []
      },
      "normalization": "PERCENTAGE",
      "lastUpdated": "2025-10-20T19:45:00Z",
      "units": [
        {
          "name": "*",
          "value": "requests"
        }
      ]
    }
  }
}</code></pre>
            <p>The <a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/"><u>Model Context Protocol</u></a> is a standard way to make information available to <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>large language models (LLMs)</u></a>. Somewhat similar to the way an application programming interface (API) works, MCP offers a documented, standardized way for a computer program to integrate services from an external source. It essentially allows <a href="https://www.cloudflare.com/learning/ai/what-is-artificial-intelligence/"><u>AI</u></a> programs to exceed their training, enabling them to incorporate new sources of information into their decision-making and content generation, and helps them connect to external tools. The <a href="https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/radar#cloudflare-radar-mcp-server-"><u>Radar MCP server</u></a> allows MCP clients to gain access to Radar data and tools, enabling exploration using natural language queries.</p><p>Radar’s <a href="https://radar.cloudflare.com/scan"><u>URL Scanner</u></a> has proven to be one of its most popular tools, <a href="https://blog.cloudflare.com/building-urlscanner/"><u>scanning millions of sites</u></a> since launching in 2023. It allows users to safely determine whether a site may contain malicious content, as well as providing information on technologies used and insights into the site’s headers, cookies, and links. In addition to being available on Radar, it is also accessible through the API and MCP server.</p><p>Finally, Radar’s user interface has seen a number of improvements over the last several years, in service of improved usability and a better user experience. As new data sets and capabilities are launched, they are added to the search bar, allowing users to search not only for countries and ASNs, but also IP address prefixes, certificate authorities, bot names, IP addresses, and more. Initially launching with just a few default date ranges (such as last 24 hours, last 7 days, etc.), we’ve expanded the number of default options, as well as enabling the user to select custom date ranges of up to one year in length. And because the Internet is global, Radar should be too. In 2024, we <a href="https://blog.cloudflare.com/cloudflare-radar-localization-journey/"><u>launched internationalized versions of Radar</u></a>, marking availability of the site in 14 languages/dialects, including downloaded and embedded content.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5zVcB5Wy98ekCJY0wAsx8e/77857f7fe3519a508c3db50a19432e08/image9.png" />
          </figure><p>This is a sampling of the updates and enhancements that we have made to Radar over the last five years in support of Internet measurement, transparency, and resilience. These individual data sets and tools combine to provide one of the most comprehensive views of the Internet available. And we’re not close to being done. We’ll continue to bring additional visibility to the unseen ways that the Internet is changing by adding more tools, data sets, and visualizations, to help users answer more questions in areas including AI, performance, adoption and usage, and security.</p><p>Visit <a href="http://radar.cloudflare.com"><u>radar.cloudflare.com</u></a> to explore all the great data sets, capabilities, and tools for yourself, and to use the Radar <a href="https://developers.cloudflare.com/api/resources/radar/"><u>API</u></a> or <a href="https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/radar#cloudflare-radar-mcp-server-"><u>MCP server</u></a> to incorporate Radar data into your own tools, sites, and applications. Keep an eye on the <a href="https://developers.cloudflare.com/changelog/?product=radar"><u>Radar changelog feed</u></a>, <a href="https://developers.cloudflare.com/radar/release-notes/"><u>Radar release notes</u></a>, and the <a href="https://blog.cloudflare.com/tag/cloudflare-radar/"><u>Cloudflare blog</u></a> for news about the latest changes and launches, and don’t hesitate to <a><u>reach out to us</u></a> with feedback, suggestions, and feature requests.</p> ]]></content:encoded>
            <category><![CDATA[Research]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Routing]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">4hyomcz7ZJG76L799PaqhJ</guid>
            <dc:creator>David Belson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Explore your Cloudflare data with Python notebooks, powered by marimo]]></title>
            <link>https://blog.cloudflare.com/marimo-cloudflare-notebooks/</link>
            <pubDate>Wed, 16 Jul 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ We’ve partnered with marimo to bring their best-in-class Python notebook experience to your Cloudflare data. ]]></description>
            <content:encoded><![CDATA[ <p>Many developers, data scientists, and researchers do much of their work in Python notebooks: they’ve been the de facto standard for data science and sharing for well over a decade. Notebooks are popular because they make it easy to code, explore data, prototype ideas, and share results. We use them heavily at Cloudflare, and we’re seeing more and more developers use notebooks to work with data – from analyzing trends in HTTP traffic, querying <a href="https://developers.cloudflare.com/analytics/analytics-engine/"><u>Workers Analytics Engine</u></a> through to querying their own <a href="https://blog.cloudflare.com/r2-data-catalog-public-beta/"><u>Iceberg tables stored in R2</u></a>.</p><p>Traditional notebooks are incredibly powerful — but they were not built with collaboration, reproducibility, or deployment as data apps in mind. As usage grows across teams and workflows, these limitations face the reality of work at scale.</p><p><a href="https://marimo.io/"><b><u>marimo</u></b></a> reimagines the notebook experience with these <a href="https://marimo.io/blog/lessons-learned"><u>challenges in mind</u></a>. It’s an <a href="https://github.com/marimo-team/marimo"><u>open-source</u></a> reactive Python notebook that’s built to be reproducible, easy to track in Git, executable as a standalone script, and deployable. We have partnered with the marimo team to bring this streamlined, production-friendly experience to Cloudflare developers. Spend less time wrestling with tools and more time exploring your data.</p><p>Today, we’re excited to announce three things:</p><ul><li><p><a href="https://notebooks.cloudflare.com/html-wasm/_start"><u>Cloudflare auth built into marimo notebooks</u></a> – Sign in with your Cloudflare account directly from a notebook and use Cloudflare APIs without needing to create API tokens</p></li><li><p><a href="https://github.com/cloudflare/notebook-examples"><u>Open-source notebook examples</u></a> – Explore your Cloudflare data with ready-to-run notebook examples for services like <a href="https://developers.cloudflare.com/r2/"><u>R2</u></a>, <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a>, <a href="https://developers.cloudflare.com/d1/"><u>D1</u></a>, and more</p></li><li><p><a href="https://github.com/cloudflare/containers-demos"><u>Run marimo on Cloudflare Containers</u></a> – Easily deploy marimo notebooks to Cloudflare Containers for scalable, long-running data workflows</p></li></ul><p>Want to start exploring your Cloudflare data with marimo right now? Head over to <a href="http://notebooks.cloudflare.com"><u>notebooks.cloudflare.com</u></a>. Or, keep reading to learn more about marimo, how we’ve made authentication easy from within notebooks, and how you can use marimo to explore and share notebooks and apps on Cloudflare.</p>
    <div>
      <h3>Why marimo?</h3>
      <a href="#why-marimo">
        
      </a>
    </div>
    <p>marimo is an <a href="https://docs.marimo.io/"><u>open-source</u></a> reactive Python notebook designed specifically for working with data, built from the ground up to solve many problems with traditional notebooks.</p><p>The core feature that sets marimo apart from traditional notebooks is its <a href="https://marimo.io/blog/lessons-learned"><u>reactive execution model</u></a>, powered by a statically inferred dataflow graph on cells. Run a cell or interact with a <a href="https://docs.marimo.io/guides/interactivity/"><u>UI element</u></a>, and marimo either runs dependent cells or marks them as stale (your choice). This keeps code and outputs consistent, prevents bugs before they happen, and dramatically increases the speed at which you can experiment with data. </p><p>Thanks to reactive execution, notebooks are also deployable as data applications, making them easy to share. While you can run marimo notebooks locally, on cloud servers, GPUs — anywhere you can traditionally run software — you can also run them entirely in the browser <a href="https://docs.marimo.io/guides/wasm/"><u>with WebAssembly</u></a>, bringing the cost of sharing down to zero.</p><p>Because marimo notebooks are stored as Python, they <a href="https://marimo.io/blog/python-not-json"><u>enjoy all the benefits of software</u></a>: version with Git, execute as a script or pipeline, test with pytest, inline package requirements with uv, and import symbols from your notebook into other Python modules. Though stored as Python, marimo also <a href="https://docs.marimo.io/guides/working_with_data/sql/"><u>supports SQL</u></a> and data sources like DuckDB, Postgres, and Iceberg-based data catalogs (which marimo's <a href="https://docs.marimo.io/guides/generate_with_ai/"><u>AI assistant</u></a> can access, in addition to data in RAM).</p><p>To get an idea of what a marimo notebook is like, check out the embedded example notebook below:</p><div>
   <div>
       
   </div>
</div>
<p></p>
    <div>
      <h3>Exploring your Cloudflare data with marimo</h3>
      <a href="#exploring-your-cloudflare-data-with-marimo">
        
      </a>
    </div>
    <p>Ready to explore your own Cloudflare data in a marimo notebook? The easiest way to begin is to visit <a href="http://notebooks.cloudflare.com"><u>notebooks.cloudflare.com</u></a> and run one of our example notebooks directly in your browser via <a href="https://webassembly.org/"><u>WebAssembly (Wasm)</u></a>. You can also browse the source in our <a href="https://github.com/cloudflare/notebook-examples"><u>notebook examples GitHub repo</u></a>.</p><p>Want to create your own notebook to run locally instead? Here’s a quick example that shows you how to authenticate with your Cloudflare account and list the zones you have access to:</p><ol><li><p>Install <a href="https://docs.astral.sh/uv/"><u>uv</u></a> if you haven’t already by following the <a href="https://docs.astral.sh/uv/getting-started/installation/"><u>installation guide</u></a>.</p></li><li><p>Create a new project directory for your notebook:</p></li></ol>
            <pre><code>mkdir cloudflare-zones-notebook
cd cloudflare-zones-notebook</code></pre>
            <p>3. Initialize a new uv project (this creates a <code>.venv</code> and a <code>pyproject.toml</code>):</p>
            <pre><code>uv init</code></pre>
            <p>4. Add marimo and required dependencies:</p>
            <pre><code>uv add marimo</code></pre>
            <p>5. Create a file called <code>list-zones.py</code> and paste in the following notebook:</p>
            <pre><code>import marimo

__generated_with = "0.14.10"
app = marimo.App(width="full", auto_download=["ipynb", "html"])


@app.cell
def _():
    from moutils.oauth import PKCEFlow
    import requests

    # Start OAuth PKCE flow to authenticate with Cloudflare
    auth = PKCEFlow(provider="cloudflare")

    # Renders login UI in notebook
    auth
    return (auth,)


@app.cell
def _(auth):
    import marimo as mo
    from cloudflare import Cloudflare

    mo.stop(not auth.access_token, mo.md("Please **sign in** using the button above."))
    client = Cloudflare(api_token=auth.access_token)

    zones = client.zones.list()
    [zone.name for zone in zones.result]
    return


if __name__ == "__main__":
    app.run()</code></pre>
            <p>6. Open the notebook editor:</p>
            <pre><code>uv run marimo edit list-zones.py --sandbox</code></pre>
            <p>7. Log in via the OAuth prompt in the notebook. Once authenticated, you’ll see a list of your Cloudflare zones in the final cell.</p><p>That’s it! From here, you can expand the notebook to call <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a> models, query Iceberg tables in <a href="https://developers.cloudflare.com/r2/data-catalog/"><u>R2 Data Catalog</u></a>, or interact with any Cloudflare API.</p>
    <div>
      <h3>How OAuth works in notebooks</h3>
      <a href="#how-oauth-works-in-notebooks">
        
      </a>
    </div>
    <p>Think of OAuth like a secure handshake between your notebook and Cloudflare. Instead of copying and pasting API tokens, you just click “Sign in with Cloudflare” and the notebook handles the rest.</p><p>We built this experience using PKCE (Proof Key for Code Exchange), a secure OAuth 2.0 flow that avoids client secrets and protects against code interception attacks. PKCE works by generating a one-time code that’s exchanged for a token after login, without ever sharing a client secret. <a href="https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow-with-pkce"><u>Learn more about how PKCE works</u></a>.</p><p>The login widget lives in <a href="https://github.com/marimo-team/moutils/blob/main/notebooks/pkceflow_login.py"><u>moutils.oauth</u></a>, a collaboration between Cloudflare and marimo to make OAuth authentication simple and secure in notebooks. To use it, just create a cell like this:</p>
            <pre><code>auth = PKCEFlow(provider="cloudflare")

# Renders login UI in notebook
auth</code></pre>
            <p>When you run the cell, you’ll see a Sign in with Cloudflare button:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2r3Dmuwcm4AZrhV39Gkhyl/c3f98a3780bc29f1c01ea945621fc005/image2.png" />
          </figure><p>Once logged in, you’ll have a read-only access token you can pass when using the Cloudflare API.</p>
    <div>
      <h3>Running marimo on Cloudflare: Workers and Containers</h3>
      <a href="#running-marimo-on-cloudflare-workers-and-containers">
        
      </a>
    </div>
    <p>In addition to running marimo notebooks locally, you can use Cloudflare to share and run them via <a href="https://developers.cloudflare.com/workers/static-assets/"><u>Workers Static Assets</u></a> or <a href="https://developers.cloudflare.com/containers/"><u>Cloudflare Containers</u></a>.</p><p>If you have a local notebook you want to share, you can publish it to Workers. This works because marimo can export notebooks to WebAssembly, allowing them to run entirely in the browser. You can get started with just two commands:</p>
            <pre><code>marimo export html-wasm notebook.py -o output_dir --mode edit --include-cloudflare
npx wrangler deploy
</code></pre>
            <p>If your notebook needs authentication, you can layer in <a href="https://developers.cloudflare.com/cloudflare-one/policies/access/"><u>Cloudflare Access</u></a> for secure, authenticated access.</p><p>For notebooks that require more compute, persistent sessions, or long-running tasks, you can deploy marimo on our <a href="https://blog.cloudflare.com/containers-are-available-in-public-beta-for-simple-global-and-programmable/"><u>new container platform</u></a>. To get started, check out our <a href="https://github.com/cloudflare/containers-demos/tree/main/marimo"><u>marimo container example</u></a> on GitHub.</p>
    <div>
      <h3>What’s next for Cloudflare + marimo</h3>
      <a href="#whats-next-for-cloudflare-marimo">
        
      </a>
    </div>
    <p>This blog post marks just the beginning of Cloudflare's partnership with marimo. While we're excited to see how you use our joint WebAssembly-based notebook platform to explore your Cloudflare data, we also want to help you bring serious compute to bear on your data — to empower you to run large scale analyses and batch jobs straight from marimo notebooks. Stay tuned!</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[R2]]></category>
            <category><![CDATA[Data Catalog]]></category>
            <category><![CDATA[Notebooks]]></category>
            <guid isPermaLink="false">1oYZ3vFOAUy5PhZyKNm286</guid>
            <dc:creator>Carlos Rodrigues</dc:creator>
            <dc:creator>Jorge Pacheco</dc:creator>
            <dc:creator>Keith Adler</dc:creator>
            <dc:creator>Akshay Agrawal (Guest Author)</dc:creator>
            <dc:creator>Myles Scolnick (Guest Author)</dc:creator>
        </item>
        <item>
            <title><![CDATA[Your IPs, your rules: enabling more efficient address space usage]]></title>
            <link>https://blog.cloudflare.com/your-ips-your-rules-enabling-more-efficient-address-space-usage/</link>
            <pubDate>Mon, 19 May 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ IPv4 is expensive, and moving network resources around is hard. Previously, when customers wanted to use multiple Cloudflare services, they had to bring a new address range. ]]></description>
            <content:encoded><![CDATA[ <p>IPv4 addresses have become a costly commodity, driven by their growing scarcity. With the original pool of 4.3 billion addresses long exhausted, organizations must now rely on the secondary market to acquire them. Over the years, prices have surged, often exceeding $30–$50 USD per address, with <a href="https://auctions.ipv4.global/?cf_history_state=%7B%22guid%22%3A%22C255D9FF78CD46CDA4F76812EA68C350%22%2C%22historyId%22%3A6%2C%22targetId%22%3A%22B695D806845101070936062659E97ADD%22%7D"><u>costs</u></a> varying based on block size and demand. Given the scarcity, these prices are only going to rise, particularly for businesses that haven’t transitioned to <a href="https://blog.cloudflare.com/amazon-2bn-ipv4-tax-how-avoid-paying/"><u>IPv6</u></a>. This rising cost and limited availability have made efficient IP address management more critical than ever. In response, we’ve evolved how we handle BYOIP (<a href="https://blog.cloudflare.com/bringing-your-own-ips-to-cloudflare-byoip/"><u>Bring Your Own IP</u></a>) prefixes to give customers greater flexibility.</p><p>Historically, when customers onboarded a BYOIP prefix, they were required to assign it to a single service, binding all IP addresses within that prefix to one service before it was advertised. Once set, the prefix's destination was fixed — to direct traffic exclusively to that service. If a customer wanted to use a different service, they had to onboard a new prefix or go through the cumbersome process of offboarding and re-onboarding the existing one.</p><p>As a step towards addressing this limitation, we’ve introduced a new level of flexibility: customers can now use parts of any prefix — whether it’s bound to Cloudflare CDN, Spectrum, or Magic Transit — for additional use with CDN or Spectrum. This enhancement provides much-needed flexibility, enabling businesses to optimize their IP address usage while keeping costs under control. </p>
    <div>
      <h2>The challenges of moving onboarded BYOIP prefixes between services</h2>
      <a href="#the-challenges-of-moving-onboarded-byoip-prefixes-between-services">
        
      </a>
    </div>
    <p>Migrating BYOIP prefixes dynamically between Cloudflare services is no trivial task, especially with thousands of servers capable of accepting and processing connections. The problem required overcoming several technical challenges related to IP address management, kernel-level bindings, and orchestration. </p>
    <div>
      <h3>Dynamic reallocation of prefixes across services</h3>
      <a href="#dynamic-reallocation-of-prefixes-across-services">
        
      </a>
    </div>
    <p>When configuring an IP prefix for a service, we need to update IP address lists and firewall rules on each of our servers to allow only the traffic we expect for that service, such as opening ports 80 and 443 to allow HTTP and HTTPS traffic for the Cloudflare CDN. We use Linux <a href="https://en.wikipedia.org/wiki/Iptables#:~:text=iptables%20is%20a%20user%2Dspace,to%20treat%20network%20traffic%20packets."><u>iptables</u></a> and <a href="https://en.wikipedia.org/wiki/Iptables"><u>IP sets</u></a> for this.</p><p>Migrating IP prefixes to a different service involves dynamically reassigning them to different IP sets and iptable rules. This requires automated updates across a large-scale distributed environment.</p><p>As prefixes shift between services, it is critical that servers update their IP sets and iptable rules dynamically to ensure traffic is correctly routed. Failure to do so could lead to routing loops or dropped connections.  </p>
    <div>
      <h3>Updating Tubular – an eBPF-based IP and port binding service</h3>
      <a href="#updating-tubular-an-ebpf-based-ip-and-port-binding-service">
        
      </a>
    </div>
    <p>Most web applications bind to a list of IP addresses at startup, and listen on only those IPs until shutdown. To allow customers to change the IPs bound to each service dynamically, we needed a way to add and remove IPs from a running service, without restarting it. <a href="https://blog.cloudflare.com/tubular-fixing-the-socket-api-with-ebpf/"><u>Tubular</u></a> is a <a href="https://blog.cloudflare.com/cloudflare-architecture-and-how-bpf-eats-the-world/"><u>BPF</u></a> program we wrote that runs on Cloudflare servers that allows services to listen on a single socket, dynamically updating the list of addresses that are routed to that socket over the lifetime of the service, without requiring it to restart when those addresses change.</p><p>A significant engineering challenge was extending Tubular to support traffic destined for Cloudflare’s CDN.  Without this enhancement, customers would be unable to leverage dynamic reassignment to bind prefixes onboarded through Spectrum to the Cloudflare CDN, limiting flexibility across services.</p><p>Cloudflare’s CDN depends on each server running an NGINX ingress proxy to terminate incoming connections. Due to the <a href="https://blog.cloudflare.com/how-we-built-pingora-the-proxy-that-connects-cloudflare-to-the-internet/"><u>scale and performance limitations of NGINX</u></a>, we are actively working to replace it by 2026. In the interim, however, we still depend on the current ingress proxy to reliably handle incoming connections.</p><p>One limitation is that this ingress proxy does not support <a href="https://systemd.io/"><u>systemd</u></a> socket activation, a mechanism Tubular relies on to integrate with other Cloudflare services on each server. For services that do support systemd socket activation, systemd independently starts the sockets for the owning service and passes them to Tubular, allowing Tubular to easily detect and route traffic to the correct terminating service.</p><p>Since this integration model is not feasible, an alternative solution was required. This was addressed by introducing a shared Unix domain socket between Tubular and the ingress proxy service on each server. Through this channel,  the ingress proxy service explicitly transmits socket information to Tubular, enabling it to correctly register the sockets in its datapath.</p><p>The final challenge was deploying the Tubular-ingress proxy integration across the fleet of servers without disrupting active connections. As of April 2025, Cloudflare handles an average of 71 million HTTP requests per second, peaking at 100 million. To safely deploy at this scale, the necessary Tubular and ingress proxy configuration changes were staged across all Cloudflare servers without disrupting existing connections. The final step involved adding bindings — IP addresses and ports corresponding to Cloudflare CDN prefixes — to the Tubular configuration. These bindings direct connections through Tubular via the Unix sockets registered during the previous integration step. To minimize risk, bindings were gradually enabled in a controlled rollout across the global fleet.</p>
    <div>
      <h4>Tubular data plane in action</h4>
      <a href="#tubular-data-plane-in-action">
        
      </a>
    </div>
    <p>This high-level representation of the Tubular data plane binds together the Layer 4 protocol (TCP), prefix (192.0.2.0/24 - which is 254 usable IP addresses), and port number 0 (any port). When incoming packets match this combination, they are directed to the correct socket of the service — in this case, Spectrum.​</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5yQpYeTxPM7B8DZwLsQATs/3f488c5b37ef2358eacf779a42ac59d5/image4.png" />
          </figure><p>In the following example, TCP 192.0.2.200/32 has been upgraded to the Cloudflare CDN via the edge <a href="https://developers.cloudflare.com/api/resources/addressing/subresources/prefixes/subresources/service_bindings/"><u>Service Bindings API</u></a>. Tubular dynamically consumes this information, adding a new entry to its data plane bindings and socket table. Using Longest Prefix Match, all packets within the 192.0.2.0/24 range port 0 will be routed to Spectrum, except for 192.0.2.200/32 port 443, which will be directed to the Cloudflare CDN.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6wWlR9gWb6JEoyZm4iOpgQ/4a59bcab4a6731a53ea235500596c7f5/image1.png" />
          </figure>
    <div>
      <h4>Coordination and orchestration at scale </h4>
      <a href="#coordination-and-orchestration-at-scale">
        
      </a>
    </div>
    <p>Our goal is to achieve a quick transition of IP address prefixes between services when initiated by customers, which requires a high level of coordination. We need to ensure that changes propagate correctly across all servers to maintain stability. Currently, when a customer migrates a prefix between services, there is a 4-6 hour window of uncertainty where incoming packets may be dropped due to a lack of guaranteed routing. To address this, we are actively implementing systems that will reduce this transition time from hours to just a matter of minutes, significantly improving reliability and minimizing disruptions.</p>
    <div>
      <h2>Smarter IP address management</h2>
      <a href="#smarter-ip-address-management">
        
      </a>
    </div>
    <p>Service Bindings are mappings that control whether traffic destined for a given IP address is routed to Magic Transit, the CDN pipeline, or the Spectrum pipeline.</p><p>Consider the example in the diagram below. One of our customers, a global finance infrastructure platform, is using BYOIP and has a /24 range bound to <a href="https://developers.cloudflare.com/spectrum/"><u>Spectrum</u></a> for DDoS protection of their TCP and UDP traffic. However, they are only using a few addresses in that range for their Spectrum applications, while the rest go unused. In addition, the customer is using Cloudflare’s CDN for their Layer 7 traffic and wants to set up <a href="https://developers.cloudflare.com/byoip/concepts/static-ips/"><u>Static IPs</u></a>, so that their customers can allowlist a consistent set of IP addresses owned and controlled by their own network infrastructure team. Instead of using up another block of address space, they asked us whether they could carve out those unused sub-ranges of the /24 prefix.</p><p>From there, we set out to determine how to selectively map sub-ranges of the onboarded prefix to different services using service bindings:</p><ul><li><p>192.0.2.0/24 is already bound to <b>Spectrum</b></p><ul><li><p>192.0.2.0/25 is updated and bound to <b>CDN</b></p></li><li><p>192.0.2.200/32 is also updated bound to <b>CDN</b></p></li></ul></li></ul><p>Both the /25 and /32 are sub-ranges within the /24 prefix and will receive traffic directed to the CDN. All remaining IP addresses within the /24 prefix, unless explicitly bound, will continue to use the default Spectrum service binding.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/uwhMHBEuI1NHfp9qD9IFM/d2dcea59a8d9f962f03389831fd73851/image3.png" />
          </figure><p>As you can see in this example, this approach provides customers with greater control and agility over how their IP address space is allocated. Instead of rigidly assigning an entire prefix to a single service, users can now tailor their IP address usage to match specific workloads or deployment needs. Setting this up is straightforward — all it takes is a few HTTP requests to the <a href="https://developers.cloudflare.com/api/resources/addressing/subresources/prefixes/subresources/service_bindings/"><u>Cloudflare API</u></a>. You can define service bindings by specifying which IP addresses or subnets should be routed to CDN, Spectrum, or Magic Transit. This allows you to tailor traffic routing to match your architecture without needing to restructure your entire IP address allocation. The process remains consistent whether you're configuring a single IP address or splitting up larger subnets, making it easy to apply across different parts of your network. The foundational technical work addressing the underlying architectural challenges outlined above made it possible to streamline what could have been a complex setup into a straightforward series of API interactions.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>We envision a future where customers have granular control over how their traffic moves through Cloudflare’s global network, not just by service, but down to the port level. A single prefix could simultaneously power web applications on CDN, protect infrastructure through Magic Transit, and much more. This isn't just flexible routing, but programmable traffic orchestration across different services. What was once rigid and static becomes dynamic and fully programmable to meet each customer’s unique needs. </p><p>If you are an existing BYOIP customer using Magic Transit, CDN, or Spectrum, check out our <a href="https://developers.cloudflare.com/byoip/service-bindings/magic-transit-with-cdn/"><u>configuration guide here</u></a>. If you are interested in bringing your own IP address space and using multiple Cloudflare services on it, please reach out to your account team to enable setting up this configuration via <a href="https://developers.cloudflare.com/api/resources/addressing/subresources/prefixes/subresources/service_bindings/"><u>API</u></a> or reach out to sales@cloudflare.com if you’re new to Cloudflare.</p> ]]></content:encoded>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Addressing]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[BYOIP]]></category>
            <category><![CDATA[Spectrum]]></category>
            <category><![CDATA[CDN]]></category>
            <category><![CDATA[Magic Transit]]></category>
            <guid isPermaLink="false">7FAYMppkyZG4CEGdLEcLlR</guid>
            <dc:creator>Mark Rodgers</dc:creator>
            <dc:creator>Sphoorti Metri</dc:creator>
            <dc:creator>Ash Pallarito</dc:creator>
        </item>
        <item>
            <title><![CDATA[HTTPS-only for Cloudflare APIs: shutting the door on cleartext traffic]]></title>
            <link>https://blog.cloudflare.com/https-only-for-cloudflare-apis-shutting-the-door-on-cleartext-traffic/</link>
            <pubDate>Thu, 20 Mar 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ We are closing the cleartext HTTP ports entirely for Cloudflare API traffic. This prevents the risk of clients unintentionally leaking their secret API keys in cleartext during the initial request.  ]]></description>
            <content:encoded><![CDATA[ <p>Connections made over cleartext HTTP ports risk exposing sensitive information because the data is transmitted unencrypted and can be intercepted by network intermediaries, such as ISPs, Wi-Fi hotspot providers, or malicious actors on the same network. It’s common for servers to either <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections"><u>redirect</u></a> or return a <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403"><u>403 (Forbidden)</u></a> response to close the HTTP connection and enforce the use of HTTPS by clients. However, by the time this occurs, it may be too late, because sensitive information, such as an API token, may have already been <a href="https://jviide.iki.fi/http-redirects"><u>transmitted in cleartext</u></a> in the initial client request. This data is exposed before the server has a chance to redirect the client or reject the connection.</p><p>A better approach is to refuse the underlying cleartext connection by closing the <a href="https://developers.cloudflare.com/fundamentals/reference/network-ports/"><u>network ports</u></a> used for plaintext HTTP, and that’s exactly what we’re going to do for our customers.</p><p><b>Today we’re announcing that we’re closing all of the </b><a href="https://developers.cloudflare.com/fundamentals/reference/network-ports/#network-ports-compatible-with-cloudflares-proxy:~:text=HTTP%20ports%20supported%20by%20Cloudflare"><b><u>HTTP ports</u></b></a><b> on api.cloudflare.com.</b> We’re also making changes so that api.cloudflare.com can change IP addresses dynamically, in line with on-going efforts to <a href="https://blog.cloudflare.com/addressing-agility/"><u>decouple names from IP addresses</u></a>, and reliably <a href="https://blog.cloudflare.com/topaz-policy-engine-design/"><u>managing</u></a> addresses in our authoritative DNS. This will enhance the agility and flexibility of our API endpoint management. Customers relying on static IP addresses for our API endpoints will be notified in advance to prevent any potential availability issues.</p><p>In addition to taking this first step to secure Cloudflare API traffic, we’ll release the ability for customers to opt-in to safely disabling all HTTP port traffic for their websites on Cloudflare. We expect to make this free security feature available in the last quarter of 2025.</p><p>We have <a href="https://blog.cloudflare.com/introducing-universal-ssl/"><u>consistently</u></a> <a href="https://blog.cloudflare.com/enforce-web-policy-with-hypertext-strict-transport-security-hsts/"><u>advocated</u></a> for <a href="https://blog.cloudflare.com/post-quantum-for-all/"><u>strong encryption standards</u></a> to safeguard users’ data and privacy online. As part of our ongoing commitment to enhancing Internet security, this blog post details our efforts to <i>enforce</i> HTTPS-only connections across our global network. </p>
    <div>
      <h3>Understanding the problem</h3>
      <a href="#understanding-the-problem">
        
      </a>
    </div>
    <p>We already provide an “<a href="https://developers.cloudflare.com/ssl/edge-certificates/additional-options/always-use-https/"><u>Always Use HTTPS</u></a>” setting that can be used to redirect all visitor traffic on our customers’ domains (and subdomains) from HTTP (plaintext) to HTTPS (encrypted). For instance, when a user clicks on an HTTP version of the URL on the site (http://www.example.com), we issue an HTTP 3XX redirection status code to immediately redirect the request to the corresponding HTTPS version (https://www.example.com) of the page. While this works well for most scenarios, there’s a subtle but important risk factor: What happens if the initial plaintext HTTP request (before the redirection) contains sensitive user information?</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2mXYZL0JRZOb8J6Tqm4mCj/1b24b76335ad9cf3f3b630ef31868f6c/1.png" />
          </figure><p><sup><i>Initial plaintext HTTP request is exposed to the network before the server can redirect to the secure HTTPS connection.</i></sup></p><p>Third parties or intermediaries on shared networks could intercept sensitive data from the first plaintext HTTP request, or even carry out a <a href="https://blog.cloudflare.com/monsters-in-the-middleboxes/"><u>Monster-in-the-Middle (MITM)</u></a> attack by impersonating the web server.</p><p>One may ask if <a href="https://developers.cloudflare.com/ssl/edge-certificates/additional-options/http-strict-transport-security/"><u>HTTP Strict Transport Security (HSTS)</u></a> would partially alleviate this concern by ensuring that, after the first request, visitors can only access the website over HTTPS without needing a redirect. While this does reduce the window of opportunity for an adversary, the first request still remains exposed. Additionally, HSTS is not applicable by default for most non-user-facing use cases, such as API traffic from stateless clients. Many API clients don’t retain browser-like state or remember HSTS headers they've encountered. It is quite <a href="https://jviide.iki.fi/http-redirects"><u>common practice</u></a> for API calls to be redirected from HTTP to HTTPS, and hence have their initial request exposed to the network.</p><p>Therefore, in line with our <a href="https://blog.cloudflare.com/dogfooding-from-home/"><u>culture of dogfooding</u></a>, we evaluated the accessibility of the Cloudflare API (<a href="https://api.cloudflare.com"><u>api.cloudflare.com</u></a>) over <a href="https://developers.cloudflare.com/fundamentals/reference/network-ports/#:~:text=ports%20listed%20below.-,HTTP,-ports%20supported%20by"><u>HTTP ports (80, and others)</u></a>. In that regard, imagine a client making an initial request to our API endpoint that includes their <i>secret API key</i>. While we outright reject all plaintext connections with a 403 Forbidden response instead of redirecting for API traffic — clearly indicating that “<i>Cloudflare API is only accessible over TLS”</i> — this rejection still happens at the <a href="https://www.cloudflare.com/learning/ddos/what-is-layer-7/">application layer</a>. By that point, the API key may have already been exposed over the network before we can even reject the request. We do have a notification mechanism in place to alert customers and rotate their API keys accordingly, but a stronger approach would be to eliminate the exposure entirely. We have an opportunity to improve!</p>
    <div>
      <h3>A better approach to API security</h3>
      <a href="#a-better-approach-to-api-security">
        
      </a>
    </div>
    <p>Any API key or token exposed in plaintext on the public Internet should be considered compromised. We can either address exposure after it occurs or prevent it entirely. The reactive approach involves continuously tracking and revoking compromised credentials, requiring active management to rotate each one. For example, when a plaintext HTTP request is made to our API endpoints, we detect exposed tokens by scanning for 'Authorization' header values.</p><p>In contrast, a preventive approach is stronger and more effective, stopping exposure before it happens. Instead of relying on the API service application to react after receiving potentially sensitive cleartext data, we can preemptively refuse the underlying connection at the <a href="https://www.cloudflare.com/learning/ddos/glossary/open-systems-interconnection-model-osi/"><u>transport layer</u></a>, before any HTTP or application-layer data is exchanged. The <i>preventative </i>approach can be achieved by closing all plaintext HTTP ports for API traffic on our global network. The added benefit is that this is operationally much simpler: by eliminating cleartext traffic, there's no need for key rotation.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1PYo3GMZjQ4LbfUHXNOVpj/2341da1d926e077624563358bd5034ef/2.png" />
          </figure><p><sup><i>The transport layer carries the application layer data on top.</i></sup></p><p>To explain why this works: an application-layer request requires an underlying transport connection, like TCP or QUIC, to be established first. The combination of a port number and an IP address serves as a transport layer identifier for creating the underlying transport channel. Ports direct network traffic to the correct application-layer process — for example, port 80 is designated for plaintext HTTP, while port 443 is used for encrypted HTTPS. By disabling the HTTP cleartext server-side port, we prevent that transport channel from being established during the initial "<a href="https://www.cloudflare.com/learning/ddos/glossary/tcp-ip/"><u>handshake</u></a>" phase of the connection — before any application data, such as a secret API key, leaves the client’s machine.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/13fKw8cHkHlsLOzlXJYenr/c9156f67ae99cfdc74dc5917ebc1e5bb/3.png" />
          </figure><p><sup><i>Both TCP and QUIC transport layer handshakes are a pre-requisite for HTTPS application data exchange on the web.</i></sup></p><p>Therefore, closing the HTTP interface entirely for API traffic gives a strong and visible <b>fast-failure</b> signal to developers that might be mistakenly accessing <code>http://… </code>instead of <code>https://…</code> with their secret API keys in the first request — a simple one-letter omission, but one with serious implications.</p><p>In theory, this is a simple change, but at Cloudflare’s global scale, implementing it required careful planning and execution. We’d like to share the steps we took to make this transition.</p>
    <div>
      <h3>Understanding the scope</h3>
      <a href="#understanding-the-scope">
        
      </a>
    </div>
    <p>In an ideal scenario, we could simply close all cleartext HTTP ports on our network. However, two key challenges prevent this. First, as shown in the <a href="https://radar.cloudflare.com/adoption-and-usage#http-vs-https"><u>Cloudflare Radar</u></a> figure below, about 2-3% of requests from “likely human” clients to our global network are over plaintext HTTP. While modern browsers prominently warn users about insecure HTTP connections and <a href="https://support.mozilla.org/en-US/kb/https-only-prefs"><u>offer features to silently upgrade to HTTPS</u></a>, this protection doesn't extend to the broader ecosystem of connected devices. IoT devices with limited processing power, automated API clients, or legacy software stacks often lack such safeguards entirely. In fact, when filtering on plaintext HTTP traffic that is “likely automated”, the share <a href="https://radar.cloudflare.com/explorer?dataSet=http&amp;groupBy=http_protocol&amp;filters=botClass%253DLikely_Automated"><u>rises to over 16%</u></a>! We continue to see a wide variety of legacy clients accessing resources over plaintext connections. This trend is not confined to specific networks, but is observable globally.</p><p>Closing HTTP ports, like port 80, across our entire IP address space would block such clients entirely, causing a major disruption in services. While we plan to cautiously start by implementing the change on Cloudflare's API IP addresses, it’s not enough. Therefore, our goal is to ensure all of our customers’ API traffic benefits from this change as well.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1OfUjwkP9iMdjymjtJX7tL/4cd278faf71f610c43239cc41d8f6fba/4.png" />
          </figure><p><sup><i>Breakdown of HTTP and HTTPS for ‘human’ connections</i></sup></p><p>The second challenge relates to limitations posed by the longstanding <a href="https://en.wikipedia.org/wiki/Berkeley_sockets"><u>BSD Sockets API</u></a> at the server-side, which we have addressed using <a href="https://blog.cloudflare.com/tubular-fixing-the-socket-api-with-ebpf/"><u>Tubular</u></a>, a tool that inspects every connection terminated by a server and decides which application should receive it. Operators historically have faced a challenging dilemma: either listen to the same ports across many IP addresses using a single socket (scalable but inflexible), or maintain individual sockets for each IP address (flexible but unscalable). Luckily, Tubular has allowed us to <a href="https://blog.cloudflare.com/its-crowded-in-here/"><u>resolve this using 'bindings'</u></a>, which decouples sockets from specific IP:port pairs. This creates efficient pathways for managing endpoints throughout our systems at scale, enabling us to handle both HTTP and HTTPS traffic intelligently without the traditional limitations of socket architecture.</p><p>Step 0, then, is about provisioning both IPv4 and IPv6 address space on our network that by default has all HTTP ports closed. Tubular enables us to configure and manage these IP addresses differently than others for our endpoints. Additionally, <a href="https://blog.cloudflare.com/addressing-agility/"><u>Addressing Agility</u></a> and <a href="https://blog.cloudflare.com/topaz-policy-engine-design/"><u>Topaz</u></a> enable us to assign these addresses dynamically, and safely, for opted-in domains.</p>
    <div>
      <h3>Moving from strategy to execution</h3>
      <a href="#moving-from-strategy-to-execution">
        
      </a>
    </div>
    <p>In the past, our legacy stack would have made this transition challenging, but today’s Cloudflare possesses the appropriate tools to deliver a scalable solution, rather than addressing it on a domain-by-domain basis.</p><p>Using Tubular, we were able to bind our new set of anycast IP prefixes to our TLS-terminating proxies across the globe. To ensure that no plaintext HTTP traffic is served on these IP addresses, we extended our global <a href="https://en.wikipedia.org/wiki/Iptables"><u>iptables</u></a> firewall configuration to reject any inbound packets on HTTP ports.</p>
            <pre><code>iptables -A INPUT -p tcp -d &lt;IP_ADDRESS_BLOCK&gt; --dport &lt;HTTP_PORT&gt; -j REJECT 
--reject-with tcp-reset

iptables -A INPUT -p udp -d &lt;IP_ADDRESS_BLOCK&gt; --dport &lt;HTTP_PORT&gt; -j REJECT 
--reject-with icmp-port-unreachable</code></pre>
            <p>As a result, any connections to these IP addresses on HTTP ports are filtered and rejected at the transport layer, eliminating the need for state management at the application layer by our web proxies.</p><p>The next logical step is to update the <a href="https://www.cloudflare.com/learning/dns/what-is-dns/"><u>DNS assignments</u></a> so that API traffic is routed over the <i>correct</i> IP addresses. In our case, we encoded a new DNS policy for API traffic for the HTTPS-only interface as a declarative <a href="https://blog.cloudflare.com/topaz-policy-engine-design/"><u>Topaz program</u></a> in our authoritative DNS server:</p>
            <pre><code>- name: https_only
 exclusive: true 
 config: |
    (config
      ([traffic_class "API"]
       [ipv4 (ipv4_address “192.0.2.1”)] # Example IPv4 address
       [ipv6 (ipv6_address “2001:DB8::1:1”)] # Example IPv6 address
       [t (ttl 300]))
  match: |
    (= query_domain_class traffic_class)
  response: |
    (response (list ipv4) (list ipv6) t)</code></pre>
            <p>The above policy encodes that for any DNS query targeting the ‘API traffic’ class, we return the respective HTTPS-only interface IP addresses. Topaz’s safety guarantees ensure <i>exclusivity</i>, preventing other DNS policies from inadvertently matching the same queries and misrouting plaintext HTTP expected domains to HTTPS-only IPs

api.cloudflare.com is the first domain to be added to our HTTPS-only API traffic class, with other applicable endpoints to follow.</p>
    <div>
      <h3>Opting-in your API endpoints</h3>
      <a href="#opting-in-your-api-endpoints">
        
      </a>
    </div>
    <p>As we said above, we've started with api.cloudflare.com and our internal API endpoints to thoroughly monitor any side effects on our own systems before extending this feature to customer domains. We have deployed these changes gradually across all data centers, leveraging Topaz’s flexibility to target subsets of traffic, minimizing disruptions, and ensuring a smooth transition.</p><p>To monitor unencrypted connections for your domains, before blocking access using the feature, you can review the relevant analytics on the Cloudflare dashboard. Log in, select your account and domain, and navigate to the "<a href="https://developers.cloudflare.com/analytics/types-of-analytics/#account-analytics-beta"><u>Analytics &amp; Logs</u></a>" section. There, under the "<i>Traffic Served Over SSL</i>" subsection, you will find a breakdown of encrypted and unencrypted traffic for your site. That data can help provide a baseline for assessing the volume of plaintext HTTP connections for your site that will be blocked when you opt in. After opting in, you would expect no traffic for your site will be served over plaintext HTTP, and therefore that number should go down to zero.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4YjvOU3XQqj1Y2Kfv2jIL3/97178a99d17f8938bc3ec53704bbc4b8/5.png" />
          </figure><p><i>Snapshot of ‘Traffic Served Over SSL’ section on Cloudflare dashboard</i></p><p>Towards the last quarter of 2025, we will provide customers the ability to opt in their domains using the dashboard or API (similar to enabling the Always Use HTTPS feature). Stay tuned!</p>
    <div>
      <h3>Wrapping up</h3>
      <a href="#wrapping-up">
        
      </a>
    </div>
    <p>Starting today, any unencrypted connection to api.cloudflare.com will be completely rejected. Developers should <b>not</b> expect a 403 Forbidden response any longer for HTTP connections, as we will prevent the underlying connection to be established by closing the HTTP interface entirely. Only secure HTTPS connections will be allowed to be established.</p><p>We are also making updates to transition api.cloudflare.com away from its static IP addresses in the future. As part of that change, we will be discontinuing support for <a href="https://developers.cloudflare.com/ssl/reference/browser-compatibility/#non-sni-support"><u>non-SNI</u></a> legacy clients for Cloudflare API specifically — currently, an average of just 0.55% of TLS connections to the Cloudflare API do not include an <a href="https://www.cloudflare.com/learning/ssl/what-is-sni/">SNI</a> value. These non-SNI connections are initiated by a small number of accounts. We are committed to coordinating this transition and will work closely with the affected customers before implementing the change. This initiative aligns with our goal of enhancing the agility and reliability of our API endpoints.</p><p>Beyond the Cloudflare API use case, we're also exploring other areas where it's safe to close plaintext traffic ports. While the long tail of unencrypted traffic may persist for a while, it shouldn’t be forced on every site.

In the meantime, a small step like this can allow us to have a big impact in helping make a better Internet, and we are working hard to reliably bring this feature to your domains. We believe security should be free for all!</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[HTTPS]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Addressing]]></category>
            <category><![CDATA[Research]]></category>
            <guid isPermaLink="false">RqjV9vQoPNX8txlmrju6d</guid>
            <dc:creator>Suleman Ahmad</dc:creator>
            <dc:creator>Ash Pallarito</dc:creator>
            <dc:creator>Algin Martin</dc:creator>
        </item>
        <item>
            <title><![CDATA[Migrating billions of records: moving our active DNS database while it’s in use]]></title>
            <link>https://blog.cloudflare.com/migrating-billions-of-records-moving-our-active-dns-database-while-in-use/</link>
            <pubDate>Tue, 29 Oct 2024 14:00:00 GMT</pubDate>
            <description><![CDATA[ DNS records have moved to a new database, bringing improved performance and reliability to all customers. ]]></description>
            <content:encoded><![CDATA[ <p>According to a survey done by <a href="https://w3techs.com/technologies/overview/dns_server"><u>W3Techs</u></a>, as of October 2024, Cloudflare is used as an <a href="https://www.cloudflare.com/en-gb/learning/dns/dns-server-types/"><u>authoritative DNS</u></a> provider by 14.5% of all websites. As an authoritative DNS provider, we are responsible for managing and serving all the DNS records for our clients’ domains. This means we have an enormous responsibility to provide the best service possible, starting at the data plane. As such, we are constantly investing in our infrastructure to ensure the reliability and performance of our systems.</p><p><a href="https://www.cloudflare.com/learning/dns/what-is-dns/"><u>DNS</u></a> is often referred to as the phone book of the Internet, and is a key component of the Internet. If you have ever used a phone book, you know that they can become extremely large depending on the size of the physical area it covers. A <a href="https://www.cloudflare.com/en-gb/learning/dns/glossary/dns-zone/#:~:text=What%20is%20a%20DNS%20zone%20file%3F"><u>zone file</u></a> in DNS is no different from a phone book. It has a list of records that provide details about a domain, usually including critical information like what IP address(es) each hostname is associated with. For example:</p>
            <pre><code>example.com      59 IN A 198.51.100.0
blog.example.com 59 IN A 198.51.100.1
ask.example.com  59 IN A 198.51.100.2</code></pre>
            <p>It is not unusual for these zone files to reach millions of records in size, just for a single domain. The biggest single zone on Cloudflare holds roughly 4 million DNS records, but the vast majority of zones hold fewer than 100 DNS records. Given our scale according to W3Techs, you can imagine how much DNS data alone Cloudflare is responsible for. Given this volume of data, and all the complexities that come at that scale, there needs to be a very good reason to move it from one database cluster to another. </p>
    <div>
      <h2>Why migrate </h2>
      <a href="#why-migrate">
        
      </a>
    </div>
    <p>When initially measured in 2022, DNS data took up approximately 40% of the storage capacity in Cloudflare’s main database cluster (<b>cfdb</b>). This database cluster, consisting of a primary system and multiple replicas, is responsible for storing DNS zones, propagated to our <a href="https://www.cloudflare.com/network/"><u>data centers in over 330 cities</u></a> via our distributed KV store <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale/"><u>Quicksilver</u></a>. <b>cfdb</b> is accessed by most of Cloudflare's APIs, including the <a href="https://developers.cloudflare.com/dns/manage-dns-records/how-to/create-dns-records/"><u>DNS Records API</u></a>. Today, the DNS Records API is the API most used by our customers, with each request resulting in a query to the database. As such, it’s always been important to optimize the DNS Records API and its surrounding infrastructure to ensure we can successfully serve every request that comes in.</p><p>As Cloudflare scaled, <b>cfdb</b> was becoming increasingly strained under the pressures of several services, many unrelated to DNS. During spikes of requests to our DNS systems, other Cloudflare services experienced degradation in the database performance. It was understood that in order to properly scale, we needed to optimize our database access and improve the systems that interact with it. However, it was evident that system level improvements could only be just so useful, and the growing pains were becoming unbearable. In late 2022, the DNS team decided, along with the help of 25 other teams, to detach itself from <b>cfdb</b> and move our DNS records data to another database cluster.</p>
    <div>
      <h2>Pre-migration</h2>
      <a href="#pre-migration">
        
      </a>
    </div>
    <p>From a DNS perspective, this migration to an improved database cluster was in the works for several years. Cloudflare initially relied on a single <a href="https://www.postgresql.org/"><u>Postgres</u></a> database cluster, <b>cfdb</b>. At Cloudflare's inception, <b>cfdb</b> was responsible for storing information about zones and accounts and the majority of services on the Cloudflare control plane depended on it. Since around 2017, as Cloudflare grew, many services moved their data out of <b>cfdb</b> to be served by a <a href="https://en.wikipedia.org/wiki/Microservices"><u>microservice</u></a>. Unfortunately, the difficulty of these migrations are directly proportional to the amount of services that depend on the data being migrated, and in this case, most services require knowledge of both zones and DNS records.</p><p>Although the term “zone” was born from the DNS point of view, it has since evolved into something more. Today, zones on Cloudflare store many different types of non-DNS related settings and help link several non-DNS related products to customers' websites. Therefore, it didn’t make sense to move both zone data and DNS record data together. This separation of two historically tightly coupled DNS concepts proved to be an incredibly challenging problem, involving many engineers and systems. In addition, it was clear that if we were going to dedicate the resources to solving this problem, we should also remove some of the legacy issues that came along with the original solution. </p><p>One of the main issues with the legacy database was that the DNS team had little control over which systems accessed exactly what data and at what rate. Moving to a new database gave us the opportunity to create a more tightly controlled interface to the DNS data. This was manifested as an internal DNS Records <a href="https://blog.cloudflare.com/moving-k8s-communication-to-grpc/"><u>gRPC API</u></a> which allows us to make sweeping changes to our data while only requiring a single change to the API, rather than coordinating with other systems.  For example, the DNS team can alter access logic and auditing procedures under the hood. In addition, it allows us to appropriately rate-limit and cache data depending on our needs. The move to this new API itself was no small feat, and with the help of several teams, we managed to migrate over 20 services, using 5 different programming languages, from direct database access to using our managed gRPC API. Many of these services touch very important areas such as <a href="https://developers.cloudflare.com/dns/dnssec/"><u>DNSSEC</u></a>, <a href="https://developers.cloudflare.com/ssl/"><u>TLS</u></a>, <a href="https://developers.cloudflare.com/email-routing/"><u>Email</u></a>, <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/"><u>Tunnels</u></a>, <a href="https://developers.cloudflare.com/workers/"><u>Workers</u></a>, <a href="https://developers.cloudflare.com/spectrum/"><u>Spectrum</u></a>, and <a href="https://developers.cloudflare.com/r2/"><u>R2 storage</u></a>. Therefore, it was important to get it right. </p><p>One of the last issues to tackle was the logical decoupling of common DNS database functions from zone data. Many of these functions expect to be able to access both DNS record data and DNS zone data at the same time. For example, at record creation time, our API needs to check that the zone is not over its maximum record allowance. Originally this check occurred at the SQL level by verifying that the record count was lower than the record limit for the zone. However, once you remove access to the zone itself, you are no longer able to confirm this. Our DNS Records API also made use of SQL functions to audit record changes, which requires access to both DNS record and zone data. Luckily, over the past several years, we have migrated this functionality out of our monolithic API and into separate microservices. This allowed us to move the auditing and zone setting logic to the application level rather than the database level. Ultimately, we are still taking advantage of SQL functions in the new database cluster, but they are fully independent of any other legacy systems, and are able to take advantage of the latest Postgres version.</p><p>Now that Cloudflare DNS was mostly decoupled from the zones database, it was time to proceed with the data migration. For this, we built what would become our <b>Change Data Capture and Transfer Service (CDCTS).</b></p>
    <div>
      <h2>Requirements for the Change Data Capture and Transfer Service</h2>
      <a href="#requirements-for-the-change-data-capture-and-transfer-service">
        
      </a>
    </div>
    <p>The Database team is responsible for all Postgres clusters within Cloudflare, and were tasked with executing the data migration of two tables that store DNS data: <i>cf_rec</i> and <i>cf_archived_rec</i>, from the original <b>cfdb </b>cluster to a new cluster we called <b>dnsdb</b>.  We had several key requirements that drove our design:</p><ul><li><p><b>Don’t lose data. </b>This is the number one priority when handling any sort of data. Losing data means losing trust, and it is incredibly difficult to regain that trust once it’s lost.  Important in this is the ability to prove no data had been lost.  The migration process would, ideally, be easily auditable.</p></li><li><p><b>Minimize downtime</b>.  We wanted a solution with less than a minute of downtime during the migration, and ideally with just a few seconds of delay.</p></li></ul><p>These two requirements meant that we had to be able to migrate data changes in near real-time, meaning we either needed to implement logical replication, or some custom method to capture changes, migrate them, and apply them in a table in a separate Postgres cluster.</p><p>We first looked at using Postgres logical replication using <a href="https://github.com/2ndQuadrant/pglogical"><u>pgLogical</u></a>, but had concerns about its performance and our ability to audit its correctness.  Then some additional requirements emerged that made a pgLogical implementation of logical replication impossible:</p><ul><li><p><b>The ability to move data must be bidirectional.</b> We had to have the ability to switch back to <b>cfdb</b> without significant downtime in case of unforeseen problems with the new implementation. </p></li><li><p><b>Partition the </b><b><i>cf_rec</i></b><b> table in the new database.</b> This was a long-desired improvement and since most access to <i>cf_rec</i> is by zone_id, it was decided that <b>mod(zone_id, num_partitions)</b> would be the partition key.</p></li><li><p><b>Transferred data accessible from original database.  </b>In case we had functionality that still needed access to data, a foreign table pointing to <b>dnsdb</b> would be available in <b>cfdb</b>. This could be used as emergency access to avoid needing to roll back the entire migration for a single missed process.</p></li><li><p><b>Only allow writes in one database. </b> Applications should know where the primary database is, and should be blocked from writing to both databases at the same time.</p></li></ul>
    <div>
      <h2>Details about the tables being migrated</h2>
      <a href="#details-about-the-tables-being-migrated">
        
      </a>
    </div>
    <p>The primary table, <i>cf_rec</i>, stores DNS record information, and its rows are regularly inserted, updated, and deleted. At the time of the migration, this table had 1.7 billion records, and with several indexes took up 1.5 TB of disk. Typical daily usage would observe 3-5 million inserts, 1 million updates, and 3-5 million deletes.</p><p>The second table, <i>cf_archived_rec</i>, stores copies of <i>cf_rec</i> that are obsolete — this table generally only has records inserted and is never updated or deleted.  As such, it would see roughly 3-5 million inserts per day, corresponding to the records deleted from <i>cf_rec</i>. At the time of the migration, this table had roughly 4.3 billion records.</p><p>Fortunately, neither table made use of database triggers or foreign keys, which meant that we could insert/update/delete records in this table without triggering changes or worrying about dependencies on other tables.</p><p>Ultimately, both of these tables are highly active and are the source of truth for many highly critical systems at Cloudflare.</p>
    <div>
      <h2>Designing the Change Data Capture and Transfer Service</h2>
      <a href="#designing-the-change-data-capture-and-transfer-service">
        
      </a>
    </div>
    <p>There were two main parts to this database migration:</p><ol><li><p><b>Initial copy:</b> Take all the data from <b>cfdb </b>and put it in <b>dnsdb.</b></p></li><li><p><b>Change copy:</b> Take all the changes in <b>cfdb </b>since the initial copy and update <b>dnsdb</b> to reflect them. This is the more involved part of the process.</p></li></ol><p>Normally, logical replication replays every insert, update, and delete on a copy of the data in the same transaction order, making a single-threaded pipeline.  We considered using a queue-based system but again, speed and auditability were both concerns as any queue would typically replay one change at a time.  We wanted to be able to apply large sets of changes, so that after an initial dump and restore, we could quickly catch up with the changed data. For the rest of the blog, we will only speak about <i>cf_rec</i> for simplicity, but the process for <i>cf_archived_rec</i> is the same.</p><p>What we decided on was a simple change capture table. Rows from this capture table would be loaded in real-time by a database trigger, with a transfer service that could migrate and apply thousands of changed records to <b>dnsdb</b> in each batch. Lastly, we added some auditing logic on top to ensure that we could easily verify that all data was safely transferred without downtime.</p>
    <div>
      <h3>Basic model of change data capture </h3>
      <a href="#basic-model-of-change-data-capture">
        
      </a>
    </div>
    <p>For <i>cf_rec</i> to be migrated, we would create a change logging table, along with a trigger function and a  table trigger to capture the new state of the record after any insert/update/delete.  </p><p>The change logging table named <i>log_cf_rec</i> had the same columns as <i>cf_rec</i>, as well as four new columns:</p><ul><li><p><b>change_id</b>:  a sequence generated unique identifier of the record</p></li><li><p><b>action</b>: a single character indicating whether this record represents an [i]nsert, [u]pdate, or [d]elete</p></li><li><p><b>change_timestamp</b>: the date/time when the change record was created</p></li><li><p><b>change_user:</b> the database user that made the change.  </p></li></ul><p>A trigger was placed on the <i>cf_rec</i> table so that each insert/update would copy the new values of the record into the change table, and for deletes, create a 'D' record with the primary key value. </p><p>Here is an example of the change logging where we delete, re-insert, update, and finally select from the <i>log_cf_rec</i><b> </b>table. Note that the actual <i>cf_rec</i> and <i>log_cf_rec</i> tables have many more columns, but have been edited for simplicity.</p>
            <pre><code>dns_records=# DELETE FROM  cf_rec WHERE rec_id = 13;

dns_records=# SELECT * from log_cf_rec;
Change_id | action | rec_id | zone_id | name
----------------------------------------------
1         | D      | 13     |         |   

dns_records=# INSERT INTO cf_rec VALUES(13,299,'cloudflare.example.com');  

dns_records=# UPDATE cf_rec SET name = 'test.example.com' WHERE rec_id = 13;

dns_records=# SELECT * from log_cf_rec;
Change_id | action | rec_id | zone_id | name
----------------------------------------------
1         | D      | 13     |         |  
2         | I      | 13     | 299     | cloudflare.example.com
3         | U      | 13     | 299     | test.example.com </code></pre>
            <p>In addition to <i>log_cf_rec</i>, we also introduced 2 more tables in <b>cfdb </b>and 3 more tables in <b>dnsdb:</b></p><p><b>cfdb</b></p><ol><li><p><i>transferred_log_cf_rec</i>: Responsible for auditing the batches transferred to <b>dnsdb</b>.</p></li><li><p><i>log_change_action</i>:<i> </i>Responsible for summarizing the transfer size in order to compare with the <i>log_change_action </i>in <b>dnsdb.</b></p></li></ol><p><b>dnsdb</b></p><ol><li><p><i>migrate_log_cf_rec</i>:<i> </i>Responsible for collecting batch changes in <b>dnsdb</b>, which would later be applied to <i>cf_rec </i>in <b>dnsdb</b><i>.</i></p></li><li><p><i>applied_migrate_log_cf_rec</i>:<i> </i>Responsible for auditing the batches that had been successfully applied to cf_rec in <b>dnsdb.</b></p></li><li><p><i>log_change_action</i>:<i> </i>Responsible for summarizing the transfer size in order to compare with the <i>log_change_action </i>in <b>cfdb.</b></p></li></ol>
    <div>
      <h3>Initial copy</h3>
      <a href="#initial-copy">
        
      </a>
    </div>
    <p>With change logging in place, we were now ready to do the initial copy of the tables from <b>cfdb</b> to <b>dnsdb</b>. Because we were changing the structure of the tables in the destination database and because of network timeouts, we wanted to bring the data over in small pieces and validate that it was brought over accurately, rather than doing a single multi-hour copy or <a href="https://www.postgresql.org/docs/current/app-pgdump.html"><u>pg_dump</u></a>.  We also wanted to ensure a long-running read could not impact production and that the process could be paused and resumed at any time.  The basic model to transfer data was done with a simple psql copy statement piped into another psql copy statement.  No intermediate files were used.</p><p><code>psql_cfdb -c "COPY (SELECT * FROM cf_rec WHERE id BETWEEN n and n+1000000 TO STDOUT)" | </code></p><p><code>psql_dnsdb -c "COPY cf_rec FROM STDIN"</code></p><p>Prior to a batch being moved, the count of records to be moved was recorded in <b>cfdb</b>, and after each batch was moved, a count was recorded in <b>dnsdb</b> and compared to the count in <b>cfdb</b> to ensure that a network interruption or other unforeseen error did not cause data to be lost. The bash script to copy data looked like this, where we included files that could be touched to pause or end the copy (if they cause load on production or there was an incident).  Once again, this code below has been heavily simplified.</p>
            <pre><code>#!/bin/bash
for i in "$@"; do
   # Allow user to control whether this is paused or not via pause_copy file
   while [ -f pause_copy ]; do
      sleep 1
   done
   # Allow user to end migration by creating end_copy file
   if [ ! -f end_copy ]; then
      # Copy a batch of records from cfdb to dnsdb
      # Get count of records from cfdb 
	# Get count of records from dnsdb
 	# Compare cfdb count with dnsdb count and alert if different 
   fi
done
</code></pre>
            <p><sup><i>Bash copy script</i></sup></p>
    <div>
      <h3>Change copy</h3>
      <a href="#change-copy">
        
      </a>
    </div>
    <p>Once the initial copy was completed, we needed to update <b>dnsdb</b> with any changes that had occurred in <b>cfdb</b> since the start of the initial copy. To implement this change copy, we created a function <i>fn_log_change_transfer_log_cf_rec </i>that could be passed a <i>batch_id</i> and <i>batch_size</i>, and did 5 things, all of which were executed in a single database <a href="https://www.postgresql.org/docs/current/tutorial-transactions.html"><u>transaction</u></a>:</p><ol><li><p>Select a <i>batch_size</i> of records from <i>log_cf_rec</i> in <b>cfdb</b>.</p></li><li><p>Copy the batch to <i>transferred_log_cf_rec</i> in <b>cfdb </b>to mark it as transferred.</p></li><li><p>Delete the batch from <i>log_cf_rec</i>.</p></li><li><p>Write a summary of the action to <i>log_change_action</i> table. This will later be used to compare transferred records with <b>cfdb</b>.</p></li><li><p>Return the batch of records.</p></li></ol><p>We then took the returned batch of records and copied them to <i>migrate_log_cf_rec </i>in <b>dnsdb</b>. We used the same bash script as above, except this time, the copy command looked like this:</p><p><code>psql_cfdb -c "COPY (SELECT * FROM </code><code><i>fn_log_change_transfer_log_cf_rec(&lt;batch_id&gt;,&lt;batch_size&gt;</i></code><code>) TO STDOUT" | </code></p><p><code>psql_dnsdb -c "COPY migrate_log_cf_rec FROM STDIN"</code></p>
    <div>
      <h3>Applying changes in the destination database</h3>
      <a href="#applying-changes-in-the-destination-database">
        
      </a>
    </div>
    <p>Now, with a batch of data in the <i>migrate_log_cf_rec </i>table, we called a newly created function <i>log_change_apply</i> to apply and audit the changes. Once again, this was all executed within a single database transaction. The function did the following:</p><ol><li><p>Move a batch from the <i>migrate_log_cf_rec</i> table to a new temporary table.</p></li><li><p>Write the counts for the batch_id to the <i>log_change_action</i> table.</p></li><li><p>Delete from the temporary table all but the latest record for a unique id (last action). For example, an insert followed by 30 updates would have a single record left, the final update. There is no need to apply all the intermediate updates.</p></li><li><p>Delete any record from <i>cf_rec</i> that has any corresponding changes.</p></li><li><p>Insert any [i]nsert or [u]pdate records in <i>cf_rec</i>.</p></li><li><p>Copy the batch to <i>applied_migrate_log_cf_rec</i> for a full audit trail.</p></li></ol>
    <div>
      <h3>Putting it all together</h3>
      <a href="#putting-it-all-together">
        
      </a>
    </div>
    <p>There were 4 distinct phases, each of which was part of a different database transaction:</p><ol><li><p>Call <i>fn_log_change_transfer_log_cf_rec </i>in <b>cfdb </b>to get a batch of records.</p></li><li><p>Copy the batch of records to <b>dnsdb.</b></p></li><li><p>Call <i>log_change_apply </i>in <b>dnsdb </b>to apply the batch of records.</p></li><li><p>Compare the <i>log_change_action</i> table in each respective database to ensure counts match.</p></li></ol>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2REIq71tc7M4jKPLZSJzS9/11f22f700300f2ad3a5ee5ca85a75480/Applying_changes_in_the_destination_database.png" />
          </figure><p>This process was run every 3 seconds for several weeks before the migration to ensure that we could keep <b>dnsdb</b> in sync with <b>cfdb</b>.</p>
    <div>
      <h2>Managing which database is live</h2>
      <a href="#managing-which-database-is-live">
        
      </a>
    </div>
    <p>The last major pre-migration task was the construction of the request locking system that would be used throughout the actual migration. The aim was to create a system that would allow the database to communicate with the DNS Records API, to allow the DNS Records API to handle HTTP connections more gracefully. If done correctly, this could reduce downtime for DNS Record API users to nearly zero.</p><p>In order to facilitate this, a new table called <i>cf_migration_manager</i> was created. The table would be periodically polled by the DNS Records API, communicating two critical pieces of information:</p><ol><li><p><b>Which database was active.</b> Here we just used a simple A or B naming convention.</p></li><li><p><b>If the database was locked for writing</b>. In the event the database was locked for writing, the DNS Records API would hold HTTP requests until the lock was released by the database.</p></li></ol><p>Both pieces of information would be controlled within a migration manager script.</p><p>The benefit of migrating the 20+ internal services from direct database access to using our internal DNS Records gRPC API is that we were able to control access to the database to ensure that no one else would be writing without going through the <i>cf_migration_manager</i>.</p>
    <div>
      <h2>During the migration </h2>
      <a href="#during-the-migration">
        
      </a>
    </div>
    <p>Although we aimed to complete this migration in a matter of seconds, we announced a DNS maintenance window that could last a couple of hours just to be safe. Now that everything was set up, and both <b>cfdb</b> and <b>dnsdb</b> were roughly in sync, it was time to proceed with the migration. The steps were as follows:</p><ol><li><p>Lower the time between copies from 3s to 0.5s.</p></li><li><p>Lock <b>cfdb</b> for writes via <i>cf_migration_manager</i>. This would tell the DNS Records API to hold write connections.</p></li><li><p>Make <b>cfdb</b> read-only and migrate the last logged changes to <b>dnsdb</b>. </p></li><li><p>Enable writes to <b>dnsdb</b>. </p></li><li><p>Tell DNS Records API that <b>dnsdb</b> is the new primary database and that write connections can proceed via the <i>cf_migration_manager</i>.</p></li></ol><p>Since we needed to ensure that the last changes were copied to <b>dnsdb</b> before enabling writing, this entire process took no more than 2 seconds. During the migration we saw a spike of API latency as a result of the migration manager locking writes, and then dealing with a backlog of queries. However, we recovered back to normal latencies after several minutes. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6agUpD8BQVxgDupBrwtTw3/38c96f91879c6539011866821ad6f11a/image3.png" />
          </figure><p><sup><i>DNS Records API Latency and Requests during migration</i></sup></p><p>Unfortunately, due to the far-reaching impact that DNS has at Cloudflare, this was not the end of the migration. There were 3 lesser-used services that had slipped by in our scan of services accessing DNS records via <b>cfdb</b>. Fortunately, the setup of the foreign table meant that we could very quickly fix any residual issues by simply changing the table name. </p>
    <div>
      <h2>Post-migration</h2>
      <a href="#post-migration">
        
      </a>
    </div>
    <p>Almost immediately, as expected, we saw a steep drop in usage across <b>cfdb</b>. This freed up a lot of resources for other services to take advantage of.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/Xfnbc9MZLwJB91ypItWsi/1eb21362893b31a1e3c846d1076a9f5b/image6.jpg" />
          </figure><p><sup><i><b>cfdb</b></i></sup><sup><i> usage dropped significantly after the migration period.</i></sup></p><p>Since the migration, the average <b>requests</b> per second to the DNS Records API has more than <b>doubled</b>. At the same time, our CPU usage across both <b>cfdb</b> and <b>dnsdb</b> has settled at below 10% as seen below, giving us room for spikes and future growth. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/39su35dkb5Pl8uwYfYjHLg/0eb26ced30b44efb71abb73830e01f3a/image2.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5AdlLKXtD68QWCsMVLKnkt/9137beee9c941827eb57c53825ffe209/image4.png" />
          </figure><p><sup><i><b>cfdb</b></i></sup><sup><i> and </i></sup><sup><i><b>dnsdb</b></i></sup><sup><i> CPU usage now</i></sup></p><p>As a result of this improved capacity, our database-related incident rate dropped dramatically.</p><p>As for query latencies, our latency post-migration is slightly lower on average, with fewer sustained spikes above 500ms. However, the performance improvement is largely noticed during high load periods, when our database handles spikes without significant issues. Many of these spikes come as a result of clients making calls to collect a large amount of DNS records or making several changes to their zone in short bursts. Both of these actions are common use cases for large customers onboarding zones.</p><p>In addition to these improvements, the DNS team also has more granular control over <b>dnsdb</b> cluster-specific settings that can be tweaked for our needs rather than catering to all the other services. For example, we were able to make custom changes to replication lag limits to ensure that services using replicas were able to read with some amount of certainty that the data would exist in a consistent form. Measures like this reduce overall load on the primary because almost all read queries can now go to the replicas.</p><p>Although this migration was a resounding success, we are always working to improve our systems. As we grow, so do our customers, which means the need to scale never really ends. We have more exciting improvements on the roadmap, and we are looking forward to sharing more details in the future.</p><p>The DNS team at Cloudflare isn’t the only team solving challenging problems like the one above. If this sounds interesting to you, we have many more tech deep dives on our blog, and we are always looking for curious engineers to join our team — see open opportunities <a href="https://www.cloudflare.com/en-gb/careers/jobs/"><u>here</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Database]]></category>
            <category><![CDATA[Kafka]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[Tracing]]></category>
            <category><![CDATA[Quicksilver]]></category>
            <guid isPermaLink="false">24rozMdbFQ7jmUgRNMF4RU</guid>
            <dc:creator>Alex Fattouche</dc:creator>
            <dc:creator>Corey Horton</dc:creator>
        </item>
        <item>
            <title><![CDATA[Automatically generating Cloudflare’s Terraform provider]]></title>
            <link>https://blog.cloudflare.com/automatically-generating-cloudflares-terraform-provider/</link>
            <pubDate>Tue, 24 Sep 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ The Cloudflare Terraform provider used to be manually maintained. With the help of our existing OpenAPI code generation pipeline, we’re now automatically generating the provider for better  ]]></description>
            <content:encoded><![CDATA[ <p>In November 2022, we announced the transition to <a href="https://blog.cloudflare.com/open-api-transition/"><u>OpenAPI Schemas for the Cloudflare API</u></a>. Back then, we had an audacious goal to make the OpenAPI schemas the source of truth for our SDK ecosystem and reference documentation. During 2024’s Developer Week, we backed this up by <a href="https://blog.cloudflare.com/workers-production-safety/"><u>announcing that our SDK libraries are now automatically generated</u></a> from these OpenAPI schemas. Today, we’re excited to announce the latest pieces of the ecosystem to now be automatically generated — the Terraform provider and API reference documentation.</p><p>This means that the moment a new feature or attribute is added to our products and the team documents it, you’ll be able to see how it’s meant to be used across our SDK ecosystem <i>and</i> make use of it immediately. No more delays. No more lacking coverage of API endpoints.</p><p>You can find the new documentation site at <a href="https://developers.cloudflare.com/api-next/"><u>https://developers.cloudflare.com/api-next/</u></a>, and you can try the preview release candidate of the Terraform provider by <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/5.0.0-alpha1"><u>installing 5.0.0-alpha1</u></a>.</p>
    <div>
      <h2>Why Terraform? </h2>
      <a href="#why-terraform">
        
      </a>
    </div>
    <p>For anyone who is unfamiliar with <a href="https://www.terraform.io/"><u>Terraform</u></a>, it is a tool for managing your infrastructure as code, much like you would with your application code. Many of our customers (big and small) rely on Terraform to orchestrate their infrastructure in a technology-agnostic way. Under the hood, it is essentially an HTTP client with lifecycle management built in, which means it makes use of our publicly documented APIs in a way that understands how to create, read, update and delete for the life of the resource. </p>
    <div>
      <h2>Keeping Terraform updated — the old way</h2>
      <a href="#keeping-terraform-updated-the-old-way">
        
      </a>
    </div>
    <p>Historically, Cloudflare has manually maintained a Terraform provider, but since the provider internals require their own unique way of doing things, responsibility for maintenance and support has landed on the shoulders of a handful of individuals. The service teams always had difficulties keeping up with the number of changes, due to the amount of cognitive overhead required to ship a single change in the provider. In order for a team to get a change to the provider, it took a minimum of 3 pull requests (4 if you were adding support to <a href="https://github.com/cloudflare/cf-terraforming"><u>cf-terraforming</u></a>).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6spvs4QAkY7BXLNfABDSQs/838f9b224838cd174376eb413cce7848/image6.png" />
          </figure><p>Even with the 4 pull requests completed, it didn’t offer guarantees on coverage of all available attributes, which meant small yet important details could be forgotten and not exposed to customers, causing frustration when trying to configure a resource.</p><p>To address this, our Terraform provider needed to be relying on the same OpenAPI schemas that the rest of our SDK ecosystem was <a href="https://blog.cloudflare.com/lessons-from-building-an-automated-sdk-pipeline/"><u>already benefiting from</u></a>.</p>
    <div>
      <h2>Updating Terraform automatically</h2>
      <a href="#updating-terraform-automatically">
        
      </a>
    </div>
    <p>The thing that differentiates Terraform from our SDKs is that it manages the lifecycle of resources. With that comes a new range of problems related to known values and managing differences in the request and response payloads. Let’s compare the two different approaches of creating a new DNS record and fetching it back.</p><p>With our Go SDK:</p>
            <pre><code>// Create the new record
record, _ := client.DNS.Records.New(context.TODO(), dns.RecordNewParams{
	ZoneID: cloudflare.F("023e105f4ecef8ad9ca31a8372d0c353"),
	Record: dns.RecordParam{
		Name:    cloudflare.String("@"),
		Type:    cloudflare.String("CNAME"),
        Content: cloudflare.String("example.com"),
	},
})


// Wasteful fetch, but shows the point
client.DNS.Records.Get(
	context.Background(),
	record.ID,
	dns.RecordGetParams{
		ZoneID: cloudflare.String("023e105f4ecef8ad9ca31a8372d0c353"),
	},
)
</code></pre>
            <p>
And with Terraform:</p>
            <pre><code>resource "cloudflare_dns_record" "example" {
  zone_id = "023e105f4ecef8ad9ca31a8372d0c353"
  name    = "@"
  content = "example.com"
  type    = "CNAME"
}</code></pre>
            <p>On the surface, it looks like the Terraform approach is simpler, and you would be correct. The complexity of knowing how to create a new resource and maintain changes are handled for you. However, the problem is that for Terraform to offer this abstraction and data guarantee, all values must be known at apply time. That means that even if you’re not using the <code>proxied</code> value, Terraform needs to know what the value needs to be in order to save it in the state file and manage that attribute going forward. The error below is what Terraform operators commonly see from providers when the value isn’t known at apply time.</p>
            <pre><code>Error: Provider produced inconsistent result after apply

When applying changes to example_thing.foo, provider "provider[\"registry.terraform.io/example/example\"]"
produced an unexpected new value: .foo: was null, but now cty.StringVal("").</code></pre>
            <p>Whereas when using the SDKs, if you don’t need a field, you just omit it and never need to worry about maintaining known values.</p><p>Tackling this for our OpenAPI schemas was no small feat. Since introducing Terraform generation support, the quality of our schemas has improved by an order of magnitude. Now we are explicitly calling out all default values that are present, variable response properties based on the request payload, and any server-side computed attributes. All of this means a better experience for anyone that interacts with our APIs.</p>
    <div>
      <h3>Making the jump from terraform-plugin-sdk to terraform-plugin-framework</h3>
      <a href="#making-the-jump-from-terraform-plugin-sdk-to-terraform-plugin-framework">
        
      </a>
    </div>
    <p>To build a Terraform provider and expose resources or data sources to operators, you need two main things: a provider server and a provider.</p><p>The provider server takes care of exposing a <a href="https://github.com/hashicorp/terraform/blob/main/docs/plugin-protocol/README.md"><u>gRPC server</u></a> that Terraform core (via the CLI) uses to communicate when managing resources or reading data sources from the operator provided configuration.</p><p>The provider is responsible for wrapping the resources and data sources, communicating with the remote services, and managing the state file. To do this, you either rely on the <a href="https://github.com/hashicorp/terraform-plugin-sdk"><u>terraform-plugin-sdk</u></a> (commonly referred to as SDKv2) or <a href="https://github.com/hashicorp/terraform-plugin-framework"><u>terraform-plugin-framework</u></a>, which includes all the interfaces and methods provided by Terraform in order to manage the internals correctly. The decision as to which plugin you use depends on the age of your provider. SDKv2 has been around longer and is what most Terraform providers use, but due to the age and complexity, it has many core unresolved issues that must remain in order to facilitate backwards compatibility for those who rely on it. <code>terraform-plugin-framework</code> is the new version that, while lacking the breadth of features SDKv2 has, provides a more Go-like approach to building providers and addresses many of the underlying bugs in SDKv2.</p><p><i>(For a deeper comparison between SDKv2 and the framework, you can check out a </i><a href="https://www.youtube.com/watch?v=4P69E44mJGo"><i><u>conversation between myself and John Bristowe from Octopus Deploy</u></i></a><i>.)</i></p><p>The majority of the Cloudflare Terraform provider is built using SDKv2, but at the beginning of 2023, we <a href="https://github.com/cloudflare/terraform-provider-cloudflare/pull/2170"><u>took the plunge to multiplex</u></a> and offer both in our provider. To understand why this was needed, we have to understand a little about SDKv2. The way SDKv2 is structured isn't really conducive to representing null or "unset" values consistently and reliably. You can use the <a href="https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#ResourceData.GetRawConfig"><u>experimental ResourceData.GetRawConfig</u></a> to check whether the value is set, null, or unknown in the config, but writing it back as null isn't really supported.</p><p>This caveat first popped up for us when the Edge Rules Engine (Rulesets) started onboarding new services and those services needed to support API responses that contained booleans in an unset (or missing), <code>true</code>, or <code>false</code> state each with their own reasoning and purpose. While this isn’t a conventional API design at Cloudflare, it is a valid way to do things that we should be able to work with. However, as mentioned above, the SDKv2 provider couldn't. This is because when a value isn't present in the response or read into state, it gets a Go-compatible zero value for the default. This showed up as the inability to unset values after they had been written to state as false values (and vice versa).</p><p>The only solution we have here to reliably use the three states of those boolean values is to migrate to the <code>terraform-plugin-framework</code>, which has the <a href="https://github.com/hashicorp/terraform-plugin-framework/blob/main/types/bool_value.go"><u>correct implementation of writing back unset values</u></a>.</p><p>Once we started adding more functionality using <code>terraform-plugin-framework</code> in the old provider, it was clear that it was a better developer experience, so we <a href="https://github.com/cloudflare/terraform-provider-cloudflare/pull/2871"><u>added a ratchet</u></a> to prevent SDKv2 usage going forward to get ahead of anyone unknowingly setting themselves up to hit this issue.</p><p>When we decided that we would be automatically generating the Terraform provider, it was only fitting that we also brought all the resources over to be based on the <code>terraform-plugin-framework</code> and leave the issues from SDKv2 behind for good. This did complicate the migration as with the improved internals came changes to major components like the schema and <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete"><u>CRUD operations</u></a> that we needed to familiarize ourselves with. However, it has been a worthwhile investment because by doing so, we’ve future-proofed the foundations of the provider and are now making fewer compromises on a great Terraform experience due to buggy, legacy internals.</p>
    <div>
      <h3>Iteratively finding bugs </h3>
      <a href="#iteratively-finding-bugs">
        
      </a>
    </div>
    <p>One of the common struggles with code generation pipelines is that unless you have existing tools that implement your new thing, it’s hard to know if it works or is reasonable to use. Sure, you can also generate your tests to exercise the new thing, but if there is a bug in the pipeline, you are very likely to not see it as a bug as you will be generating test assertions that show the bug is expected behavior.</p><p>One of the essential feedback loops we have had is the existing acceptance test suite. All resources within the existing provider had a mix of regression and functionality tests. Best of all, as the test suite is creating and managing real resources, it was very easy to know whether the outcome was a working implementation or not by looking at the HTTP traffic to see whether the API calls were accepted by the remote endpoints. Getting the test suite ported over was only a matter of copying over all the existing tests and checking for any type assertion differences (such as list to single nested list) before kicking off a test run to determine whether the resource was working correctly.</p><p>While the centralized schema pipeline was a huge quality of life improvement for having schema fixes propagate to the whole ecosystem almost instantly, it couldn’t help us solve the largest hurdle, which was surfacing bugs that hide other bugs. This was time-consuming because when fixing a problem in Terraform, you have three places where you can hit an error:</p><ol><li><p>Before any API calls are made, Terraform implements logical schema validation and when it encounters validation errors, it will immediately halt.</p></li><li><p>If any API call fails, it will stop at the CRUD operation and return the diagnostics, immediately halting.</p></li><li><p>After the CRUD operation has run, Terraform then has checks in place to ensure all values are known.</p></li></ol><p>That means that if we hit the bug at step 1 and then fixed the bug, there was no guarantee or way to tell that we didn’t have two more waiting for us. Not to mention that if we found a bug in step 2 and shipped a fix, that it wouldn’t then identify a bug in the first step on the next round of testing.</p><p>There is no silver bullet here and our workaround was instead to notice patterns of problems in the schema behaviors and apply CI lint rules within the OpenAPI schemas before it got into the code generation pipeline. Taking this approach incrementally cut down the number of bugs in step 1 and 2 until we were largely only dealing with the type in step 3.</p>
    <div>
      <h3>A more reusable approach to model and struct conversion </h3>
      <a href="#a-more-reusable-approach-to-model-and-struct-conversion">
        
      </a>
    </div>
    <p>Within Terraform provider CRUD operations, it is fairly common to see boilerplate like the following:</p>
            <pre><code>var plan ThingModel
diags := req.Plan.Get(ctx, &amp;plan)
resp.Diagnostics.Append(diags...)
if resp.Diagnostics.HasError() {
	return
}

out, err := r.client.UpdateThingModel(ctx, client.ThingModelRequest{
	AttrA: plan.AttrA.ValueString(),
	AttrB: plan.AttrB.ValueString(),
	AttrC: plan.AttrC.ValueString(),
})
if err != nil {
	resp.Diagnostics.AddError(
		"Error updating project Thing",
		"Could not update Thing, unexpected error: "+err.Error(),
	)
	return
}

result := convertResponseToThingModel(out)
tflog.Info(ctx, "created thing", map[string]interface{}{
	"attr_a": result.AttrA.ValueString(),
	"attr_b": result.AttrB.ValueString(),
	"attr_c": result.AttrC.ValueString(),
})

diags = resp.State.Set(ctx, result)
resp.Diagnostics.Append(diags...)
if resp.Diagnostics.HasError() {
	return
}</code></pre>
            <p>At a high level:</p><ul><li><p>We fetch the proposed updates (known as a plan) using <code>req.Plan.Get()</code></p></li><li><p>Perform the update API call with the new values</p></li><li><p>Manipulate the data from a Go type into a Terraform model (<code>convertResponseToThingModel</code>)</p></li><li><p>Set the state by calling <code>resp.State.Set()</code></p></li></ul><p>Initially, this doesn’t seem too problematic. However, the third step where we manipulate the Go type into the Terraform model quickly becomes cumbersome, error-prone, and complex because all of your resources need to do this in order to swap between the type and associated Terraform models.</p><p>To avoid generating more complex code than needed, one of the improvements featured in our provider is that all CRUD methods use unified <code>apijson.Marshal, apijson.Unmarshal</code>, and <code>apijson.UnmarshalComputed</code> methods that solve this problem by centralizing the conversion and handling logic based on the struct tags.</p>
            <pre><code>var data *ThingModel

resp.Diagnostics.Append(req.Plan.Get(ctx, &amp;data)...)
if resp.Diagnostics.HasError() {
	return
}

dataBytes, err := apijson.Marshal(data)
if err != nil {
	resp.Diagnostics.AddError("failed to serialize http request", err.Error())
	return
}
res := new(http.Response)
env := ThingResultEnvelope{*data}
_, err = r.client.Thing.Update(
	// ...
)
if err != nil {
	resp.Diagnostics.AddError("failed to make http request", err.Error())
	return
}

bytes, _ := io.ReadAll(res.Body)
err = apijson.UnmarshalComputed(bytes, &amp;env)
if err != nil {
	resp.Diagnostics.AddError("failed to deserialize http request", err.Error())
	return
}
data = &amp;env.Result

resp.Diagnostics.Append(resp.State.Set(ctx, &amp;data)...)</code></pre>
            <p>Instead of needing to generate hundreds of instances of type-to-model converter methods, we can instead decorate the Terraform model with the correct tags and handle marshaling and unmarshaling of the data consistently. It’s a minor change to the code that in the long run makes the generation more reusable and readable. As an added benefit, this approach is great for bug fixing as once you identify a bug with a particular type of field, fixing that in the unified interface fixes it for other occurrences you may not yet have found.</p>
    <div>
      <h2>But wait, there’s more (docs)!</h2>
      <a href="#but-wait-theres-more-docs">
        
      </a>
    </div>
    <p>To top off our OpenAPI schema usage, we’re tightening the SDK integration with our <a href="https://developers.cloudflare.com/api-next/"><u>new API documentation site</u></a>. It’s using the same pipeline we’ve invested in for the last two years while addressing some of the common usage issues.</p>
    <div>
      <h3>SDK aware </h3>
      <a href="#sdk-aware">
        
      </a>
    </div>
    <p>If you’ve used our API documentation site, you know we give you examples of interacting with the API using command line tools like curl. This is a great starting point, but if you’re using one of the SDK libraries, you need to do the mental gymnastics to convert it to the method or type definition you want to use. Now that we’re using the same pipeline to generate the SDKs <b>and</b> the documentation, we’re solving that by providing examples in all the libraries you <i>could</i> use — not just curl.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2SNCehksc30kXXQvVKYC47/a3a6071be64d006a2da9b2e615d143ae/image2.png" />
            
            </figure><p><sup><i>Example using cURL to fetch all zones.</i></sup></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/50PeyK8oOLb51mCLF4ikds/764db96a24232b611ec88d5ff8f8844f/image4.png" />
            
            </figure><p><sup><i>Example using the Typescript library to fetch all zones.</i></sup></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5rQn6OY3R1yi5iot1oxti4/09cf62ea46ede21d1541b5012497efdb/image5.png" />
            
            </figure><p><sup><i>Example using the Python library to fetch all zones.</i></sup></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Na9y9ta3fLBMEAvJK4uaH/41ecf061a5a088f4bdb313d70b173a9a/image7.png" />
            
            </figure><p><sup><i>Example using the Go library to fetch all zones.</i></sup></p><p>With this improvement, we also remember the language selection so if you’ve selected to view the documentation using our Typescript library and keep clicking around, we keep showing you examples using Typescript until it is swapped out.</p><p>Best of all, when we introduce new attributes to existing endpoints or add SDK languages, this documentation site is automatically kept in sync with the pipeline. It is no longer a huge effort to keep it all up to date.</p>
    <div>
      <h3>Faster and more efficient rendering</h3>
      <a href="#faster-and-more-efficient-rendering">
        
      </a>
    </div>
    <p>A problem we’ve always struggled with is the sheer number of API endpoints and how to represent them. As of this post, we have 1,330 endpoints, and for each of those endpoints, we have a request payload, a response payload, and multiple types associated with it. When it comes to rendering this much information, the solutions we’ve used in the past have had to make tradeoffs in order to make parts of the representation work.</p><p>This next iteration of the API documentation site addresses this is a couple of ways:</p><ul><li><p>It's implemented as a modern React application that pairs an interactive client-side experience with static pre-rendered content, resulting in a quick initial load and fast navigation. (Yes, it even works without JavaScript enabled!). </p></li><li><p>It fetches the underlying data incrementally as you navigate.</p></li></ul><p>By solving this foundational issue, we’ve unlocked other planned improvements to the documentation site and SDK ecosystem to improve the user experience without making tradeoffs like we’ve needed to in the past. </p>
    <div>
      <h3>Permissions</h3>
      <a href="#permissions">
        
      </a>
    </div>
    <p>One of the most requested features to be re-implemented into the documentation site has been minimum required permissions for API endpoints. One of the previous iterations of the documentation site had this available. However, unknown to most who used it, the values were manually maintained and were regularly incorrect, causing support tickets to be raised and frustration for users.</p><p>Inside Cloudflare's identity and access management system, answering the question “what do I need to access this endpoint” isn’t a simple one. The reason for this is that in the normal flow of a request to the control plane, we need two different systems to provide parts of the question, which can then be combined to give you the full answer. As we couldn’t initially automate this as part of the OpenAPI pipeline, we opted to leave it out instead of having it be incorrect with no way of verifying it.</p><p>Fast-forward to today, and we’re excited to say endpoint permissions are back! We built some new tooling that abstracts answering this question in a way that we can integrate into our code generation pipeline and have all endpoints automatically get this information. Much like the rest of the code generation platform, it is focused on having service teams own and maintain high quality schemas that can be reused with value adds introduced without any work on their behalf.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/641gSS5MLQpCvEANYXcVK6/447cf0b873ecb60fdbbc415df0424363/image3.png" />
            
            </figure>
    <div>
      <h2>Stop waiting for updates</h2>
      <a href="#stop-waiting-for-updates">
        
      </a>
    </div>
    <p>With these announcements, we’re putting an end to waiting for updates to land in the SDK ecosystem. These new improvements allow us to streamline the ability of new attributes and endpoints the moment teams document them. So what are you waiting for? Check out the <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/5.0.0-alpha1"><u>Terraform provider</u></a> and <a href="https://developers.cloudflare.com/api-next/"><u>API documentation site</u></a> today.</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[SDK]]></category>
            <category><![CDATA[Terraform]]></category>
            <category><![CDATA[Open API]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">1M8zVthnUiMpJpGylQuptu</guid>
            <dc:creator>Jacob Bednarz</dc:creator>
        </item>
        <item>
            <title><![CDATA[Making zone management more efficient with batch DNS record updates]]></title>
            <link>https://blog.cloudflare.com/batched-dns-changes/</link>
            <pubDate>Mon, 23 Sep 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ In response to customer demand, we now support the ability to DELETE, PATCH, PUT and POST multiple DNS records in a single API call, enabling more efficient and reliable zone management.
 ]]></description>
            <content:encoded><![CDATA[ <p>Customers that use Cloudflare to manage their DNS often need to create a whole batch of records, enable <a href="https://developers.cloudflare.com/dns/manage-dns-records/reference/proxied-dns-records/"><u>proxying</u></a> on many records, update many records to point to a new target at the same time, or even delete all of their records. Historically, customers had to resort to bespoke scripts to make these changes, which came with their own set of issues. In response to customer demand, we are excited to announce support for batched API calls to the <a href="https://developers.cloudflare.com/dns/manage-dns-records/how-to/create-dns-records/"><u>DNS records API</u></a> starting today. This lets customers make large changes to their zones much more efficiently than before. Whether sending a POST, PUT, PATCH or DELETE, users can now execute these four different <a href="https://en.wikipedia.org/wiki/HTTP#Request_methods"><u>HTTP methods</u></a>, and multiple HTTP requests all at the same time.</p>
    <div>
      <h2>Efficient zone management matters</h2>
      <a href="#efficient-zone-management-matters">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/en-gb/learning/dns/dns-records/"><u>DNS records</u></a> are an essential part of most web applications and websites, and they serve many different purposes. The most common use case for a DNS record is to have a hostname point to an <a href="https://en.wikipedia.org/wiki/IPv4"><u>IPv4</u></a> address, this is called an <a href="https://www.cloudflare.com/en-gb/learning/dns/dns-records/dns-a-record/"><u>A record</u></a>:</p><p><b>example.com</b> 59 IN A <b>198.51.100.0</b></p><p><b>blog.example.com</b> 59 IN A <b>198.51.100.1</b></p><p><b>ask.example.com</b> 59 IN A <b>198.51.100.2</b></p><p>In its most simple form, this enables Internet users to connect to websites without needing to memorize their IP address. </p><p>Often, our customers need to be able to do things like create a whole batch of records, or enable <a href="https://developers.cloudflare.com/dns/manage-dns-records/reference/proxied-dns-records/"><u>proxying</u></a> on many records, or update many records to point to a new target at the same time, or even delete all of their records. Unfortunately, for most of these cases, we were asking customers to write their own custom scripts or programs to do these tasks for them, a number of which are open sourced and whose content has not been checked by us. These scripts are often used to avoid needing to repeatedly make the same API calls manually. This takes time, not only for the development of the scripts, but also to simply execute all the API calls, not to mention it can leave the zone in a bad state if some changes fail while others succeed.</p>
    <div>
      <h2>Introducing /batch</h2>
      <a href="#introducing-batch">
        
      </a>
    </div>
    <p>Starting today, everyone with a <a href="https://developers.cloudflare.com/dns/zone-setups/"><u>Cloudflare zone</u></a> will have access to this endpoint, with free tier customers getting access to 200 changes in one batch, and paid plans getting access to 3,500 changes in one batch. We have successfully tested up to 100,000 changes in one call. The API is simple, expecting a POST request to be made to the <a href="https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-batch-dns-records"><u>new API endpoint</u></a> /dns_records/batch, which passes in a JSON object in the body in the format:</p>
            <pre><code>{
    deletes:[]Record
    patches:[]Record
    puts:[]Record
    posts:[]Record
}
</code></pre>
            <p>Each list of records []Record will follow the same requirements as the regular API, except that the record ID on deletes, patches, and puts will be required within the Record object itself. Here is a simple example:</p>
            <pre><code>{
    "deletes": [
        {
            "id": "143004ef463b464a504bde5a5be9f94a"
        },
        {
            "id": "165e9ef6f325460c9ca0eca6170a7a23"
        }
    ],
    "patches": [
        {
            "id": "16ac0161141a4e62a79c50e0341de5c6",
            "content": "192.0.2.45"
        },
        {
            "id": "6c929ea329514731bcd8384dd05e3a55",
            "name": "update.example.com",
            "proxied": true
        }
    ],
    "puts": [
        {
            "id": "ee93eec55e9e45f4ae3cb6941ffd6064",
            "content": "192.0.2.50",
            "name": "no-change.example.com",
            "proxied": false,
            "ttl:": 1
        },
        {
            "id": "eab237b5a67e41319159660bc6cfd80b",
            "content": "192.0.2.45",
            "name": "no-change.example.com",
            "proxied": false,
            "ttl:": 3000
        }
    ],
    "posts": [
        {
            "name": "@",
            "type": "A",
            "content": "192.0.2.45",
            "proxied": false,
            "ttl": 3000
        },
        {
            "name": "a.example.com",
            "type": "A",
            "content": "192.0.2.45",
            "proxied": true
        }
    ]
}</code></pre>
            <p>Our API will then parse this and execute these calls in the following order: </p><ol><li><p>deletes</p></li><li><p>patches</p></li><li><p>puts</p></li><li><p>posts</p></li></ol><p>Each of these respective lists will be executed in the order given. This ordering system is important because it removes the need for our clients to worry about conflicts, such as if they need to create a CNAME on the same hostname as a to-be-deleted A record, which is not allowed in <a href="https://datatracker.ietf.org/doc/html/rfc1912#section-2.4"><u>RFC 1912</u></a>. In the event that any of these individual actions fail, the entire API call will fail and return the first error it sees. The batch request will also be executed inside a single database <a href="https://en.wikipedia.org/wiki/Database_transaction"><u>transaction</u></a>, which will roll back in the event of failure.</p><p>After the batch request has been successfully executed in our database, we then propagate the changes to our edge via <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale"><u>Quicksilver</u></a>, our distributed KV store. Each of the individual record changes inside the batch request is treated as a single key-value pair, and database transactions are not supported. As such, <b>we cannot guarantee that the propagation to our edge servers will be atomic</b>. For example, if replacing a <a href="https://developers.cloudflare.com/dns/manage-dns-records/how-to/subdomains-outside-cloudflare/"><u>delegation</u></a> with an A record, some resolvers may see the <a href="https://www.cloudflare.com/en-gb/learning/dns/dns-records/dns-ns-record/"><u>NS</u></a> record removed before the A record is added. </p><p>The response will follow the same format as the request. Patches and puts that result in no changes will be placed at the end of their respective lists.</p><p>We are also introducing some new changes to the Cloudflare dashboard, allowing users to select multiple records and subsequently:</p><ol><li><p>Delete all selected records</p></li><li><p>Change the proxy status of all selected records</p></li></ol>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ZU7nvMlcH2L51IqJrS1zC/db7ac600e503a72bb0c25679d63394e7/BLOG-2495_2.png" />
          </figure><p>We plan to continue improving the dashboard to support more batch actions based on your feedback.</p>
    <div>
      <h2>The journey</h2>
      <a href="#the-journey">
        
      </a>
    </div>
    <p>Although at the surface, this batch endpoint may seem like a fairly simple change, behind the scenes it is the culmination of a multi-year, multi-team effort. Over the past several years, we have been working hard to improve the DNS pipeline that takes our customers' records and pushes them to <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale"><u>Quicksilver</u></a>, our distributed database. As part of this effort, we have been improving our <a href="https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-list-dns-records"><u>DNS Records API</u></a> to reduce the overall latency. The DNS Records API is Cloudflare's most used API externally, serving twice as many requests as any other API at peak. In addition, the DNS Records API supports over 20 internal services, many of which touch very important areas such as DNSSEC, TLS, Email, Tunnels, Workers, Spectrum, and R2 storage. Therefore, it was important to build something that scales. </p><p>To improve API performance, we first needed to understand the complexities of the entire stack. At Cloudflare, we use <a href="https://www.jaegertracing.io/"><u>Jaeger tracing</u></a> to debug our systems. It gives us granular insights into a sample of requests that are coming into our APIs. When looking at API request latency, the <a href="https://www.jaegertracing.io/docs/1.23/architecture/#span"><u>span</u></a> that stood out was the time spent on each individual database lookup. The latency here can vary anywhere from ~1ms to ~5ms. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/61f3sKGUs9oWMPT9P4au6R/a91d8291b626f4bab3ac1c69adf62a5d/BLOG-2495_3.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3L3OaTb9cTKKKcIjCm1RLq/86ffd63116988025fd52105e316c5b5a/BLOG-2495_4.png" />
          </figure><p><sub><i>Jaeger trace showing variable database latency</i></sub></p><p>Given this variability in database query latency, we wanted to understand exactly what was going on within each DNS Records API request. When we first started on this journey, the breakdown of database lookups for each action was as follows:</p><table><tr><th><p><b>Action</b></p></th><th><p><b>Database Queries</b></p></th><th><p><b>Reason</b></p></th></tr><tr><td><p>POST</p></td><td><p>2 </p></td><td><p>One to write and one to read the new record.</p></td></tr><tr><td><p>PUT</p></td><td><p>3</p></td><td><p>One to collect, one to write, and one to read back the new record.</p></td></tr><tr><td><p>PATCH</p></td><td><p>3</p></td><td><p>One to collect, one to write, and one to read back the new record.</p></td></tr><tr><td><p>DELETE</p></td><td><p>2</p></td><td><p>One to read and one to delete.</p></td></tr></table><p>The reason we needed to read the newly created records on POST, PUT, and PATCH was because the record contains information filled in by the database which we cannot infer in the API. </p><p>Let’s imagine that a customer needed to edit 1,000 records. If each database lookup took 3ms to complete, that was 3ms * 3 lookups * 1,000 records = 9 seconds spent on database queries alone, not taking into account the round trip time to and from our API or any other processing latency. It’s clear that we needed to reduce the number of overall queries and ideally minimize per query latency variation. Let’s tackle the variation in latency first.</p><p>Each of these calls is not a simple INSERT, UPDATE, or DELETE, because we have functions wrapping these database calls for sanitization purposes. In order to understand the variable latency, we enlisted the help of <a href="https://www.postgresql.org/docs/current/auto-explain.html"><u>PostgreSQL’s “auto_explain”</u></a>. This module gives a breakdown of execution times for each statement without needing to EXPLAIN each one by hand. We used the following settings:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2myvmIREh2Q9yl30HbRus/29f085d40ba7dde34e9a46c27e3c6ba2/BLOG-2495_5.png" />
          </figure><p>A handful of queries showed durations like the one below, which took an order of magnitude longer than other queries.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/557xg66x8OiHM6pcAG4svk/56157cd0e5b6d7fd47f0152798598729/BLOG-2495_6.png" />
          </figure><p>We noticed that in several locations we were doing queries like:</p><p><code>IF (EXISTS (SELECT id FROM table WHERE row_hash = __new_row_hash))</code></p><p>If you are trying to insert into very large zones, such queries could mean even longer database query times, potentially explaining the discrepancy between 1ms and 5ms in our tracing images above. Upon further investigation, we already had a unique index on that exact hash. <a href="https://www.postgresql.org/docs/current/indexes-unique.html"><u>Unique indexes</u></a> in PostgreSQL enforce the uniqueness of one or more column values, which means we can safely remove those existence checks without risk of inserting duplicate rows.</p><p>The next task was to introduce database batching into our DNS Records API. In any API, external calls such as SQL queries are going to add substantial latency to the request. Database batching allows the DNS Records API to execute multiple SQL queries within one single network call, subsequently lowering the number of database round trips our system needs to make. </p><p>According to the table above, each database write also corresponded to a read after it had completed the query. This was needed to collect information like creation/modification timestamps and new IDs. To improve this, we tweaked our database functions to now return the newly created DNS record itself, removing a full round trip to the database. Here is the updated table:</p><table><tr><th><p><b>Action</b></p></th><th><p><b>Database Queries</b></p></th><th><p><b>Reason</b></p></th></tr><tr><td><p>POST</p></td><td><p>1 </p></td><td><p>One to write</p></td></tr><tr><td><p>PUT</p></td><td><p>2</p></td><td><p>One to read, one to write.</p></td></tr><tr><td><p>PATCH</p></td><td><p>2</p></td><td><p>One to read, one to write.</p></td></tr><tr><td><p>DELETE</p></td><td><p>2</p></td><td><p>One to read, one to delete.</p></td></tr></table><p>We have room for improvement here, however we cannot easily reduce this further due to some restrictions around auditing and other sanitization logic.</p><p><b>Results:</b></p><table><tr><th><p><b>Action</b></p></th><th><p><b>Average database time before</b></p></th><th><p><b>Average database time after</b></p></th><th><p><b>Percentage Decrease</b></p></th></tr><tr><td><p>POST</p></td><td><p>3.38ms</p></td><td><p>0.967ms</p></td><td><p>71.4%</p></td></tr><tr><td><p>PUT</p></td><td><p>4.47ms</p></td><td><p>2.31ms</p></td><td><p>48.4%</p></td></tr><tr><td><p>PATCH</p></td><td><p>4.41ms</p></td><td><p>2.24ms</p></td><td><p>49.3%</p></td></tr><tr><td><p>DELETE</p></td><td><p>1.21ms</p></td><td><p>1.21ms</p></td><td><p>0%</p></td></tr></table><p>These are some pretty good improvements! Not only did we reduce the API latency, we also reduced the database query load, benefiting other systems as well.</p>
    <div>
      <h2>Weren’t we talking about batching?</h2>
      <a href="#werent-we-talking-about-batching">
        
      </a>
    </div>
    <p>I previously mentioned that the /batch endpoint is fully atomic, making use of a single database transaction. However, a single transaction may still require multiple database network calls, and from the table above, that can add up to a significant amount of time when dealing with large batches. To optimize this, we are making use of <a href="https://pkg.go.dev/github.com/jackc/pgx/v4#Batch"><u>pgx/batch</u></a>, a Golang object that allows us to write and subsequently read multiple queries in a single network call. Here is a high level of how the batch endpoint works:</p><ol><li><p>Collect all the records for the PUTs, PATCHes and DELETEs.</p></li><li><p>Apply any per record differences as requested by the PATCHes and PUTs.</p></li><li><p>Format the batch SQL query to include each of the actions.</p></li><li><p>Execute the batch SQL query in the database.</p></li><li><p>Parse each database response and return any errors if needed.</p></li><li><p>Audit each change.</p></li></ol><p>This takes at most only two database calls per batch. One to fetch, and one to write/delete. If the batch contains only POSTs, this will be further reduced to a single database call. Given all of this, we should expect to see a significant improvement in latency when making multiple changes, which we do when observing how these various endpoints perform: </p><p><i>Note: Each of these queries was run from multiple locations around the world and the median of the response times are shown here. The server responding to queries is located in Portland, Oregon, United States. Latencies are subject to change depending on geographical location.</i></p><p><b>Create only:</b></p><table><tr><th><p>
</p></th><th><p><b>10 Records</b></p></th><th><p><b>100 Records</b></p></th><th><p><b>1,000 Records</b></p></th><th><p><b>10,000 Records</b></p></th></tr><tr><td><p><b>Regular API</b></p></td><td><p>7.55s</p></td><td><p>74.23s</p></td><td><p>757.32s</p></td><td><p>7,877.14s</p></td></tr><tr><td><p><b>Batch API - Without database batching</b></p></td><td><p>0.85s</p></td><td><p>1.47s</p></td><td><p>4.32s</p></td><td><p>16.58s</p></td></tr><tr><td><p><b>Batch API - with database batching</b></p></td><td><p>0.67s</p></td><td><p>1.21s</p></td><td><p>3.09s</p></td><td><p>10.33s</p></td></tr></table><p><b>Delete only:</b></p><table><tr><th><p>
</p></th><th><p><b>10 Records</b></p></th><th><p><b>100 Records</b></p></th><th><p><b>1,000 Records</b></p></th><th><p><b>10,000 Records</b></p></th></tr><tr><td><p><b>Regular API</b></p></td><td><p>7.28s</p></td><td><p>67.35s</p></td><td><p>658.11s</p></td><td><p>7,471.30s</p></td></tr><tr><td><p><b>Batch API - without database batching</b></p></td><td><p>0.79s</p></td><td><p>1.32s</p></td><td><p>3.18s</p></td><td><p>17.49s</p></td></tr><tr><td><p><b>Batch API - with database batching</b></p></td><td><p>0.66s</p></td><td><p>0.78s</p></td><td><p>1.68s</p></td><td><p>7.73s</p></td></tr></table><p><b>Create/Update/Delete:</b></p><table><tr><th><p>
</p></th><th><p><b>10 Records</b></p></th><th><p><b>100 Records</b></p></th><th><p><b>1,000 Records</b></p></th><th><p><b>10,000 Records</b></p></th></tr><tr><td><p><b>Regular API</b></p></td><td><p>7.11s</p></td><td><p>72.41s</p></td><td><p>715.36s</p></td><td><p>7,298.17s</p></td></tr><tr><td><p><b>Batch API - without database batching</b></p></td><td><p>0.79s</p></td><td><p>1.36s</p></td><td><p>3.05s</p></td><td><p>18.27s</p></td></tr><tr><td><p><b>Batch API - with database batching</b></p></td><td><p>0.74s</p></td><td><p>1.06s</p></td><td><p>2.17s</p></td><td><p>8.48s</p></td></tr></table><p><b>Overall Average:</b></p><table><tr><th><p>
</p></th><th><p><b>10 Records</b></p></th><th><p><b>100 Records</b></p></th><th><p><b>1,000 Records</b></p></th><th><p><b>10,000 Records</b></p></th></tr><tr><td><p><b>Regular API</b></p></td><td><p>7.31s</p></td><td><p>71.33s</p></td><td><p>710.26s</p></td><td><p>7,548.87s</p></td></tr><tr><td><p><b>Batch API - without database batching</b></p></td><td><p>0.81s</p></td><td><p>1.38s</p></td><td><p>3.51s</p></td><td><p>17.44s</p></td></tr><tr><td><p><b>Batch API - with database batching</b></p></td><td><p>0.69s</p></td><td><p>1.02s</p></td><td><p>2.31s</p></td><td><p>8.85s</p></td></tr></table><p>We can see that on average, the new batching API is significantly faster than the regular API trying to do the same actions, and it’s also nearly twice as fast as the batching API without batched database calls. We can see that at 10,000 records, the batching API is a staggering 850x faster than the regular API. As mentioned above, these numbers are likely to change for a number of different reasons, but it’s clear that making several round trips to and from the API adds substantial latency, regardless of the region.</p>
    <div>
      <h2>Batch overload</h2>
      <a href="#batch-overload">
        
      </a>
    </div>
    <p>Making our API faster is awesome, but we don’t operate in an isolated environment. Each of these records needs to be processed and pushed to <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale"><u>Quicksilver</u></a>, our distributed database. If we have customers creating tens of thousands of records every 10 seconds, we need to be able to handle this downstream so that we don’t overwhelm our system. In a May 2022 blog post titled <a href="https://blog.cloudflare.com/dns-build-improvement"><i><u>How we improved DNS record build speed by more than 4,000x</u></i></a>, I noted<i> </i>that:</p><blockquote><p><i>We plan to introduce a batching system that will collect record changes into groups to minimize the number of queries we make to our database and Quicksilver.</i></p></blockquote><p>This task has since been completed, and our propagation pipeline is now able to batch thousands of record changes into a single database query which can then be published to Quicksilver in order to be propagated to our global network. </p>
    <div>
      <h2>Next steps</h2>
      <a href="#next-steps">
        
      </a>
    </div>
    <p>We have a couple more improvements we may want to bring into the API. We also intend to improve the UI to bring more usability improvements to the dashboard to more easily manage zones. <a href="https://research.rallyuxr.com/cloudflare/lp/cm0zu2ma7017j1al98l1m8a7n?channel=share&amp;studyId=cm0zu2ma4017h1al9byak79iw"><u>We would love to hear your feedback</u></a>, so please let us know what you think and if you have any suggestions for improvements.</p><p>For more details on how to use the new /batch API endpoint, head over to our <a href="https://developers.cloudflare.com/dns/manage-dns-records/how-to/batch-record-changes/"><u>developer documentation</u></a> and <a href="https://developers.cloudflare.com/api/operations/dns-records-for-a-zone-batch-dns-records"><u>API reference</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Kafka]]></category>
            <category><![CDATA[Database]]></category>
            <guid isPermaLink="false">op0CI3wllMcGjptdRb2Ce</guid>
            <dc:creator>Alex Fattouche</dc:creator>
        </item>
        <item>
            <title><![CDATA[Application Security report: 2024 update]]></title>
            <link>https://blog.cloudflare.com/application-security-report-2024-update/</link>
            <pubDate>Thu, 11 Jul 2024 17:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare’s updated 2024 view on Internet cyber security trends spanning global traffic insights, bot traffic insights, API traffic insights, and client-side risks ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/QISWKhi85GFq3Aqcj9NSX/a983a69c4df14e83712ecc0beb71117a/AD_4nXftYZ9tWp6nRYAEltNHH2LVZZDWKRMZn4Y8oTwdLKuFY-wcPHiULhXzJouGXdjVVDpCeR9T63J_cCxqSzKoq4QsgeXVxQ7MmkL5GS0muw5jhWFRr1fhfpVoH314" />
            
            </figure><p>Over the last twelve months, the Internet security landscape has changed dramatically. Geopolitical uncertainty, coupled with an active 2024 voting season in many countries across the world, has led to a substantial increase in malicious traffic activity across the Internet. In this report, we take a look at Cloudflare’s perspective on Internet application security.</p><p>This report is the fourth edition of our Application Security Report and is an official update to our <a href="/application-security-report-q2-2023">Q2 2023 report</a>. New in this report is a section focused on client-side security within the context of web applications.</p><p>Throughout the report we discuss various insights. From a global standpoint, mitigated traffic across the whole network now averages 7%, and WAF and Bot mitigations are the source of over half of that. While DDoS attacks remain the number one attack vector used against web applications, targeted CVE attacks are also worth keeping an eye on, as we have seen exploits as fast as 22 minutes after a proof of concept was released.</p><p>Focusing on bots, about a third of all traffic we observe is automated, and of that, the vast majority (93%) is not generated by bots in Cloudflare’s verified list and is potentially malicious.</p><p>API traffic is also still growing, now accounting for 60% of all traffic, and maybe more concerning, is that organizations have up to a quarter of their API endpoints not accounted for.</p><p>We also touch on client side security and the proliferation of third-party integrations in web applications. On average, enterprise sites integrate 47 third-party endpoints according to Page Shield data.</p><p>It is also worth mentioning that since the last report, our network, from which we gather the data and insights, is bigger and faster: we are now processing an average of 57 million HTTP requests/second (<b>+23.9%</b> YoY) and 77 million at peak (<b>+22.2%</b> YoY). From a DNS perspective, we are handling 35 million DNS queries per second (<b>+40%</b> YoY). This is the sum of authoritative and resolver requests served by our infrastructure.</p><p>Maybe even more noteworthy, is that, focusing on HTTP requests only, in Q1 2024 Cloudflare blocked an average of 209 billion cyber threats each day (<b>+86.6%</b> YoY). That is a substantial increase in relative terms compared to the same time last year.</p><p>As usual, before we dive in, we need to define our terms.</p>
    <div>
      <h2>Definitions</h2>
      <a href="#definitions">
        
      </a>
    </div>
    <p>Throughout this report, we will refer to the following terms:</p><ul><li><p><b>Mitigated traffic:</b> any eyeball HTTP* request that had a “terminating” action applied to it by the Cloudflare platform. These include the following actions: <code>BLOCK</code>, <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#legacy-captcha-challenge"><code>CHALLENGE</code></a>, <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#js-challenge"><code>JS_CHALLENGE</code></a> and <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#managed-challenge-recommended"><code>MANAGED_CHALLENGE</code></a>. This does not include requests that had the following actions applied: <code>LOG</code>, <code>SKIP</code>, <code>ALLOW</code>. They also accounted for a relatively small percentage of requests. Additionally, we improved our calculation regarding the <code>CHALLENGE</code> type actions to ensure that only unsolved challenges are counted as mitigated. A detailed <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/actions/">description of actions</a> can be found in our developer documentation. This has not changed from last year’s report.</p></li><li><p><b>Bot traffic/automated traffic</b>: any HTTP* request identified by Cloudflare’s <a href="https://www.cloudflare.com/products/bot-management/">Bot Management</a> system as being generated by a bot. This includes requests with a <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">bot score</a> between <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">1 and 29</a> inclusive. This has not changed from last year’s report.</p></li><li><p><b>API traffic</b>: any HTTP* request with a response content type of XML or JSON. Where the response content type is not available, such as for mitigated requests, the equivalent Accept content type (specified by the user agent) is used instead. In this latter case, API traffic won’t be fully accounted for, but it still provides a good representation for the purposes of gaining insights. This has not changed from last year’s report.</p></li></ul><p>Unless otherwise stated, the time frame evaluated in this post is the period from April 1, 2023, through March 31, 2024, inclusive.</p><p>Finally, please note that the data is calculated based only on traffic observed across the Cloudflare network and does not necessarily represent overall HTTP traffic patterns across the Internet.</p><p><sup>*When referring to HTTP traffic we mean both HTTP and HTTPS.</sup></p>
    <div>
      <h2>Global traffic insights</h2>
      <a href="#global-traffic-insights">
        
      </a>
    </div>
    
    <div>
      <h3>Average mitigated daily traffic increases to nearly 7%</h3>
      <a href="#average-mitigated-daily-traffic-increases-to-nearly-7">
        
      </a>
    </div>
    <p>Compared to the prior 12-month period, Cloudflare mitigated a higher percentage of application layer traffic and layer 7 (L7) DDoS attacks between Q2 2023 and Q1 2024, growing from 6% to 6.8%.</p><p><b>Figure 1:</b> Percent of mitigated HTTP traffic increasing over the last 12 months</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5HrbJsLZMv12tBdVLAJEwk/56519fe3c06a1996324ba7a0e710fe5e/unnamed-6.png" />
            
            </figure><p>During large global attack events, we can observe spikes of mitigated traffic approaching 12% of all HTTP traffic. These are much larger spikes than we have ever observed across our entire network.</p>
    <div>
      <h3>WAF and Bot mitigations accounted for 53.9% of all mitigated traffic</h3>
      <a href="#waf-and-bot-mitigations-accounted-for-53-9-of-all-mitigated-traffic">
        
      </a>
    </div>
    <p>As the Cloudflare platform continues to expose additional signals to identify potentially malicious traffic, customers have been actively using these signals in WAF Custom Rules to improve their security posture. Example signals include our <a href="https://developers.cloudflare.com/waf/about/waf-attack-score/">WAF Attack Score</a>, which identifies malicious payloads, and our <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">Bot Score</a>, which identifies automated traffic.</p><p>After WAF and Bot mitigations, HTTP DDoS rules are the second-largest contributor to mitigated traffic. IP reputation, that uses our <a href="https://developers.cloudflare.com/waf/tools/security-level/">IP threat score</a> to block traffic, and access rules, which are simply IP and country blocks, follow in third and fourth place.</p><p><b>Figure 2: Mitigated traffic by Cloudflare product group</b></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/l9emHl05MUfrpqsLehyMr/51e8d8d327a5d78d90126de82bebcc38/unnamed--5--3.png" />
            
            </figure>
    <div>
      <h3>CVEs exploited as fast as 22 minutes after proof-of-concept published</h3>
      <a href="#cves-exploited-as-fast-as-22-minutes-after-proof-of-concept-published">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/learning/security/threats/zero-day-exploit/">Zero-day exploits</a> (also called zero-day threats) are increasing, as is the speed of weaponization of disclosed CVEs. In 2023, 97 zero-days were <a href="https://cloud.google.com/blog/topics/threat-intelligence/2023-zero-day-trends">exploited in the wild</a>, and that’s along with a 15% increase of disclosed <a href="https://www.cve.org/About/Overview">CVEs</a> between 2022 and 2023.</p><p>Looking at CVE exploitation attempts against customers, Cloudflare mostly observed scanning activity, followed by command injections, and some exploitation attempts of vulnerabilities that had PoCs available online, including Apache <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-50164">CVE-2023-50164</a> and <a href="https://nvd.nist.gov/vuln/detail/cve-2022-33891">CVE-2022-33891</a>, Coldfusion <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-29298">CVE-2023-29298</a> <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-38203">CVE-2023-38203</a> and <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-26360">CVE-2023-26360</a>, and MobileIron <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-35082">CVE-2023-35082</a>.</p><p>This trend in CVE exploitation attempt activity indicates that attackers are going for the easiest targets first, and likely having success in some instances given the continued activity around old vulnerabilities.</p><p>As just one example, Cloudflare observed exploitation attempts of <a href="https://nvd.nist.gov/vuln/detail/CVE-2024-27198">CVE-2024-27198</a> (JetBrains TeamCity authentication bypass) at 19:45 UTC on March 4, just 22 minutes after proof-of-concept code was published.</p><p><b>Figure 3:</b> JetBrains TeamCity authentication bypass timeline</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2193b87OL8QLpaE3hxF7RP/06b61f3bdcac2d4ce8364a4b408e35f4/image8-2.png" />
            
            </figure><p>The speed of exploitation of disclosed CVEs is often quicker than the speed at which humans can create WAF rules or create and deploy patches to mitigate attacks. This also applies to our own internal security analyst team that maintains the WAF Managed Ruleset, which has led us to <a href="/detecting-zero-days-before-zero-day">combine the human written signatures with an ML-based approach</a> to achieve the best balance between low false positives and speed of response.</p><p>CVE exploitation campaigns from specific threat actors are clearly visible when we focus on a subset of CVE categories. For example, if we filter on CVEs that result in remote code execution (RCE), we see clear attempts to exploit Apache and Adobe installations towards the end of 2023 and start of 2024 along with a notable campaign targeting Citrix in May of this year.</p><p><b>Figure 4:</b> Worldwide daily number of requests for Code Execution CVEs</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5A2f3Shrcp7rw6zmE9VoNa/90dd0ab1a3e9a80dddfc3c893bebb283/unnamed--1--4.png" />
            
            </figure><p>Similar views become clearly visible when focusing on other CVEs or specific attack categories.</p>
    <div>
      <h3>DDoS attacks remain the most common attack against web applications</h3>
      <a href="#ddos-attacks-remain-the-most-common-attack-against-web-applications">
        
      </a>
    </div>
    <p>DDoS attacks remain the most common attack type against web applications, with DDoS comprising 37.1% of all mitigated application traffic over the time period considered.</p><p><b>Figure 5:</b> Volume of HTTP DDoS attacks over time</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3578XXGADnTHyYT6Kdf6ez/19dc83ad5c2b4989d0542f39fb755fa5/unnamed--6--3.png" />
            
            </figure><p>We saw a large increase in volumetric attacks in February and March 2024. This was partly the result of improved detections deployed by our teams, in addition to increased attack activity. In the first quarter of 2024 alone, Cloudflare’s automated defenses mitigated 4.5 million unique DDoS attacks, an amount equivalent to 32% of all the DDoS attacks Cloudflare mitigated in 2023. Specifically, application layer HTTP DDoS attacks increased by 93% YoY and 51% quarter-over-quarter (QoQ).</p><p>Cloudflare correlates DDoS attack traffic and defines unique attacks by looking at event start and end times along with target destination.</p><p>Motives for launching DDoS attacks range from targeting specific organizations for financial gains (ransom), to testing the capacity of botnets, to targeting institutions and countries for political reasons. As an example, Cloudflare observed a 466% increase in DDoS attacks on Sweden after its acceptance to the NATO alliance on March 7, 2024. This mirrored the DDoS pattern observed during Finland’s NATO acceptance in 2023. The size of DDoS attacks themselves are also increasing.</p><p>In August 2023, Cloudflare mitigated a hyper-volumetric <a href="/zero-day-rapid-reset-http2-record-breaking-ddos-attack">HTTP/2 Rapid Reset</a> DDoS attack that peaked at 201 million requests per second (rps) – three times larger than any previously observed attack. In the attack, threat actors exploited a zero-day vulnerability in the HTTP/2 protocol that had the potential to incapacitate nearly any server or application supporting HTTP/2. This underscores how menacing DDoS vulnerabilities are for unprotected organizations.</p><p>Gaming and gambling became the most targeted sector by DDoS attacks, followed by Internet technology companies and cryptomining.</p><p><b>Figure 6:</b> Largest HTTP DDoS attacks as seen by Cloudflare, by year</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1FYxByBvHz2MQQ8WhKErWk/a7f757447c04820ea5a642838b3e5e10/image1.jpg" />
            
            </figure>
    <div>
      <h2>Bot traffic insights</h2>
      <a href="#bot-traffic-insights">
        
      </a>
    </div>
    <p>Cloudflare has continued to invest heavily in our bot detection systems. In early July, we declared <a href="/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click">AIndependence</a> to help preserve a safe Internet for content creators, offering a brand new “easy button” to <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block all AI bots</a>. It’s available for all customers, including those on our free tier.</p><p>Major progress has also been made in other complementary systems such as our Turnstile offering, a user-friendly, privacy-preserving alternative to CAPTCHA.</p><p>All these systems and technologies help us better identify and differentiate human traffic from automated bot traffic.</p>
    <div>
      <h3>On average, bots comprise one-third of all application traffic</h3>
      <a href="#on-average-bots-comprise-one-third-of-all-application-traffic">
        
      </a>
    </div>
    <p>31.2% of all application traffic processed by Cloudflare is bot traffic. This percentage has stayed relatively consistent (hovering at about 30%) over the past three years.</p><p>The term bot traffic may carry a negative connotation, but in reality bot traffic is not necessarily good or bad; it all depends on the purpose of the bots. Some are “good” and perform a needed service, such as customer service chatbots and authorized search engine crawlers. But some bots misuse an online product or service and need to be blocked.</p><p>Different application owners may have different criteria for what they deem a “bad” bot. For example, some organizations may want to <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">block a content scraping bot</a> that is being deployed by a competitor to undercut on prices, whereas an organization that does not sell products or services may not be as concerned with content scraping. Known, good bots are classified by Cloudflare as “verified bots.”</p>
    <div>
      <h3>93% of bots we identified were unverified bots, and potentially malicious</h3>
      <a href="#93-of-bots-we-identified-were-unverified-bots-and-potentially-malicious">
        
      </a>
    </div>
    <p>Unverified bots are often created for disruptive and harmful purposes, such as hoarding inventory, launching DDoS attacks, or attempting to take over an account via brute force or credential stuffing. Verified bots are those that are known to be safe, such as search engine crawlers, and Cloudflare aims to verify all major legitimate bot operators. <a href="https://radar.cloudflare.com/traffic/verified-bots">A list of all verified bots</a> can be found in our documentation.</p><p>Attackers leveraging bots focus most on industries that could bring them large financial gains. For example, consumer goods websites are often the target of inventory hoarding, price scraping run by competition or automated applications aimed at exploiting some sort of arbitrage (for example, sneaker bots). This type of abuse can have a significant financial impact on the target organization.</p><p><b>Figure 8:</b> Industries with the highest median daily share of bot traffic</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/XIeyHN59gsaqxq0OQvCKV/4e23e2b081263ae09c6ca3de6aac2cdd/unnamed--7--3.png" />
            
            </figure>
    <div>
      <h2>API traffic insights</h2>
      <a href="#api-traffic-insights">
        
      </a>
    </div>
    <p>Consumers and end users expect dynamic web and mobile experiences powered by APIs. For businesses, APIs fuel competitive advantages, greater business intelligence, faster cloud deployments, integration of new AI capabilities, and more.</p><p>However, APIs introduce new risks by providing outside parties additional attack surfaces with which to access applications and databases which also need to be secured. As a consequence, numerous attacks we observe are now targeting API endpoints first rather than the traditional web interfaces.</p><p>The additional security concerns are of course not slowing down adoption of API first applications.</p>
    <div>
      <h3>60% of dynamic (non cacheable) traffic is API-related</h3>
      <a href="#60-of-dynamic-non-cacheable-traffic-is-api-related">
        
      </a>
    </div>
    <p>This is a two percentage point increase compared to last year’s report. Of this 60%, about 4% on average is mitigated by our security systems.</p><p><b>Figure 9</b>: Share of mitigated API traffic</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/YmPWYVWpG250GqjO2noHu/63582e53d5f43cda2068385ea9713976/unnamed--3--3.png" />
            
            </figure><p>A substantial spike is visible around January 11-17 that accounts for almost a 10% increase in traffic share alone for that period. This was due to a specific customer zone receiving attack traffic that was mitigated by a WAF Custom Rule.</p><p>Digging into mitigation sources for API traffic, we see the WAF being the largest contributor, as standard malicious payloads are commonly applicable to both API endpoints and standard web applications.</p><p><b>Figure 10:</b> API mitigated traffic broken down by product group</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/76D0iSJOzvx4ArJYIXIfLI/a4bf06a320cc8e9cc03b7da2ab6f35b3/unnamed--4--3.png" />
            
            </figure>
    <div>
      <h2>A quarter of APIs are “shadow APIs”</h2>
      <a href="#a-quarter-of-apis-are-shadow-apis">
        
      </a>
    </div>
    <p>You cannot protect what you cannot see. And, many organizations lack accurate API inventories, even when they believe they can correctly identify API traffic.</p><p>Using our proprietary machine learning model that scans not just known API calls, but all HTTP requests (identifying API traffic that may be going unaccounted for), we found that organizations had 33% more public-facing API endpoints than they knew about. This number was the median, and it was calculated by comparing the number of API endpoints detected through machine learning based discovery vs. customer-provided session identifiers.</p><p>This suggests that nearly a quarter of APIs are “shadow APIs” and may not be properly inventoried and secured.</p>
    <div>
      <h2>Client-side risks</h2>
      <a href="#client-side-risks">
        
      </a>
    </div>
    <p>Most organizations’ web apps rely on separate programs or pieces of code from third-party providers (usually coded in JavaScript). The use of third-party scripts accelerates modern web app development and allows organizations to ship features to market faster, without having to build all new app features in-house.</p><p>Using Cloudflare's client side security product, <a href="https://developers.cloudflare.com/page-shield/">Page Shield</a>, we can get a view on the popularity of third party libraries used on the Internet and the risk they pose to organizations. This has become very relevant recently due to the <a href="http://polyfill.io">Polyfill.io incident</a> that affected more than one hundred thousand sites.</p>
    <div>
      <h3>Enterprise applications use 47 third-party scripts on average</h3>
      <a href="#enterprise-applications-use-47-third-party-scripts-on-average">
        
      </a>
    </div>
    <p>Cloudflare’s typical enterprise customer uses an average of 47 third-party scripts, and a median of 20 third-party scripts. The average is much higher than the median due to SaaS providers, who often have thousands of subdomains which may all use third-party scripts. Here are some of the top third-party script providers Cloudflare customers commonly use:</p><ul><li><p>Google (Tag Manager, Analytics, Ads, Translate, reCAPTCHA, YouTube)</p></li><li><p>Meta (Facebook Pixel, Instagram)</p></li><li><p>Cloudflare (Web Analytics)</p></li><li><p>jsDelivr</p></li><li><p>New Relic</p></li><li><p>Appcues</p></li><li><p>Microsoft (Clarity, Bing, LinkedIn)</p></li><li><p>jQuery</p></li><li><p>WordPress (Web Analytics, hosted plugins)</p></li><li><p>Pinterest</p></li><li><p>UNPKG</p></li><li><p>TikTok</p></li><li><p>Hotjar</p></li></ul><p>While useful, third-party software dependencies are often loaded directly by the end-user’s browser (i.e. they are loaded client-side) placing organizations and their customers at risk given that organizations have no direct control over third-party security measures. For example, in the retail sector, 18% of all data breaches <a href="https://www.verizon.com/business/resources/reports/dbir/">originate from Magecart style attacks</a>, according to Verizon’s 2024 Data Breach Investigations Report.</p>
    <div>
      <h3>Enterprise applications connect to nearly 50 third-parties on average</h3>
      <a href="#enterprise-applications-connect-to-nearly-50-third-parties-on-average">
        
      </a>
    </div>
    <p>Loading a third-party script into your website poses risks, even more so when that script “calls home” to submit data to perform the intended function. A typical example here is Google Analytics: whenever a user performs an action, the Google Analytics script will submit data back to the Google servers. We identify these as connections.</p><p>On average, each enterprise website connects to 50 separate third-party destinations, with a median of 15. Each of these connections also poses a potential client-side security risk as attackers will often use them to exfiltrate additional data going unnoticed.</p><p>Here are some of the top third-party connections Cloudflare customers commonly use:</p><ul><li><p>Google (Analytics, Ads)</p></li><li><p>Microsoft (Clarity, Bing, LinkedIn)</p></li><li><p>Meta (Facebook Pixel)</p></li><li><p>Hotjar</p></li><li><p>Kaspersky</p></li><li><p>Sentry</p></li><li><p>Criteo</p></li><li><p>tawk.to</p></li><li><p>OneTrust</p></li><li><p>New Relic</p></li><li><p>PayPal</p></li></ul>
    <div>
      <h2>Looking forward</h2>
      <a href="#looking-forward">
        
      </a>
    </div>
    <p>This application security report is also <a href="https://www.cloudflare.com/2024-application-security-trends/">available in PDF format</a> with additional recommendations on how to address many of the concerns raised, along with additional insights.</p><p>We also publish many of our reports with dynamic charts on <a href="https://radar.cloudflare.com/reports">Cloudflare Radar</a>, making it an excellent resource to keep up to date with the state of the Internet.</p> ]]></content:encoded>
            <category><![CDATA[Application Security]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">78VdVl96em2bFvHmZ4jeHj</guid>
            <dc:creator>Michael Tremante</dc:creator>
            <dc:creator>Sabina Zejnilovic</dc:creator>
            <dc:creator>Catherine Newcomb</dc:creator>
        </item>
        <item>
            <title><![CDATA[Developer Week 2024 wrap-up]]></title>
            <link>https://blog.cloudflare.com/developer-week-2024-wrap-up/</link>
            <pubDate>Mon, 08 Apr 2024 13:00:02 GMT</pubDate>
            <description><![CDATA[ Developer Week 2024 has officially come to a close. Here’s a quick recap of the announcements and in-depth technical explorations that went out last week ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7fwPu75tSubJgSS8nJ5gOt/6e2fd9b7cc6f9dcd7b86d73988a6e5fb/Dev-week-wrap-up-1.jpg" />
            
            </figure><p>Developer Week 2024 has officially come to a close. Each day last week, we shipped new products and functionality geared towards giving developers the components they need to build full-stack applications on Cloudflare.</p><p>Even though Developer Week is now over, we are continuing to innovate with the over two million developers who build on our platform. Building a platform is only as exciting as seeing what developers build on it. Before we dive into a recap of the announcements, to send off the week, we wanted to share how a couple of companies are using Cloudflare to power their applications:</p><blockquote><p><i>We have been using Workers for image delivery using R2 and have been able to maintain stable operations for a year after implementation. The speed of deployment and the flexibility of detailed configurations have greatly reduced the time and effort required for traditional server management. In particular, we have seen a noticeable cost savings and are deeply appreciative of the support we have received from Cloudflare Workers.</i>- <a href="http://www.fancs.com/">FAN Communications</a></p></blockquote><blockquote><p><i>Milkshake helps creators, influencers, and business owners create engaging web pages directly from their phone, to simply and creatively promote their projects and passions. Cloudflare has helped us migrate data quickly and affordably with R2. We use Workers as a routing layer between our users' websites and their images and assets, and to build a personalized analytics offering affordably. Cloudflare’s innovations have consistently allowed us to run infrastructure at a fraction of the cost of other developer platforms and we have been eagerly awaiting updates to D1 and Queues to sustainably scale Milkshake as the product continues to grow.</i>- <a href="https://milkshake.app/">Milkshake</a></p></blockquote><p>In case you missed anything, here’s a quick recap of the announcements and in-depth technical explorations that went out last week:</p>
    <div>
      <h2>Summary of announcements</h2>
      <a href="#summary-of-announcements">
        
      </a>
    </div>
    
    <div>
      <h3>Monday</h3>
      <a href="#monday">
        
      </a>
    </div>
    
<table>
<thead>
  <tr>
    <th><span>Announcement</span></th>
    <th><span>Summary</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/making-full-stack-easier-d1-ga-hyperdrive-queues"><span>Making state easy with D1 GA, Hyperdrive, Queues and Workers Analytics Engine updates</span></a></td>
    <td><span>A core part of any full-stack application is storing and persisting data! We kicked off the week with announcements that help developers build stateful applications on top of Cloudflare, including making D1, Cloudflare’s SQL database and Hyperdrive, our database accelerating service, generally available.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/building-d1-a-global-database"><span>Building D1: a Global Database</span></a></td>
    <td><span>D1, Cloudflare’s SQL database, is now generally available. With new support for 10GB databases, data export, and enhanced query debugging, we empower developers to build production-ready applications with D1 to meet all their relational SQL needs. To support Workers in global applications, we’re sharing a sneak peek of our design and API for D1 global read replication to demonstrate how developers scale their workloads with D1.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/workers-environment-live-object-bindings"><span>Why Workers environment variables contain live objects</span></a></td>
    <td><span>Bindings don't just reduce boilerplate. They are a core design feature of the Workers platform which simultaneously improve developer experience and application security in several ways. Usually these two goals are in opposition to each other, but bindings elegantly solve for both at the same time.</span></td>
  </tr>
</tbody>
</table>
    <div>
      <h3>Tuesday</h3>
      <a href="#tuesday">
        
      </a>
    </div>
    
<table>
<thead>
  <tr>
    <th><span>Announcement</span></th>
    <th><span>Summary</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/workers-ai-ga-huggingface-loras-python-support"><span>Leveling up Workers AI: General Availability and more new capabilities</span></a></td>
    <td><span>We made a series of AI-related announcements, including Workers AI, Cloudflare’s inference platform becoming GA, support for fine-tuned models with LoRAs, one-click deploys from HuggingFace, Python support for Cloudflare Workers, and more.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/fine-tuned-inference-with-loras"><span>Running fine-tuned models on Workers AI with LoRAs</span></a></td>
    <td><span>Workers AI now supports fine-tuned models using LoRAs. But what is a LoRA and how does it work? In this post, we dive into fine-tuning, LoRAs and even some math to share the details of how it all works under the hood.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/python-workers"><span>Bringing Python to Workers using Pyodide and WebAssembly</span></a></td>
    <td><span>We introduced Python support for Cloudflare Workers, now in open beta. We've revamped our systems to support Python, from the Workers runtime itself to the way Workers are deployed to Cloudflare’s network. Learn about a Python Worker's lifecycle, Pyodide, dynamic linking, and memory snapshots in this post.</span></td>
  </tr>
</tbody>
</table>
    <div>
      <h3>Wednesday</h3>
      <a href="#wednesday">
        
      </a>
    </div>
    
<table>
<thead>
  <tr>
    <th><span>Announcement</span></th>
    <th><span>Summary</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/r2-events-gcs-migration-infrequent-access"><span>R2 adds event notifications, support for migrations from Google Cloud Storage, and an infrequent access storage tier</span></a></td>
    <td><span>We announced three new features for Cloudflare R2: event notifications, support for migrations from Google Cloud Storage, and an infrequent access storage tier.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/data-anywhere-events-pipelines-durable-execution-workflows"><span>Data Anywhere with Pipelines, Event Notifications, and Workflows</span></a></td>
    <td><span>We’re making it easier to build scalable, reliable, data-driven applications on top of our global network, and so we announced a new Event Notifications framework; our take on durable execution, Workflows; and an upcoming streaming ingestion service, Pipelines.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/prisma-orm-and-d1"><span>Improving Cloudflare Workers and D1 developer experience with Prisma ORM</span></a></td>
    <td><span>Together, Cloudflare and Prisma make it easier than ever to deploy globally available apps with a focus on developer experience. To further that goal, Prisma ORM now natively supports Cloudflare Workers and D1 in Preview. With version 5.12.0 of Prisma ORM you can now interact with your data stored in D1 from your Cloudflare Workers with the convenience of the Prisma Client API. Learn more and try it out now.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/picsart-move-to-workers-huge-performance-gains"><span>How Picsart leverages Cloudflare's Developer Platform to build globally performant services</span></a></td>
    <td><span>Picsart, one of the world’s largest digital creation platforms, encountered performance challenges in catering to its global audience. Adopting Cloudflare's global-by-default Developer Platform emerged as the optimal solution, empowering Picsart to enhance performance and scalability substantially.</span></td>
  </tr>
</tbody>
</table>
    <div>
      <h3>Thursday</h3>
      <a href="#thursday">
        
      </a>
    </div>
    
<table>
<thead>
  <tr>
    <th><span>Announcement</span></th>
    <th><span>Summary</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/pages-workers-integrations-monorepos-nextjs-wrangler"><span>Announcing Pages support for monorepos, wrangler.toml, database integrations and more!</span></a></td>
    <td><span>We launched four improvements to Pages that bring functionality previously restricted to Workers, with the goal of unifying the development experience between the two. Support for monorepos, wrangler.toml, new additions to Next.js support and database integrations!</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/workers-production-safety"><span>New tools for production safety — Gradual Deployments, Stack Traces, Rate Limiting, and API SDKs</span></a></td>
    <td><span>Production readiness isn’t just about scale and reliability of the services you build with. We announced five updates that put more power in your hands – Gradual Deployments, Source mapped stack traces in Tail Workers, a new Rate Limiting API, brand-new API SDKs, and updates to Durable Objects – each built with mission-critical production services in mind.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/whats-next-for-cloudflare-media"><span>What’s new with Cloudflare Media: updates for Calls, Stream, and Images</span></a></td>
    <td><span>With Cloudflare Calls in open beta, you can build real-time, serverless video and audio applications. Cloudflare Stream lets your viewers instantly clip from ongoing streams. Finally, Cloudflare Images now supports automatic face cropping and has an upload widget that lets you easily integrate into your application.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/cloudflare-calls-anycast-webrtc"><span>Cloudflare Calls: millions of cascading trees all the way down</span></a></td>
    <td><span>Cloudflare Calls is a serverless SFU and TURN service running at Cloudflare’s edge. It’s now in open beta and costs $0.05/ real-time GB. It’s 100% anycast WebRTC.</span></td>
  </tr>
</tbody>
</table>
    <div>
      <h3>Friday</h3>
      <a href="#friday">
        
      </a>
    </div>
    
<table>
<thead>
  <tr>
    <th><span>Announcement</span></th>
    <th><span>Summary</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/browser-rendering-api-ga-rolling-out-cloudflare-snippets-swr-and-bringing-workers-for-platforms-to-our-paygo-plans"><span>Browser Rendering API GA, rolling out Cloudflare Snippets, SWR, and bringing Workers for Platforms to all users</span></a></td>
    <td><span>Browser Rendering API is now available to all paid Workers customers with improved session management.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/cloudflare-acquires-baselime-expands-observability-capabilities"><span>Cloudflare acquires Baselime to expand serverless application observability capabilities</span></a></td>
    <td><span>We announced that Cloudflare has acquired Baselime, a serverless observability company.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/cloudflare-acquires-partykit"><span>Cloudflare acquires PartyKit to allow developers to build real-time multi-user applications</span></a></td>
    <td><span>We announced that PartyKit, a trailblazer in enabling developers to craft ambitious real-time, collaborative, multiplayer applications, is now a part of Cloudflare. This acquisition marks a significant milestone in our journey to redefine the boundaries of serverless computing, making it more dynamic, interactive, and, importantly, stateful.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/blazing-fast-development-with-full-stack-frameworks-and-cloudflare"><span>Blazing fast development with full-stack frameworks and Cloudflare</span></a></td>
    <td><span>Full-stack web development with Cloudflare is now faster and easier! You can now use your framework’s development server while accessing D1 databases, R2 object stores, AI models, and more. Iterate locally in milliseconds to build sophisticated web apps that run on Cloudflare. Let’s dev together!</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/javascript-native-rpc"><span>We've added JavaScript-native RPC to Cloudflare Workers</span></a></td>
    <td><span>Cloudflare Workers now features a built-in RPC (Remote Procedure Call) system for use in Worker-to-Worker and Worker-to-Durable Object communication, with absolutely minimal boilerplate. We've designed an RPC system so expressive that calling a remote service can feel like using a library.</span></td>
  </tr>
  <tr>
    <td><a href="http://staging.blog.mrk.cfdata.org/2024-community-update"><span>Community Update: empowering startups building on Cloudflare and creating an inclusive community</span></a></td>
    <td><span>We closed out Developer Week by sharing updates on our Workers Launchpad program, our latest Developer Challenge, and the work we’re doing to ensure our community spaces – like our Discord and Community forums – are safe and inclusive for all developers.</span></td>
  </tr>
</tbody>
</table><p>Here's a video summary, by Craig Dennis, Developer Educator, AI:</p><blockquote><p>🏃<a href="https://twitter.com/CloudflareDev?ref_src=twsrc%5Etfw">@CloudflareDev</a> Developer Week 2024 🧡 ICYMI 🧡 Speed run <a href="https://t.co/0uzPJshC93">pic.twitter.com/0uzPJshC93</a></p>— Craig Dennis (@craigsdennis) <a href="https://twitter.com/craigsdennis/status/1778875721575989734?ref_src=twsrc%5Etfw">April 12, 2024</a></blockquote> 
    <div>
      <h3>Continue the conversation</h3>
      <a href="#continue-the-conversation">
        
      </a>
    </div>
    <p>Thank you for being a part of Developer Week! Want to continue the conversation and share what you’re building? Join us on <a href="https://discord.com/invite/cloudflaredev">Discord</a>. To get started building on Workers, check out our <a href="https://developers.cloudflare.com/workers/">developer documentation</a>.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Cloudflare Pages]]></category>
            <category><![CDATA[Rate Limiting]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[D1]]></category>
            <category><![CDATA[Connectivity Cloud]]></category>
            <guid isPermaLink="false">VNnYecAmN7CpST4nBbas0</guid>
            <dc:creator>Phillip Jones</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare’s URL Scanner, new features, and the story of how we built it]]></title>
            <link>https://blog.cloudflare.com/building-urlscanner/</link>
            <pubDate>Fri, 08 Mar 2024 14:00:09 GMT</pubDate>
            <description><![CDATA[ Discover the enhanced URL Scanner API: Now with direct access from the Security Center Investigate Portal, enjoy unlisted scans, multi-device screenshots, and seamless integration within the Cloudflare ecosystem ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Today, we’re excited to talk about <a href="https://radar.cloudflare.com/scan">URL Scanner</a>, a tool that helps everyone from security teams to everyday users to detect and safeguard against malicious websites by scanning and analyzing them. URL Scanner has executed almost a million scans since its <a href="/radar-url-scanner-early-access/">launch</a> last March on <a href="https://radar.cloudflare.com/scan">Cloudflare Radar</a>, driving us to continuously innovate and enhance its capabilities. Since that time, we have introduced unlisted scans, detailed malicious verdicts, enriched search functionality, and now, integration with Security Center and an official API, all built upon the robust foundation of <a href="https://developers.cloudflare.com/workers/">Cloudflare Workers</a>, <a href="https://developers.cloudflare.com/workers/runtime-apis/durable-objects/">Durable Objects</a>, and the <a href="/browser-rendering-open-beta/">Browser Rendering API</a>.</p>
    <div>
      <h2>Integration with the Security Center in the Cloudflare Dashboard</h2>
      <a href="#integration-with-the-security-center-in-the-cloudflare-dashboard">
        
      </a>
    </div>
    <p>Security Center is the single place in the Cloudflare Dashboard to map your <a href="https://www.cloudflare.com/learning/security/what-is-an-attack-surface/">attack surface</a>, identify potential security risks, and mitigate risks with a few clicks. Its users can now access the URL scanner directly from the <a href="https://developers.cloudflare.com/security-center/investigate/">Investigate Portal</a>, enhancing their cybersecurity workflow. These scans will be unlisted by default, ensuring privacy while facilitating a deep dive into <a href="https://www.cloudflare.com/learning/security/glossary/website-security-checklist/">website security.</a> Users will be able to see their historic scans and access the related reports when they need to, and they will benefit from automatic screenshots for multiple screen sizes, enriching the context of each scan.</p><p>Customers with Cloudflare dashboard access will enjoy higher API limits and faster response times, crucial for agile security operations. Integration with internal workflows becomes seamless, allowing for sophisticated network and user protection strategies.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5xsq5tQ1cjM8cQwnCRUCmh/f6f3a929b008c706765db6a22e5abde7/image2-24.png" />
            
            </figure><p><i>Security Center in the Cloudflare Dashboard</i></p>
    <div>
      <h2>Unlocking the potential of the URL Scanner API</h2>
      <a href="#unlocking-the-potential-of-the-url-scanner-api">
        
      </a>
    </div>
    <p>The <a href="https://developers.cloudflare.com/radar/investigate/url-scanner/">URL Scanner API</a> is a powerful asset for developers, enabling custom scans to detect <a href="https://www.cloudflare.com/learning/access-management/phishing-attack/">phishing</a> or <a href="https://www.cloudflare.com/learning/ddos/glossary/malware/">malware</a> risks, analyze website technologies, and much more. With new features like custom HTTP headers and multi-device screenshots, developers gain a comprehensive toolkit for thorough website assessment.</p>
    <div>
      <h3>Submitting a scan request</h3>
      <a href="#submitting-a-scan-request">
        
      </a>
    </div>
    <p>Using the API, here’s the simplest way to <a href="https://developers.cloudflare.com/api/operations/urlscanner-create-scan">submit</a> a scan request:</p>
            <pre><code>curl --request POST \
	--url https://api.cloudflare.com/client/v4/accounts/&lt;accountId&gt;/urlscanner/scan \
	--header 'Content-Type: application/json' \
--header "Authorization: Bearer &lt;API_TOKEN&gt;" \
	--data '{
		"url": "https://www.cloudflare.com",
	}'</code></pre>
            <p>New features include the option to set custom HTTP headers, like <a href="https://developer.mozilla.org/en-US/docs/Glossary/User_agent">User-Agent</a> and <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization">Authorization</a>, request multiple target device screenshots, like mobile and desktop, as well as set the visibility level to “unlisted”. This essentially marks the scan as private and was often requested by developers who wanted to keep their investigations confidential. Public scans, on the other hand, can be found by anyone through search and are useful to share results with the wider community. You can find more details in our <a href="https://developers.cloudflare.com/radar/investigate/url-scanner/">developer documentation</a>.</p>
    <div>
      <h3>Exploring the scan results</h3>
      <a href="#exploring-the-scan-results">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2NRMloVOilGBXsYy12xeeT/a43d66e1b6cd00bd5fdf2bd18ede1256/image5-16.png" />
            
            </figure><p><i>Scan results for</i> <a href="http://www.cloudflare.com"><i>www.cloudflare.com</i></a> <i>on Cloudflare Radar</i></p><p>Once a scan concludes, fetch the final <a href="https://developers.cloudflare.com/api/operations/urlscanner-get-scan">report</a> and the full <a href="https://developers.cloudflare.com/api/operations/urlscanner-get-scan-har">network log</a>. Recently added features include the `verdict` property, indicating the site’s malicious status, and the `securityViolations` section detailing <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP">CSP</a> or <a href="https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity">SRI</a> policy breaches — as a developer, you can also scan your own website and see our recommendations. Expect improvements on verdict accuracy over time, as this is an area we’re focusing on.</p>
    <div>
      <h3>Enhanced search functionality</h3>
      <a href="#enhanced-search-functionality">
        
      </a>
    </div>
    <p>Developers can now <a href="https://developers.cloudflare.com/api/operations/urlscanner-search-scans">search</a> scans by hostname, a specific URL or even <i>any</i> URL the page connected to during the scan. This allows, for example, to search for websites that use a JavaScript library named jquery.min.js (‘?path=jquery.min.js’). Future plans include additional features like searching by IP address, <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">ASN</a>, and malicious website categorisation.</p><p>The URL Scanner can be used for a diverse range of applications. These include capturing a website's evolving state over time (such as tracking changes to the front page of an online newspaper), analyzing technologies employed by a website, preemptively assessing potential risks (as when scrutinizing shortened URLs), and supporting the investigation of persistent cybersecurity threats (such as identifying affected websites hosting a malicious JavaScript file).</p>
    <div>
      <h2>How we built the URL Scanner API</h2>
      <a href="#how-we-built-the-url-scanner-api">
        
      </a>
    </div>
    <p>In recounting the process of developing the URL Scanner, we aim to showcase the potential and versatility of Cloudflare Workers as a platform. This story is more than a technical journey, but a testament to the capabilities inherent in our platform's suite of APIs. By dogfooding our own technology, we not only demonstrate confidence in its robustness but also encourage developers to harness the same capabilities for building sophisticated applications. The URL Scanner exemplifies how Cloudflare Workers, Durable Objects, and the Browser Rendering API seamlessly integrate.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/55HCrSeFuu3FUjcjjIXJyl/53c0ed5a74a3ca5052972fa191bd679b/image4-23.png" />
            
            </figure><p><i>High level overview of the Cloudflare URL Scanner technology stack</i></p><p>As seen above, Cloudflare’s runtime infrastructure is the foundation the system runs on. <a href="https://developers.cloudflare.com/workers/">Cloudflare Workers</a> serves the public API, <a href="https://developers.cloudflare.com/workers/runtime-apis/durable-objects/">Durable Objects</a> handles orchestration, <a href="https://developers.cloudflare.com/r2/">R2</a> acts as the primary storage solution, and <a href="https://developers.cloudflare.com/queues/">Queues</a> efficiently handles batch operations, all at the edge. However, what truly enables the URL Scanner’s capabilities is the <a href="https://developers.cloudflare.com/browser-rendering/">Browser Rendering API</a>. It’s what initially allowed us to release in such a short time frame, since we didn’t have to build and manage an entire fleet of Chrome browsers from scratch. We simply request a browser, and then using the well known <a href="https://pptr.dev/">Puppeteer</a> library, instruct it to fetch the webpage and process it in the way we want. This API is at the heart of the entire system.</p>
    <div>
      <h3>Scanning a website</h3>
      <a href="#scanning-a-website">
        
      </a>
    </div>
    <p>The entire process of scanning a website, can be split into 4 phases:</p><ol><li><p>Queue a scan</p></li><li><p>Browse to the website and compile initial report</p></li><li><p>Post-process: compile additional information and build final report</p></li><li><p>Store final report, ready for serving and searching</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5AUAfO5VBEUvbRlYwoJ5zL/4da6566b0dff56dca49e14fc500cc427/image1-28.png" />
            
            </figure><p>In short, we create a Durable Object, the Scanner, unique to each scan, which is responsible for orchestrating the scan from start to finish. Since we want to respond immediately to the user, we save the scan to the Durable Object’s transactional Key-Value storage, and schedule an alarm so we can perform the scan asynchronously a second later.  We then respond to the user, informing them that the scan request was accepted.</p><p>When the Scanner’s alarm triggers, we enter the second phase:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4qnzMmDQbPtWmyIClv6680/4ff2ee5c2a36b78a9d5e2d7f6d3a134a/image7-5.png" />
            
            </figure><p>There are 3 components at work in this phase, the Scanner, the Browser Pool and the Browser Controller, all <a href="https://developers.cloudflare.com/workers/runtime-apis/durable-objects/">Durable Objects</a>.</p><p>In the initial release, for each new scan we would launch a brand-new browser. However, This operation would take time and was inefficient, so after review, we decided to reuse browsers across multiple scans. This is why we introduced both the Browser Pool and the Browser Controller components. The Browser Pool keeps track of what browsers we have open, when they last pinged the browser pool (so it knows they’re alive), and whether they’re free to accept a new scan. The Browser Controller is responsible for keeping the browser instance alive, once it’s launched, and orchestrating (ahem, <a href="http://pptr.dev">puppeteering</a>) the entire browsing session. Here’s a simplified version of our Browser Controller code:</p>
            <pre><code>export class BrowserController implements DurableObject {
	//[..]
	private async handleNewScan(url: string) {
		if (!this.browser) {
			// Launch browser: 1st request to durable object
			this.browser = await puppeteer.launch(this.env.BROWSER)
			await this.state.storage.setAlarm(Date.now() + 5 * 1000)
		}
		// Open new page and navigate to url
		const page = await this.browser.newPage()
		await page.goto(url, { waitUntil: 'networkidle2', timeout: 5000, })

		// Capture DOM
		const dom = await page.content()

		// Clean up
		await page.close()

		return {
			dom: dom,
		}
	}

	async alarm() {
		if (!this.browser) {
			return
		}
		await this.browser.version() // stop websocket connection to Chrome from going idle
		
		// ping browser pool, let it know we're alive
		
		// Keep durable object alive
		await this.state.storage.setAlarm(Date.now() + 5 * 1000)
	}
}</code></pre>
            <p>Launching a browser (Step 6) and maintaining a connection to it is abstracted away from us thanks to the <a href="/browser-rendering-open-beta/">Browser Rendering API</a>. This API is responsible for all the infrastructure required to maintain a fleet of Chrome browsers, and led to a much quicker development and release of the URL Scanner. It also allowed us to use a well-known library, <a href="https://pptr.dev/">Puppeteer</a>, to communicate with Google Chrome via the <a href="https://chromedevtools.github.io/devtools-protocol/">DevTools</a> protocol.</p><p>The initial report is made up of the network log of all requests, captured in <a href="https://en.wikipedia.org/wiki/HAR_(file_format)">HAR</a> (HTTP Archive) format. HAR files, essentially JSON files, provide a detailed record of all interactions between a web browser and a website. As an established standard in the industry, HAR files can be easily <a href="https://developers.cloudflare.com/api/operations/urlscanner-get-scan-har">shared</a> and analyzed using specialized <a href="https://toolbox.googleapps.com/apps/har_analyzer/">tools</a>. In addition to this network log, we augment our dataset with an array of other metadata, including base64-encoded screenshots which provide a snapshot of the website at the moment of the scan.</p><p>Having this data, we transition to phase 3, where the Scanner Durable Object initiates a series of interactions with a few other Cloudflare APIs in order to collect additional information, like running a phishing scanner over the web page's Document Object Model (DOM), fetching <a href="https://www.cloudflare.com/learning/dns/dns-records/">DNS records</a>, and extracting information about <a href="https://developers.cloudflare.com/api/operations/domain-intelligence-get-domain-details">categories</a> and <a href="https://developers.cloudflare.com/api/operations/radar-get-ranking-domain-details">Radar rank</a> associated with the main hostname.</p><p>This process ensures that the final report is enriched with insights coming from different sources, making the URL Scanner more efficient in assessing websites. Once all the necessary information is collected, we compile the final report and store it as a JSON file within <a href="https://developers.cloudflare.com/r2">R2</a>, Cloudflare’s <a href="https://www.cloudflare.com/developer-platform/products/r2/">object storage solution</a>. To empower users with efficient scan searches, we use Postgres.</p><p>While the initial approach involved sending each completed scan promptly to the core API for immediate storage in Postgres, we realized that, as the rate of scans grew, a more efficient strategy would be to batch those operations, and for that, we use Worker Queues:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1gyX0lHVmgVGhmYZeyipcM/93933f31d4dd0905fef9d7cc234a1528/image6-8.png" />
            
            </figure><p>This allows us to better manage the write load on Postgres. We wanted scans available as soon as possible to those who requested them, but it’s ok if they’re only available in search results at a slightly later point in time (seconds to minutes, depending on load).</p><p>In short, <a href="https://developers.cloudflare.com/workers/runtime-apis/durable-objects/">Durable Objects</a> together with the <a href="/browser-rendering-open-beta/">Browser Rendering API</a> power the entire scanning process. Once that’s finished, the Cloudflare Worker serving the API will simply fetch it from <a href="https://developers.cloudflare.com/r2/">R2</a> by ID. All together, Workers, Durable Objects, and R2 scale seamlessly and will allow us to grow as demand evolves.</p>
    <div>
      <h3>Last but not least</h3>
      <a href="#last-but-not-least">
        
      </a>
    </div>
    <p>While we've extensively covered the URL scanning workflow, we've yet to delve into the construction of the API worker itself. Developed with <a href="https://www.typescriptlang.org/">Typescript</a>, it uses <a href="https://github.com/cloudflare/itty-router-openapi">itty-router-openapi</a>, a Javascript router with <a href="https://spec.openapis.org/oas/v3.1.0">Open API 3</a> schema generation and validation, originally built for <a href="https://radar.cloudflare.com/">Radar</a>, but that’s been improving ever since with contributions from the community. Here’s a quick example of how to set up an endpoint, with input validation built in:</p>
            <pre><code>import { DateOnly, OpenAPIRoute, Path, Str, OpenAPIRouter } from '@cloudflare/itty-router-openapi'

import { z } from 'zod'
import { OpenAPIRoute, OpenAPIRouter, Uuid } from '@cloudflare/itty-router-openapi'

export class ScanMetadataCreate extends OpenAPIRoute {
  static schema = {
    tags: ['Scans'],
    summary: 'Create Scan metadata',
    requestBody: {
      scan_id: Uuid,
      url: z.string().url(),
      destination_ip: z.string().ip(),
      timestamp: z.string().datetime(),
      console_logs: [z.string()],
    },
  }

  async handle(
    request: Request,
    env: any,
    context: any,
    data: any,
  ) {
    // Retrieve validated scan
    const newScanMetadata = data.body

    // Insert the scan

    // Return scan as json
    return newScanMetadata
  }
}


const router = OpenAPIRouter()
router.post('/scan/metadata/', ScanMetadataCreate)

// 404 for everything else
router.all('*', () =&gt; new Response('Not Found.', { status: 404 }))

export default {
  fetch: router.handle,
}</code></pre>
            <p>In the example above, the ScanMetadataCreate endpoint will make sure to validate the incoming POST data to match the defined schema before calling the ‘async handle(request,env,context,data)’ function. This way you can be sure that if your code is called, the data argument will always be validated and formatted.</p><p>You can learn more about the project on its <a href="https://github.com/cloudflare/itty-router-openapi">GitHub page</a>.</p>
    <div>
      <h2>Future plans and new features</h2>
      <a href="#future-plans-and-new-features">
        
      </a>
    </div>
    <p>Looking ahead, we're committed to further elevating the URL Scanner's capabilities. Key upcoming features include geographic scans, where users can customize the location that the scan is done from, providing critical insights into regional security threats and content compliance; expanded scan details, including more comprehensive headers and security details; and continuous performance improvements and optimisations, so we can deliver faster scan results.</p><p>The evolution of the URL Scanner is a reflection of our commitment to Internet safety and innovation. Whether you're a developer, a security professional, or simply invested in the safety of the digital landscape, the URL Scanner API offers a comprehensive suite of tools to enhance your efforts. Explore the new features today, and join us in shaping a safer Internet for everyone.</p><p>Remember, while Security Center's new capabilities offer advanced tools for URL Scanning for Cloudflare’s existing customers, the URL Scanner remains accessible for basic scans to the public on <a href="https://radar.cloudflare.com/scan">Cloudflare Radar</a>, ensuring our technology benefits a broad audience.</p><p>If you’re considering a new career direction, check out <a href="https://cloudflare.com/careers">our open positions</a>. We’re looking for individuals who want to help make the Internet better; learn more about our mission <a href="https://www.cloudflare.com/learning/what-is-cloudflare/">here</a>.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[URL Scanner]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">1JMMDhLvjentdgwo5df7bC</guid>
            <dc:creator>Sofia Cardita</dc:creator>
            <dc:creator>Alexandra Moraru</dc:creator>
        </item>
        <item>
            <title><![CDATA[Application Security Report: Q2 2023]]></title>
            <link>https://blog.cloudflare.com/application-security-report-q2-2023/</link>
            <pubDate>Mon, 21 Aug 2023 14:15:46 GMT</pubDate>
            <description><![CDATA[ We are back with a quarterly update of our Application Security report. Read on to learn about new attack trends and insights visible from Cloudflare’s global network ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3QrNN5PqgSYLMfSnYsQRrt/5b85d393b0ff4520e5ea771b684aa058/Application-Security-Trends-report.png" />
            
            </figure><p>Cloudflare has a unique vantage point on the Internet. From this position, we are able to see, explore, and identify trends that would otherwise go unnoticed. In this report we are doing just that and sharing our insights into Internet-wide application security trends.</p><p>This report is the third edition of our Application Security Report. The first one was <a href="/application-security/">published in March 2022</a>, with the second <a href="/application-security-2023/">published earlier this year in March</a>, and this is the first to be published on a  quarterly basis.</p><p>Since the last report, our network is bigger and faster: we are now processing an average of 46 million HTTP requests/second and 63 million at peak. We consistently handle approximately 25 million DNS queries per second. That's around 2.1 trillion DNS queries per day, and 65 trillion queries a month. This is the sum of authoritative and resolver requests served by our infrastructure. Summing up both HTTP and DNS requests, we get to see a lot of malicious traffic. Focusing on HTTP requests only, in Q2 2023 Cloudflare blocked an average of 112 billion cyber threats each day, and this is the data that powers this report.</p><p>But as usual, before we dive in, we need to define our terms.</p>
    <div>
      <h2>Definitions</h2>
      <a href="#definitions">
        
      </a>
    </div>
    <p>Throughout this report, we will refer to the following terms:</p><ul><li><p><b>Mitigated traffic</b>: any eyeball HTTP* request that had a “terminating” action applied to it by the Cloudflare platform. These include the following actions: <code>BLOCK</code>, <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#legacy-captcha-challenge"><code>CHALLENGE</code></a>, <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#js-challenge"><code>JS_CHALLENGE</code></a> and <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#managed-challenge-recommended"><code>MANAGED_CHALLENGE</code></a>. This does not include requests that had the following actions applied: <code>LOG</code>, <code>SKIP</code>, <code>ALLOW</code>. In contrast to last year, we now exclude requests that had <code>CONNECTION_CLOSE</code> and <code>FORCE_CONNECTION_CLOSE</code> actions applied by our DDoS mitigation system, as these technically only slow down connection initiation. They also accounted for a relatively small percentage of requests. Additionally, we improved our calculation regarding the <code>CHALLENGE</code> type actions to ensure that only unsolved challenges are counted as mitigated. A detailed <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/actions/">description of actions</a> can be found in our developer documentation.</p></li><li><p><b>Bot traffic/automated traffic</b>: any HTTP* request identified by Cloudflare’s <a href="https://www.cloudflare.com/products/bot-management/">Bot Management</a> system as being generated by a bot. This includes requests with a <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">bot score</a> between <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">1 and 29</a> inclusive. This has not changed from last year’s report.</p></li><li><p><b>API traffic</b>: any HTTP* request with a response content type of <code>XML</code> or <code>JSON</code>. Where the response content type is not available, such as for mitigated requests, the equivalent <code>Accept</code> content type (specified by the user agent) is used instead. In this latter case, API traffic won’t be fully accounted for, but it still provides a good representation for the purposes of gaining insights.</p></li></ul><p>Unless otherwise stated, the time frame evaluated in this post is the 3 month period from April 2023 through June 2023 inclusive.</p><p>Finally, please note that the data is calculated based only on traffic observed across the Cloudflare network and does not necessarily represent overall HTTP traffic patterns across the Internet.</p><p>* <i>When referring to HTTP traffic we mean both HTTP and HTTPS.</i></p>
    <div>
      <h2>  Global traffic insights</h2>
      <a href="#global-traffic-insights">
        
      </a>
    </div>
    
    <div>
      <h3>Mitigated daily traffic stable at 6%, spikes reach 8%</h3>
      <a href="#mitigated-daily-traffic-stable-at-6-spikes-reach-8">
        
      </a>
    </div>
    <p>Although daily mitigated HTTP requests decreased by 2 percentage points to 6% on average from 2021 to 2022, days with larger than usual malicious activity can be clearly seen across the network. One clear example is shown in the graph below: towards the end of May 2023, a spike reaching nearly 8% can be seen. This is attributable to large DDoS events and other activity that does not follow standard daily or weekly cycles and is a constant reminder that large malicious events can still have a visible impact at a global level, even at Cloudflare scale.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6KbxMm5vOJD0fNB3FUqMFI/60fa8701cd90a9ee34d934f616955bed/image1-22.png" />
            
            </figure><p>75% of mitigated HTTP requests were outright BLOCKed. This is a 6 percentage point decrease compared to the previous report. The majority of other requests are mitigated with the various CHALLENGE type actions, with managed challenges leading with ~20% of this subset.</p>
    <div>
      <h3>Shields up: customer configured rules now biggest contributor to mitigated traffic</h3>
      <a href="#shields-up-customer-configured-rules-now-biggest-contributor-to-mitigated-traffic">
        
      </a>
    </div>
    <p>In our previous report, our automated DDoS mitigation system accounted for, on average, more than 50% of mitigated traffic. Over the past two quarters, due to both increased WAF adoption, but most likely organizations better configuring and locking down their applications from unwanted traffic, we’ve seen a new trend emerge, with WAF mitigated traffic surpassing DDoS mitigation. Most of the increase has been driven by WAF Custom Rule BLOCKs rather than our WAF Managed Rules, indicating that these mitigations are generated by customer configured rules for business logic or related purposes. This can be clearly seen in the chart below.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/tTe2Lvb0twlgtTDUvI39y/c1384e2a1c9d86991f7f0f6b42024f7c/image5-8.png" />
            
            </figure><p>Note that our WAF Managed Rules mitigations (yellow line) are negligible compared to overall WAF mitigated traffic also indicating that customers are adopting positive security models by allowing known good traffic as opposed to blocking only known bad traffic. Having said that, WAF Managed Rules mitigations reached as much as 1.5 billion/day during the quarter.</p><p>Our DDoS mitigation is, of course, volumetric and the amount of traffic matching our DDoS layer 7 rules should not be underestimated, especially given that we are observing a number of novel attacks and botnets being spun up across the web. You can read a deep dive on DDoS attack trends in our <a href="/ddos-threat-report-2023-q2/">Q2 DDoS threat report</a>.</p><p>Aggregating the source of mitigated traffic, the WAF now accounts for approximately 57% of all mitigations. Tabular format below with other sources for reference.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6bzDwbhkhvnHM6c2chSYmK/15ef779cb157e65d74a826d5cbf173e5/image6-1.png" />
            
            </figure><table><colgroup><col></col><col></col></colgroup><tbody><tr><td><p><span>Source</span></p></td><td><p><span>Percentage %</span></p></td></tr><tr><td><p><span>WAF</span></p></td><td><p><span>57%</span></p></td></tr><tr><td><p><span>DDoS Mitigation</span></p></td><td><p><span>34%</span></p></td></tr><tr><td><p><span>IP Reputation</span></p></td><td><p><span>6%</span></p></td></tr><tr><td><p><span>Access Rules</span></p></td><td><p><span>2%</span></p></td></tr><tr><td><p><span>Other</span></p></td><td><p><span>1%</span></p></td></tr></tbody></table>
    <div>
      <h3>Application owners are increasingly relying on geo location blocks</h3>
      <a href="#application-owners-are-increasingly-relying-on-geo-location-blocks">
        
      </a>
    </div>
    <p>Given the increase in mitigated traffic from customer defined WAF rules, we thought it would be interesting to dive one level deeper and better understand what customers are blocking and how they are doing it. We can do this by reviewing rule field usage across our WAF Custom Rules to identify common themes. Of course, the data needs to be interpreted correctly, as not all customers have access to all fields as that varies by contract and plan level, but we can still make some inferences based on field “categories”. By reviewing all ~7M WAF Custom Rules deployed across the network and focusing on main groupings only, we get the following field usage distribution:</p><table><colgroup><col></col><col></col></colgroup><tbody><tr><td><p><span>Field</span></p></td><td><p><span> Used in percentage % of rules</span></p></td></tr><tr><td><p><span>Geolocation fields</span></p></td><td><p><span>40%</span></p></td></tr><tr><td><p><span>HTTP URI</span></p></td><td><p><span>31%</span></p></td></tr><tr><td><p><span>IP address</span></p></td><td><p><span>21%</span></p></td></tr><tr><td><p><span>Other HTTP fields (excluding URI)</span></p></td><td><p><span>34%</span></p></td></tr><tr><td><p><span>Bot Management fields</span></p></td><td><p><span>11%</span></p></td></tr><tr><td><p><span>IP reputation score</span></p></td><td><p><span>4%</span></p></td></tr></tbody></table><p>Notably, 40% of all deployed WAF Custom Rules use geolocation-related fields to make decisions on how to treat traffic. This is a common technique used to implement business logic or to exclude geographies from which no traffic is expected and helps reduce <a href="https://www.cloudflare.com/learning/security/what-is-an-attack-surface/">attack surface areas</a>. While these are coarse controls which are unlikely to stop a sophisticated attacker, they are still efficient at reducing the attack surface.</p><p>Another notable observation is the usage of Bot Management related fields in 11% of WAF Custom Rules. This number has been steadily increasing over time as more customers adopt machine learning-based classification strategies to protect their applications.</p>
    <div>
      <h3>Old CVEs are still exploited en masse</h3>
      <a href="#old-cves-are-still-exploited-en-masse">
        
      </a>
    </div>
    <p>Contributing ~32% of WAF Managed Rules mitigated traffic overall, HTTP Anomaly is still the most common attack category blocked by the WAF Managed Rules. SQLi moved up to second position, surpassing Directory Traversal with 12.7% and 9.9% respectively.</p><p>If we look at the start of April 2023, we notice the DoS category far exceeding the HTTP Anomaly category. Rules in the DoS category are WAF layer 7 HTTP signatures that are sufficiently specific to match (and block) single requests without looking at cross request behavior and that can be attributed to either specific botnets or payloads that cause denial of service (DoS). Normally, as is the case here, these requests are not part of “distributed” attacks, hence the lack of the first “D” for “distributed” in the category name.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6lfQTGyIuNJv3KSBoOM2Kt/21126000755decaaeb6bb5acd7c7c531/image10.png" />
            
            </figure><p>Tabular format for reference (top 10 categories):</p><table><colgroup><col></col><col></col></colgroup><tbody><tr><td><p><span>Source</span></p></td><td><p><span>Percentage %</span></p></td></tr><tr><td><p><span>HTTP Anomaly</span></p></td><td><p><span>32%</span></p></td></tr><tr><td><p><span>SQLi</span></p></td><td><p><span>13%</span></p></td></tr><tr><td><p><span>Directory Traversal</span></p></td><td><p><span>10%</span></p></td></tr><tr><td><p><span>File Inclusion</span></p></td><td><p><span>9%</span></p></td></tr><tr><td><p><span>DoS</span></p></td><td><p><span>9%</span></p></td></tr><tr><td><p><span>XSS</span></p></td><td><p><span>9%</span></p></td></tr><tr><td><p><span>Software Specific</span></p></td><td><p><span>7%</span></p></td></tr><tr><td><p><span>Broken Authentication</span></p></td><td><p><span>6%</span></p></td></tr><tr><td><p><span>Common Injection</span></p></td><td><p><span>3%</span></p></td></tr><tr><td><p><span>CVE</span></p></td><td><p><span>1%</span></p></td></tr></tbody></table><p>Zooming in, and filtering on the DoS category only, we find that most of the mitigated traffic is attributable to one rule: 100031 / ce02fd… (old WAF and new WAF rule ID respectively). This rule, with a description of “<i>Microsoft IIS - DoS, Anomaly:Header:Range - CVE:CVE-2015-1635</i>” pertains to a <a href="https://nvd.nist.gov/vuln/detail/CVE-2015-1635">CVE dating back to 2015</a> that affected a number of Microsoft Windows components resulting in remote code execution*****. This is a good reminder that old CVEs, even those dating back more than 8 years, are still actively exploited to compromise machines that may be unpatched and still running vulnerable software.</p><p>* <i>Due to rule categorisation, some CVE specific rules are still assigned to a broader category such as DoS in this example. Rules are assigned to a CVE category only when the attack payload does not clearly overlap with another more generic category.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2mSqUk3g9mRcgetHJ4oQOI/d5e6fc2d8ddc72fdb9fc8611ccb48a34/image2-13.png" />
            
            </figure><p>Another interesting observation is the increase in Broken Authentication rule matches starting in June. This increase is also attributable to a single rule deployed across all our customers, including our FREE users: “<i>Wordpress - Broken Access Control, File Inclusion</i>”. This rule is blocking attempts to access wp-config.php - the WordPress default configuration file which is normally found in the web server document root directory, but of course should never be accessed directly via HTTP.</p><p>On a similar note, CISA/CSA recently published a report highlighting the <a href="https://www.cisa.gov/news-events/cybersecurity-advisories/aa23-215a?cf_target_id=DC7FD2F218498816EEC88041CD1F9A74">2022 Top Routinely Exploited Vulnerabilities</a>. We took this opportunity to explore <a href="/unmasking-the-top-exploited-vulnerabilities-of-2022/">how each CVE mentioned in CISA’s report was reflected in Cloudflare’s own data</a>. The CISA/CSA discuss 12 vulnerabilities that malicious cyber actors routinely exploited in 2022. However, based on our analysis, two CVEs mentioned in the CISA report are responsible for the vast majority of attack traffic we have seen in the wild: Log4J and Atlassian Confluence Code Injection. Our data clearly suggests a major difference in exploit volume between the top two and the rest of the list. The following chart compares the attack volume (in logarithmic scale) of the top 6 vulnerabilities of the CISA list according to our logs.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2uwZCjKV69ScnhTPgvZ82e/1c763d1c13f7fe671c56da6f6dd22228/image8.png" />
            
            </figure>
    <div>
      <h2>Bot traffic insights</h2>
      <a href="#bot-traffic-insights">
        
      </a>
    </div>
    <p>Cloudflare’s <a href="https://www.cloudflare.com/products/bot-management/">Bot Management</a> continues to see significant investment as the addition of JavaScript Verified URLs for greater protection against browser-based bots, Detection IDs are now available in Custom Rules for additional configurability, and an improved UI for easier onboarding. For self-serve customers, we’ve added the ability to “Skip” Super Bot Fight Mode rules and support for Wordpress Loopback requests, to better integrate with our customers’ applications and give them the protection they need.</p><p>Our confidence in the Bot Management classification output remains very high. If we plot the bot scores across the analyzed time frame, we find a very clear distribution, with most requests either being classified as definitely bot (score below 30) or definitely human (score greater than 80), with most requests actually scoring less than 2 or greater than 95. This equates, over the same time period, to 33% of traffic being classified as automated (generated by a bot). Over longer time periods we do see the overall bot traffic percentage stable at 29%, and this reflects the data shown on <a href="https://radar.cloudflare.com/traffic?dateStart=2023-04-01&amp;dateEnd=2023-07-01">Cloudflare Radar</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3IWinRPtNWXlGyYBWJ1d63/40fe64370434befa861d7b7a2730c329/image3-7.png" />
            
            </figure>
    <div>
      <h3>On average, more than 10% of non-verified bot traffic is mitigated</h3>
      <a href="#on-average-more-than-10-of-non-verified-bot-traffic-is-mitigated">
        
      </a>
    </div>
    <p>Compared to the last report, non-verified bot HTTP traffic mitigation is currently on a downward trend (down 6 percentage points). However, the Bot Management field usage within WAF Custom Rules is non negligible, standing at 11%. This means that there are more than 700k WAF Custom Rules deployed on Cloudflare that are relying on bot signals to perform some action. The most common field used is cf.client.bot, an alias to cf.bot_management.verified_bot which is powered by our <a href="https://radar.cloudflare.com/traffic/verified-bots">list of verified bots</a> and allows customers to make a distinction between “good” bots and potentially “malicious”  non-verified ones.</p><p>Enterprise customers have access to the more powerful cf.bot_management.score which provides direct access to the score computed on each request, the same score used to generate the bot score distribution graph in the prior section.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2H2kx5ONOEAzChOHlKHSuK/f8d7437d8627985f7ea8123ecaf084ce/image7-1.png" />
            
            </figure><p>The above data is also validated by looking at what Cloudflare service is mitigating unverified bot traffic. Although our DDoS mitigation system is automatically blocking HTTP traffic across all customers, this only accounts for 13% of non-verified bot mitigations. On the other hand, WAF, and mostly customer defined rules, account for 77% of such mitigations, much higher than mitigations across all traffic (57%) discussed at the start of the report. Note that Bot Management is specifically called out but refers to our “default” one-click rules, which are counted separately from the bot fields used in WAF Custom Rules.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2C1qfqHZVrqjJBa1upHkz2/11a16bb6e27542a6f2e3366e9903ac4a/image4-9.png" />
            
            </figure><p>Tabular format for reference:</p><table><colgroup><col></col><col></col></colgroup><tbody><tr><td><p><span>Source</span></p></td><td><p><span>Percentage %</span></p></td></tr><tr><td><p><span>WAF</span></p></td><td><p><span>77%</span></p></td></tr><tr><td><p><span>DDoS Mitigation</span></p></td><td><p><span>13%</span></p></td></tr><tr><td><p><span>IP reputation</span></p></td><td><p><span>5%</span></p></td></tr><tr><td><p><span>Access Rules</span></p></td><td><p><span>3%</span></p></td></tr><tr><td><p><span>Other</span></p></td><td><p><span>1%</span></p></td></tr></tbody></table>
    <div>
      <h2>API traffic insights</h2>
      <a href="#api-traffic-insights">
        
      </a>
    </div>
    
    <div>
      <h3>58% of dynamic (non cacheable) traffic is API related</h3>
      <a href="#58-of-dynamic-non-cacheable-traffic-is-api-related">
        
      </a>
    </div>
    <p>The growth of overall API traffic observed by Cloudflare is not slowing down. Compared to last quarter, we are now seeing 58% of total dynamic traffic be classified as API related. This is a 3 percentage point increase as compared to Q1.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3Gd7TYNHlXEzUr0YaGyXww/6e37e4fb95e4519450523ca1863edbf7/image11.png" />
            
            </figure><p>Our investment in API Gateway is also following a similar growth trend. Over the last quarter we have released several new API security features.</p><p>First, we’ve made API Discovery easier to use with a <a href="https://developers.cloudflare.com/api-shield/security/api-discovery/#inbox-view">new inbox view</a>. API Discovery inventories your APIs to prevent shadow IT and zombie APIs, and now customers can easily filter to show only new endpoints found by API Discovery. Saving endpoints from API Discovery places them into our <a href="https://developers.cloudflare.com/api-shield/management-and-monitoring/">Endpoint Management</a> system.</p><p>Next, we’ve added a brand new API security feature offered only at Cloudflare: the ability to control API access by client behavior. We call it <a href="https://developers.cloudflare.com/api-shield/security/sequence-mitigation/">Sequence Mitigation</a>. Customers can now create positive or negative security models based on the order of API paths accessed by clients. You can now ensure that your application’s users are the only ones accessing your API instead of brute-force attempts that ignore normal application functionality. For example, in a banking application you can now enforce that access to the funds transfer endpoint can only be accessed after a user has also accessed the account balance check endpoint.</p><p>We’re excited to continue releasing <a href="https://www.cloudflare.com/application-services/solutions/api-security/">API security</a> and <a href="https://www.cloudflare.com/application-services/products/api-gateway/">API management</a> features for the remainder of 2023 and beyond.</p>
    <div>
      <h3>65% of global API traffic is generated by browsers</h3>
      <a href="#65-of-global-api-traffic-is-generated-by-browsers">
        
      </a>
    </div>
    <p>The percentage of API traffic generated by browsers has remained very stable over the past quarter. With this statistic, we are referring to HTTP requests that are not serving HTML based content that will be directly rendered by the browser without some preprocessing, such as those more commonly known as AJAX calls which would normally serve JSON based responses.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6EAog7mZWAttnTLNti5zPD/79924a318e5afa6a07c035f0644a0edf/image12.png" />
            
            </figure>
    <div>
      <h3>HTTP Anomalies are the most common attack vector on API endpoints</h3>
      <a href="#http-anomalies-are-the-most-common-attack-vector-on-api-endpoints">
        
      </a>
    </div>
    <p>Just like last quarter, HTTP Anomalies remain the most common mitigated attack vector on API traffic. SQLi injection attacks, however, are non negligible, contributing approximately 11% towards the total mitigated traffic, closely followed by XSS attacks, at around 9%.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/14c68gj0NmXbfog0jJuEhV/c3fe3b73684d1142983baa0430e0f4a5/image9.png" />
            
            </figure><p>Tabular format for reference (top 5):</p><table><colgroup><col></col><col></col></colgroup><tbody><tr><td><p><span>Source</span></p></td><td><p><span>Percentage %</span></p></td></tr><tr><td><p><span>HTTP Anomaly</span></p></td><td><p><span>64%</span></p></td></tr><tr><td><p><span>SQLi</span></p></td><td><p><span>11%</span></p></td></tr><tr><td><p><span>XSS</span></p></td><td><p><span>9%</span></p></td></tr><tr><td><p><span>Software Specific</span></p></td><td><p><span>5%</span></p></td></tr><tr><td><p><span>Command Injection</span></p></td><td><p><span>4%</span></p></td></tr></tbody></table>
    <div>
      <h3>Looking forward</h3>
      <a href="#looking-forward">
        
      </a>
    </div>
    <p>As we move our application security report to a quarterly cadence, we plan to deepen some of the insights and to provide additional data from some of our newer products such as Page Shield, allowing us to look beyond HTTP traffic, and explore the state of third party dependencies online.</p><p>Stay tuned and keep an eye on <a href="https://radar.cloudflare.com/">Cloudflare Radar</a> for more frequent application security reports and insights.</p> ]]></content:encoded>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">6PHfFuJQzKabT5BxIjgB9j</guid>
            <dc:creator>Michael Tremante</dc:creator>
            <dc:creator>David Belson</dc:creator>
            <dc:creator>Sabina Zejnilovic</dc:creator>
        </item>
        <item>
            <title><![CDATA[Workers Browser Rendering API enters open beta]]></title>
            <link>https://blog.cloudflare.com/browser-rendering-open-beta/</link>
            <pubDate>Fri, 19 May 2023 13:00:32 GMT</pubDate>
            <description><![CDATA[ The Workers Browser Rendering API allows developers to programmatically control and interact with a headless browser instance and create automation flows for their applications and products. Today we enter the open beta and start onboarding our customers. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/48ITDIuGFsDkkeLlfW5wR6/a1a70dec0c91931ba63a9c0a5851ed5e/image1-56.png" />
            
            </figure><p>The Workers Browser Rendering API allows developers to programmatically control and interact with a headless browser instance and create automation flows for their applications and products.</p><p>Since the <a href="/introducing-workers-browser-rendering-api/">private beta announcement</a>, based on the feedback we've been receiving and our own roadmap, the team has been working on the developer experience and improving the platform architecture for the best possible performance and reliability. <b>Today we enter the open beta and will start onboarding the customers on the</b> <a href="https://www.cloudflare.com/lp/workers-browser-rendering-api?ref=blog.cloudflare.com"><b>wait list</b></a><b>.</b></p>
    <div>
      <h3>Developer experience</h3>
      <a href="#developer-experience">
        
      </a>
    </div>
    <p>Starting today, <a href="https://developers.cloudflare.com/workers/wrangler/">Wrangler</a>, our command-line tool for configuring, building, and deploying applications with Cloudflare developer products, has support for the Browser Rendering API bindings.</p><p>You can install Wrangler Beta using <a href="https://www.npmjs.com/package/npm">npm</a>:</p>
            <pre><code>npm install wrangler --save-dev</code></pre>
            <p>Bindings allow your Workers to interact with resources on the Cloudflare developer platform. In this case, they will provide your Worker script with an authenticated endpoint to interact with a dedicated Chromium browser instance.</p><p>This is all you need in your <code>wrangler.toml</code> once this service is enabled for your account:</p>
            <pre><code>browser = { binding = "MYBROWSER", type = "browser" }</code></pre>
            <p>Now you can deploy any Worker script that requires Browser Rendering capabilities. You can spawn Chromium instances and interact with them programmatically in any way you typically do manually behind your browser.</p><p>Under the hood, the Browser Rendering API gives you access to a WebSocket endpoint that speaks the <a href="https://chromedevtools.github.io/devtools-protocol/">DevTools Protocol</a>. DevTools is what allows us to instrument a Chromium instance running in our global network, and it's the same protocol that Chrome uses on your computer when you inspect a page.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3jPdmcv9Dtzd3yq8R3DFUR/9065566d8deb7780efe494da03cdb1bc/image2-35.png" />
            
            </figure><p>With enough dedication, you can, in fact, implement your own DevTools client and talk the protocol directly. But that'd be crazy; almost no one does that.</p><p>So…</p>
    <div>
      <h3>Puppeteer</h3>
      <a href="#puppeteer">
        
      </a>
    </div>
    <p><a href="https://pptr.dev/">Puppeteer</a> is one of the most popular libraries that abstract the lower-level DevTools protocol from developers and provides a high-level API that you can use to easily instrument Chrome/Chromium and automate browsing sessions. It's widely used for things like creating screenshots, crawling pages, and testing web applications.</p><p>Puppeteer typically <a href="https://pptr.dev/api/puppeteer.puppeteer.connect">connects</a> to a local Chrome or Chromium browser using the DevTools port.</p><p>We forked a version of Puppeteer and patched it to connect to the Workers Browser Rendering API instead. The <a href="https://github.com/cloudflare/puppeteer/blob/main/src/puppeteer-core.ts">changes</a> are minimal; after connecting the developers can then use the full Puppeteer API as they would on a standard setup.</p><p>Our version is <a href="https://github.com/cloudflare/puppeteer">open sourced here</a>, and the npm can be installed from <a href="https://www.npmjs.com/">npmjs</a> as <a href="https://www.npmjs.com/package/@cloudflare/puppeteer">@cloudflare/puppeteer</a>. Using it from a Worker is as easy as:</p>
            <pre><code>import puppeteer from "@cloudflare/puppeteer";</code></pre>
            <p>And then all it takes to launch a browser from your script is:</p>
            <pre><code>const browser = await puppeteer.launch(env.MYBROWSER);</code></pre>
            <p>In the long term, we will update Puppeteer to keep matching the version of our Chromium instances infrastructure running in our network.</p>
    <div>
      <h3>Developer documentation</h3>
      <a href="#developer-documentation">
        
      </a>
    </div>
    <p>Following the tradition with other Developer products, we created a dedicated section for the Browser Rendering APIs in our <a href="https://developers.cloudflare.com/browser-rendering">Developer's Documentation site</a>.</p><p>You can access this page to learn more about how the service works, Wrangler support, APIs, and limits, and find examples of starter templates for common applications.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4xCzHB3baMJCw7CRov6P1P/33414eea87ea7b9e226b4dcc62395dcb/download-16.png" />
            
            </figure>
    <div>
      <h3>An example application: taking screenshots</h3>
      <a href="#an-example-application-taking-screenshots">
        
      </a>
    </div>
    <p>Taking screenshots from web pages is one of the typical cases for browser automation.</p><p>Let's create a Worker that uses the Browser Rendering API to do just that. This is a perfect example of how to set up everything and get an application running in minutes, it will give you a good overview of the steps involved and the basics of the Puppeteer API, and then you can move from here to other more sophisticated use-cases.</p><p>Step one, start a project, install Wrangler and Cloudflare’s fork of Puppeteer:</p>
            <pre><code>npm init -f
npm install wrangler -save-dev
npm install @cloudflare/puppeteer -save-dev</code></pre>
            <p>Step two, let’s create the simplest possible wrangler.toml configuration file with the Browser Rendering API binding:</p>
            <pre><code>name = "browser-worker"
main = "src/index.ts"
compatibility_date = "2023-03-14"
node_compat = true
workers_dev = true

browser = { binding = "MYBROWSER", type = "browser" }</code></pre>
            <p>Step three, create src/index.ts with your Worker code:</p>
            <pre><code>import puppeteer from "@cloudflare/puppeteer";

export default {
    async fetch(request: Request, env: Env): Promise&lt;Response&gt; {
        const { searchParams } = new URL(request.url);
        let url = searchParams.get("url");
        let img: Buffer;
        if (url) {
            const browser = await puppeteer.launch(env.MYBROWSER);
            const page = await browser.newPage();
            await page.goto(url);
            img = (await page.screenshot()) as Buffer;
            await browser.close();
            return new Response(img, {
                headers: {
                    "content-type": "image/jpeg",
                },
            });
        } else {
            return new Response(
                "Please add the ?url=https://example.com/ parameter"
            );
        }
    },
};</code></pre>
            <p>That's it, no more steps. This Worker instantiates a browser using Puppeteer, opens a new page, navigates to whatever you put in the "url" parameter, takes a screenshot of the page, closes the browser, and responds with the JPEG image of the screenshot. It can't get any easier to get started with the Browser Rendering API.</p><p>Run <code>npx wrangler dev –remote</code> to test it and <code>npx wrangler publish</code> when you’re done.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4SBhHt9GtxSXSEERkrnwSl/d159f82e6319be6d7a37c34709059067/image4-21.png" />
            
            </figure><p>You can explore the <a href="https://github.com/cloudflare/puppeteer/blob/main/docs/api/index.md">entire Puppeteer API</a> and implement other functionality and logic from here. And, because it's Workers, you can add other <a href="https://developers.cloudflare.com/">developer products</a> to your code. You might need a <a href="https://developers.cloudflare.com/d1/">relational database</a>, or a <a href="https://developers.cloudflare.com/workers/runtime-apis/kv/#kv">KV store</a> to cache your screenshots, or an <a href="https://developers.cloudflare.com/r2/">R2 bucket</a> to archive your crawled pages and assets, or maybe use a <a href="https://developers.cloudflare.com/workers/runtime-apis/durable-objects/#durable-objects">Durable Object</a> to keep your browser instance alive and share it with multiple requests, or <a href="https://developers.cloudflare.com/queues/">queues</a> to handle your jobs asynchronous, we have all of this and <a href="https://developers.cloudflare.com/">more</a>.</p><p>You can also find this and other examples of how to use Browser Rendering in the <a href="https://developers.cloudflare.com/browser-rendering">Developer Documentation</a>.</p>
    <div>
      <h3>How do we use Browser Rendering</h3>
      <a href="#how-do-we-use-browser-rendering">
        
      </a>
    </div>
    <p>Dogfooding our products is one of the best ways to test and improve them, and in some cases, our internal needs dictate or influence our roadmap. Workers Browser Rendering is a good example of that; it was born out of our necessities before we realized it could be a product. We've been using it extensively for things like taking screenshots of pages for social sharing or dashboards, testing web software in CI, or gathering page load performance metrics of our applications.</p><p>But there's one product we've been using to stress test and push the limits of the Browser Rendering API and drive the engineering sprints that brought us to open the beta to our customers today: The <a href="https://radar.cloudflare.com/scan">Cloudflare Radar URL Scanner</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/wv4BHcjI8iY4ZkhNJSGEK/d824748dab36a93bf8393820f0045a89/image3-22.png" />
            
            </figure><p>The URL Scanner scans any URL and compiles a full report containing technical, performance, privacy, and security details about that page. It's processing thousands of scans per day currently. It was built on top of Workers and uses a combination of the Browser Rendering APIs with Puppeteer to create enriched <a href="https://en.wikipedia.org/wiki/HAR_(file_format)">HAR archives</a> and page screenshots, Durable Objects to reuse browser instances, Queues to handle customers' load and execute jobs asynchronously, and <a href="https://www.cloudflare.com/developer-platform/r2/">R2</a> to store the final reports.</p><p>This tool will soon have its own "how we built it" blog. Still, we wanted to let you know about it now because it is a good example of how you can build sophisticated applications using Browser Rendering APIs at scale starting today.</p>
    <div>
      <h3>Future plans</h3>
      <a href="#future-plans">
        
      </a>
    </div>
    <p>The team will keep improving the Browser Rendering API, but a few things are worth mentioning today.</p><p>First, we are looking into upstreaming the changes in our Puppeteer fork to the main project so that using the official library with the Cloudflare Workers Browser Rendering API becomes as easy as a configuration option.</p><p>Second, one of the reasons why we decided to expose the <a href="https://chromedevtools.github.io/devtools-protocol/">DevTools</a> protocol bare naked in the Worker binding is so that it can support other browser instrumentalization libraries in the future. <a href="https://playwright.dev/docs/api/class-playwright">Playwright</a> is a good example of another popular library that developers want to use.</p><p>And last, we are also keeping an eye on and testing <a href="https://w3c.github.io/webdriver-bidi/">WebDriver BiDi</a>, a "new standard browser automation protocol that bridges the gap between the WebDriver Classic and CDP (DevTools) protocols." Click here to know more about the <a href="https://developer.chrome.com/blog/webdriver-bidi-2023/">status of WebDriver BiDi.</a></p>
    <div>
      <h3>Final words</h3>
      <a href="#final-words">
        
      </a>
    </div>
    <p>The Workers Browser Rendering API enters open beta today. We will gradually be enabling the customers in the <a href="https://www.cloudflare.com/en-gb/lp/workers-browser-rendering-api/?ref=blog.cloudflare.com">wait list</a> in batches and sending them emails. We look forward to seeing what you will be building with it and want to hear from you.</p><p>As usual, you can talk to us on our <a href="https://discord.cloudflare.com/">Developers Discord</a> or the <a href="https://community.cloudflare.com/">Community forum</a>; the team will be listening.</p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div></div> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">49dZCDezJuyDIP4HFOOCS3</guid>
            <dc:creator>Celso Martinho</dc:creator>
            <dc:creator>Joshua Claeys</dc:creator>
        </item>
        <item>
            <title><![CDATA[The state of application security in 2023]]></title>
            <link>https://blog.cloudflare.com/application-security-2023/</link>
            <pubDate>Tue, 14 Mar 2023 13:00:00 GMT</pubDate>
            <description><![CDATA[ One year ago we published our first Application Security Report. For Security Week 2023, we are providing updated insights and trends around mitigated traffic, bot and API traffic, and account takeover attacks. ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3GRJDQ6QM2YpuKYXI37zWr/08e76c72a478028a390db8d262ca1d36/image13-1.png" />
            
            </figure><p>One year ago we published our <a href="/application-security/">first Application Security Report</a>. For Security Week 2023, we are providing updated insights and trends around mitigated traffic, bot and API traffic, and <a href="https://www.cloudflare.com/zero-trust/solutions/account-takeover-prevention/">account takeover attacks</a>.</p><p>Cloudflare has grown significantly over the last year. In <a href="https://news.netcraft.com/archives/2023/02/28/february-2023-web-server-survey.html">February 2023</a>, Netcraft noted that Cloudflare had become the most commonly used web server vendor within the top million sites at the start of 2023, and continues to grow, reaching a 21.71% market share, up from 19.4% in February 2022.</p><p>This continued growth now equates to Cloudflare handling over 45 million HTTP requests/second on average (up from 32 million last year), with more than 61 million HTTP requests/second at peak. DNS queries handled by the network are also growing and stand at approximately 24.6 million queries/second. All of this traffic flow gives us an unprecedented view into Internet trends.</p><p>Before we dive in, we need to define our terms.</p>
    <div>
      <h2>Definitions</h2>
      <a href="#definitions">
        
      </a>
    </div>
    <p>Throughout this report, we will refer to the following terms:</p><ul><li><p><b>Mitigated traffic</b>: any eyeball HTTP* request that had a “terminating” action applied to it by the Cloudflare platform. These include the following actions: <code>BLOCK</code>, <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#legacy-captcha-challenge"><code>CHALLENGE</code></a>, <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#js-challenge"><code>JS_CHALLENGE</code></a> and <a href="https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/#managed-challenge-recommended"><code>MANAGED_CHALLENGE</code></a>. This does not include requests that had the following actions applied: <code>LOG</code>, <code>SKIP</code>, <code>ALLOW</code>. In contrast to last year, we now exclude requests that had <code>CONNECTION_CLOSE</code> and <code>FORCE_CONNECTION_CLOSE</code> actions applied by our DDoS mitigation system, as these technically only slow down connection initiation. They also accounted for a relatively small percentage of requests. Additionally, we improved our calculation regarding the <code>CHALLENGE</code> type actions to ensure that only unsolved challenges are counted as mitigated. A detailed <a href="https://developers.cloudflare.com/ruleset-engine/rules-language/actions/">description of actions</a> can be found in our developer documentation.</p></li><li><p><b>Bot traffic/automated traffic</b>: any HTTP* request identified by Cloudflare’s <a href="https://www.cloudflare.com/products/bot-management/">Bot Management</a> system as being generated by a bot. This includes requests with a <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">bot score</a> between <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">1 and 29</a> inclusive. This has not changed from last year’s report.</p></li><li><p><b>API traffic</b>: any HTTP* request with a response content type of <code>XML</code> or <code>JSON</code>. Where the response content type is not available, such as for mitigated requests, the equivalent <code>Accept</code> content type (specified by the user agent) is used instead. In this latter case, API traffic won’t be fully accounted for, but it still provides a good representation for the purposes of gaining insights.</p></li></ul><p>Unless otherwise stated, the time frame evaluated in this post is the 12 month period from March 2022 through February 2023 inclusive.</p><p>Finally, please note that the data is calculated based only on traffic observed across the Cloudflare network and does not necessarily represent overall HTTP traffic patterns across the Internet.</p><p>*When referring to HTTP traffic we mean both HTTP and HTTPS.</p>
    <div>
      <h2>Global traffic insights</h2>
      <a href="#global-traffic-insights">
        
      </a>
    </div>
    
    <div>
      <h3>6% of daily HTTP requests are mitigated on average</h3>
      <a href="#6-of-daily-http-requests-are-mitigated-on-average">
        
      </a>
    </div>
    <p>In looking at all HTTP requests proxied by the Cloudflare network, we find that the share of requests that are mitigated has dropped to 6%, down two percentage points compared to last year. Looking at 2023 to date, we see that mitigated request share has fallen even further, to between 4-5%. Large spikes visible in the chart below, such as those seen in June and October, often correlate with large DDoS attacks mitigated by Cloudflare. It is interesting to note that although the percentage of mitigated traffic has decreased over time, the total mitigated request volume has been relatively stable as shown in the second chart below, indicating an increase in overall clean traffic globally rather than an absolute decrease in malicious traffic.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5eiU8dDdgzilfy1QDkz444/b3e287f663e473138259bac7c2dea4f3/Screenshot-2023-03-06-at-23.00.01.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3ObkNfvMZcTPPxXnDjhgfe/e5a9c47dbdd1ff43a53119f6e0013041/pasted-image-0-7.png" />
            
            </figure><p>81% of mitigated HTTP requests were outright <code>BLOCK</code>ed, with mitigations for the remaining set split across the various <code>CHALLENGE</code> type actions.</p>
    <div>
      <h3>DDoS mitigation accounts for more than 50% of all mitigated traffic</h3>
      <a href="#ddos-mitigation-accounts-for-more-than-50-of-all-mitigated-traffic">
        
      </a>
    </div>
    <p>Cloudflare provides various security features that customers can configure to keep their applications safe. Unsurprisingly, DDoS mitigation is still the largest contributor to mitigated layer 7 (application layer) HTTP requests. Just last month (February 2023), we reported the <a href="/cloudflare-mitigates-record-breaking-71-million-request-per-second-ddos-attack/">largest known mitigated DDoS attack by HTTP requests/second volume</a> (This particular attack is not visible in the graphs above because they are aggregated at a daily level, and the attack only lasted for ~5 minutes).</p><p>Compared to last year, however, mitigation by the Cloudflare WAF has grown significantly, and now accounts for nearly 41% of mitigated requests. This can be partially attributed to <a href="/stop-attacks-before-they-are-known-making-the-cloudflare-waf-smarter/">advances in our WAF technology</a> that enables it to detect and block a larger range of attacks.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3pdhwKY0jThTj1OiHqL4vB/e94c2be1c4750b148718970118673938/out.png" />
            
            </figure><p>Tabular format for reference:</p>
<table>
<thead>
  <tr>
    <th><span>Source</span></th>
    <th><span>Percentage %</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>DDoS Mitigation</span></td>
    <td><span>52%</span></td>
  </tr>
  <tr>
    <td><span>WAF</span></td>
    <td><span>41%</span></td>
  </tr>
  <tr>
    <td><span>IP reputation</span></td>
    <td><span>4%</span></td>
  </tr>
  <tr>
    <td><span>Access Rules</span></td>
    <td><span>2%</span></td>
  </tr>
  <tr>
    <td><span>Other</span></td>
    <td><span>1%</span></td>
  </tr>
</tbody>
</table><p>Please note that in the table above, in contrast to last year, we are now grouping our products to match our marketing materials and the groupings used in the <a href="https://radar.cloudflare.com/year-in-review/2022">2022 Radar Year in Review</a>. This mostly affects our WAF product that comprises the combination of WAF Custom Rules, WAF Rate Limiting Rules, and WAF Managed Rules. In last year’s report, these three features accounted for an aggregate 31% of mitigations.</p><p>To understand the growth in WAF mitigated requests over time, we can look one level deeper where it becomes clear that Cloudflare customers are increasingly relying on WAF Custom Rules (historically referred to as Firewall Rules) to mitigate malicious traffic or implement business logic blocks. Observe how the orange line (<code>firewallrules</code>) in the chart below shows a gradual increase over time while the blue line (<code>l7ddos</code>) clearly trends lower.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ZcNrwcUi33RFZZd8zgg67/ecf460086091b5ba9e081940a6931426/pasted-image-0--1--3.png" />
            
            </figure>
    <div>
      <h3>HTTP Anomaly is the most frequent layer 7 attack vector mitigated by the WAF</h3>
      <a href="#http-anomaly-is-the-most-frequent-layer-7-attack-vector-mitigated-by-the-waf">
        
      </a>
    </div>
    <p>Contributing 30% of WAF Managed Rules mitigated traffic overall in March 2023, HTTP Anomaly’s share has decreased by nearly 25 percentage points as compared to the same time last year. Examples of HTTP anomalies include malformed method names, null byte characters in headers, non-standard ports or content length of zero with a <code>POST</code> request. This can be attributed to botnets matching HTTP anomaly signatures slowly changing their traffic patterns.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/35U8ZjXfZTDdpRQ59xV3w9/3e533f90eb2f42a067edd7b7ad086231/pasted-image-0--2--1.png" />
            
            </figure><p>Removing the HTTP anomaly line from the graph, we can see that in early 2023, the attack vector distribution looks a lot more balanced.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6rluFF9pSRJLaGItGsxfCT/7706f922a43159a25aa44154a3b0a0dc/pasted-image-0--3--1.png" />
            
            </figure><p>Tabular format for reference (top 10 categories):</p>
<table>
<thead>
  <tr>
    <th><span>Source</span></th>
    <th><span>Percentage % (last 12 months)</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>HTTP Anomaly</span></td>
    <td><span>30%</span></td>
  </tr>
  <tr>
    <td><span>Directory Traversal</span></td>
    <td><span>16%</span></td>
  </tr>
  <tr>
    <td><span>SQLi</span></td>
    <td><span>14%</span></td>
  </tr>
  <tr>
    <td><span>File Inclusion</span></td>
    <td><span>12%</span></td>
  </tr>
  <tr>
    <td><span>Software Specific</span></td>
    <td><span>10%</span></td>
  </tr>
  <tr>
    <td><span>XSS</span></td>
    <td><span>9%</span></td>
  </tr>
  <tr>
    <td><span>Broken Authentication</span></td>
    <td><span>3%</span></td>
  </tr>
  <tr>
    <td><span>Command Injection</span></td>
    <td><span>3%</span></td>
  </tr>
  <tr>
    <td><span>Common Attack</span></td>
    <td><span>1%</span></td>
  </tr>
  <tr>
    <td><span>CVE </span></td>
    <td><span>1%</span></td>
  </tr>
</tbody>
</table><p>Of particular note is the orange line spike seen towards the end of February 2023 (CVE category). The spike relates to a sudden increase of two of our WAF Managed Rules:</p><ul><li><p>Drupal - Anomaly:Header:X-Forwarded-For (id: <code>d6f6d394cb01400284cfb7971e7aed1e</code>)</p></li><li><p>Drupal - Anomaly:Header:X-Forwarded-Host (id: <code>d9aeff22f1024655937e5b033a61fbc5</code>)</p></li></ul><p>These two rules are also tagged against <a href="https://nvd.nist.gov/vuln/detail/CVE-2018-14774">CVE-2018-14774</a>, indicating that even relatively old and known vulnerabilities are still often targeted in an effort to exploit potentially unpatched software.</p>
    <div>
      <h2>Bot traffic insights</h2>
      <a href="#bot-traffic-insights">
        
      </a>
    </div>
    <p>Cloudflare’s <a href="https://www.cloudflare.com/products/bot-management/">Bot Management</a> solution has seen significant investment over the last twelve months. New features such as configurable heuristics, hardened JavaScript detections, automatic <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/">machine learning model</a> updates, and <a href="https://www.cloudflare.com/products/turnstile/">Turnstile</a>, Cloudflare’s free CAPTCHA replacement, make our classification of human vs. bot traffic improve daily.</p><p>Our confidence in the classification output is very high. If we plot the bot scores across the traffic from the last week of February 2023, we find a very clear distribution, with most requests either being classified as definitely bot (less than 30) or definitely human (greater than 80) with most requests actually scoring less than 2 or greater than 95.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/KD5WOxYsgKxjXru5Y92q3/7e214da58227bcb5bea9de2a1b8a5b4e/pasted-image-0--4--1.png" />
            
            </figure>
    <div>
      <h3>30% of HTTP traffic is automated</h3>
      <a href="#30-of-http-traffic-is-automated">
        
      </a>
    </div>
    <p>Over the last week of February 2023, 30% of Cloudflare HTTP traffic was classified as automated, equivalent to about 13 million HTTP requests/second on the Cloudflare network. This is 8 percentage points less than at the same time last year.</p><p>Looking at bot traffic only, we find that only 8% is generated by verified bots, comprising 2% of total traffic. Cloudflare maintains a list of known good (verified) bots to allow customers to easily distinguish between well-behaved bot providers like Google and Facebook and potentially lesser known or unwanted bots. There are currently <a href="https://radar.cloudflare.com/traffic/verified-bots">171 bots in the list</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2NGH6G5AdgzABSAo6ItxhD/40429c8382f4ea3db4327f6bbcfca7ca/pasted-image-0--5-.png" />
            
            </figure>
    <div>
      <h3>16% of non-verified bot HTTP traffic is mitigated</h3>
      <a href="#16-of-non-verified-bot-http-traffic-is-mitigated">
        
      </a>
    </div>
    <p>Non-verified bot traffic often includes vulnerability scanners that are constantly looking for exploits on the web, and as a result, nearly one-sixth of this traffic is mitigated because some customers prefer to restrict the insights such tools can potentially gain.</p><p>Although verified bots like googlebot and bingbot are generally seen as beneficial and most customers want to allow them, we also see a small percentage (1.5%) of verified bot traffic being mitigated. This is because some site administrators don’t want portions of their site to be crawled, and customers often rely on WAF Custom Rules to enforce this business logic.</p><p>The most common action used by customers is to <code>BLOCK</code> these requests (13%), although we do have some customers configuring <code>CHALLENGE</code> actions (3%) to ensure any human false positives can still complete the request if necessary.</p><p>On a similar note, it is also interesting that nearly 80% of all mitigated traffic is classified as a bot, as illustrated in the figure below. Some may note that 20% of mitigated traffic being classified as human is still extremely high, but most mitigations of human traffic are generated by WAF Custom Rules, and are frequently due to customers implementing country-level or other related legal blocks on their applications. This is common, for example, in the context of US-based companies blocking access to European users for GDPR compliance reasons.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4LUr8AfDJrf4U5IfRLFnC6/41af48de35fac34e053f9ef256bc9e1b/Ms7yLEMZH9YTC2GfsGffgbk4rQwzHfpIwPlVe1dy7ZkNxLKe49WZHfope9X9Z-x9BZ0ygFBqay5NV5ipMU42-7uNt5QTYkv8VryoXr5QaJp4-ystQ7I7x6WIa2-c.png" />
            
            </figure>
    <div>
      <h2>API traffic insights</h2>
      <a href="#api-traffic-insights">
        
      </a>
    </div>
    
    <div>
      <h3>55% of dynamic (non cacheable) traffic is API related</h3>
      <a href="#55-of-dynamic-non-cacheable-traffic-is-api-related">
        
      </a>
    </div>
    <p>Just like our Bot Management solution, we are also investing heavily in tools to protect API endpoints. This is because a <b>lot</b> of HTTP traffic is API related. In fact, if you count only HTTP requests that reach the origin and are <b>not</b> cacheable, up to 55% of traffic is API related, as per the definition stated earlier. This is the same methodology used in last year’s report, and the 55% figure remains unchanged year-over-year.</p><p>If we look at cached HTTP requests only (those with a cache status of <code>HIT</code>, <code>UPDATING</code>, <code>REVALIDATED</code> and <code>EXPIRED</code>) we find that, maybe surprisingly, nearly 7% is API related. Modern API endpoint implementations and proxy systems, including our own <a href="https://www.cloudflare.com/learning/security/api/what-is-an-api-gateway/">API Gateway</a>/caching feature set, in fact, allow for very flexible cache logic allowing both <a href="https://developers.cloudflare.com/cache/how-to/create-cache-keys/">caching on custom keys</a> as well as quick cache revalidation (<a href="https://developers.cloudflare.com/cache/about/edge-browser-cache-ttl/#:~:text=Edge%20Cache%20TTL%20(Time%20to,TTL%20depends%20on%20plan%20type.&amp;text=For%20more%20information%20on%20creating%20page%20rules%2C%20see%20Create%20page%20rules.)">as often as every second</a> allowing developers to reduce load on back end endpoints.</p><p>Including cacheable assets and other requests in the total count, such as redirects, the number goes down, but is still 25% of traffic. In the graph below we provide both perspectives on API traffic:</p><ul><li><p>Yellow line: % of API traffic against all HTTP requests. This will include redirects, cached assets and all other HTTP requests in the total count;</p></li><li><p>Blue line: % of API traffic against dynamic traffic returning HTTP 200 OK response code only;</p></li></ul>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3sBxmFpYbAGXEFqoJnr2SS/5518e387d93644a48d2a6cefc721d834/pasted-image-0--6-.png" />
            
            </figure>
    <div>
      <h3>65% of global API traffic is generated by browsers</h3>
      <a href="#65-of-global-api-traffic-is-generated-by-browsers">
        
      </a>
    </div>
    <p>A growing number of web applications nowadays are built “API first”. This means that the initial HTML page load only provides the skeleton layout, and most dynamic components and data are loaded via separate API calls (for example, via <a href="https://en.wikipedia.org/wiki/Ajax_(programming)">AJAX</a>). This is the case for Cloudflare’s own dashboard. This growing implementation paradigm is visible when analyzing the bot scores for API traffic. We can see in the figure below that a large amount of API traffic is generated by user-driven browsers classified as “human” by our system, with nearly two-thirds of it clustered at the high end of the “human” range.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/49vmou2pqy4n6RMVyZp0rf/60feafffa33b03318e80a3f210bbed1a/pasted-image-0--7--2.png" />
            
            </figure><p>Calculating mitigated API traffic is challenging, as we don’t forward the request to origin servers, and therefore cannot rely on the response content type. Applying the same calculation that was used last year, a little more than 2% of API traffic is mitigated, down from 10.2% last year.</p>
    <div>
      <h3>HTTP Anomaly surpasses SQLi as most common attack vector on API endpoints</h3>
      <a href="#http-anomaly-surpasses-sqli-as-most-common-attack-vector-on-api-endpoints">
        
      </a>
    </div>
    <p>Compared to last year, HTTP anomalies now surpass SQLi as the most popular attack vector attempted against API endpoints (note the blue line being higher at the start of the graph just when last year's report was published). Attack vectors on API traffic are not consistent throughout the year and show more variation as compared to global HTTP traffic. For example, note the spike in file inclusion attack attempts in early 2023.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4jHOBvqPAy11orLuscByY0/4422fa5b78e4cd1ec2e3a455231bd3a5/pasted-image-0--8-.png" />
            
            </figure>
    <div>
      <h2>Exploring account takeover attacks</h2>
      <a href="#exploring-account-takeover-attacks">
        
      </a>
    </div>
    <p>Since <a href="/account-takeover-protection/">March 2021, Cloudflare has provided a leaked credential check feature as part of its WAF</a>. This allows customers to be notified (via an HTTP request header) whenever an authentication request is detected with a username/password pair that is known to be leaked. This tends to be an extremely effective signal at detecting botnets performing account takeover brute force attacks.</p><p>Customers also use this signal, on valid username/password pair login attempts, to issue two factor authentication, password reset, or in some cases, increased logging in the event the user is not the legitimate owner of the credentials.</p>
    <div>
      <h3>Brute force account takeover attacks are increasing</h3>
      <a href="#brute-force-account-takeover-attacks-are-increasing">
        
      </a>
    </div>
    <p>If we look at the trend of matched requests over the past 12 months, an increase is noticeable starting in the latter half of 2022, indicating growing fraudulent activity against login endpoints. During large brute force attacks we have observed matches against HTTP requests with leaked credentials at a rate higher than 12k per minute.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3uTR3ZU85pPkHG6xtlAicC/707ddc37a95b5a7ec113e7d710bc3b52/pasted-image-0--9-.png" />
            
            </figure><p>Our leaked credential check feature has rules matching authentication requests for the following systems:</p><ul><li><p>Drupal</p></li><li><p>Ghost</p></li><li><p>Joomla</p></li><li><p>Magento</p></li><li><p>Plone</p></li><li><p>WordPress</p></li><li><p>Microsoft Exchange</p></li><li><p>Generic rules matching common authentication endpoint formats</p></li></ul><p>This allows us to compare activity from malicious actors, normally in the form of botnets, attempting to “break into” potentially compromised accounts.</p>
    <div>
      <h3>Microsoft Exchange is attacked more than WordPress</h3>
      <a href="#microsoft-exchange-is-attacked-more-than-wordpress">
        
      </a>
    </div>
    <p>Mostly due to its popularity, you might expect <a href="https://w3techs.com/technologies/details/cm-wordpress">WordPress</a> to be the application most <a href="https://www.cloudflare.com/learning/security/how-to-improve-wordpress-security/">at risk</a> and/or observing most brute force account takeover traffic. However, looking at rule matches from the supported systems listed above, we find that after our generic signatures, the Microsoft Exchange signature is the most frequent match.</p><p>Most applications experiencing brute force attacks tend to be high value assets, and Exchange accounts being the most likely targeted according to our data reflects this trend.</p><p>If we look at leaked credential match traffic by source country, the United States leads by a fair margin. Potentially notable is the absence of China in top contenders given network size. The only exception is Ukraine leading during the first half of 2022 towards the start of the war — the yellow line seen in the figure below.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6XiIU19alI12sLDxm3SDS8/d0bafef4317e397c62b5eafa093d7e54/pasted-image-0--10-.png" />
            
            </figure>
    <div>
      <h2>Looking forward</h2>
      <a href="#looking-forward">
        
      </a>
    </div>
    <p>Given the amount of web traffic carried by Cloudflare, we observe a broad spectrum of attacks. From HTTP anomalies, SQL injection attacks, and <a href="https://www.cloudflare.com/learning/security/threats/cross-site-scripting/">cross-site scripting (XSS)</a> to account takeover attempts and malicious bots, the threat landscape is constantly changing. As such, it is critical that any business operating online is investing in visibility, detection, and mitigation technologies so that they can ensure their applications, and more importantly, their end user’s data, remains safe.</p><p>We hope that you found the findings in this report interesting, and at the very least, gave you an appreciation on the state of <a href="https://www.cloudflare.com/application-services/solutions/">application security</a> on the Internet. There are a lot of bad actors online, and there is no indication that Internet security is getting easier.</p><p>We are already planning an update to this report including additional data and insights across our product portfolio. Keep an eye on <a href="https://radar.cloudflare.com/">Cloudflare Radar</a> for more frequent application security reports and insights.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[WAF]]></category>
            <guid isPermaLink="false">aUiJyB7qZCQ2kxHw5bM4B</guid>
            <dc:creator>Michael Tremante</dc:creator>
            <dc:creator>David Belson</dc:creator>
            <dc:creator>Sabina Zejnilovic</dc:creator>
        </item>
        <item>
            <title><![CDATA[Keeping the Cloudflare API 'all green' using Python-based testing]]></title>
            <link>https://blog.cloudflare.com/keeping-cloudflare-api-all-green-using-python-based-testing/</link>
            <pubDate>Tue, 07 Mar 2023 19:00:00 GMT</pubDate>
            <description><![CDATA[ Scout is an automated system providing constant end to end testing and monitoring of live APIs over different environments and resources. Scout does it by periodically running self explanatory Python tests ]]></description>
            <content:encoded><![CDATA[ <p></p><p>At Cloudflare, we reuse existing core systems to power multiple products and testing of these core systems is essential. In particular, we require being able to have a wide and thorough visibility of our live APIs’ behaviors. We want to be able to detect regressions, prevent incidents and maintain healthy APIs. That is why we built Scout.</p><p>Scout is an automated system periodically running Python tests verifying the end to end behavior of our APIs. Scout allows us to evaluate APIs in production-like environments and thus ensures we can green light a production deployment while also monitoring the behavior of APIs in production.</p>
    <div>
      <h2>Why Scout?</h2>
      <a href="#why-scout">
        
      </a>
    </div>
    <p>Before Scout, we were using an automated test system leveraging the Robot Framework. This older system was limiting our testing capabilities. In fact, we could not easily match json responses against keys we were looking for. We would abandon covering different behaviors of our APIs as it was impossible to decide on which resources a given test suite would run. Two different test suites would create false negatives as they were running on the same account.</p><p>Regarding schema validation, only API responses were validated against a json schema and tests would not fail if the response did not match the schema. Moreover, It was impossible to validate API requests.</p><p>Test suites were run in a queue, making the delay to a new feature assessment dependent on the number of test suites to run. The queue would as well potentially make newer test suites run the following day. Hence we often ended up with a mismatch between tests and APIs versions. Test steps could not be run in parallel either.</p><p>We could not split test suites between different environments. If a new API feature was being developed it was impossible to write a test without first needing the actual feature to be released to production.</p><p>We built Scout to overcome all these difficulties. We wanted the developer experience to be easy and we wanted Scout to be fast and reliable while spotting any live API issue.</p>
    <div>
      <h3>A Scout test example</h3>
      <a href="#a-scout-test-example">
        
      </a>
    </div>
    <p>Scout is built in Python and leverages the functionalities of Pytest. Before diving into the exact capabilities of Scout and its architecture, let’s have a quick look at how to use it!</p><p>Following is an example of a Scout test on the Rulesets API (the docs are available <a href="https://developers.cloudflare.com/ruleset-engine/about/">here</a>):</p>
            <pre><code>from scout import requires, validate, Account, Zone

@validate(schema="rulesets", ignorePaths=["accounts/[^/]+/rules/lists"])
@requires(
    account=Account(
        entitlements={"rulesets.max_rules_per_ruleset": 2),
    zone=Zone(plan="ENT",
        entitlements={"rulesets.firewall_custom_phase_allowed": True},
        account_entitlements={"rulesets.max_rules_per_ruleset": 2 }))
class TestZone:
    def test_create_custom_ruleset(self, cfapi):
        response = cfapi.zone.request(
            "POST",
            "rulesets",
            payload=f"""{{
            "name": "My zone ruleset",
            "description": "My ruleset description",
            "phase": "http_request_firewall_custom",
            "kind": "zone",
            "rules": [
                {{
                    "description": "My rule",
                    "action": "block",
                    "expression": "http.host eq \"fake.net\""
                }}
            ]
        }}""")
        response.expect_json_success(
            200,
            result=f"""{{
            "name": "My zone ruleset",
            "version": "1",
            "source": "firewall_custom",
            "phase": "http_request_firewall_custom",
            "kind": "zone",
            "rules": [
                {{
                    "description": "My rule",
                    "action": "block",
                    "expression": "http.host eq \"fake.net\"",
                    "enabled": true,
                    ...
                }}
            ],
            ...
        }}""")</code></pre>
            <p>A Scout test is a succession of roundtrips of requests and responses against a given API. We use the functionalities of Pytest fixtures and marks to be able to target specific resources while validating the request and responses.  Pytest marks in Scout allow to provide an extra set of information to test suites. Pytest fixtures are contexts with information and methods which can be used across tests to enhance their capabilities. Hence the conjunction of marks with fixtures allow Scout to build the whole harness required to run a test suite against APIs.</p><p>Being able to exactly describe the resources against which a given test will run provides us confidence the live API behaves as expected under various conditions.</p><p>The <b><i>cfapi</i></b> fixture provides the capability to target different resources such as a Cloudflare account or a zone. In the test above, we use a Pytest mark <b><i>@requires</i></b> to describe the characteristics of the resources we want, e.g. we need here an account with a flag allowing us to have 2 rules for a ruleset. This will allow the test to only be run against accounts with such entitlements.</p><p>The <b><i>@validate</i></b> mark provides the capability to validate requests and responses to a given OpenAPI schema (here the rulesets OpenAPI schema). Any validation failure will be reported and flagged as a test failure.</p><p>Regarding the actual requests and responses, their payloads are described as f-strings, in particular the response f-string can be written as a “semi-json”:</p>
            <pre><code> response.expect_json_success(
            200,
            result=f"""{{
            "name": "My zone ruleset",
            "version": "1",
            "source": "firewall_custom",
            "phase": "phase_http_request_firewall_custom",
            "kind": "zone",
            "rules": [
                {{
                    "description": "My rule",
                    "action": "block",
                    "expression": "http.host eq \"fake.net\"",
                    "enabled": true,
                    ...
                }}
            ],
            ...
        }}""")</code></pre>
            <p>Among many test assertions possible, Scout can assert the validity of a partial json response and it will log the information. We added the handling of ellipsis <b>…</b> as an indication for Scout not to care about any further fields at a given json nesting level. Hence, we are able to do partial matching on JSON API responses, thus focusing only on what matters the most in each test.</p><p>Once a test suite run is complete, the results are pushed by the service and stored using Cloudflare Workers KV. They are displayed via a Cloudflare Worker.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2iDExs9LHFvCjTMC12zsSp/f7fcbbf636cb5a959981fbdfad3fb94f/image6-2.png" />
            
            </figure><p>Scout is run in separate environments such as production-like and production environments. It is part of our deployment process to verify Scout is <i>green</i> in our production-like environment prior to deploying to production where Scout is also used for monitoring purposes.</p>
    <div>
      <h2>How we built it</h2>
      <a href="#how-we-built-it">
        
      </a>
    </div>
    <p>The core of Scout is written in Python and it is a combination of three components interacting together:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4WgzfD8Wk6Vyd0uTE7hfNk/ccea6ac729b0e35906088af25b601cb9/image4-3.png" />
            
            </figure><ul><li><p><b>The Scout plugin</b>: a Pytest plugin to write tests easily</p></li><li><p><b>The Scout service</b>: a scheduler service to run the test suites periodically</p></li><li><p><b>The Scout Worker</b>: a collector and presenter of test reports</p></li></ul>
    <div>
      <h3>The Scout plugin</h3>
      <a href="#the-scout-plugin">
        
      </a>
    </div>
    <p>This is the core component of the Scout system. It allows us to write self explanatory tests while ensuring a high level of compliance against OpenAPI schemas and verifying the APIs’ behaviors.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3IAEu33ULvtQorn3fZWBrn/0ce45e8506e6b37293cdd90ac68da651/image1-2.png" />
            
            </figure><p>The Scout plugin architecture can be split into three components: setup, resource allocator, and runners. Setup is a conjunction of multiple sub components in charge of setting up the plugin.</p><p>The Registry contains all the information regarding a pool of accounts and zones we use for testing. As an example, entitlements are flags gating customers for using products features, the Registry provides the capability to describe entitlements per account and zone so that Scout can run a test against a specific setup.</p><p>As explained earlier, Scout can validate requests and responses against OpenAPI schemas. This is the responsibility of validators. A validator is built per OpenAPI schema and can be selected via the <b><i>@validate</i></b> mark we saw above.</p>
            <pre><code>@validate(schema="rulesets", ignorePaths=["accounts/[^/]+/rules/lists"])</code></pre>
            <p>As soon as a validator is selected, all the interaction of a given test with an API will be validated. If there is a validation failure, it will be marked as a test failure.</p><p>Last element of the setup, the config reader. It is the sub component in charge of providing all the URLs and authentication elements required for the Scout plugin to communicate with APIs.</p><p>Next in the chain, the resources allocator. This component is in charge of consuming the configuration and objects of the setup to build multiple runners. This is a factory which will make available the runners in the <b><i>cfapi</i></b> fixture.</p>
            <pre><code>response = cfapi.zone.request(method, path, payload)</code></pre>
            <p>When such a line of code is processed, it is the actual method <b><i>request</i></b> of the zone runner allocated for the test which is executed. Actually, the resources allocator is able to provide specialized runners (account, zone or default) which grant the possibility of targeting specific API endpoints for a given account or zone.</p><p>Runners are in charge of handling the execution of requests, managing the test expectations and using the validators for request/response schema validation.</p><p>Any failure on expectation or validation and any exceptions are recorded in the stash. The stash is shared across all runners. As such, when a test setup, run or cleanup is processed, the timeline of execution and potential retries are logged in the stash. The stash contents are later used for building the test suite reports.</p><p>Scout is able to run multiple test steps in parallel. Actually, each resource couple (Account Runner, Zone Runner) is associated with a Pytest-xdist worker which runs test steps independently. There can be as many workers as there are resource couples. An extra “default” runner is provided for reaching our different APIs and/or URLs with or without authentication.</p><p>Testing a test system was not the easiest part. We have been required to build a fake API and assert the Scout plugin would behave as it should in different situations. We reached and maintained a test coverage confidence which was considered good (close to 90%) for using the Scout plugin permanently.</p>
    <div>
      <h3>The Scout service</h3>
      <a href="#the-scout-service">
        
      </a>
    </div>
    <p>The Scout service is meant to schedule test suites periodically. It is a configurable scheduler providing a reporting harness for the test suites as well as multiple metrics. It was a design decision to build a scheduler instead of using cron jobs.</p><p>We wanted to be aware of any scheduling issue as well as run issues. For this we used Prometheus metrics. The problem is that the Prometheus default configuration is to scrape metrics advertised by services. This scraping happens periodically and we were concerned about the eventuality of missing metrics if a cron job was to finish prior to the next Prometheus metrics scraping. As such we decided a small scheduler was better suited for overall observability of the test runs. Among the metrics the Scout service provides are network failures, general test failures, reporting failures, tests lagging and more.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/42kpAT26a338B07zUR59ke/ff248a59d66687ac0d31ef7c4d2544a9/image3-3.png" />
            
            </figure><p>The Scout service runs threads on configured periods. Each thread is a test suite run as a separate Pytest with Scout plugin process followed by a reporting execution consuming the results and publishing them to the relevant parties.</p><p>The reporting component provided to each thread publishes the report to Workers KV and notifies us on chat in case there is a failure. Reporting takes also care of publishing the information relevant for building API testing coverage. In fact it is mandatory for us to have coverage of all the API endpoints and their possible methods so that we can achieve a wide and thorough visibility of our live APIs.</p><p>As a fallback, if there are any thread failure, test failure or reporting failure we are alerted based on the Prometheus metrics being updated across the service execution. The logs of the Scout service as well as the logs of each Pytest-Scout plugin execution provide the last resort information if no metrics are available and reporting is failing.</p><p>The service can be deployed with a minimal YAML configuration and be set up for different environments. We can for example decide to run different test suites based on the environment, publish or not to Cloudflare Workers, set different periods and retry mechanisms and so on.</p><p>We keep the tests as part of our code base alongside the configuration of the Scout service, and that’s about it, the Scout service is a separate entity.</p>
    <div>
      <h3>The Scout Worker</h3>
      <a href="#the-scout-worker">
        
      </a>
    </div>
    <p>It is a Cloudflare worker in charge of fetching the most recent Worker KVs and displaying them in an eye pleasing manner. The Scout service publishes a test report as JSON, thus the Scout worker parses the report and displays its content based on the status of the test suite run.</p><p>For example, we present below an authentication failure during a test which resulted in such a display in the worker:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/42dMShTD7SrsgASIKbpCcZ/71e84ef0e898fcad0327d495aeb8cfe5/image5.png" />
            
            </figure>
    <div>
      <h2>What does Scout let us do</h2>
      <a href="#what-does-scout-let-us-do">
        
      </a>
    </div>
    <p>Through leveraging the capabilities of Pytest and Cloudflare Workers, we have been able to build a configurable, robust and reliable system which allows us to easily write self explanatory tests for our APIs.</p><p>We can validate requests and responses against OpenAPI schemas and test behaviors over specific resources while getting alerted through multiple means if something goes wrong.</p><p>For specific use cases, we can write a test verifying the API behaves as it should, the configuration to be pushed at the edge is valid and a given zone will react as it should to security threats. Thus going beyond an end-to-end API test.</p><p>Scout quickly became our permanent live tester and monitor of APIs. We wrote tests for all endpoints to maintain a wide coverage of all our APIs. Scout has since been used for verifying an API version prior to its deployment to production. In fact, after a deployment in a production-like environment we can know in a couple of minutes if a new feature is good to go to production and assess if it is behaving correctly.</p><p>We hope you enjoyed this deep dive description into one of our systems!</p> ]]></content:encoded>
            <category><![CDATA[Monitoring]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">2rnwmGJBmwG1wAJdzvcSG1</guid>
            <dc:creator>Elie Mitrani</dc:creator>
        </item>
        <item>
            <title><![CDATA[Partnering with civil society to track Internet shutdowns with Radar Alerts and API]]></title>
            <link>https://blog.cloudflare.com/partnering-with-civil-society-to-track-shutdowns/</link>
            <pubDate>Thu, 15 Dec 2022 14:02:00 GMT</pubDate>
            <description><![CDATA[ Learn more on how Cloudflare works with civil society organizations to provide tools to track Internet shutdowns using Radar Alerts and API. ]]></description>
            <content:encoded><![CDATA[ <p><i></i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2HghK2JjzdoM7YOZEWc8gX/55fd94bc3fb19952c85b77752dee1395/image1-31.png" />
            
            </figure><p>Internet shutdowns have long been a tool in government toolboxes when it comes to silencing opposition and cutting off access from the outside world. The KeepItOn campaign by Access Now, a group that defends the digital rights of global Internet users, documented at least 182 Internet shutdowns in 34 countries in 2021. Many of these shutdowns occurred during public protests, elections, and wars as an extreme form of censorship in places like <a href="https://www.wired.co.uk/article/afghanistan-taliban-internet">Afghanistan</a>, <a href="https://paradigmhq.org/internet-shutdown-dr-congo/">Democratic Republic of the Congo</a>, <a href="https://www.accessnow.org/stop-internet-shutdowns-in-ukraine/">Ukraine</a>, <a href="https://www.accessnow.org/internet-shutdowns-india-keepiton-2021/">India,</a> and <a href="https://www.state.gov/joint-statement-on-internet-shutdowns-in-iran/">Iran</a>.</p><p>There are a range of ways governments block or slow communications, including throttling, IP blocking, <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">DNS</a> interference, mobile data shutoffs, and <a href="https://www.cloudflare.com/learning/security/what-is-a-firewall/">deep packet inspection</a>, all with similar goals: exerting control over information.</p><p>Although Internet shutdowns are largely public, it is difficult to document and track the ways in which governments implement them. The shutdowns not only impact people’s ability to participate in civil and political life and the economy but also have grave consequences for trust in democratic institutions.</p><p>We have <a href="/q3-2022-internet-disruption-summary/">reported</a> on these shutdowns in the past, and for Cloudflare Impact Week, we want to tell you more about how we work with civil society organizations to provide tools to track and document the scope of these disruptions. We want to support their critical work and provide the tools they need so they can demand accountability and condemn the use of shutdowns to silence dissent.</p>
    <div>
      <h3>Radar Internet shutdown alerts for civil society</h3>
      <a href="#radar-internet-shutdown-alerts-for-civil-society">
        
      </a>
    </div>
    <p>We <a href="/introducing-cloudflare-radar/">launched Radar in 2020</a> to shine light on the Internet’s patterns, insights, threats, and trends based on aggregated data from our network. Once we launched Radar, we found that many civil society organizations and those who work in democracy-building use Radar to track trends in countries to better understand the rise and fall of Internet usage.</p><p>Internally, we had an alert system for potential Internet disruptions that we use as an early warning regarding shifts in network patterns and incidents. When we engaged with these organizations that use Radar to track Internet trends, we learned more about how our internal tool to identify traffic distributions could be useful for organizations that work with human rights defenders on the ground that are impacted by these shutdowns.</p><p>To determine the best way to provide a tool to alert organizations when Cloudflare has seen these disruptions, we spoke with organizations such as Access Now, Internews, The Carter Center, National Democratic Institute, Internet Society, and the International Foundation for Electoral Systems. After our conversations, we launched <a href="/working-with-those-who-protect-human-rights-around-the-world/">Radar Internet shutdown alerts</a> in 2021 to provide alerts on when Cloudflare has detected significant drops in traffic with the hope that the information is used to document, track, and hold institutions accountable for these human rights violations.</p><p>Since 2021, we have been providing these alerts to civil society partners to track these shutdowns. As we have collected feedback to improve the alerts, we have seen many partners looking for more ways to integrate Radar and the alerts into their existing tracking mechanisms. With this, we announced <a href="/radar2/">Radar 2.0 with API access</a> for free so academics, data sleuths, civil society, human rights organizations, and other web enthusiasts can analyze, visualize, and investigate Internet usage across the globe, based on data from our global network. In addition, we launched <a href="https://radar.cloudflare.com/outage-center">Cloudflare Radar Outage Center</a> to archive Internet outages and make it easier for civil society organizations, journalists/news media, and impacted parties to track past shutdowns.</p>
    <div>
      <h3>Highlighting the work of our civil society partners to track Internet shutdowns</h3>
      <a href="#highlighting-the-work-of-our-civil-society-partners-to-track-internet-shutdowns">
        
      </a>
    </div>
    <p>We believe our job at Cloudflare is to build tools that improve privacy and security for a range of players on the Internet. With this, we want to highlight the work of our civil society partners. These organizations are pushing back against targeted shutdowns that inflict lasting damage to democracies around the world. Here are their stories.</p><p><a href="https://www.accessnow.org/"><b>Access Now</b></a></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/33KYq2fwmnj7dXpSgFIplr/4191b05161eee5edfef9d9559087a1d8/image3-19.png" />
            
            </figure><p>Access Now’s #KeepItOn coalition was <a href="https://www.accessnow.org/defending-the-web-for-all-how-global-champions-are-fighting-to-keepiton/">launched</a> in 2016 to help unite and organize the efforts of activists and organizations across the world to end Internet shutdowns. It now represents more than 280 organizations from 105 countries across the globe. The goal of STOP Project (Shutdown Tracker Optimization Project) is ultimately to document and report shutdowns accurately, which requires diligent verification. Access Now regularly uses multiple sources to identify and understand the shutdown, the choice and combination of which depends on where and how the shutdown occurred.</p><p>The tracker uses both quantitative and qualitative data to record the number of Internet shutdowns in the world in a given year and to characterize the nature of the shutdowns, including their magnitude, scope, and causes.</p><blockquote><p><b>Zach Rosson</b>, #KeepItOn Data Analyst, Access Now, details, “<i>Sometimes, we confirm an Internet shutdown through means such as technical measurement, while at other times we rely upon contextual information, such as news reports or personal accounts. We also work hard to document how a particular shutdown was ordered and how it impacted society, including why and how it happened.</i>”</p></blockquote><blockquote><p>On how Access Now’s #KeepItOn coalition uses Cloudflare Radar, <b>Rosson</b> says, <b>“</b><i>We use Radar Internet shutdown alerts in both email and tweet form, as a trusted source to help verify a shutdown occurrence. These alerts and their underlying measurements are used as primary sources in our dataset when compiling shutdowns for our annual report, so they are used in an archival sense as well. Cloudflare Radar is sometimes the first place that we hear about a shutdown, which is quite useful in a rapid response context, since we can quickly mobilize to verify the shutdown and have strong evidence when advocating against it.</i><b>”</b></p></blockquote><p>The recorded instances of shutdowns include events reported through local or international news sources that are included in the dataset, from local actors through Access Now’s Digital Security Helpline or the #KeepItOn Coalition email list, or directly from telecommunication and Internet companies.</p><blockquote><p><b>Rosson</b> notes, <b>“</b><i>When it comes to Radar 2.0 and API, we plan to use these resources to speed up our response, verification, and publication of shutdown data as compiled from different sources. Thus, the Cloudflare Radar Outage Center (CROC) and related API endpoint will be very useful for us to access timely information on shutdowns, either through visual inspection of the CROC in the short term or through using the API to pull data into a centralized database in the long term.</i><b>”</b></p></blockquote><p><a href="https://www.internetsociety.org/"><b>Internet Society: ISOC</b></a></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4lLhnoXtBjfsG7dLbEZxA0/d3f5e0207f4a3f8cf27df4ea172bbfe4/image2-29.png" />
            
            </figure><p>On the Internet Society Pulse platform, Susannah Gray, Director, Communications, Internet Society, explains that they strive to curate meaningful information around a government-mandated Internet shutdown by using data from multiple trusted sources, and making it available to everyone, everywhere in an easy-to-understand manner. ISOC does this by monitoring Internet traffic using various tools, including Radar. When they see something that might indicate that an Internet shutdown is in progress, they check if the shutdown meets <a href="https://pulse.internetsociety.org/blog/tracking-internet-shutdowns">their  criteria</a>. For a shutdown to appear on the Pulse Shutdowns Tracker it needs to meet all the following requirements. It must:</p><ul><li><p>Be artificially induced, as evident from reputable sources, including government statements and orders.</p></li><li><p>Remove Internet access.</p></li><li><p>Affect access to a group of people.</p></li></ul><p>Once ISOC is certain that a shutdown is the result of government action, and isn’t the result of <a href="https://www.bleepingcomputer.com/news/security/major-bgp-leak-disrupts-thousands-of-networks-globally/">technical errors</a>, <a href="https://www.ripe.net/publications/news/industry-developments/youtube-hijacking-a-ripe-ncc-ris-case-study">routing misconfigurations</a>, or <a href="https://www.businessinsider.com/ships-anchor-cuts-internet-cables-to-jersey-jt-2016-11">infrastructure failures</a>, they prepare an incident page, collate related measurements from their trusted data partners, and then publish the information on the <a href="https://pulse.internetsociety.org/shutdowns">Pulse shutdowns tracker</a>.</p><blockquote><p>ISOC uses many resources to track shutdowns. <b>Gray</b> explains, <b>“</b><i>Radar Internet shutdown alerts are incredibly useful for bringing incidents to our attention as they are happening. The easy access to the data provided helps us assess the nature of an outage. If an outage is established as a government-mandated shutdown, we often use </i><a href="https://pulse.internetsociety.org/shutdowns/short-internet-disruption-in-cuba"><i>screenshots of Radar charts</i></a><i> on the Pulse shutdowns tracker incident page to help illustrate how traffic stopped flowing in and out of a country during the shutdown. We provide a link back to the Radar platform so that people interested in getting more in-depth data can find out more.</i><b>”</b></p></blockquote><p>ISOC’s aim has never been to be the first to report a government-mandated shutdown: instead, their mission is to report accurate and meaningful information about the shutdown and explore its impact on the economy and society.</p><blockquote><p><b>Gray</b> adds, <b>“</b><i>For Radar 2.0 and the API, we plan to use it as part of the data aggregation tool we are developing. This internal tool will combine several outage alert and monitoring tools and sources into one single system so that we are able to track incidents more efficiently.</i><b>”</b></p></blockquote><p><a href="https://ooni.org/"><b>Open Observatory of Network Interference: OONI</b></a></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6fMuHyuEMjirW8hJXdOGMv/466fd8e59474d6e9bc4ff9292baae86b/image4-16.png" />
            
            </figure><p><a href="https://ooni.org/">OONI</a> is a nonprofit that measures Internet censorship, including the blocking of websites, instant messaging apps, and circumvention tools. <a href="https://radar.cloudflare.com/">Cloudflare Radar</a> is one of the main public data sources that they use when examining reported Internet connectivity shutdowns. For example, OONI relied on Radar data when <a href="https://ooni.org/post/2022-iran-blocks-social-media-mahsa-amini-protests/#network-outages">reporting on shutdowns in Iran</a> amid ongoing protests. In 2022, the team launched the <a href="https://explorer.ooni.org/chart/mat">Measurement Aggregation Toolkit (MAT)</a>, which enables the public to track censorship worldwide and create their own charts based on real-time OONI data. OONI also forms <a href="https://ooni.org/partners">partnerships</a> with multiple digital rights organizations that use OONI tools and data to monitor and respond to censorship events in their regions.</p><blockquote><p><b>Maria Xynou</b>, OONI Research and Partnerships Director, explains <b>“</b><i>Cloudflare Radar is one of the main public data sources that OONI has referred to when examining reported internet connectivity shutdowns. Specifically, OONI refers to Cloudflare Radar to check whether the platform provides signals of a reported internet connectivity shutdown; compare Cloudflare Radar signals with those visible in other, relevant public data sources (such as </i><a href="https://ioda.inetintel.cc.gatech.edu/"><i>IODA</i></a><i>, and </i><a href="https://transparencyreport.google.com/traffic/overview?hl=en"><i>Google traffic data</i></a><i>).</i><b>”</b></p></blockquote>
    <div>
      <h3>Tracking the shutdowns of tomorrow</h3>
      <a href="#tracking-the-shutdowns-of-tomorrow">
        
      </a>
    </div>
    <p>As we work with more organizations in the human rights space and learn how our global network can be used for good, we are eager to improve and create new tools to protect human rights in the digital age.</p><p>If you would like to be added to Radar Internet Shutdown alerts, please contact <a>radar@cloudflare.com</a> and follow the <a href="https://twitter.com/CloudflareRadar">Cloudflare Radar alert Twitter page</a> and <a href="https://radar.cloudflare.com/outage-center">Cloudflare Radar Outage Center (CROC</a>). For access to the Radar API, please visit <a href="https://developers.cloudflare.com/radar/">Cloudflare Radar.</a></p> ]]></content:encoded>
            <category><![CDATA[Impact Week]]></category>
            <category><![CDATA[Internet Shutdown]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Radar Alerts]]></category>
            <category><![CDATA[Radar API]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Policy & Legal]]></category>
            <guid isPermaLink="false">5ALIcq0sVUascLp7yNq35K</guid>
            <dc:creator>Jocelyn Woolbright</dc:creator>
        </item>
        <item>
            <title><![CDATA[How Cloudflare uses Terraform to manage Cloudflare]]></title>
            <link>https://blog.cloudflare.com/terraforming-cloudflare-at-cloudflare/</link>
            <pubDate>Thu, 17 Nov 2022 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare uses the Cloudflare Terraform provider extensively to make changes to our internal accounts as easy as opening a pull request. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Configuration management is far from a solved problem. As organizations scale beyond a handful of administrators, having a secure, auditable, and self-service way of updating system settings becomes invaluable. Managing a Cloudflare account is no different. With <a href="https://www.cloudflare.com">dozens of products</a> and <a href="https://api.cloudflare.com/">hundreds of API endpoints</a>, keeping track of current configuration and making bulk updates across multiple zones can be a challenge. While the Cloudflare Dashboard is great for analytics and feature exploration, any changes that could potentially impact users really should get a code review before being applied!</p><p>This is where <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs">Cloudflare's Terraform provider</a> can come in handy. Built as a layer on top of the <a href="https://github.com/cloudflare/cloudflare-go">cloudflare-go</a> library, the provider allows users to interface with the Cloudflare API using stateful <a href="https://developer.hashicorp.com/terraform">Terraform</a> resource declarations. Not only do we actively support this provider for customers, we make extensive use of it internally! In this post, we hope to provide some best practices we've learned about managing complex Cloudflare configurations in Terraform.</p>
    <div>
      <h2>Why Terraform</h2>
      <a href="#why-terraform">
        
      </a>
    </div>
    <p>Unsurprisingly, we find Cloudflare's products to be pretty useful for securing and enhancing the performance of services we deploy internally. We use <a href="https://www.cloudflare.com/dns/">DNS</a>, <a href="https://www.cloudflare.com/waf/">WAF</a>, <a href="https://www.cloudflare.com/products/zero-trust/">Zero Trust</a>, <a href="https://www.cloudflare.com/products/zero-trust/email-security/">Email Security</a>, <a href="https://www.cloudflare.com/developer-platform-hub/">Workers</a>, and all manner of <a href="https://www.cloudflare.com/whats-new/">experimental new features</a> throughout the company. This <a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food">dog-fooding</a> allows us to battle-harden the services we provide to users and feed our desired features back to the product teams all while running the backend of Cloudflare. But, as Cloudflare grew, so did the complexity and importance of our configuration.</p><p>When we were a much smaller company, we only had a handful of accounts with designated administrators making changes on behalf of their colleagues. However, over time this handful of accounts grew into hundreds with each managed by separate teams. Independent accounts are useful in that they allow service-owners to make modifications that can't impact others, but it comes with overhead.</p><p>We faced the challenge of ensuring consistent security policies, up-to-date account memberships, and change visibility. While our  accounts were still administered by kind human stewards, we had numerous instances of account members not being removed after they transferred to a different team. While this never became a security incident, it demonstrated the shortcomings of manually provisioning account memberships. In the case of a production service migration, the administrator executing the change would often hop on a video call and ask for others to triple-check an IP address, ruleset, or access policy update. It was an era of looking through the audit logs to see what broke a service.</p><p>We wanted to make it easier for developers and users to make the changes they wanted without having to reach out to an administrator. Defining our configuration in code using Terraform has allowed us to keep tabs on the complexity of configuration while improving visibility and change management practices. By dogfooding the Cloudflare Terraform provider, we've been able to ensure:</p><ul><li><p>Modifications to accounts are peer reviewed by the team that owns an account.</p></li><li><p>Each change is tied to a user, commit, and a ticket explaining the rationale for the change.</p></li><li><p>API Tokens are tied to service accounts rather than individual human users, meaning they survive team changes and offboarding.</p></li><li><p>Account configuration can be audited by anyone at the company for current state, accuracy, and security without needing to add everyone as a member of every account.</p></li><li><p>Large changes, such as <a href="/how-cloudflare-implemented-fido2-and-zero-trust/">enforcing hard keys</a> can be done rapidly– even in a single pull request.</p></li><li><p>Configuration can be easily copied and reused across accounts to promote best practices and speed up development.</p></li><li><p>We can use and iterate on our awesome provider and provide a better experience to other users (shoutout in particular to <a href="https://github.com/jacobbednarz">Jacob</a>!).</p></li></ul>
    <div>
      <h2>Terraform in CI/CD</h2>
      <a href="#terraform-in-ci-cd">
        
      </a>
    </div>
    <p><a href="https://github.com/hashicorp/terraform">Terraform</a> has a fairly mature open source ecosystem, built from years of running-in-production experience. Thus, there are a number of ways to make interacting with the system feel as comfortable to developers as git. One of these tools is <a href="https://www.runatlantis.io/">Atlantis</a>.</p><p>Atlantis acts as <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/">continuous integration/continuous delivery (CI/CD)</a> for Terraform; fitting neatly into version control workflows, and giving visibility into the changes being deployed in each code change. We use Atlantis to display Terraform plans (effectively a diff in configuration) within pull requests and apply the changes after the pull request has been approved. Having all the output from the terraform provider in the comments of a pull request means there's no need to fiddle with the state locally or worry about where a state lock is coming from. Using Terraform CI/CD like this makes configuration management approachable to developers and non-technical folks alike.</p><p>In this example pull request, I'm adding a user to the cloudflare-cool-account (see the code in the next section). Once the PR is opened, Bitbucket posts a webhook to Atlantis, telling it to run a `terraform plan` using this branch. The resulting comment is placed in the pull request. Notice that this pull request can't be applied or merged yet as it doesn't have an approval! Once the pull request is approved, I would comment "atlantis apply", wait for Atlantis to post a comment containing the output of the command, and merge the pull request if that output looks correct.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6OqbJjxYbT2Dw4qpvFltng/d672644824a4808946b7f71d5d2fb3b9/image4-25.png" />
            
            </figure><p>Our Terraforming Cloudflare architecture consists of a monorepo with one directory (and tfstate) for each internally-owned Cloudflare account. This keeps all of our Cloudflare configuration centralized for easier oversight while remaining neatly organized.</p><p>It will be possible in a future (<a href="https://github.com/cloudflare/terraform-provider-cloudflare/issues/1646">as of this writing</a>) release to manage multiple Cloudflare accounts in the same tfstate, but we've found that accounts in our use generally map fairly neatly onto teams. Teams can be configured as <a href="https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners">CODEOWNERS</a> for a given directory and be tagged on any pull requests to that account. With teams owning separate accounts and each account having a separate tfstate, it's rare for pull requests to get stuck waiting for a lock on the tfstate. Team-account-sized states remain relatively small, meaning that they also build quickly. Later on, we'll share some of the other optimizations we've made to keep the repo user-friendly.</p><p>Each of our terraform states, given that they <a href="https://developer.hashicorp.com/terraform/language/settings/backends/configuration#credentials-and-sensitive-data">include secrets (including the API key!)</a>, is stored encrypted in an internal datastore. When a pull request is opened, Atlantis reaches out to a piece of middleware (that we may open source once it's cleaned up a bit) that retrieves and decrypts the state for processing. Once the pull request is applied, the state is encrypted and put away again.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2WMBQIoo8yaKC6RLz75E1T/c29b2d834e0f580ca5912733a7a0e51e/image2-44.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/76bxQk0hlFPFOHRyscuObV/23e258f576ab1e9b0c171c3fb5ee0ada/image5-14.png" />
            
            </figure><p>We execute a daily Terraform apply across all tfstates to capture any unintended config drift and rotate certificates when they approach expiration. This prevents unrelated changes from popping up in pull request diffs and causing confusion. While we could run more frequent state applies to ensure Terraform remains firmly up to date, once-a-day rectification strikes a balance between code enforcement and avoiding state locks while users are running Terraform plans in pull requests.</p><p>One of the problems that we encountered during our transition to Terraform is that folks were in the habit of making updates to configuration in the Dashboard and were still able to edit settings there. Thus, we didn't always have a single source of truth for our configuration in code. It also meant the change would get mysteriously (to them) reverted the next day! So that's why I'm excited to share a new Zero Trust Dashboard toggle that we've been turning on for our accounts internally: API/Terraform read-only mode.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7Ks2CZ0OxPwNbV5Kal95Tj/ce12fabf21a7fe81c67690f4fc4cf4ef/image1-59.png" />
            
            </figure><p>Easily one of my favorite new features</p><p>With this button, we're able to politely prevent manual changes to your Cloudflare account’s Zero Trust configuration without removing permissions from the set of users who can fix settings manually in a break-glass emergency scenario. <a href="https://api.cloudflare.com/#zero-trust-organization-update-your-zero-trust-organization">Check out how you can enable this setting in your Zero Trust organization</a>.</p>
    <div>
      <h2>Slick Snippets and Terraforming Recommendations</h2>
      <a href="#slick-snippets-and-terraforming-recommendations">
        
      </a>
    </div>
    <p>As our Terraform repository has matured, we've refined how we define Cloudflare resources in code. By finding a sweet spot between code reuse and readability, we've been able to minimize operational overhead and generally let users get their work done. Here's a couple of useful snippets that have been particularly valuable to us.</p>
    <div>
      <h3>Account Membership</h3>
      <a href="#account-membership">
        
      </a>
    </div>
    <p>This allows for defining a fairly straightforward mapping of user emails to account privileges without code duplication or complex modules. We pull the list of human-friendly names of <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/data-sources/account_roles">account roles</a> from the API to show <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/account_member">user</a> permission assignments at a glance. Note: <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/account_member#status">status</a> is a new argument that allows for accounts to be added without sending an email to the user; perfect for when an organization is using SSO. (Thanks <a href="https://github.com/patrobinson">patrobinson</a> for the <a href="https://github.com/cloudflare/terraform-provider-cloudflare/issues/1654">feature request</a> and <a href="https://github.com/markblackman">mblackman</a> for the <a href="https://github.com/cloudflare/terraform-provider-cloudflare/pull/1920">PR</a>!)</p>
            <pre><code>variables.tf
—-
data "cloudflare_account_roles" "my_account" {
	account_id = var.account_id
}

locals {
  roles = {
	for role in data.cloudflare_account_roles.my_account.roles :
  	role.name =&gt; role
  }
}

members.tf
—-
locals {
  users = {
    emerson = {
      roles = [
        local.roles["Administrator"].id
      ]
    }
    lucian = {
      roles = [
        local.roles["Super Administrator - All Privileges"].id
      ]
    }
    walruto = {
      roles = [
        local.roles_by_name["Audit Logs Viewer"].id,
        local.roles_by_name["Cloudflare Access"].id,
        local.roles_by_name["DNS"].id
      ]
  }
}

resource "cloudflare_account_member" "account_member" {
  for_each  	= local.users
  account_id	= var.account_id
  email_address = "${each.key}@cloudflare.com"
  role_ids  	= each.value.roles
  status            = "accepted"
}</code></pre>
            
    <div>
      <h3>Defining Auto-Refreshing Access Service Tokens</h3>
      <a href="#defining-auto-refreshing-access-service-tokens">
        
      </a>
    </div>
    <p>The <a href="https://github.com/cloudflare/terraform-provider-cloudflare/issues/1866">GitHub issue</a> and <a href="https://github.com/cloudflare/terraform-provider-cloudflare/pull/1872">provider change</a> that enabled automatic Access service token refreshes actually came from a need inside Cloudflare. Here's how we ended up implementing it. We begin by defining a set of services that need to connect to our hostnames that are protected by Access. Each of these <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/access_service_token">tokens</a> are created and stored in a secret key value store. Next, we reference those access tokens by ID in the target <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/access_policy">Access policies</a>. Once this has run, the service owner or the service itself can retrieve the credentials from the data store. (Note: we're using Vault here, but any storage provider could be used in its place).</p>
            <pre><code>tokens.tf
—
locals {
  service_tokens = toset([
    "customer-service",     # TICKET-120
    "full-service",               # TICKET-128
    "quality-of-service"      # TICKET-420 
    "room-service"            # TICKET-927
  ])
}

resource "cloudflare_access_service_token" "token" {
  for_each   = local.service_tokens
  account_id = var.account_id
  name   	= each.key
  min_days_for_renewal = 30
}

resource "vault_generic_secret" "access_service_token" {
  for_each   = local.service_tokens
  path = "kv/secrets/${each.key}/access_service_token"
  disable_read = true

  data_json = jsonencode({
	client_id        = cloudflare_access_service_token.token["${each.key}"].client_id,
client_secret = cloudflare_access_service_token.token["${each.key}"].client_secret
  })
}

super_cool_hostname.tf
—
resource "cloudflare_access_application" "super_cool_hostname" {
  account_id             	            = var.account_id
  name                   	            = "Super Cool Hostname"
  domain                 	            = "supercool.hostname.tld"
}

resource "cloudflare_access_policy" "super_cool_hostname_service_access" {
  application_id = cloudflare_access_application.super_cool_hostname.id
  zone_id    	= data.cloudflare_zone.hostname_tld.id
  name       	= "TICKET-927 Allow Room Service "
  decision   	= "non_identity"
  precedence 	= 1
  include {
	service_token = [cloudflare_access_service_token.token["room-service"].id]
  }
}</code></pre>
            
    <div>
      <h3>mTLS (Authenticated Origin Pulls) certificate creation and rotation</h3>
      <a href="#mtls-authenticated-origin-pulls-certificate-creation-and-rotation">
        
      </a>
    </div>
    <p>To further defense-in-depth objectives, we've been rolling out mTLS throughout our internal systems. One of the places where we can take advantage of our Terraform provider is in defining <a href="/protecting-the-origin-with-tls-authenticated-origin-pulls/">AOP (Authenticated Origin Pulls)</a> certificates to lock down the Cloudflare-edge-to-origin connection. Anyone who has <a href="https://www.cloudflare.com/application-services/solutions/certificate-lifecycle-management/">managed certificates</a> of any kind can speak to the headaches they can cause. Having certificate configurations in Terraform takes out the manual work of rotation and expiration.</p><p>In this example we're defining <a href="https://api.cloudflare.com/#per-hostname-authenticated-origin-pull-properties">hostname-level AOP</a> as opposed to <a href="https://api.cloudflare.com/#zone-level-authenticated-origin-pulls-properties">zone-level AOP</a>. We start by cutting a certificate for each hostname. Once again we're using Vault for certificate creation, but other backends could be used just as well. This certificate is created with a (not-shown) 30 day expiration, but set to renew automatically. This means once the time-to-expiration is equal to <a href="https://registry.terraform.io/providers/hashicorp/vault/latest/docs/resources/pki_secret_backend_cert#min_seconds_remaining">min_seconds_remaining</a>, the resource will be automatically tainted and replaced on the next Terraform run. We like to give this automation plenty of room before expiration to take into account holiday seasons and avoid sending alerts to humans when the alerts hit seven days to expiration. For the rest of this snippet, the <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/authenticated_origin_pulls_certificate">certificate is uploaded to Cloudflare</a> and the ID from that upload is then <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/authenticated_origin_pulls">placed in the AOP configuration</a> for the given hostname. The <a href="https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle#create_before_destroy">create_before_destroy</a> meta-argument ensures that the replacement certificate is uploaded successfully before we remove the certificate that's currently in place.</p>
            <pre><code>locals {
  hostnames = toset([
	"supercool.hostname.tld",
            "thatsafinelooking.hostname.tld"
  ])
}

resource "vault_pki_secret_backend_cert" "vault_cert" {
  for_each          	      = local.hostnames
  backend           	      = "pki-aop"
  name              	      = "default"
  auto_renew         	      = true
  common_name       	      = "${each.key}.aop.pki.vault.cfdata.org"
  min_seconds_remaining = 864000 // renew when there are 10 days left before expiration
}

resource "cloudflare_authenticated_origin_pulls_certificate" "aop_cert" {
  for_each  = local.hostnames
  zone_id   = data.cloudflare_zone.hostname_tld.id
  type 	      = "per-hostname"

  certificate = vault_pki_secret_backend_cert.vault_cert["${each.key}"].certificate
  private_key = vault_pki_secret_backend_cert.vault_cert["${each.key}"].private_key

  lifecycle {
	create_before_destroy = true
  }
}

resource "cloudflare_authenticated_origin_pulls" "aop_config" {
  for_each                           	= local.hostnames
  zone_id    	                        = data.cloudflare_zone.hostname_tld.id
  authenticated_origin_pulls_certificate = cloudflare_authenticated_origin_pulls_certificate.aop_cert["${each.key}"].id
  hostname                           	= "${each.key}"
  enabled                            	= true
}</code></pre>
            
    <div>
      <h3>Terraform recommendations</h3>
      <a href="#terraform-recommendations">
        
      </a>
    </div>
    <p>The comfortable automation that we've achieved thus far did not come without some hair-pulling. Below are a few of the learnings that have allowed us to maintain the repository as a side project run by two engineers (shoutout <a href="https://github.com/dhaynespls">David</a>).</p>
    <div>
      <h4><b>Store your state somewhere safe</b></h4>
      <a href="#store-your-state-somewhere-safe">
        
      </a>
    </div>
    <p>It feels worth repeating that the tfstate <a href="https://developer.hashicorp.com/terraform/language/settings/backends/configuration#credentials-and-sensitive-data"><b>contains secrets</b></a> <b>including any API keys you're using with providers</b> and <b>the default location of the tfstate is in the current working directory.</b> It's very easy to accidentally commit this to source control. By defining a <a href="https://developer.hashicorp.com/terraform/language/settings/backends/configuration">backend</a>, the state can be stored with a cloud storage provider, <a href="https://developer.hashicorp.com/terraform/language/settings/backends/local">in a secure location on a filesystem</a>, <a href="https://developer.hashicorp.com/terraform/language/settings/backends/pg">in a database</a>, or even <a href="https://mirio.dev/2022/09/18/implementing-a-terraform-state-backend/">Cloudflare Workers</a>! Wherever the state is stored, make sure it is encrypted.</p>
    <div>
      <h5><b>Choose simplicity, avoid modules</b></h5>
      <a href="#choose-simplicity-avoid-modules">
        
      </a>
    </div>
    <p><a href="https://developer.hashicorp.com/terraform/language/modules">Modules</a> are intended to reduce code repetition for well-defined chunks of systems such as "I want three clusters of whizz-bangs in locations A, C, and F." If cloud-computing was like <a href="https://wiki.factorio.com/Blueprint">Factorio</a>, this would be amazing. However, financial, technical, and physical constraints mean subtle differences in systems develop over time such as "I want fewer whizz-bangs in C and the whizz-bangs in F should get a different network topology." In Terraform, implementation logic of these requirements is moved to the module code. <a href="https://github.com/hashicorp/hcl">HCL</a> is absolutely not the place to write decipherable conditionals. While module versioning prevents having to make every change backwards-compatible, keeping module usage up-to-date becomes another chore for repository maintainers.</p><p>An understandable code base is a user-friendly codebase. It's rare that a deeply cryptic error will return from a misconfigured resource definition. Conversely, modules, especially custom ones, can lead users on a head-scratching adventure. This kind of system can't scale with confused users.</p><p>A few well-designed <a href="https://developer.hashicorp.com/terraform/language/meta-arguments/for_each">for_each</a> loops (we're obviously fans) can achieve similar objectives as modules without the complexity. It's fine to use plain old resources too! Especially when there are more than a handful of varying arguments, it's more valuable for the configuration to be clear than to be eloquent. For example: an <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/account_member">account_member</a> resource makes sense to be in a for_loop, but a <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/resources/page_rule">page_rule</a> probably doesn't.</p>
    <div>
      <h5><b>Keep tfstates small</b></h5>
      <a href="#keep-tfstates-small">
        
      </a>
    </div>
    <p>Maintaining quick pull-request-to-plan turnaround keeps Terraform from feeling like a burden on users' time. Furthermore, if a plan is taking 30 minutes to run, a rollback in the case of an issue would also take 30 minutes! This post describes our single-account-to-tfstate model.</p><p>However, after noticing slow-downs coming from the large number of AOP certificate configurations in a big zone, we moved that code to a separate tfstate. We were able to make this change because AOP configuration is fairly self-contained. To ensure there would be no fighting between the states, we kept the API token permissions for each tfstate mutually exclusive of each other. Our Atlantis Terraform plans typically finish under five minutes. If it feels impossible to keep the size of a tfstate down to a reasonable amount of time, it may be worth considering a different tool for that bit of configuration management.</p>
    <div>
      <h5><b>Know when to use a Different tool</b></h5>
      <a href="#know-when-to-use-a-different-tool">
        
      </a>
    </div>
    <p>Terraform isn't a panacea. We generally don't use Terraform to manage DNS records, for example. We use <a href="/improving-the-resiliency-of-our-infrastructure-dns-zone/">OctoDNS</a> which integrates more neatly into our infrastructure automation. DNS records can quickly add up to long state-rendering times and are often dynamically generated from systems that Terraform doesn't know about. To avoid conflicts, there should only ever be one system publishing changes to DNS records.</p><p>We also haven't figured out a maintainable way of managing Workers scripts in Terraform. When a .js script in the Terraform directory changes, Terraform isn't aware of it. This means a change needs to occur somewhere else in a .tf file before the plan diff is generated. It likely isn't an unsolvable issue, but doesn't seem particularly worth cramming into Terraform when there are better options for Worker management like <a href="https://developers.cloudflare.com/workers/wrangler/">Wrangler</a>.</p>
    <div>
      <h2>Looking forward</h2>
      <a href="#looking-forward">
        
      </a>
    </div>
    <p>We're continuing to invest in the Cloudflare Terraforming experience both for our own use and for the benefit of our users. With the provider, we hope to offer a comfortable and scalable method of interacting with Cloudflare products. Hopefully this post has presented some useful suggestions to anyone interested in adopting Cloudflare-configuration-as-code. Don't hesitate to reach out on the <a href="https://github.com/cloudflare/terraform-provider-cloudflare">GitHub project</a> for troubleshooting, bug reports, or feature requests. For more in depth documentation on using Terraform to manage your Cloudflare account, <a href="https://developers.cloudflare.com/terraform/">read on here</a>. And if you don't have a Cloudflare account already, <a href="https://dash.cloudflare.com/sign-up/teams">click here</a> to get started.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Terraform]]></category>
            <guid isPermaLink="false">13LOJFjSYZuMAEcqDWYnk0</guid>
            <dc:creator>Michael Wolf</dc:creator>
            <dc:creator>David Haynes</dc:creator>
        </item>
        <item>
            <title><![CDATA[The Cloudflare API now uses OpenAPI schemas]]></title>
            <link>https://blog.cloudflare.com/open-api-transition/</link>
            <pubDate>Wed, 16 Nov 2022 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare now has OpenAPI Schemas available for the API. Users can use these schemas in any open source OpenAPI Tooling. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2yxy5Dmp3sT7mMhoPFwyMs/43ad3be6028c91ab4cee6ce72eea508e/image1-44.png" />
            
            </figure><p>Today, we are announcing the general availability of <a href="https://github.com/cloudflare/api-schemas">OpenAPI Schemas for the Cloudflare API</a>. These are published via GitHub and will be updated regularly as Cloudflare adds and updates APIs. OpenAPI is the widely adopted standard for defining APIs in a machine-readable format. OpenAPI Schemas allow for the ability to plug our API into a wide breadth of tooling to accelerate development for ourselves and customers. Internally, it will make it easier for us to maintain and update our APIs. Before getting into those benefits, let’s start with the basics.</p>
    <div>
      <h2>What is OpenAPI?</h2>
      <a href="#what-is-openapi">
        
      </a>
    </div>
    <p>Much of the Internet is built upon <a href="https://www.cloudflare.com/learning/security/api/what-is-an-api/">APIs (Application Programming Interfaces)</a> or provides them as services to clients all around the world. This allows computers to talk to each other in a standardized fashion. OpenAPI is a widely adopted standard for how to define APIs. This allows other machines to reliably parse those definitions and use them in interesting ways. Cloudflare’s own <a href="https://developers.cloudflare.com/api-shield/security/schema-validation/">API Shield product</a> uses OpenAPI schemas to provide schema validation to ensure only well-formed API requests are sent to your origin.</p><p>Cloudflare itself has an API that customers can use to interface with our security and performance products from other places on the Internet. How do we define our own APIs? In the past we used a standard called <a href="https://github.com/json-schema-org/json-hyperschema-spec">JSON Hyper-Schema</a>. That had served us well, but as time went on we wanted to adopt more tooling that could both benefit ourselves internally and make our customer’s lives easier. The OpenAPI community has flourished over the past few years providing many capabilities as we will discuss that were unavailable while we used JSON Hyper-Schema. As of today we now use OpenAPI.</p><p>You can learn more about OpenAPI itself <a href="https://oai.github.io/Documentation/start-here.html">here</a>. Having an open, well-understood standard for defining our APIs allows for shared tooling and infrastructure to be used that can read these standard definitions. Let’s take a look at a few examples.</p>
    <div>
      <h2>Uses of Cloudflare’s OpenAPI schemas</h2>
      <a href="#uses-of-cloudflares-openapi-schemas">
        
      </a>
    </div>
    <p>Most customers won’t need to use the schemas themselves to see value. The first system leveraging OpenAPI schemas is our <a href="https://developers.cloudflare.com/api/">new API Docs</a> that were <a href="/building-a-better-developer-experience-through-api-documentation/">announced today</a>. Because we now have OpenAPI schemas, we leverage the open source tool <a href="https://stoplight.io/open-source/elements">Stoplight Elements</a> to aid in generating this new doc site. This allowed us to retire our previously custom-built site that was hard to maintain. Additionally, many engineers at Cloudflare are familiar with OpenAPI, so we gain teams can write new schemas more quickly and are less likely to make mistakes by using a standard that teams understand when defining new APIs.</p><p>There are ways to leverage the schemas directly, however. The OpenAPI community has a huge number of tools that only require a set of schemas to be able to use. Two such examples are mocking APIs and library generation.</p>
    <div>
      <h3>Mocking Cloudflare’s API</h3>
      <a href="#mocking-cloudflares-api">
        
      </a>
    </div>
    <p>Say you have code that calls Cloudflare’s API and you want to be able to easily run unit tests locally or integration tests in your <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/">CI/CD pipeline</a>. While you could just call Cloudflare’s API in each run, you may not want to for a few reasons. First, you may want to run tests frequently enough that managing the creation and tear down of resources becomes a pain. Also, in many of these tests you aren’t trying to validate logic in Cloudflare necessarily, but your own system’s behavior. In this case, mocking Cloudflare’s API would be ideal since you can gain confidence that you aren’t violating Cloudflare’s API contract, but without needing to worry about specifics of managing real resources. Additionally, mocking allows you to simulate different scenarios, like being rate limited or receiving 500 errors. This allows you to test your code for typically rare circumstances that can end up having a serious impact.</p><p>As an example, <a href="https://docs.stoplight.io/docs/prism/83dbbd75532cf-http-mocking">Stoplight Prism</a> could be used to mock Cloudflare’s API for testing purposes. With a local copy of Cloudflare’s API Schemas you can run the following command to spin up a local mock server:</p>
            <pre><code>$ docker run --init --rm \
  -v /home/user/git/api-schemas/openapi.yaml:/tmp/openapi.yaml \
  -p 4010:4010 stoplight/prism:4 \
  mock -h 0.0.0.0 /tmp/openapi.yaml</code></pre>
            <p>Then you can send requests to the mock server in order to validate that your use of Cloudflare’s API doesn’t violate the API contract locally:</p>
            <pre><code>$ curl -sX PUT localhost:4010/zones/f00/activation_check \
  -Hx-auth-email:foo@bar.com -Hx-auth-key:foobarbaz | jq
{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "id": "023e105f4ecef8ad9ca31a8372d0c353"
  }
}</code></pre>
            <p>This means faster development and shorter test runs while still catching API contract issues early before they get merged or deployed.</p>
    <div>
      <h3>Library generation</h3>
      <a href="#library-generation">
        
      </a>
    </div>
    <p>Cloudflare has libraries in many programming languages like <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest">Terraform</a> and <a href="https://github.com/cloudflare/cloudflare-go">Go</a>, but we don’t support every possible programming language. Fortunately, using a tool like <a href="https://github.com/OpenAPITools/openapi-generator">openapi generator</a>, you can feed in Cloudflare’s API schemas and generate a library in a wide range of languages to then use in your code to talk to Cloudflare’s API. For example, you could generate a Java library using the following commands:</p>
            <pre><code>git clone https://github.com/openapitools/openapi-generator
cd openapi-generator
mvn clean package
java -jar modules/openapi-generator-cli/target/openapi-generator-cli.jar generate \
   -i https://raw.githubusercontent.com/cloudflare/api-schemas/main/openapi.yaml \
   -g java \
   -o /var/tmp/java_api_client</code></pre>
            <p>And then start using that client in your Java code to talk to Cloudflare’s API.</p>
    <div>
      <h2>How Cloudflare transitioned to OpenAPI</h2>
      <a href="#how-cloudflare-transitioned-to-openapi">
        
      </a>
    </div>
    <p>As mentioned earlier, we previously used JSON Hyper-Schema to define our APIs. We have roughly 600 endpoints that were already defined in the schemas. Here is a snippet of what one endpoint looks like in JSON Hyper-Schema:</p>
            <pre><code>{
      "title": "List Zones",
      "description": "List, search, sort, and filter your zones.",
      "rel": "collection",
      "href": "zones",
      "method": "GET",
      "schema": {
        "$ref": "definitions/zone.json#/definitions/collection_query"
      },
      "targetSchema": {
        "$ref": "#/definitions/response_collection"
      },
      "cfOwnership": "www",
      "cfPlanAvailability": {
        "free": true,
        "pro": true,
        "business": true,
        "enterprise": true
      },
      "cfPermissionsRequired": {
        "enum": [
          "#zone:read"
        ]
      }
    }</code></pre>
            <p>Let’s look at the same endpoint in OpenAPI:</p>
            <pre><code>/zones:
    get:
      description: List, search, sort, and filter your zones.
      operationId: zone-list-zones
      responses:
        4xx:
          content:
            application/json:
              schema:
                allOf:
                - $ref: '#/components/schemas/components-schemas-response_collection'
                - $ref: '#/components/schemas/api-response-common-failure'
          description: List Zones response failure
        "200":
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/components-schemas-response_collection'
          description: List Zones response
      security:
      - api_email: []
        api_key: []
      summary: List Zones
      tags:
      - Zone
      x-cfPermissionsRequired:
        enum:
        - '#zone:read'
      x-cfPlanAvailability:
        business: true
        enterprise: true
        free: true
        pro: true</code></pre>
            <p>You can see that the two look fairly similar and for the most part the same information is contained in each including method type, a description, and request and response definitions (although those are linked in $refs). The value of migrating from one to the other isn’t the change in how we define the schemas themselves, but in what we can do with these schemas. Numerous tools can parse the latter, the OpenAPI, while much fewer can parse the former, the JSON Hyper-Schema.</p><p>If this one API was all that made up the Cloudflare API, it would be easy to just convert the JSON Hyper-Schema into the OpenAPI Schema by hand and call it a day. Doing this 600 times, however, was going to be a huge undertaking. When considering that teams are constantly adding new endpoints, it would be impossible to keep up. It was also the case that our existing API docs used the existing JSON Hyper-Schema, so that meant that we would need to keep both schemas up to date during any transition period. There had to be a better way.</p>
    <div>
      <h3>Auto conversion</h3>
      <a href="#auto-conversion">
        
      </a>
    </div>
    <p>Given both JSON Hyper-Schema and OpenAPI are standards, it reasons that it should be possible to take a file in one format and convert to the other, right? Luckily the answer is yes! We built a tool that took all existing JSON Hyper-Schema and output fully compliant OpenAPI schemas. This of course didn’t happen overnight, but because of existing OpenAPI tooling, we could iteratively improve the auto convertor and run OpenAPI validation tooling over the output schemas to see what issues the conversion tool still had.</p><p>After many iterations and improvements to the conversion tool, we finally had fully compliant OpenAPI Spec schemas being auto-generated from our existing JSON Hyper-Schema. While we were building this tool, teams kept adding and updating the existing schemas and our Product Content team was also updating text in the schemas to make our API docs easier to use. The benefit of this process is we didn’t have to slow any of that work down since anything that changed in the old schemas was automatically reflected in the new schemas!</p><p>Once the tool was ready, the remaining step was to decide when and how we would stop making updates to the JSON Hyper-Schemas and move all teams to the OpenAPI Schemas. The (now old) API docs were the biggest concern, given they only understood JSON Hyper-Schema. Thanks to the help of our Developer Experience and Product Content teams, we were able to launch the new API docs today and can officially cut over to OpenAPI today as well!</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Now that we have fully moved over to OpenAPI, more opportunities become available. Internally, we will be investigating what tooling we can adopt in order to help reduce the effort of individual teams and speed up API development. One idea we are exploring is automatically creating openAPI schemas from code notations. Externally, we now have the foundational tools necessary to begin exploring how to auto generate and support more programming language libraries for customers to use. We are also excited to see what you may do with the schemas yourself, so if you do something cool or have ideas, don’t hesitate to share them with us!</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Deep Dive]]></category>
            <category><![CDATA[API]]></category>
            <guid isPermaLink="false">2W25ZXGCTx02W7CmYfX6FS</guid>
            <dc:creator>Garrett Galow</dc:creator>
        </item>
    </channel>
</rss>