
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 11 Apr 2026 13:33:12 GMT</lastBuildDate>
        <item>
            <title><![CDATA[DIY BYOIP: a new way to Bring Your Own IP prefixes to Cloudflare]]></title>
            <link>https://blog.cloudflare.com/diy-byoip/</link>
            <pubDate>Fri, 07 Nov 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Announcing a new self-serve API for Bring Your Own IP (BYOIP), giving customers unprecedented control and flexibility to onboard, manage, and use their own IP prefixes with Cloudflare's services. ]]></description>
            <content:encoded><![CDATA[ <p>When a customer wants to <a href="https://blog.cloudflare.com/bringing-your-own-ips-to-cloudflare-byoip/"><u>bring IP address space to</u></a> Cloudflare, they’ve always had to reach out to their account team to put in a request. This request would then be sent to various Cloudflare engineering teams such as addressing and network engineering — and then the team responsible for the particular service they wanted to use the prefix with (e.g., CDN, Magic Transit, Spectrum, Egress). In addition, they had to work with their own legal teams and potentially another organization if they did not have primary ownership of an IP prefix in order to get a <a href="https://developers.cloudflare.com/byoip/concepts/loa/"><u>Letter of Agency (LOA)</u></a> issued through hoops of approvals. This process is complex, manual, and  time-consuming for all parties involved — sometimes taking up to 4–6 weeks depending on various approvals. </p><p>Well, no longer! Today, we are pleased to announce the launch of our self-serve BYOIP API, which enables our customers to onboard and set up their BYOIP prefixes themselves.</p><p>With self-serve, we handle the bureaucracy for you. We have automated this process using the gold standard for routing security — the Resource Public Key Infrastructure, RPKI. All the while, we continue to ensure the best quality of service by generating LOAs on our customers’ behalf, based on the security guarantees of our new ownership validation process. This ensures that customer routes continue to be accepted in every corner of the Internet.</p><p>Cloudflare takes the security and stability of the whole Internet very seriously. RPKI is a cryptographically-strong authorization mechanism and is, we believe, substantially more reliable than common practice which relies upon human review of scanned documents. However, deployment and availability of some RPKI-signed artifacts like the AS Path Authorisation (ASPA) object remains limited, and for that reason we are limiting the initial scope of self-serve onboarding to BYOIP prefixes originated from Cloudflare's autonomous system number (ASN) AS 13335. By doing this, we only need to rely on the publication of Route Origin Authorisation (ROA) objects, which are widely available. This approach has the advantage of being safe for the Internet and also meeting the needs of most of our BYOIP customers. </p><p>Today, we take a major step forward in offering customers a more comprehensive IP address management (IPAM) platform. With the recent update to <a href="https://blog.cloudflare.com/your-ips-your-rules-enabling-more-efficient-address-space-usage/"><u>enable multiple services on a single BYOIP prefix</u></a> and this latest advancement to enable self-serve onboarding via our API, we hope customers feel empowered to take control of their IPs on our network.</p>
    <div>
      <h2>An evolution of Cloudflare BYOIP</h2>
      <a href="#an-evolution-of-cloudflare-byoip">
        
      </a>
    </div>
    <p>We want Cloudflare to feel like an extension of your infrastructure, which is why we <a href="https://blog.cloudflare.com/bringing-your-own-ips-to-cloudflare-byoip/"><u>originally launched Bring-Your-Own-IP (BYOIP) back in 2020</u></a>. </p><p>A quick refresher: Bring-your-own-IP is named for exactly what it does - it allows customers to bring their own IP space to Cloudflare. Customers choose BYOIP for a number of reasons, but the main reasons are control and configurability. An IP prefix is a range or block of IP addresses. Routers create a table of reachable prefixes, known as a routing table, to ensure that packets are delivered correctly across the Internet. When a customer's Cloudflare services are configured to use the customer's own addresses, onboarded to Cloudflare as BYOIP, a packet with a corresponding destination address will be routed across the Internet to Cloudflare's global edge network, where it will be received and processed. BYOIP can be used with our Layer 7 services, Spectrum, or Magic Transit. </p>
    <div>
      <h2>A look under the hood: How it works</h2>
      <a href="#a-look-under-the-hood-how-it-works">
        
      </a>
    </div>
    
    <div>
      <h3>Today’s world of prefix validation</h3>
      <a href="#todays-world-of-prefix-validation">
        
      </a>
    </div>
    <p>Let’s take a step back and take a look at the state of the BYOIP world right now. Let’s say a customer has authority over a range of IP addresses, and they’d like to bring them to Cloudflare. We require customers to provide us with a Letter of Authorization (LOA) and have an Internet Routing Registry (IRR) record matching their prefix and ASN. Once we have this, we require manual review by a Cloudflare engineer. There are a few issues with this process:</p><ul><li><p>Insecure: The LOA is just a document—a piece of paper. The security of this method rests entirely on the diligence of the engineer reviewing the document. If the review is not able to detect that a document is fraudulent or inaccurate, it is possible for a prefix or ASN to be hijacked.</p></li><li><p>Time-consuming: Generating a single LOA is not always sufficient. If you are leasing IP space, we will ask you to provide documentation confirming that relationship as well, so that we can see a clear chain of authorisation from the original assignment or allocation of addresses to you. Getting all the paper documents to verify this chain of ownership, combined with having to wait for manual review can result in weeks of waiting to deploy a prefix!</p></li></ul>
    <div>
      <h3>Automating trust: How Cloudflare verifies your BYOIP prefix ownership in minutes</h3>
      <a href="#automating-trust-how-cloudflare-verifies-your-byoip-prefix-ownership-in-minutes">
        
      </a>
    </div>
    <p>Moving to a self-serve model allowed us to rethink the manner in which we conduct prefix ownership checks. We asked ourselves: How can we quickly, securely, and automatically prove you are authorized to use your IP prefix and intend to route it through Cloudflare?</p><p>We ended up killing two birds with one stone, thanks to our two-step process involving the creation of an RPKI ROA (verification of intent) and modification of IRR or rDNS records (verification of ownership). Self-serve unlocks the ability to not only onboard prefixes more quickly and without human intervention, but also exercises more rigorous ownership checks than a simple scanned document ever could. While not 100% foolproof, it is a significant improvement in the way we verify ownership.</p>
    <div>
      <h3>Tapping into the authorities	</h3>
      <a href="#tapping-into-the-authorities">
        
      </a>
    </div>
    <p>Regional Internet Registries (RIRs) are the organizations responsible for distributing and managing Internet number resources like IP addresses. They are composed of 5 different entities operating in different regions of the world (<a href="https://developers.cloudflare.com/byoip/get-started/#:~:text=Your%20prefix%20must%20be%20registered%20under%20one%20of%20the%20Regional%20Internet%20Registries%20(RIRs)%3A"><u>RIRs</u></a>). Originally allocated address space from the Internet Assigned Numbers Authority (IANA), they in turn assign and allocate that IP space to Local Internet Registries (LIRs) like ISPs.</p><p>This process is based on RIR policies which generally look at things like legal documentation, existing database/registry records, technical contacts, and BGP information. End-users can obtain addresses from an LIR, or in some cases through an RIR directly. As IPv4 addresses have become more scarce, brokerage services have been launched to allow addresses to be leased for fixed periods from their original assignees.</p><p>The Internet Routing Registry (IRR) is a separate system that focuses on routing rather than address assignment. Many organisations operate IRR instances and allow routing information to be published, including all five RIRs. While most IRR instances impose few barriers to the publication of routing data, those that are operated by RIRs are capable of linking the ability to publish routing information with the organisations to which the corresponding addresses have been assigned. We believe that being able to modify an IRR record protected in this way provides a good signal that a user has the rights to use a prefix.</p><p>Example of a route object containing validation token (using the documentation-only address 192.0.2.0/24):</p>
            <pre><code>% whois -h rr.arin.net 192.0.2.0/24

route:          192.0.2.0/24
origin:         AS13335
descr:          Example Company, Inc.
                cf-validation: 9477b6c3-4344-4ceb-85c4-6463e7d2453f
admin-c:        ADMIN2521-ARIN
tech-c:         ADMIN2521-ARIN
tech-c:         CLOUD146-ARIN
mnt-by:         MNT-CLOUD14
created:        2025-07-29T10:52:27Z
last-modified:  2025-07-29T10:52:27Z
source:         ARIN</code></pre>
            <p>For those that don’t want to go through the process of IRR-based validation, reverse DNS (rDNS) is provided as another secure method of verification. To manage rDNS for a prefix — whether it's creating a PTR record or a security TXT record — you must be granted permission by the entity that allocated the IP block in the first place (usually your ISP or the RIR).</p><p>This permission is demonstrated in one of two ways:</p><ul><li><p>Directly through the IP owner’s authenticated customer portal (ISP/RIR).</p></li><li><p>By the IP owner delegating authority to your third-party DNS provider via an NS record for your reverse zone.</p></li></ul><p>Example of a reverse domain lookup using dig command (using the documentation-only address 192.0.2.0/24):</p>
            <pre><code>% dig cf-validation.2.0.192.in-addr.arpa TXT

; &lt;&lt;&gt;&gt; DiG 9.10.6 &lt;&lt;&gt;&gt; cf-validation.2.0.192.in-addr.arpa TXT
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 16686
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cf-validation.2.0.192.in-addr.arpa. IN TXT

;; ANSWER SECTION:
cf-validation.2.0.192.in-addr.arpa. 300 IN TXT "b2f8af96-d32d-4c46-a886-f97d925d7977"

;; Query time: 35 msec
;; SERVER: 127.0.2.2#53(127.0.2.2)
;; WHEN: Fri Oct 24 10:43:52 EDT 2025
;; MSG SIZE  rcvd: 150</code></pre>
            <p>So how exactly is one supposed to modify these records? That’s where the validation token comes into play. Once you choose either the IRR or Reverse DNS method, we provide a unique, single-use validation token. You must add this token to the content of the relevant record, either in the IRR or in the DNS. Our system then looks for the presence of the token as evidence that the request is being made by someone with authorization to make the requested modification. If the token is found, verification is complete and your ownership is confirmed!</p>
    <div>
      <h3>The digital passport 🛂</h3>
      <a href="#the-digital-passport">
        
      </a>
    </div>
    <p>Ownership is only half the battle; we also need to confirm your intention that you authorize Cloudflare to advertise your prefix. For this, we rely on the gold standard for routing security: the Resource Private Key Infrastructure (RPKI), and in particular Route Origin Authorization (ROA) objects.</p><p>A ROA is a cryptographically-signed document that specifies which Autonomous System Number (ASN) is authorized to originate your IP prefix. You can think of a ROA as the digital equivalent of a certified, signed, and notarised contract from the owner of the prefix.</p><p>Relying parties can validate the signatures in a ROA using the RPKI.You simply create a ROA that specifies Cloudflare's ASN (AS13335) as an authorized originator and arrange for it to be signed. Many of our customers used hosted RPKI systems available through RIR portals for this. When our systems detect this signed authorization, your routing intention is instantly confirmed. </p><p>Many other companies that support BYOIP require a complex workflow involving creating self-signed certificates and manually modifying RDAP (Registration Data Access Protocol) records—a heavy administrative lift. By embracing a choice of IRR object modification and Reverse DNS TXT records, combined with RPKI, we offer a verification process that is much more familiar and straightforward for existing network operators.</p>
    <div>
      <h3>The global reach guarantee</h3>
      <a href="#the-global-reach-guarantee">
        
      </a>
    </div>
    <p>While the new self-serve flow ditches the need for the "dinosaur relic" that is the LOA, many network operators around the world still rely on it as part of the process of accepting prefixes from other networks.</p><p>To help ensure your prefix is accepted by adjacent networks globally, Cloudflare automatically generates a document on your behalf to be distributed in place of a LOA. This document provides information about the checks that we have carried out to confirm that we are authorised to originate the customer prefix, and confirms the presence of valid ROAs to authorise our origination of it. In this way we are able to support the workflows of network operators we connect to who rely upon LOAs, without our customers having the burden of generating them.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1GimIe80gJn5PrRUGkEMpF/130d2590e45088d58ac62ab2240f4d5c/image1.png" />
          </figure>
    <div>
      <h2>Staying away from black holes</h2>
      <a href="#staying-away-from-black-holes">
        
      </a>
    </div>
    <p>One concern in designing the Self-Serve API is the trade-off between giving customers flexibility while implementing the necessary safeguards so that an IP prefix is never advertised without a matching service binding. If this were to happen, Cloudflare would be advertising a prefix with no idea on what to do with the traffic when we receive it! We call this “blackholing” traffic. To handle this, we introduced the requirement of a default service binding — i.e. a service binding that spans the entire range of the IP prefix onboarded. </p><p>A customer can later layer different service bindings on top of their default service binding via <a href="https://blog.cloudflare.com/your-ips-your-rules-enabling-more-efficient-address-space-usage/"><u>multiple service bindings</u></a>, like putting CDN on top of a default Spectrum service binding. This way, a prefix can never be advertised without a service binding and blackhole our customers’ traffic.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/20QAM5GITJ5m5kYkNlh701/82812d202ffa7b9a4e46838aa6c04937/image2.png" />
          </figure>
    <div>
      <h2>Getting started</h2>
      <a href="#getting-started">
        
      </a>
    </div>
    <p>Check out our <a href="https://developers.cloudflare.com/byoip/get-started/"><u>developer docs</u></a> on the most up-to-date documentation on how to onboard, advertise, and add services to your IP prefixes via our API. Remember that onboardings can be complex, and don’t hesitate to ask questions or reach out to our <a href="https://www.cloudflare.com/professional-services/"><u>professional services</u></a> team if you’d like us to do it for you.</p>
    <div>
      <h2>The future of network control</h2>
      <a href="#the-future-of-network-control">
        
      </a>
    </div>
    <p>The ability to script and integrate BYOIP management into existing workflows is a game-changer for modern network operations, and we’re only just getting started. In the months ahead, look for self-serve BYOIP in the dashboard, as well as self-serve BYOIP offboarding to give customers even more control.</p><p>Cloudflare's self-serve BYOIP API onboarding empowers customers with unprecedented control and flexibility over their IP assets. This move to automate onboarding empowers a stronger security posture, moving away from manually-reviewed PDFs and driving <a href="https://rpki.cloudflare.com/"><u>RPKI adoption</u></a>. By using these API calls, organizations can automate complex network tasks, streamline migrations, and build more resilient and agile network infrastructures.</p> ]]></content:encoded>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Addressing]]></category>
            <category><![CDATA[BYOIP]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[IPv6]]></category>
            <category><![CDATA[Spectrum]]></category>
            <category><![CDATA[CDN]]></category>
            <category><![CDATA[Magic Transit]]></category>
            <category><![CDATA[Egress]]></category>
            <category><![CDATA[Cloudflare Gateway]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Aegis]]></category>
            <category><![CDATA[Smart Shield]]></category>
            <guid isPermaLink="false">4usaEaUwShJ04VKzlMV0V9</guid>
            <dc:creator>Ash Pallarito</dc:creator>
            <dc:creator>Lynsey Haynes</dc:creator>
            <dc:creator>Gokul Unnikrishnan</dc:creator>
        </item>
        <item>
            <title><![CDATA[Your IPs, your rules: enabling more efficient address space usage]]></title>
            <link>https://blog.cloudflare.com/your-ips-your-rules-enabling-more-efficient-address-space-usage/</link>
            <pubDate>Mon, 19 May 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ IPv4 is expensive, and moving network resources around is hard. Previously, when customers wanted to use multiple Cloudflare services, they had to bring a new address range. ]]></description>
            <content:encoded><![CDATA[ <p>IPv4 addresses have become a costly commodity, driven by their growing scarcity. With the original pool of 4.3 billion addresses long exhausted, organizations must now rely on the secondary market to acquire them. Over the years, prices have surged, often exceeding $30–$50 USD per address, with <a href="https://auctions.ipv4.global/?cf_history_state=%7B%22guid%22%3A%22C255D9FF78CD46CDA4F76812EA68C350%22%2C%22historyId%22%3A6%2C%22targetId%22%3A%22B695D806845101070936062659E97ADD%22%7D"><u>costs</u></a> varying based on block size and demand. Given the scarcity, these prices are only going to rise, particularly for businesses that haven’t transitioned to <a href="https://blog.cloudflare.com/amazon-2bn-ipv4-tax-how-avoid-paying/"><u>IPv6</u></a>. This rising cost and limited availability have made efficient IP address management more critical than ever. In response, we’ve evolved how we handle BYOIP (<a href="https://blog.cloudflare.com/bringing-your-own-ips-to-cloudflare-byoip/"><u>Bring Your Own IP</u></a>) prefixes to give customers greater flexibility.</p><p>Historically, when customers onboarded a BYOIP prefix, they were required to assign it to a single service, binding all IP addresses within that prefix to one service before it was advertised. Once set, the prefix's destination was fixed — to direct traffic exclusively to that service. If a customer wanted to use a different service, they had to onboard a new prefix or go through the cumbersome process of offboarding and re-onboarding the existing one.</p><p>As a step towards addressing this limitation, we’ve introduced a new level of flexibility: customers can now use parts of any prefix — whether it’s bound to Cloudflare CDN, Spectrum, or Magic Transit — for additional use with CDN or Spectrum. This enhancement provides much-needed flexibility, enabling businesses to optimize their IP address usage while keeping costs under control. </p>
    <div>
      <h2>The challenges of moving onboarded BYOIP prefixes between services</h2>
      <a href="#the-challenges-of-moving-onboarded-byoip-prefixes-between-services">
        
      </a>
    </div>
    <p>Migrating BYOIP prefixes dynamically between Cloudflare services is no trivial task, especially with thousands of servers capable of accepting and processing connections. The problem required overcoming several technical challenges related to IP address management, kernel-level bindings, and orchestration. </p>
    <div>
      <h3>Dynamic reallocation of prefixes across services</h3>
      <a href="#dynamic-reallocation-of-prefixes-across-services">
        
      </a>
    </div>
    <p>When configuring an IP prefix for a service, we need to update IP address lists and firewall rules on each of our servers to allow only the traffic we expect for that service, such as opening ports 80 and 443 to allow HTTP and HTTPS traffic for the Cloudflare CDN. We use Linux <a href="https://en.wikipedia.org/wiki/Iptables#:~:text=iptables%20is%20a%20user%2Dspace,to%20treat%20network%20traffic%20packets."><u>iptables</u></a> and <a href="https://en.wikipedia.org/wiki/Iptables"><u>IP sets</u></a> for this.</p><p>Migrating IP prefixes to a different service involves dynamically reassigning them to different IP sets and iptable rules. This requires automated updates across a large-scale distributed environment.</p><p>As prefixes shift between services, it is critical that servers update their IP sets and iptable rules dynamically to ensure traffic is correctly routed. Failure to do so could lead to routing loops or dropped connections.  </p>
    <div>
      <h3>Updating Tubular – an eBPF-based IP and port binding service</h3>
      <a href="#updating-tubular-an-ebpf-based-ip-and-port-binding-service">
        
      </a>
    </div>
    <p>Most web applications bind to a list of IP addresses at startup, and listen on only those IPs until shutdown. To allow customers to change the IPs bound to each service dynamically, we needed a way to add and remove IPs from a running service, without restarting it. <a href="https://blog.cloudflare.com/tubular-fixing-the-socket-api-with-ebpf/"><u>Tubular</u></a> is a <a href="https://blog.cloudflare.com/cloudflare-architecture-and-how-bpf-eats-the-world/"><u>BPF</u></a> program we wrote that runs on Cloudflare servers that allows services to listen on a single socket, dynamically updating the list of addresses that are routed to that socket over the lifetime of the service, without requiring it to restart when those addresses change.</p><p>A significant engineering challenge was extending Tubular to support traffic destined for Cloudflare’s CDN.  Without this enhancement, customers would be unable to leverage dynamic reassignment to bind prefixes onboarded through Spectrum to the Cloudflare CDN, limiting flexibility across services.</p><p>Cloudflare’s CDN depends on each server running an NGINX ingress proxy to terminate incoming connections. Due to the <a href="https://blog.cloudflare.com/how-we-built-pingora-the-proxy-that-connects-cloudflare-to-the-internet/"><u>scale and performance limitations of NGINX</u></a>, we are actively working to replace it by 2026. In the interim, however, we still depend on the current ingress proxy to reliably handle incoming connections.</p><p>One limitation is that this ingress proxy does not support <a href="https://systemd.io/"><u>systemd</u></a> socket activation, a mechanism Tubular relies on to integrate with other Cloudflare services on each server. For services that do support systemd socket activation, systemd independently starts the sockets for the owning service and passes them to Tubular, allowing Tubular to easily detect and route traffic to the correct terminating service.</p><p>Since this integration model is not feasible, an alternative solution was required. This was addressed by introducing a shared Unix domain socket between Tubular and the ingress proxy service on each server. Through this channel,  the ingress proxy service explicitly transmits socket information to Tubular, enabling it to correctly register the sockets in its datapath.</p><p>The final challenge was deploying the Tubular-ingress proxy integration across the fleet of servers without disrupting active connections. As of April 2025, Cloudflare handles an average of 71 million HTTP requests per second, peaking at 100 million. To safely deploy at this scale, the necessary Tubular and ingress proxy configuration changes were staged across all Cloudflare servers without disrupting existing connections. The final step involved adding bindings — IP addresses and ports corresponding to Cloudflare CDN prefixes — to the Tubular configuration. These bindings direct connections through Tubular via the Unix sockets registered during the previous integration step. To minimize risk, bindings were gradually enabled in a controlled rollout across the global fleet.</p>
    <div>
      <h4>Tubular data plane in action</h4>
      <a href="#tubular-data-plane-in-action">
        
      </a>
    </div>
    <p>This high-level representation of the Tubular data plane binds together the Layer 4 protocol (TCP), prefix (192.0.2.0/24 - which is 254 usable IP addresses), and port number 0 (any port). When incoming packets match this combination, they are directed to the correct socket of the service — in this case, Spectrum.​</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5yQpYeTxPM7B8DZwLsQATs/3f488c5b37ef2358eacf779a42ac59d5/image4.png" />
          </figure><p>In the following example, TCP 192.0.2.200/32 has been upgraded to the Cloudflare CDN via the edge <a href="https://developers.cloudflare.com/api/resources/addressing/subresources/prefixes/subresources/service_bindings/"><u>Service Bindings API</u></a>. Tubular dynamically consumes this information, adding a new entry to its data plane bindings and socket table. Using Longest Prefix Match, all packets within the 192.0.2.0/24 range port 0 will be routed to Spectrum, except for 192.0.2.200/32 port 443, which will be directed to the Cloudflare CDN.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6wWlR9gWb6JEoyZm4iOpgQ/4a59bcab4a6731a53ea235500596c7f5/image1.png" />
          </figure>
    <div>
      <h4>Coordination and orchestration at scale </h4>
      <a href="#coordination-and-orchestration-at-scale">
        
      </a>
    </div>
    <p>Our goal is to achieve a quick transition of IP address prefixes between services when initiated by customers, which requires a high level of coordination. We need to ensure that changes propagate correctly across all servers to maintain stability. Currently, when a customer migrates a prefix between services, there is a 4-6 hour window of uncertainty where incoming packets may be dropped due to a lack of guaranteed routing. To address this, we are actively implementing systems that will reduce this transition time from hours to just a matter of minutes, significantly improving reliability and minimizing disruptions.</p>
    <div>
      <h2>Smarter IP address management</h2>
      <a href="#smarter-ip-address-management">
        
      </a>
    </div>
    <p>Service Bindings are mappings that control whether traffic destined for a given IP address is routed to Magic Transit, the CDN pipeline, or the Spectrum pipeline.</p><p>Consider the example in the diagram below. One of our customers, a global finance infrastructure platform, is using BYOIP and has a /24 range bound to <a href="https://developers.cloudflare.com/spectrum/"><u>Spectrum</u></a> for DDoS protection of their TCP and UDP traffic. However, they are only using a few addresses in that range for their Spectrum applications, while the rest go unused. In addition, the customer is using Cloudflare’s CDN for their Layer 7 traffic and wants to set up <a href="https://developers.cloudflare.com/byoip/concepts/static-ips/"><u>Static IPs</u></a>, so that their customers can allowlist a consistent set of IP addresses owned and controlled by their own network infrastructure team. Instead of using up another block of address space, they asked us whether they could carve out those unused sub-ranges of the /24 prefix.</p><p>From there, we set out to determine how to selectively map sub-ranges of the onboarded prefix to different services using service bindings:</p><ul><li><p>192.0.2.0/24 is already bound to <b>Spectrum</b></p><ul><li><p>192.0.2.0/25 is updated and bound to <b>CDN</b></p></li><li><p>192.0.2.200/32 is also updated bound to <b>CDN</b></p></li></ul></li></ul><p>Both the /25 and /32 are sub-ranges within the /24 prefix and will receive traffic directed to the CDN. All remaining IP addresses within the /24 prefix, unless explicitly bound, will continue to use the default Spectrum service binding.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/uwhMHBEuI1NHfp9qD9IFM/d2dcea59a8d9f962f03389831fd73851/image3.png" />
          </figure><p>As you can see in this example, this approach provides customers with greater control and agility over how their IP address space is allocated. Instead of rigidly assigning an entire prefix to a single service, users can now tailor their IP address usage to match specific workloads or deployment needs. Setting this up is straightforward — all it takes is a few HTTP requests to the <a href="https://developers.cloudflare.com/api/resources/addressing/subresources/prefixes/subresources/service_bindings/"><u>Cloudflare API</u></a>. You can define service bindings by specifying which IP addresses or subnets should be routed to CDN, Spectrum, or Magic Transit. This allows you to tailor traffic routing to match your architecture without needing to restructure your entire IP address allocation. The process remains consistent whether you're configuring a single IP address or splitting up larger subnets, making it easy to apply across different parts of your network. The foundational technical work addressing the underlying architectural challenges outlined above made it possible to streamline what could have been a complex setup into a straightforward series of API interactions.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>We envision a future where customers have granular control over how their traffic moves through Cloudflare’s global network, not just by service, but down to the port level. A single prefix could simultaneously power web applications on CDN, protect infrastructure through Magic Transit, and much more. This isn't just flexible routing, but programmable traffic orchestration across different services. What was once rigid and static becomes dynamic and fully programmable to meet each customer’s unique needs. </p><p>If you are an existing BYOIP customer using Magic Transit, CDN, or Spectrum, check out our <a href="https://developers.cloudflare.com/byoip/service-bindings/magic-transit-with-cdn/"><u>configuration guide here</u></a>. If you are interested in bringing your own IP address space and using multiple Cloudflare services on it, please reach out to your account team to enable setting up this configuration via <a href="https://developers.cloudflare.com/api/resources/addressing/subresources/prefixes/subresources/service_bindings/"><u>API</u></a> or reach out to sales@cloudflare.com if you’re new to Cloudflare.</p> ]]></content:encoded>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Addressing]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[BYOIP]]></category>
            <category><![CDATA[Spectrum]]></category>
            <category><![CDATA[CDN]]></category>
            <category><![CDATA[Magic Transit]]></category>
            <guid isPermaLink="false">7FAYMppkyZG4CEGdLEcLlR</guid>
            <dc:creator>Mark Rodgers</dc:creator>
            <dc:creator>Sphoorti Metri</dc:creator>
            <dc:creator>Ash Pallarito</dc:creator>
        </item>
        <item>
            <title><![CDATA[HTTPS-only for Cloudflare APIs: shutting the door on cleartext traffic]]></title>
            <link>https://blog.cloudflare.com/https-only-for-cloudflare-apis-shutting-the-door-on-cleartext-traffic/</link>
            <pubDate>Thu, 20 Mar 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ We are closing the cleartext HTTP ports entirely for Cloudflare API traffic. This prevents the risk of clients unintentionally leaking their secret API keys in cleartext during the initial request.  ]]></description>
            <content:encoded><![CDATA[ <p>Connections made over cleartext HTTP ports risk exposing sensitive information because the data is transmitted unencrypted and can be intercepted by network intermediaries, such as ISPs, Wi-Fi hotspot providers, or malicious actors on the same network. It’s common for servers to either <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections"><u>redirect</u></a> or return a <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403"><u>403 (Forbidden)</u></a> response to close the HTTP connection and enforce the use of HTTPS by clients. However, by the time this occurs, it may be too late, because sensitive information, such as an API token, may have already been <a href="https://jviide.iki.fi/http-redirects"><u>transmitted in cleartext</u></a> in the initial client request. This data is exposed before the server has a chance to redirect the client or reject the connection.</p><p>A better approach is to refuse the underlying cleartext connection by closing the <a href="https://developers.cloudflare.com/fundamentals/reference/network-ports/"><u>network ports</u></a> used for plaintext HTTP, and that’s exactly what we’re going to do for our customers.</p><p><b>Today we’re announcing that we’re closing all of the </b><a href="https://developers.cloudflare.com/fundamentals/reference/network-ports/#network-ports-compatible-with-cloudflares-proxy:~:text=HTTP%20ports%20supported%20by%20Cloudflare"><b><u>HTTP ports</u></b></a><b> on api.cloudflare.com.</b> We’re also making changes so that api.cloudflare.com can change IP addresses dynamically, in line with on-going efforts to <a href="https://blog.cloudflare.com/addressing-agility/"><u>decouple names from IP addresses</u></a>, and reliably <a href="https://blog.cloudflare.com/topaz-policy-engine-design/"><u>managing</u></a> addresses in our authoritative DNS. This will enhance the agility and flexibility of our API endpoint management. Customers relying on static IP addresses for our API endpoints will be notified in advance to prevent any potential availability issues.</p><p>In addition to taking this first step to secure Cloudflare API traffic, we’ll release the ability for customers to opt-in to safely disabling all HTTP port traffic for their websites on Cloudflare. We expect to make this free security feature available in the last quarter of 2025.</p><p>We have <a href="https://blog.cloudflare.com/introducing-universal-ssl/"><u>consistently</u></a> <a href="https://blog.cloudflare.com/enforce-web-policy-with-hypertext-strict-transport-security-hsts/"><u>advocated</u></a> for <a href="https://blog.cloudflare.com/post-quantum-for-all/"><u>strong encryption standards</u></a> to safeguard users’ data and privacy online. As part of our ongoing commitment to enhancing Internet security, this blog post details our efforts to <i>enforce</i> HTTPS-only connections across our global network. </p>
    <div>
      <h3>Understanding the problem</h3>
      <a href="#understanding-the-problem">
        
      </a>
    </div>
    <p>We already provide an “<a href="https://developers.cloudflare.com/ssl/edge-certificates/additional-options/always-use-https/"><u>Always Use HTTPS</u></a>” setting that can be used to redirect all visitor traffic on our customers’ domains (and subdomains) from HTTP (plaintext) to HTTPS (encrypted). For instance, when a user clicks on an HTTP version of the URL on the site (http://www.example.com), we issue an HTTP 3XX redirection status code to immediately redirect the request to the corresponding HTTPS version (https://www.example.com) of the page. While this works well for most scenarios, there’s a subtle but important risk factor: What happens if the initial plaintext HTTP request (before the redirection) contains sensitive user information?</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2mXYZL0JRZOb8J6Tqm4mCj/1b24b76335ad9cf3f3b630ef31868f6c/1.png" />
          </figure><p><sup><i>Initial plaintext HTTP request is exposed to the network before the server can redirect to the secure HTTPS connection.</i></sup></p><p>Third parties or intermediaries on shared networks could intercept sensitive data from the first plaintext HTTP request, or even carry out a <a href="https://blog.cloudflare.com/monsters-in-the-middleboxes/"><u>Monster-in-the-Middle (MITM)</u></a> attack by impersonating the web server.</p><p>One may ask if <a href="https://developers.cloudflare.com/ssl/edge-certificates/additional-options/http-strict-transport-security/"><u>HTTP Strict Transport Security (HSTS)</u></a> would partially alleviate this concern by ensuring that, after the first request, visitors can only access the website over HTTPS without needing a redirect. While this does reduce the window of opportunity for an adversary, the first request still remains exposed. Additionally, HSTS is not applicable by default for most non-user-facing use cases, such as API traffic from stateless clients. Many API clients don’t retain browser-like state or remember HSTS headers they've encountered. It is quite <a href="https://jviide.iki.fi/http-redirects"><u>common practice</u></a> for API calls to be redirected from HTTP to HTTPS, and hence have their initial request exposed to the network.</p><p>Therefore, in line with our <a href="https://blog.cloudflare.com/dogfooding-from-home/"><u>culture of dogfooding</u></a>, we evaluated the accessibility of the Cloudflare API (<a href="https://api.cloudflare.com"><u>api.cloudflare.com</u></a>) over <a href="https://developers.cloudflare.com/fundamentals/reference/network-ports/#:~:text=ports%20listed%20below.-,HTTP,-ports%20supported%20by"><u>HTTP ports (80, and others)</u></a>. In that regard, imagine a client making an initial request to our API endpoint that includes their <i>secret API key</i>. While we outright reject all plaintext connections with a 403 Forbidden response instead of redirecting for API traffic — clearly indicating that “<i>Cloudflare API is only accessible over TLS”</i> — this rejection still happens at the <a href="https://www.cloudflare.com/learning/ddos/what-is-layer-7/">application layer</a>. By that point, the API key may have already been exposed over the network before we can even reject the request. We do have a notification mechanism in place to alert customers and rotate their API keys accordingly, but a stronger approach would be to eliminate the exposure entirely. We have an opportunity to improve!</p>
    <div>
      <h3>A better approach to API security</h3>
      <a href="#a-better-approach-to-api-security">
        
      </a>
    </div>
    <p>Any API key or token exposed in plaintext on the public Internet should be considered compromised. We can either address exposure after it occurs or prevent it entirely. The reactive approach involves continuously tracking and revoking compromised credentials, requiring active management to rotate each one. For example, when a plaintext HTTP request is made to our API endpoints, we detect exposed tokens by scanning for 'Authorization' header values.</p><p>In contrast, a preventive approach is stronger and more effective, stopping exposure before it happens. Instead of relying on the API service application to react after receiving potentially sensitive cleartext data, we can preemptively refuse the underlying connection at the <a href="https://www.cloudflare.com/learning/ddos/glossary/open-systems-interconnection-model-osi/"><u>transport layer</u></a>, before any HTTP or application-layer data is exchanged. The <i>preventative </i>approach can be achieved by closing all plaintext HTTP ports for API traffic on our global network. The added benefit is that this is operationally much simpler: by eliminating cleartext traffic, there's no need for key rotation.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1PYo3GMZjQ4LbfUHXNOVpj/2341da1d926e077624563358bd5034ef/2.png" />
          </figure><p><sup><i>The transport layer carries the application layer data on top.</i></sup></p><p>To explain why this works: an application-layer request requires an underlying transport connection, like TCP or QUIC, to be established first. The combination of a port number and an IP address serves as a transport layer identifier for creating the underlying transport channel. Ports direct network traffic to the correct application-layer process — for example, port 80 is designated for plaintext HTTP, while port 443 is used for encrypted HTTPS. By disabling the HTTP cleartext server-side port, we prevent that transport channel from being established during the initial "<a href="https://www.cloudflare.com/learning/ddos/glossary/tcp-ip/"><u>handshake</u></a>" phase of the connection — before any application data, such as a secret API key, leaves the client’s machine.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/13fKw8cHkHlsLOzlXJYenr/c9156f67ae99cfdc74dc5917ebc1e5bb/3.png" />
          </figure><p><sup><i>Both TCP and QUIC transport layer handshakes are a pre-requisite for HTTPS application data exchange on the web.</i></sup></p><p>Therefore, closing the HTTP interface entirely for API traffic gives a strong and visible <b>fast-failure</b> signal to developers that might be mistakenly accessing <code>http://… </code>instead of <code>https://…</code> with their secret API keys in the first request — a simple one-letter omission, but one with serious implications.</p><p>In theory, this is a simple change, but at Cloudflare’s global scale, implementing it required careful planning and execution. We’d like to share the steps we took to make this transition.</p>
    <div>
      <h3>Understanding the scope</h3>
      <a href="#understanding-the-scope">
        
      </a>
    </div>
    <p>In an ideal scenario, we could simply close all cleartext HTTP ports on our network. However, two key challenges prevent this. First, as shown in the <a href="https://radar.cloudflare.com/adoption-and-usage#http-vs-https"><u>Cloudflare Radar</u></a> figure below, about 2-3% of requests from “likely human” clients to our global network are over plaintext HTTP. While modern browsers prominently warn users about insecure HTTP connections and <a href="https://support.mozilla.org/en-US/kb/https-only-prefs"><u>offer features to silently upgrade to HTTPS</u></a>, this protection doesn't extend to the broader ecosystem of connected devices. IoT devices with limited processing power, automated API clients, or legacy software stacks often lack such safeguards entirely. In fact, when filtering on plaintext HTTP traffic that is “likely automated”, the share <a href="https://radar.cloudflare.com/explorer?dataSet=http&amp;groupBy=http_protocol&amp;filters=botClass%253DLikely_Automated"><u>rises to over 16%</u></a>! We continue to see a wide variety of legacy clients accessing resources over plaintext connections. This trend is not confined to specific networks, but is observable globally.</p><p>Closing HTTP ports, like port 80, across our entire IP address space would block such clients entirely, causing a major disruption in services. While we plan to cautiously start by implementing the change on Cloudflare's API IP addresses, it’s not enough. Therefore, our goal is to ensure all of our customers’ API traffic benefits from this change as well.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1OfUjwkP9iMdjymjtJX7tL/4cd278faf71f610c43239cc41d8f6fba/4.png" />
          </figure><p><sup><i>Breakdown of HTTP and HTTPS for ‘human’ connections</i></sup></p><p>The second challenge relates to limitations posed by the longstanding <a href="https://en.wikipedia.org/wiki/Berkeley_sockets"><u>BSD Sockets API</u></a> at the server-side, which we have addressed using <a href="https://blog.cloudflare.com/tubular-fixing-the-socket-api-with-ebpf/"><u>Tubular</u></a>, a tool that inspects every connection terminated by a server and decides which application should receive it. Operators historically have faced a challenging dilemma: either listen to the same ports across many IP addresses using a single socket (scalable but inflexible), or maintain individual sockets for each IP address (flexible but unscalable). Luckily, Tubular has allowed us to <a href="https://blog.cloudflare.com/its-crowded-in-here/"><u>resolve this using 'bindings'</u></a>, which decouples sockets from specific IP:port pairs. This creates efficient pathways for managing endpoints throughout our systems at scale, enabling us to handle both HTTP and HTTPS traffic intelligently without the traditional limitations of socket architecture.</p><p>Step 0, then, is about provisioning both IPv4 and IPv6 address space on our network that by default has all HTTP ports closed. Tubular enables us to configure and manage these IP addresses differently than others for our endpoints. Additionally, <a href="https://blog.cloudflare.com/addressing-agility/"><u>Addressing Agility</u></a> and <a href="https://blog.cloudflare.com/topaz-policy-engine-design/"><u>Topaz</u></a> enable us to assign these addresses dynamically, and safely, for opted-in domains.</p>
    <div>
      <h3>Moving from strategy to execution</h3>
      <a href="#moving-from-strategy-to-execution">
        
      </a>
    </div>
    <p>In the past, our legacy stack would have made this transition challenging, but today’s Cloudflare possesses the appropriate tools to deliver a scalable solution, rather than addressing it on a domain-by-domain basis.</p><p>Using Tubular, we were able to bind our new set of anycast IP prefixes to our TLS-terminating proxies across the globe. To ensure that no plaintext HTTP traffic is served on these IP addresses, we extended our global <a href="https://en.wikipedia.org/wiki/Iptables"><u>iptables</u></a> firewall configuration to reject any inbound packets on HTTP ports.</p>
            <pre><code>iptables -A INPUT -p tcp -d &lt;IP_ADDRESS_BLOCK&gt; --dport &lt;HTTP_PORT&gt; -j REJECT 
--reject-with tcp-reset

iptables -A INPUT -p udp -d &lt;IP_ADDRESS_BLOCK&gt; --dport &lt;HTTP_PORT&gt; -j REJECT 
--reject-with icmp-port-unreachable</code></pre>
            <p>As a result, any connections to these IP addresses on HTTP ports are filtered and rejected at the transport layer, eliminating the need for state management at the application layer by our web proxies.</p><p>The next logical step is to update the <a href="https://www.cloudflare.com/learning/dns/what-is-dns/"><u>DNS assignments</u></a> so that API traffic is routed over the <i>correct</i> IP addresses. In our case, we encoded a new DNS policy for API traffic for the HTTPS-only interface as a declarative <a href="https://blog.cloudflare.com/topaz-policy-engine-design/"><u>Topaz program</u></a> in our authoritative DNS server:</p>
            <pre><code>- name: https_only
 exclusive: true 
 config: |
    (config
      ([traffic_class "API"]
       [ipv4 (ipv4_address “192.0.2.1”)] # Example IPv4 address
       [ipv6 (ipv6_address “2001:DB8::1:1”)] # Example IPv6 address
       [t (ttl 300]))
  match: |
    (= query_domain_class traffic_class)
  response: |
    (response (list ipv4) (list ipv6) t)</code></pre>
            <p>The above policy encodes that for any DNS query targeting the ‘API traffic’ class, we return the respective HTTPS-only interface IP addresses. Topaz’s safety guarantees ensure <i>exclusivity</i>, preventing other DNS policies from inadvertently matching the same queries and misrouting plaintext HTTP expected domains to HTTPS-only IPs

api.cloudflare.com is the first domain to be added to our HTTPS-only API traffic class, with other applicable endpoints to follow.</p>
    <div>
      <h3>Opting-in your API endpoints</h3>
      <a href="#opting-in-your-api-endpoints">
        
      </a>
    </div>
    <p>As we said above, we've started with api.cloudflare.com and our internal API endpoints to thoroughly monitor any side effects on our own systems before extending this feature to customer domains. We have deployed these changes gradually across all data centers, leveraging Topaz’s flexibility to target subsets of traffic, minimizing disruptions, and ensuring a smooth transition.</p><p>To monitor unencrypted connections for your domains, before blocking access using the feature, you can review the relevant analytics on the Cloudflare dashboard. Log in, select your account and domain, and navigate to the "<a href="https://developers.cloudflare.com/analytics/types-of-analytics/#account-analytics-beta"><u>Analytics &amp; Logs</u></a>" section. There, under the "<i>Traffic Served Over SSL</i>" subsection, you will find a breakdown of encrypted and unencrypted traffic for your site. That data can help provide a baseline for assessing the volume of plaintext HTTP connections for your site that will be blocked when you opt in. After opting in, you would expect no traffic for your site will be served over plaintext HTTP, and therefore that number should go down to zero.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4YjvOU3XQqj1Y2Kfv2jIL3/97178a99d17f8938bc3ec53704bbc4b8/5.png" />
          </figure><p><i>Snapshot of ‘Traffic Served Over SSL’ section on Cloudflare dashboard</i></p><p>Towards the last quarter of 2025, we will provide customers the ability to opt in their domains using the dashboard or API (similar to enabling the Always Use HTTPS feature). Stay tuned!</p>
    <div>
      <h3>Wrapping up</h3>
      <a href="#wrapping-up">
        
      </a>
    </div>
    <p>Starting today, any unencrypted connection to api.cloudflare.com will be completely rejected. Developers should <b>not</b> expect a 403 Forbidden response any longer for HTTP connections, as we will prevent the underlying connection to be established by closing the HTTP interface entirely. Only secure HTTPS connections will be allowed to be established.</p><p>We are also making updates to transition api.cloudflare.com away from its static IP addresses in the future. As part of that change, we will be discontinuing support for <a href="https://developers.cloudflare.com/ssl/reference/browser-compatibility/#non-sni-support"><u>non-SNI</u></a> legacy clients for Cloudflare API specifically — currently, an average of just 0.55% of TLS connections to the Cloudflare API do not include an <a href="https://www.cloudflare.com/learning/ssl/what-is-sni/">SNI</a> value. These non-SNI connections are initiated by a small number of accounts. We are committed to coordinating this transition and will work closely with the affected customers before implementing the change. This initiative aligns with our goal of enhancing the agility and reliability of our API endpoints.</p><p>Beyond the Cloudflare API use case, we're also exploring other areas where it's safe to close plaintext traffic ports. While the long tail of unencrypted traffic may persist for a while, it shouldn’t be forced on every site.

In the meantime, a small step like this can allow us to have a big impact in helping make a better Internet, and we are working hard to reliably bring this feature to your domains. We believe security should be free for all!</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[HTTPS]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Addressing]]></category>
            <category><![CDATA[Research]]></category>
            <guid isPermaLink="false">RqjV9vQoPNX8txlmrju6d</guid>
            <dc:creator>Suleman Ahmad</dc:creator>
            <dc:creator>Ash Pallarito</dc:creator>
            <dc:creator>Algin Martin</dc:creator>
        </item>
        <item>
            <title><![CDATA[How we prevent conflicts in authoritative DNS configuration using formal verification]]></title>
            <link>https://blog.cloudflare.com/topaz-policy-engine-design/</link>
            <pubDate>Fri, 08 Nov 2024 14:00:00 GMT</pubDate>
            <description><![CDATA[ We describe how Cloudflare uses a custom Lisp-like programming language and formal verifier (written in Racket and Rosette) to prevent logical contradictions in our authoritative DNS nameserver’s behavior. ]]></description>
            <content:encoded><![CDATA[ <p>Over the last year, Cloudflare has begun formally verifying the correctness of our internal DNS addressing behavior — the logic that determines which IP address a DNS query receives when it hits our authoritative nameserver. This means that for every possible DNS query for a <a href="https://developers.cloudflare.com/dns/manage-dns-records/reference/proxied-dns-records/"><u>proxied</u></a> domain we could receive, we try to mathematically prove properties about our DNS addressing behavior, even when different systems (owned by different teams) at Cloudflare have contradictory views on which IP addresses should be returned.</p><p>To achieve this, we formally verify the programs — written in a custom <a href="https://en.wikipedia.org/wiki/Lisp_(programming_language)"><u>Lisp</u></a>-like programming language — that our nameserver executes when it receives a DNS query. These programs determine which IP addresses to return. Whenever an engineer changes one of these programs, we run all the programs through our custom model checker (written in <a href="https://racket-lang.org/"><u>Racket</u></a> + <a href="https://emina.github.io/rosette/"><u>Rosette</u></a>) to check for certain bugs (e.g., one program overshadowing another) before the programs are deployed.</p><p>Our formal verifier runs in production today, and is part of a larger addressing system called Topaz. In fact, it’s likely you’ve made a DNS query today that triggered a formally verified Topaz program.</p><p>This post is a technical description of how Topaz’s formal verification works. Besides being a valuable tool for Cloudflare engineers, Topaz is a real-world example of <a href="https://en.wikipedia.org/wiki/Formal_verification"><u>formal verification</u></a> applied to networked systems. We hope it inspires other network operators to incorporate formal methods, where appropriate, to help make the Internet more reliable for all.</p><p>Topaz’s full technical details have been peer-reviewed and published in <a href="https://conferences.sigcomm.org/sigcomm/2024/"><u>ACM SIGCOMM 2024</u></a>, with both a <a href="https://research.cloudflare.com/publications/Larisch2024/"><u>paper</u></a> and short <a href="https://www.youtube.com/watch?v=hW7RjXVx7_Q"><u>video</u></a> available online. </p>
    <div>
      <h2>Addressing: how IP addresses are chosen</h2>
      <a href="#addressing-how-ip-addresses-are-chosen">
        
      </a>
    </div>
    <p>When a DNS query for a customer’s proxied domain hits Cloudflare’s nameserver, the nameserver returns an IP address — but how does it decide which address to return?</p><p>Let’s make this more concrete. When a customer, say <code>example.com</code>, signs up for Cloudflare and <a href="https://developers.cloudflare.com/dns/manage-dns-records/reference/proxied-dns-records/"><u>proxies</u></a> their traffic through Cloudflare, it makes Cloudflare’s nameserver <i>authoritative</i> for their domain, which means our nameserver has the <i>authority </i>to respond to DNS queries for <code>example.com</code>. Later, when a client makes a DNS query for <code>example.com</code>, the client’s recursive DNS resolver (for example, <a href="https://www.cloudflare.com/learning/dns/what-is-1.1.1.1/"><u>1.1.1.1</u></a>) queries our nameserver for the authoritative response. Our nameserver returns <b><i>some</i></b><i> </i>Cloudflare IP address (of our choosing) to the resolver, which forwards that address to the client. The client then uses the IP address to connect to Cloudflare’s network, which is a global <a href="https://www.cloudflare.com/en-gb/learning/cdn/glossary/anycast-network/"><u>anycast</u></a> network — every data center advertises all of our addresses.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/72EmlVrMTMBMhxrhZ50YI9/54e08160ea98c55bc8e2703d7c85927b/image3.png" />
          </figure><p><sup>Clients query Cloudflare’s nameserver (via their resolver) for customer domains. The nameserver returns Cloudflare IP addresses, advertised by our entire global network, which the client uses to connect to the customer domain. Cloudflare may then connect to the origin server to fulfill the user’s HTTPS request.</sup></p><p>When the customer has <a href="https://developers.cloudflare.com/byoip/"><u>configured a static IP address</u></a> for their domain, our nameserver’s choice of IP address is simple: it simply returns that static address in response to queries made for that domain.</p><p>But for all other customer domains, our nameserver could respond with virtually any IP address that we own and operate. We may return the <i>same</i> address in response to queries for <i>different</i> domains, or <i>different</i> addresses in response to different queries for the <i>same</i> domain. We do this for resilience, but also because decoupling names and IP addresses <a href="https://blog.cloudflare.com/addressing-agility"><u>improves flexibility</u></a>.</p><p>With all that in mind, let’s return to our initial question: given a query for a proxied domain without a static IP, which IP address should be returned? The answer: <b>Cloudflare chooses IP addresses to meet various business objectives. </b>For instance, we may choose IPs to:</p><ul><li><p>Change the IP address of a domain that is under attack.</p></li><li><p>Direct fractions of traffic to specific IP addresses to test new features or services.</p></li><li><p><a href="https://blog.cloudflare.com/cloudflare-incident-on-september-17-2024/"><u>Remap or “renumber”</u></a> domain names to new IP address space.</p></li></ul>
    <div>
      <h2>Topaz executes DNS objectives</h2>
      <a href="#topaz-executes-dns-objectives">
        
      </a>
    </div>
    <p>To change authoritative nameserver behavior — how we choose IPs —  a Cloudflare engineer encodes their desired DNS business objective as a declarative Topaz program. Our nameserver stores the list of all such programs such that when it receives a DNS query for a proxied domain, it executes the list of programs in sequence until one returns an IP address. It then returns that IP to the resolver.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3gyUw7j0GlTXj0Vm637aCW/9aa353d512151d5878998e199a538973/image1.png" />
          </figure><p><sup>Topaz receives DNS queries (metadata included) for proxied domains from Cloudflare’s nameserver. It executes a list of policies in sequence until a match is found. It returns the resulting IP address to the nameserver, which forwards it to the resolver.</sup></p><p>What do these programs look like?</p><p>Each Topaz program has three primary components:</p><ol><li><p><b>Match function: </b>A program’s match function specifies under which circumstances the program should execute. It takes as input DNS query metadata (e.g., datacenter information, account information) and outputs a boolean. If, given a DNS query, the match function returns <i>true</i>, the program’s response function is executed.</p></li><li><p><b>Response function</b>: A program’s response function specifies <i>which</i> IP addresses should be chosen. It also takes as input all the DNS query metadata, but outputs a 3-tuple (IPv4 addresses, IPv6 addresses, and TTL). When a program’s match function returns true, its corresponding response function is executed. The resulting IP addresses and TTL are returned to the resolver that made the query. </p></li><li><p><b>Configuration</b>: A program’s configuration is a set of variables that parameterize that program’s match and response function. The match and response functions reference variables in the corresponding configuration, thereby separating the macro-level behavior of a program (match/response functions) from its nitty-gritty details (specific IP addresses, names, etc.). This separation makes it easier to understand how a Topaz program behaves at a glance, without getting bogged down by specific function parameters.</p></li></ol><p>Let’s walk through an example Topaz program. The goal of this program is to give all queried domains whose metadata field “tag1” is equal to “orange” a particular IP address. The program looks like this:</p>
            <pre><code>- name: orange
  config: |
    (config
      ([desired_tag1 "orange"]
       [ipv4 (ipv4_address “192.0.2.3”)]
       [ipv6 (ipv6_address “2001:DB8:1:3”)]
       [t (ttl 300]))
  match: |
    (= query_domain_tag1 desired_tag1) 
  response: |
    (response (list ipv4) (list ipv6) t)</code></pre>
            <p>Before we walk through the program, note that the program’s configuration, match, and response function are YAML strings, but more specifically they are topaz-lang expressions. Topaz-lang is the <a href="https://en.wikipedia.org/wiki/Domain-specific_language"><u>domain-specific language (DSL)</u></a> we created specifically for expressing Topaz programs. It is based on <a href="https://www.scheme.org/"><u>Scheme</u></a>, but is much simpler. It is dynamically typed, it is not <a href="https://en.wikipedia.org/wiki/Turing_completeness"><u>Turing complete</u></a>, and every expression evaluates to exactly one value (though functions can throw errors). Operators cannot define functions within topaz-lang, they can only add new DSL functions by writing functions in the host language (Go). The DSL provides basic types (numbers, lists, maps) but also Topaz-specific types, like IPv4/IPv6 addresses and TTLs.</p><p>Let’s now examine this program in detail. </p><ul><li><p>The <code>config</code> is a set of four <i>bindings</i> from name to value. The first binds the string <code>”orange”</code> to the name <code>desired_tag1</code>. The second binds the IPv4 address <code>192.0.2.3</code> to the name <code>ipv4</code>. The third binds the IPv6 address <code>2001:DB8:1:3</code> to the name <code>ipv6</code>. And the fourth binds the TTL (for which we added a topaz-lang type) <code>300</code> (seconds) to the name <code>t</code>.</p></li><li><p>The <code>match</code> function is an expression that <i>must</i> evaluate to a boolean. It can reference configuration values (e.g., <code>desired_tag1</code>), and can also reference DNS query fields. All DNS query fields use the prefix <code>query_</code> and are brought into scope at evaluation time. This program’s match function checks whether <code>desired_tag1</code> is equal to the tag attached to the queried domain, <code>query_domain_tag1</code>. </p></li><li><p>The <code>response</code> function is an expression that evaluates to the special <code>response</code> type, which is really just a 3-tuple consisting of: a list of IPv4 addresses, a list of IPv6 addresses, and a TTL. This program’s response function simply returns the configured IPv4 address, IPv6 address, and TTL (seconds).</p></li></ul><p>Critically, <i>all</i> Topaz programs are encoded as YAML and live in the same version-controlled file. Imagine this program file contained only the <code>orange</code> program above, but now, a new team wants to add a new program, which checks whether the queried domain’s “tag1” field is equal to “orange” AND that the domain’s “tag2” field is equal to true:</p>
            <pre><code>- name: orange_and_true
  config: |
    (config
      ([desired_tag1 "orange"]
       [ipv4 (ipv4_address “192.0.2.2”)]
       [ipv6 (ipv6_address “2001:DB8:1:2”)]
       [t (ttl 300)]))
  match: |
    (and (= query_domain_tag1 desired_tag1)
         query_domain_tag2)
  response: |
    (response (list ipv4) (list ipv6) t)</code></pre>
            <p>This new team must place their new <code>orange_and_true</code> program either below or above the <code>orange</code> program in the file containing the list of Topaz programs. For instance, they could place <code>orange_and_true</code> after <code>orange</code>, like so:</p>
            <pre><code>- name: orange
  config: …
  match: …
  response: …
- name: orange_and_true
  config: …
  match: …
  response: …</code></pre>
            <p>Now let’s add a third, more interesting Topaz program. Say a Cloudflare team wants to test a modified version of our CDN’s HTTP server on a small percentage of domains, and only in a subset of Cloudflare’s data centers. Furthermore, they want to distribute these queries across a specific IP prefix such that queries for the same domain get the same IP. They write the following:</p>
            <pre><code>- name: purple
  config: |
    (config
      ([purple_datacenters (fetch_datacenters “purple”)]
       [percentage 10]
       [ipv4_prefix (ipv4_prefix “203.0.113.0/24”)]
       [ipv6_prefix (ipv6_prefix “2001:DB8:3::/48”)]))
  match: |
    (let ([rand (rand_gen (hash query_domain))])
      (and (member? purple_datacenters query_datacenter)
           (&lt; (random_number (range 0 99) rand) percentage)))
  response: |
    (let ([hashed_domain (hash query_domain)]
          [ipv4_address (select_from ipv4_prefix hashed_domain)]
          [ipv6_address (select_from ipv6_prefix hashed_domain)])
      (response (list ipv4_address) (list ipv6_address) (ttl 1)))</code></pre>
            <p>This Topaz program is significantly more complicated, so let’s walk through it.</p><p>Starting with configuration: </p><ul><li><p>The first configuration value, <code>purple_datacenters</code>, is bound to the expression <code>(fetch_datacenters “purple”)</code>, which is a function that retrieves all Cloudflare data centers tagged “purple” via an internal HTTP API. The result of this function call is a list of data centers. </p></li><li><p>The second configuration value, <code>percentage</code>, is a number representing the fraction of traffic we would like our program to act upon.</p></li><li><p>The third and fourth names are bound to IP prefixes, v4 and v6 respectively (note the <code>built-in ipv4_prefix</code> and <code>ipv6_prefix</code> types).</p></li></ul><p>The match function is also more complicated. First, note the <code>let</code> form — this lets operators define local variables. We define one local variable, a random number generator called <code>rand</code> seeded with the hash of the queried domain name. The match expression itself is a conjunction that checks two things. </p><ul><li><p>First, it checks whether the query landed in a data center tagged “purple”. </p></li><li><p>Second, it checks whether a random number between 0 and 99 (produced by a generator seeded by the domain name) is less than the configured percentage. By seeding the random number generator with the domain, the program ensures that 10% of <i>domains</i> trigger a match. If we had seeded the RNG with, say, the query ID, then queries for the same domain would behave differently.</p></li></ul><p>Together, the conjuncts guarantee that the match expression evaluates to true for 10% of domains queried in “purple” data centers.</p><p>Now let’s look at the response function. We define three local variables. The first is a hash of the domain. The second is an IPv4 address selected from the configured IPv4 prefix. <code>select_from</code> always chooses the same IP address given the same prefix and hash — this ensures that queries for a given domain always receive the same IP address (which makes it easier to correlate queries for a single domain), but that queries for different domains can receive different IP addresses within the configured prefix. The third local variable is an IPv6 address selected similarly. The response function returns these IP addresses and a TTL of value 1 (second).</p>
    <div>
      <h2>Topaz programs are executed on the hot path</h2>
      <a href="#topaz-programs-are-executed-on-the-hot-path">
        
      </a>
    </div>
    <p>Topaz’s control plane validates the list of programs and distributes them to our global nameserver instances. As we’ve seen, the list of programs reside in a single, version-controlled YAML file. When an operator changes this file (i.e., adds a program, removes a program, or modifies an existing program), Topaz’s control plane does the following things in order:</p><ul><li><p>First, it validates the programs, making sure there are no syntax errors. </p></li><li><p>Second, it “finalizes” each program’s configuration by evaluating every configuration binding and storing the result. (For instance, to finalize the <code>purple</code> program, it evaluates <code>fetch_datacenters</code>, storing the resulting list. This way our authoritative nameservers never need to retrieve external data.) </p></li><li><p>Third, it <i>verifies</i> the finalized programs, which we will explain below. </p></li><li><p>Finally, it distributes the finalized programs across our network.</p></li></ul><p>Topaz’s control plane distributes the programs to all servers globally by writing the list of programs to <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale/"><u>QuickSilver</u></a>, our edge key-value store. The Topaz service on each server detects changes in Quicksilver and updates its program list.</p><p>When our nameserver service receives a DNS query, it augments the query with additional metadata (e.g., tags) and then forwards the query to the Topaz service (both services run on every Cloudflare server) via Inter-Process Communication (IPC). Topaz, upon receiving a DNS query from the nameserver, walks through its program list, executing each program’s match function (using the topaz-lang interpreter) with the DNS query in scope (with values prefixed with <code>query_</code>). It walks the list until a match function returns <code>true</code>. It then executes that program’s response function, and returns the resulting IP addresses and TTL to our nameserver. The nameserver packages these addresses and TTL in valid DNS format, and then returns them to the resolver. </p>
    <div>
      <h2>Topaz programs are formally verified</h2>
      <a href="#topaz-programs-are-formally-verified">
        
      </a>
    </div>
    <p>Before programs are distributed to our global network, they are formally verified. Each program is passed through our formal verification tool which throws an error if a program has a bug, or if two programs (e.g., the <code>orange_and_true</code> and <code>orange</code> programs) conflict with one another.</p><p>The Topaz formal verifier (<a href="https://en.wikipedia.org/wiki/Model_checking"><u>model-checker</u></a>) checks three properties.</p><p>First, it checks that each program is <i>satisfiable </i>— that there exists <i>some</i> DNS query that causes each program’s match function to return <code>true</code>. This property is useful for detecting internally-inconsistent programs that will simply never match. For instance, if a program’s match expression was <code>(and true false)</code>, there exists no query that will cause this to evaluate to true, so the verifier throws an error.</p><p>Second, it checks that each program is <i>reachable </i>— that there exists some DNS query that causes each program’s match function to return <code>true</code> <i>given all preceding programs.</i> This property is useful for detecting “dead” programs that are completely overshadowed by higher-priority programs. For instance, recall the ordering of the <code>orange</code> and <code>orange_and_true</code> programs:</p>
            <pre><code>- name: orange
  config: …
  match: (= query_domain_tag1 "orange")  
  response: …
- name: orange_and_true
  config: …
  match: (and (= query_domain_tag1 "orange") query_domain_tag2)
  response: …</code></pre>
            <p>The verifier would throw an error because the <code>orange_and_true</code> program is unreachable. For all DNS queries for which <code>query_domain_tag1</code> is ”orange”, regardless of <code>metadata2</code>, the <code>orange</code> program will <i>always</i> match, which means the <code>orange_and_true</code> program will <i>never</i> match. To resolve this error, we’d need to swap these two programs like we did above.</p><p>Finally, and most importantly, the verifier checks for program <i>conflicts</i>: queries that cause any two programs to both match. If such a query exists, it throws an error (and prints the relevant query), and the operators are forced to resolve the conflict by changing their programs. However, it only checks whether specific programs conflict — those that are explicitly marked <i>exclusive. </i>Operators mark their program as exclusive if they want to be sure that no other exclusive program could match on the same queries.</p><p>To see what conflict detection looks like, consider the corrected ordering of the <code>orange_and_true</code> and <code>orange</code> programs, but note that the two programs have now been marked exclusive:</p>
            <pre><code>- name: orange_and_true
  exclusive: true
  config: ...
  match: (and (= query_domain_tag1 "orange") query_domain_tag2)
  response: ...
- name: orange
  exclusive: true
  config: ...
  match: (= query_domain_tag1 "orange") 
  response: ...</code></pre>
            <p>After marking these two programs exclusive, the verifier will throw an error. Not only will it say that these two programs can contradict one another, but it will provide a sample query as proof:</p>
            <pre><code>Checking: no exclusive programs match the same queries: check FAILED!
Intersecting programs found:
programs "orange_and_true" and "orange" both match any query...
  to any domain...
    with tag1: "orange"
    with tag2: true
</code></pre>
            <p>The teams behind the <code>orange</code> and <code>orange_and_true</code> programs respectively <i>must</i> resolve this conflict before these programs are deployed, and can use the above query to help them do so. To resolve the conflict, the teams have a few options. The simplest option is to remove the exclusive setting from one program, and acknowledge that it is simply not possible for these programs to be <code>exclusive</code>. In that case, the order of the two programs matters (one must have higher priority). This is fine! Topaz allows developers to write certain programs that <i>absolutely cannot </i>overlap with other programs (using <code>exclusive</code>), but sometimes that is just not possible. And when it’s not, at least program priority is <i>explicit.</i></p><p><i>Note: in practice, we place all exclusive programs at the top of the program file. This makes it easier to reason about interactions between exclusive and non-exclusive programs.</i></p><p>In short, verification is powerful not only because it catches bugs (e.g., satisfiability and reachability), but it also highlights the consequences of program changes. It helps operators understand the impact of their changes by providing immediate feedback. If two programs conflict, operators are forced to resolve it before deployment, rather than after an incident.</p><p><b>Bonus: verification-powered diffs. </b>One of the newest features we’ve added to the verifier is one we call <i>semantic diffs</i>. It’s in early stages, but the key insight is that operators often just want to <i>understand</i> the impact of changes, even if these changes are deemed safe. To help operators, the verifier compares the old and new versions of the program file. Specifically, it looks for any query that matched program <i>X</i> in the old version, but matches a different program <i>Y</i> in the new version (or vice versa). For instance, if we changed <code>orange_and_true</code> thus:</p>
            <pre><code>- name: orange_and_true
  config: …
  match: (and (= query_domain_tag1 "orange") (not query_domain_tag2))
  response: …</code></pre>
            <p>Our verifier would emit:</p>
            <pre><code>Generating a report to help you understand your changes...
NOTE: the queries below (if any) are just examples. Other such queries may exist.

* program "orange_and_true" now MATCHES any query...
  to any domain...
    with tag1: "orange"
    with tag2: false</code></pre>
            <p>While not exhaustive, this information helps operators understand whether their changes are doing what they intend or not, <i>before</i> deployment. We look forward to expanding our verifier’s diff capabilities going forward.</p>
    <div>
      <h2>How Topaz’s verifier works, and its tradeoffs</h2>
      <a href="#how-topazs-verifier-works-and-its-tradeoffs">
        
      </a>
    </div>
    <p>How does the verifier work? At a high-level, the verifier checks that, for all possible DNS queries, the three properties outlined above are satisfied. A Satisfiability Modulo Theories (SMT) solver — which we explain below — makes this seemingly impossible operation feasible. (It doesn't literally loop over all DNS queries, but it is equivalent to doing so — it provides exhaustive proof.)</p><p>We implemented our formal verifier in <a href="https://emina.github.io/rosette/"><u>Rosette</u></a>, a solver-enhanced domain-specific language written in the <a href="https://racket-lang.org/"><u>Racket</u></a> programming language. Rosette makes writing a verifier more of an engineering exercise, rather than a formal logic test: if you can express the interpreter for your language in Racket/Rosette, you get verification “for free”, in some sense. We wrote a topaz-lang interpreter in Racket, then crafted our three properties using the Rosette DSL.</p><p>How does Rosette work? Rosette translates our desired properties into formulae in <a href="https://en.wikipedia.org/wiki/First-order_logic"><u>first-order logic</u></a>. At a high level, these formulae are like equations from algebra class in school, with “unknowns” or variables. For instance, when checking whether the orange program is reachable (with the <code>orange_and_true</code> program ordered before it), Rosette produces the formula <code>((NOT orange_and_true.match) AND orange.match)</code>. The “unknowns” here are the DNS query parameters that these match functions operate over, e.g., <code>query_domain_tag1</code>. To solve this formula, Rosette interfaces with an <a href="https://en.wikipedia.org/wiki/Satisfiability_modulo_theories"><u>SMT solver</u></a> (like <a href="https://github.com/Z3Prover/z3"><u>Z3</u></a>), which is specifically designed to solve these types of formulae by efficiently finding values to assign to the DNS query parameters that make the formulae true. Once the SMT solver finds satisfying values, Rosette translates them into a Racket data structure: in our case, a sample DNS query. In this example, once it finds a satisfying DNS query, it would report that the <code>orange</code> program is indeed reachable.</p><p>However, verification is not free. The primary cost is maintenance. The model checker’s interpreter (Racket) must be kept in lockstep with the main interpreter (Go). If they fall out-of-sync, the verifier loses the ability to accurately detect bugs. Furthermore, functions added to topaz-lang must be compatible with formal verification.</p><p>Also, not all functions are easily verifiable, which means we must restrict the kinds of functions that program authors can write. Rosette can only verify functions that operate over integers and bit-vectors. This means we only permit functions whose operations can be converted into operations over integers and bit-vectors. While this seems restrictive, it actually gets us pretty far. The main challenge is strings: Topaz does not support programs that, for example, manipulate or work with substrings of the queried domain name. However, it does support simple operations on closed-set strings. For instance, it supports checking if two domain names are equal, because we can convert all strings to a small set of values representable using integers (which are easily verifiable).</p><p>Fortunately, thanks to our design of Topaz programs, the verifier need not be compatible with all Topaz program code. The verifier only ever examines Topaz <i>match</i> functions, so only the functions specified in match functions need to be verification-compatible. We encountered other challenges when working to make our model accurate, like modeling randomness — if you are interested in the details, we encourage you to read the <a href="https://research.cloudflare.com/publications/Larisch2024/"><u>paper</u></a>.</p><p>Another potential cost is verification speed. We find that the verifier can ensure our existing seven programs satisfy all three properties within about six seconds, which is acceptable because verification happens only at build time. We verify programs centrally, before programs are deployed, and only when programs change. </p><p>We also ran microbenchmarks to determine how fast the verifier can check more programs — we found that, for instance, it would take the verifier about 300 seconds to verify 50 programs. While 300 seconds is still acceptable, we are looking into verifier optimizations that will reduce the time further.</p>
    <div>
      <h2>Bringing formal verification from research to production</h2>
      <a href="#bringing-formal-verification-from-research-to-production">
        
      </a>
    </div>
    <p>Topaz’s verifier began as a <a href="https://research.cloudflare.com/"><u>research</u></a> project, and has since been deployed to production. It formally verifies all changes made to the authoritative DNS behavior specified in Topaz.</p><p>For more in-depth information on Topaz, see both our research <a href="https://research.cloudflare.com/publications/Larisch2024/"><u>paper</u></a> published at SIGCOMM 2024 and the <a href="https://www.youtube.com/watch?v=hW7RjXVx7_Q"><u>recording</u></a> of the talk.</p><p>We thank our former intern, Tim Alberdingk-Thijm, for his invaluable work on Topaz’s verifier.</p> ]]></content:encoded>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Research]]></category>
            <category><![CDATA[Addressing]]></category>
            <category><![CDATA[Formal Methods]]></category>
            <guid isPermaLink="false">5LVsblxj2Git54IRxadpyg</guid>
            <dc:creator>James Larisch</dc:creator>
            <dc:creator>Suleman Ahmad</dc:creator>
            <dc:creator>Marwan Fayed</dc:creator>
        </item>
    </channel>
</rss>