
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 04 Apr 2026 06:53:20 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Monitoring AS-SETs and why they matter]]></title>
            <link>https://blog.cloudflare.com/monitoring-as-sets-and-why-they-matter/</link>
            <pubDate>Fri, 26 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ We will cover some of the reasons why operators need to monitor the AS-SET memberships for their ASN, and now Cloudflare Radar can help.  ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Introduction to AS-SETs</h2>
      <a href="#introduction-to-as-sets">
        
      </a>
    </div>
    <p>An <a href="https://www.apnic.net/manage-ip/using-whois/guide/as-set/"><u>AS-SET</u></a>, not to be confused with the <a href="https://datatracker.ietf.org/doc/rfc9774/"><u>recently deprecated BGP AS_SET</u></a>, is an <a href="https://irr.net/overview/"><u>Internet Routing Registry (IRR)</u></a> object that allows network operators to group related networks together. AS-SETs have been used historically for multiple purposes such as grouping together a list of downstream customers of a particular network provider. For example, Cloudflare uses the <a href="https://irrexplorer.nlnog.net/as-set/AS13335:AS-CLOUDFLARE"><u>AS13335:AS-CLOUDFLARE</u></a> AS-SET to group together our list of our own <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>Autonomous System Numbers</u></a> (ASNs) and our downstream Bring-Your-Own-IP (BYOIP) customer networks, so we can ultimately <a href="https://www.peeringdb.com/net/4224"><u>communicate</u></a> to other networks whose prefixes they should accept from us. </p><p>In other words, an AS-SET is <i>currently</i> the way on the Internet that allows someone to attest the networks for which they are the provider. This system of provider authorization is completely trust-based, meaning it's <a href="https://www.kentik.com/blog/the-scourge-of-excessive-as-sets/"><u>not reliable at all</u></a>, and is best-effort. The future of an RPKI-based provider authorization system is <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>coming in the form of ASPA (Autonomous System Provider Authorization),</u></a> but it will take time for standardization and adoption. Until then, we are left with AS-SETs.</p><p>Because AS-SETs are so critical for BGP routing on the Internet, network operators need to be able to monitor valid and invalid AS-SET <i>memberships </i>for their networks. Cloudflare Radar now introduces a transparent, public listing to help network operators in our <a href="https://radar.cloudflare.com/routing/as13335"><u>routing page</u></a> per ASN.</p>
    <div>
      <h2>AS-SETs and building BGP route filters</h2>
      <a href="#as-sets-and-building-bgp-route-filters">
        
      </a>
    </div>
    <p>AS-SETs are a critical component of BGP policies, and often paired with the expressive <a href="https://irr.net/rpsl-guide/"><u>Routing Policy Specification Language (RPSL)</u></a> that describes how a particular BGP ASN accepts and propagates routes to other networks. Most often, networks use AS-SET to express what other networks should accept from them, in terms of downstream customers. </p><p>Back to the AS13335:AS-CLOUDFLARE example AS-SET, this is published clearly on <a href="https://www.peeringdb.com/net/4224"><u>PeeringDB</u></a> for other peering networks to reference and build filters against. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2590TMppv2h4SAi7uy6xS9/617ec81e2364f470c0efe243a528f695/image6.png" />
          </figure><p>When turning up a new transit provider service, we also ask the provider networks to build their route filters using the same AS-SET. Because BGP prefixes are also created in IRR <a href="https://irr.net/registry/"><u>registries</u></a> using the <i>route</i> or <i>route6 </i><a href="https://developers.cloudflare.com/byoip/concepts/irr-entries/best-practices/"><u>objects</u></a>, peers and providers now know what BGP prefixes they should accept from us and deny the rest. A popular tool for building prefix-lists based on AS-SETs and IRR databases is <a href="https://github.com/bgp/bgpq4"><u>bgpq4</u></a>, and it’s one you can easily try out yourself. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7F2QdhcZTLEJjKNtZbBWxR/92efe32dcef67aa6d51c3b1a29218843/image3.png" />
          </figure><p>For example, to generate a Juniper router’s IPv4 prefix-list containing prefixes that AS13335 could propagate for Cloudflare and its customers, you may use: </p>
            <pre><code>% bgpq4 -4Jl CLOUDFLARE-PREFIXES -m24 AS13335:AS-CLOUDFLARE | head -n 10
policy-options {
replace:
 prefix-list CLOUDFLARE-PREFIXES {
    1.0.0.0/24;
    1.0.4.0/22;
    1.1.1.0/24;
    1.1.2.0/24;
    1.178.32.0/19;
    1.178.32.0/20;
    1.178.48.0/20;</code></pre>
            <p><sup><i>Restricted to 10 lines, actual output of prefix-list would be much greater</i></sup></p><p>This prefix list would be applied within an eBGP import policy by our providers and peers to make sure AS13335 is only able to propagate announcements for ourselves and our customers.</p>
    <div>
      <h2>How accurate AS-SETs prevent route leaks</h2>
      <a href="#how-accurate-as-sets-prevent-route-leaks">
        
      </a>
    </div>
    <p>Let’s see how accurate AS-SETs can help prevent route leaks with a simple example. In this example, AS64502 has two providers – AS64501 and AS64503. AS64502 has accidentally messed up their BGP export policy configuration toward the AS64503 neighbor, and is exporting <b>all</b> routes, including those it receives from their AS64501 provider. This is a typical <a href="https://datatracker.ietf.org/doc/html/rfc7908#section-3.1"><u>Type 1 Hairpin route leak</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/D69Fq0jXg9MaGieS0KqZ2/42fa33a433c875591b85ce9a6db91610/image5.png" />
          </figure><p>Fortunately, AS64503 has implemented an import policy that they generated using IRR data including AS-SETs and route objects. By doing so, they will only accept the prefixes that originate from the <a href="https://www.manrs.org/wp-content/uploads/2021/11/AS-Cones-MANRS.pdf"><u>AS Cone</u></a> of AS64502, since they are their customer. Instead of having a major reachability or latency impact for many prefixes on the Internet because of this route leak propagating, it is stopped in its tracks thanks to the responsible filtering by the AS64503 provider network. Again it is worth keeping in mind the success of this strategy is dependent upon data accuracy for the fictional AS64502:AS-CUSTOMERS AS-SET.</p>
    <div>
      <h2>Monitoring AS-SET misuse</h2>
      <a href="#monitoring-as-set-misuse">
        
      </a>
    </div>
    <p>Besides using AS-SETs to group together one’s downstream customers, AS-SETs can also represent other types of relationships, such as peers, transits, or IXP participations.</p><p>For example, there are 76 AS-SETs that directly include one of the Tier-1 networks, Telecom Italia / Sparkle (AS6762). Judging from the names of the AS-SETs, most of them are representing peers and transits of certain ASNs, which includes AS6762. You can view this output yourself at <a href="https://radar.cloudflare.com/routing/as6762#irr-as-sets"><u>https://radar.cloudflare.com/routing/as6762#irr-as-sets</u></a></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/eeAA6iWaAVd6qd2rB93VM/ff37a27156f8229639a6ec377c7eb273/image7.png" />
          </figure><p>There is nothing wrong with defining AS-SETs that contain one’s peers or upstreams as long as those AS-SETs are not submitted upstream for customer-&gt;provider BGP session filtering. In fact, an AS-SET for upstreams or peer-to-peer relationships can be useful for defining a network’s policies in RPSL.</p><p>However, some AS-SETs in the AS6762 membership list such as AS-10099 look to attest customer relationships. </p>
            <pre><code>% whois -h rr.ntt.net AS-10099 | grep "descr"
descr:          CUHK Customer</code></pre>
            <p>We know AS6762 is transit free and this customer membership must be invalid, so it is a prime example of AS-SET misuse that would ideally be cleaned up. Many Internet Service Providers and network operators are more than happy to correct an invalid AS-SET entry when asked to. It is reasonable to look at each AS-SET membership like this as a potential risk of having higher route leak propagation to major networks and the Internet when they happen.</p>
    <div>
      <h2>AS-SET information on Cloudflare Radar</h2>
      <a href="#as-set-information-on-cloudflare-radar">
        
      </a>
    </div>
    <p><a href="https://radar.cloudflare.com/"><u>Cloudflare Radar</u></a> is a hub that showcases global Internet traffic, attack, and technology trends and insights. Today, we are adding IRR AS-SET information to Radar’s routing section, freely available to the public via both website and API access. To view all AS-SETs an AS is a member of, directly or indirectly via other AS-SETs, a user can visit the corresponding AS’s routing page. For example, the AS-SETs list for Cloudflare (AS13335) is available at <a href="https://radar.cloudflare.com/routing/as13335#irr-as-sets"><u>https://radar.cloudflare.com/routing/as13335#irr-as-sets</u></a></p><p>The AS-SET data on IRR contains only limited information like the AS members and AS-SET members. Here at Radar, we also enhance the AS-SET table with additional useful information as follows.</p><ul><li><p><code>Inferred ASN</code> shows the AS number that is inferred to be the creator of the AS-SET. We use PeeringDB AS-SET information match if available. Otherwise, we parse the AS-SET name to infer the creator.</p></li><li><p><code>IRR Sources</code> shows which IRR databases we see the corresponding AS-SET. We are currently using the following databases: <code>AFRINIC</code>, <code>APNIC</code>, <code>ARIN</code>, <code>LACNIC</code>, <code>RIPE</code>, <code>RADB</code>, <code>ALTDB</code>, <code>NTTCOM</code>, and <code>TC</code>.</p></li><li><p><code>AS Members</code> and <code>AS-SET members</code> show the count of the corresponding types of members.</p></li><li><p><code>AS Cone</code> is the count of the unique ASNs that are included by the AS-SET directly or indirectly.</p></li><li><p><code>Upstreams</code> is the count of unique AS-SETs that includes the corresponding AS-SET.</p></li></ul><p>Users can further filter the table by searching for a specific AS-SET name or ASN. A toggle to show only direct or indirect AS-SETs is also available.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/0ssTf7bi6yjT2m0YKWPJE/e20b18a7d3151652fecbe606bbe13346/image1.png" />
          </figure><p>In addition to listing AS-SETs, we also provide a tree-view to display how an AS-SET includes a given ASN. For example, the following screenshot shows how as-delta indirectly includes AS6762 through 7 additional other AS-SETs. Users can copy or download this tree-view content in the text format, making it easy to share with others.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2hNbh2gdj2F0eLTYrzjrVN/eceb588456067a387e7cb6eb3e1e3c5e/image4.png" />
          </figure><p>We built this Radar feature using our<a href="https://developers.cloudflare.com/api/resources/radar/subresources/entities/subresources/asns/methods/as_set/"><u> publicly available API</u></a>, the same way other Radar websites are built. We have also experimented using this API to build additional features like a full AS-SET tree visualization. We encourage developers to give <a href="https://developers.cloudflare.com/api/resources/radar/subresources/entities/subresources/asns/methods/as_set/"><u>this API</u></a> (and <a href="https://developers.cloudflare.com/api/resources/radar/"><u>other Radar APIs</u></a>) a try, and tell us what you think!</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ElaU3M5oe8xRnblrrf67u/3fa35d3a25d797c0b0cbe96f0490fa93/image8.png" />
          </figure>
    <div>
      <h2>Looking ahead</h2>
      <a href="#looking-ahead">
        
      </a>
    </div>
    <p>We know AS-SETs are hard to keep clean of error or misuse, and even though Radar is making them easier to monitor, the mistakes and misuse will continue. Because of this, we as a community need to push forth adoption of <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234</u></a> and <a href="https://blog.apnic.net/2025/09/05/preventing-route-leaks-made-simple-bgp-roleplay-with-junos-rfc-9234/"><u>implementations</u></a> of it from the major vendors. RFC9234 embeds roles and an Only-To-Customer (OTC) attribute directly into the BGP protocol itself, helping to detect and prevent route leaks in-line. In addition to BGP misconfiguration protection with RFC9234, Autonomous System Provider Authorization (ASPA) is still making its way <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>through the IETF</u></a> and will eventually help offer an authoritative means of attesting who the actual providers are per BGP Autonomous System (AS).</p><p>If you are a network operator and manage an AS-SET, you should seriously consider moving to <a href="https://manrs.org/2022/12/why-network-operators-should-use-hierarchical-as-sets/"><u>hierarchical AS-SETs</u></a> if you have not already. A hierarchical AS-SET looks like AS13335:AS-CLOUDFLARE instead of AS-CLOUDFLARE, but the difference is very important. Only a proper maintainer of the AS13335 ASN can create AS13335:AS-CLOUDFLARE, whereas anyone could create AS-CLOUDFLARE in an IRR database if they wanted to. In other words, using hierarchical AS-SETs helps guarantee ownership and prevent the malicious poisoning of routing information.</p><p>While keeping track of AS-SET memberships seems like a chore, it can have significant payoffs in preventing BGP-related <a href="https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024/"><u>incidents</u></a> such as route leaks. We encourage all network operators to do their part in making sure the AS-SETs you submit to your providers and peers to communicate your downstream customer cone are accurate. Every small adjustment or clean-up effort in AS-SETs could help lessen the impact of a BGP incident later.</p><p>Visit <a href="https://radar.cloudflare.com/"><u>Cloudflare Radar</u></a> for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at <a href="https://twitter.com/CloudflareRadar"><u>@CloudflareRadar</u></a> (X), <a href="https://noc.social/@cloudflareradar"><u>https://noc.social/@cloudflareradar</u></a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com"><u>radar.cloudflare.com</u></a> (Bluesky), or contact us via <a><u>e-mail</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Radar]]></category>
            <guid isPermaLink="false">6QVNgwE5ZlVbZcWQHJKsDS</guid>
            <dc:creator>Mingwei Zhang</dc:creator>
            <dc:creator>Bryton Herdes</dc:creator>
        </item>
        <item>
            <title><![CDATA[Connect and secure any private or public app by hostname, not IP — free for everyone in Cloudflare One]]></title>
            <link>https://blog.cloudflare.com/tunnel-hostname-routing/</link>
            <pubDate>Thu, 18 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Tired of IP Lists? Securely connect private networks to any app by its hostname, not its IP address. This routing is now built into Cloudflare Tunnel and is free for all Cloudflare One customers. ]]></description>
            <content:encoded><![CDATA[ <p>Connecting to an application should be as simple as knowing its name. Yet, many security models still force us to rely on brittle, ever-changing IP addresses. And we heard from many of you that managing those ever-changing IP lists was a constant struggle. </p><p>Today, we’re taking a major step toward making that a relic of the past.</p><p>We're excited to announce that you can now route traffic to <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/"><u>Cloudflare Tunnel</u></a> based on a hostname or a domain. This allows you to use Cloudflare Tunnel to build simple zero-trust and egress policies for your private and public web applications without ever needing to know their underlying IP. This is one more step on our <a href="https://blog.cloudflare.com/egress-policies-by-hostname/"><u>mission</u></a> to strengthen platform-wide support for hostname- and domain-based policies in the <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Cloudflare One</u></a> <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/">SASE</a> platform, simplifying complexity and improving security for our customers and end users. </p>
    <div>
      <h2>Grant access to applications, not networks</h2>
      <a href="#grant-access-to-applications-not-networks">
        
      </a>
    </div>
    <p>In August 2020, the National Institute of Standards (NIST) published <a href="https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-207.pdf"><u>Special Publication 800-207</u></a>, encouraging organizations to abandon the "castle-and-moat" model of security (where trust is established on the basis of network location) and move to a <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/">Zero Trust model </a>(where we “<a href="https://www.whitehouse.gov/wp-content/uploads/2022/01/M-22-09.pdf"><u>verify anything and everything attempting to establish access</u></a>").</p><p>Now, instead of granting broad network permissions, you grant specific access to individual resources. This concept, known as per-resource authorization, is a cornerstone of the Zero Trust framework, and it presents a huge change to how organizations have traditionally run networks. Per-resource authorization requires that access policies be configured on a per-resource basis. By applying the principle of least privilege, you give users access only to the resources they absolutely need to do their job. This tightens security and shrinks the potential attack surface for any given resource.</p><p>Instead of allowing your users to access an entire network segment, like <code><b>10.131.0.0/24</b></code>, your security policies become much more precise. For example:</p><ul><li><p>Only employees in the "SRE" group running a managed device can access <code><b>admin.core-router3-sjc.acme.local</b></code>.</p></li><li><p>Only employees in the "finance" group located in Canada can access <code><b>canada-payroll-server.acme.local</b></code>.</p></li><li><p>All employees located in New York can access<b> </b><code><b>printer1.nyc.acme.local</b></code>.</p></li></ul><p>Notice what these powerful, granular rules have in common? They’re all based on the resource’s private <b>hostname</b>, not its IP address. That’s exactly what our new hostname routing enables. We’ve made it dramatically easier to write effective zero trust policies using stable hostnames, without ever needing to know the underlying IP address.</p>
    <div>
      <h2>Why IP-based rules break</h2>
      <a href="#why-ip-based-rules-break">
        
      </a>
    </div>
    <p>Let's imagine you need to secure an internal server, <code><b>canada-payroll-server.acme.local</b></code>. It’s hosted on internal IP <code><b>10.4.4.4</b></code> and its hostname is available in internal private DNS, but not in public DNS. In a modern cloud environment, its IP address is often the least stable thing about it. If your security policy is tied to that IP, it's built on a shaky foundation.</p><p>This happens for a few common reasons:</p><ul><li><p><b>Cloud instances</b>: When you launch a compute instance in a cloud environment like AWS, you're responsible for its <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/hostname-types.html"><u>hostname</u></a>, but not always its IP address. As a result, you might only be tracking the hostname and may not even know the server's IP.</p></li><li><p><b>Load Balancers</b>: If the server is behind a load balancer in a cloud environment (like AWS ELB), its IP address could be changing dynamically in response to <a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html"><u>changes in traffic</u></a>.</p></li><li><p><b>Ephemeral infrastructure</b>: This is the "<a href="https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/"><u>cattle, not pets</u></a>" world of modern infrastructure. Resources like servers in an autoscaling group, containers in a Kubernetes cluster, or applications that spin down overnight are created and destroyed as needed. They keep a persistent hostname so users can find them, but their IP is ephemeral and changes every time they spin up.</p></li></ul><p>To cope with this, we've seen customers build complex scripts to maintain dynamic "IP Lists" — mappings from a hostname to its IPs that are updated every time the address changes. While this approach is clever, maintaining IP Lists is a chore. They are brittle, and a single error could cause employees to lose access to vital resources.</p><p>Fortunately, hostname-based routing makes this IP List workaround obsolete.</p>
    <div>
      <h2>How it works: secure a private server by hostname using Cloudflare One SASE platform</h2>
      <a href="#how-it-works-secure-a-private-server-by-hostname-using-cloudflare-one-sase-platform">
        
      </a>
    </div>
    <p>To see this in action, let's create a policy from our earlier example: we want to grant employees in the "finance" group located in Canada access to <code><b>canada-payroll-server.acme.local</b></code>. Here’s how you do it, without ever touching an IP address.</p><p><b>Step 1: Connect your private network</b></p><p>First, the server's network needs a secure connection to Cloudflare's global network. You do this by installing our lightweight agent, <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/"><u>cloudflared</u></a>, in the same local area network as the server, which creates a secure Cloudflare Tunnel. You can create a new tunnel directly from cloudflared by running <code><b>cloudflared tunnel create &lt;TUNNEL-NAME&gt;</b></code> or using your Zero Trust dashboard.</p><div>
  
</div><p>
<b>Step 2: Route the hostname to the tunnel</b></p><p>This is where the new capability comes into play. In your Zero Trust dashboard, you now establish a route that binds the <i>hostname</i> <code>canada-payroll-server.acme.local</code> directly to that tunnel. In the past, you could only route an IP address (<code>10.4.4.4)</code> or its subnet (<code>10.4.4.0/24</code>). That old method required you to create and manage those brittle IP Lists we talked about. Now, you can even route entire domains, like <code>*.acme.local</code>, directly to the tunnel, simply by creating a hostname route to <code>acme.local</code>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3mcoBAILYENIP6kGW4tw96/bb7ec6571ae7b4f04b5dc0456f694d59/1.png" />
          </figure><p>For this to work, you must delete your private network’s subnet (in this case <code>10.0.0.0/8</code>) and <code>100.64.0.0/10</code> from the <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/configure-warp/route-traffic/split-tunnels/"><u>Split Tunnels Exclude</u></a> list. You also need to remove <code>.local</code> from the <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/configure-warp/route-traffic/local-domains/"><u>Local Domain Fallback</u></a>.</p><p>(As an aside, we note that this feature also works with domains. For example, you could bind <code>*.acme.local</code> to a single tunnel, if desired.)</p><p><b>Step 3: Write your zero trust policy</b></p><p>Now that Cloudflare knows <i>how</i> to reach your server by its name, you can write a policy to control <i>who</i> can access it. You have a couple of options:</p><ul><li><p><b>In Cloudflare Access (for HTTPS applications):</b> Write an <a href="https://developers.cloudflare.com/cloudflare-one/applications/non-http/self-hosted-private-app/"><u>Access policy</u></a> that grants employees in the “finance” group access to the private hostname <code>canada-payroll-server.acme.local</code>. This is ideal for applications accessible over HTTPS on port 443.
</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7lIZI9ThsAWtxFZZis3HtZ/08451586dbe373ff137bd9e91d23dea6/2.png" />
          </figure><p></p></li><li><p><b>In Cloudflare Gateway (for HTTPS applications):</b> Alternatively, write a <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/"><u>Gateway policy</u></a> that grants employees in the “finance” group access to the <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/network-policies/#sni"><u>SNI</u></a> <code>canada-payroll-server.acme.local</code>. This <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/network-policies/protocol-detection/"><u>works</u></a> for services accessible over HTTPS on any port.
</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5GpwDZNmdzapOyjOgFFlKD/50e2d0df64d2230479ad8d0a013de24b/3.png" />
          </figure><p></p></li><li><p><b>In Cloudflare Gateway (for non-HTTP applications):</b> You can also write a <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/"><u>Gateway policy</u></a> that blocks DNS resolution <code>canada-payroll-server.acme.local</code> for all employees except the “finance” group.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3na5Mf6UMpBcKYm6JWmnzd/5791054c944300e667c3829e9bd8c6ec/4.png" />
          </figure><p>The principle of "trust nothing" means your security posture should start by denying traffic by default. For this setup to work in a true Zero Trust model, it should be paired with a default Gateway policy that blocks all access to your internal IP ranges. Think of this as ensuring all doors to your private network are locked by default. The specific <code>allow</code> policies you create for hostnames then act as the keycard, unlocking one specific door only for authorized users.</p><p>Without that foundational "deny" policy, creating a route to a private resource would make it accessible to everyone in your organization, defeating the purpose of a least-privilege model and creating significant security risks. This step ensures that only the traffic you explicitly permit can ever reach your corporate resources.</p><p>And there you have it. We’ve walked through the entire process of writing a per-resource policy using only the server’s private hostname. No IP Lists to be seen anywhere, simplifying life for your administrators.</p>
    <div>
      <h2>Secure egress traffic to third-party applications</h2>
      <a href="#secure-egress-traffic-to-third-party-applications">
        
      </a>
    </div>
    <p>Here's another powerful use case for hostname routing: controlling outbound connections from your users to the public Internet. Some third-party services, such as banking portals or partner APIs, use an IP allowlist for security. They will only accept connections that originate from a specific, dedicated public source IP address that belongs to your company.</p><p>This common practice creates a challenge. Let's say your banking portal at <code>bank.example.com</code> requires all traffic to come from a dedicated source IP <code>203.0.113.9</code> owned by your company. At the same time, you want to enforce a zero trust policy that <i>only</i> allows your finance team to access that portal. You can't build your policy based on the bank's destination IP — you don't control it, and it could change at any moment. You have to use its hostname.</p><p>There are two ways to solve this problem. First, if your dedicated source IP is purchased from Cloudflare, you can use the <a href="https://blog.cloudflare.com/egress-policies-by-hostname/"><u>“egress policy by hostname” feature</u></a> that we announced previously. By contrast, if your dedicated source IP belongs to your organization, or is leased from cloud provider, then we can solve this problem with hostname-based routing, as shown in the figure below:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6wXu6FMiiVz4lXsESFrBTg/e1bb13e8eef0653ab311d0800d95f391/5.png" />
          </figure><p>Here’s how this works:</p><ol><li><p><b>Force traffic through your dedicated IP.</b> First, you deploy a <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/"><u>Cloudflare Tunnel</u></a> in the network that owns your dedicated IP (for example, your primary VPC in a cloud provider). All traffic you send through this tunnel will exit to the Internet with <code>203.0.113.9</code> as its source IP.</p></li><li><p><b>Route the banking app to that tunnel.</b> Next, you create a hostname route in your Zero Trust dashboard. This rule tells Cloudflare: "Any traffic destined for <code>bank.example.com</code> must be sent through this specific tunnel."</p></li><li><p><b>Apply your user policies.</b> Finally, in Cloudflare Gateway, you create your granular access rules. A low-priority <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/network-policies/"><u>network policy</u></a> blocks access to the <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/network-policies/#sni"><u>SNI</u></a> <code>bank.example.com</code> for everyone. Then, a second, higher-priority policy explicitly allows users in the "finance" group to access the <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/network-policies/#sni"><u>SNI</u></a> <code>bank.example.com</code>.</p></li></ol><p>Now, when a finance team member accesses the portal, their traffic is correctly routed through the tunnel and arrives with the source IP the bank expects. An employee from any other department is blocked by Gateway before their traffic even enters the tunnel. You've enforced a precise, user-based zero trust policy for a third-party service, all by using its public hostname.</p>
    <div>
      <h2>Under the hood: how hostname routing works</h2>
      <a href="#under-the-hood-how-hostname-routing-works">
        
      </a>
    </div>
    <p>To build this feature, we needed to solve a classic networking challenge. The routing mechanism for Cloudflare Tunnel is a core part of Cloudflare Gateway, which operates at both Layer 4 (TCP/UDP) and Layer 7 (HTTP/S) of the network stack.</p><p>Cloudflare Gateway must make a decision about which Cloudflare Tunnel to send traffic upon receipt of the very first IP packet in the connection. This means the decision must necessarily be made at Layer 4, where Gateway only sees the IP and TCP/UDP headers of a packet. IP and TCP/UDP headers contain the destination IP address, but do not contain destination <i>hostname</i>. The hostname is only found in Layer 7 data (like a TLS SNI field or an HTTP Host header), which isn't even available until after the Layer 4 connection is already established.</p><p>This creates a dilemma: how can we route traffic based on a hostname before we've even seen the hostname? </p>
    <div>
      <h3>Synthetic IPs to the rescue</h3>
      <a href="#synthetic-ips-to-the-rescue">
        
      </a>
    </div>
    <p>The solution lies in the fact that Cloudflare Gateway also acts as a DNS resolver. This means we see the user's <i>intent </i>— the DNS query for a hostname — <i>before</i> we see the actual application traffic. We use this foresight to "tag" the traffic using a <a href="https://blog.cloudflare.com/egress-policies-by-hostname/"><u>synthetic IP address</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7Kd3x5SppGp8G4KZeO34n/67b338ca8e81db63e110dc89c7596bf6/6.png" />
          </figure><p>Let’s walk through the flow:</p><ol><li><p><b>DNS Query</b>. A user's device sends a DNS query for
 <code>canada-payroll-server.acme.local </code>to the Gateway resolver.</p></li><li><p><b>Private Resolution</b>. Gateway asks the <code>cloudflared </code>agent running in your private network to resolve the real IP for that hostname. Since <code>cloudflared</code> has access to your internal DNS, it finds the real private IP <code>10.4.4.4</code>, and sends it back to the Gateway resolver.</p></li><li><p><b>Synthetic Response</b>. Here's the key step. Gateway resolver <b>does not</b> send the real IP (<code>10.4.4.4</code>) back to the user. Instead, it temporarily assigns an <i>initial resolved IP</i> from a reserved Carrier-Grade NAT (CGNAT) address space (e.g., <code>100.80.10.10</code>) and sends the initial resolved IP back to the user's device. The initial resolved IP acts as a tag that allows Gateway to identify network traffic destined to <code>canada-payroll-server.acme.local</code>. The initial resolved IP is randomly selected and temporarily assigned from one of the two IP address ranges:</p><ul><li><p>IPv4: <code>100.80.0.0/16</code></p></li><li><p>IPv6: <code>2606:4700:0cf1:4000::/64</code> </p></li></ul></li><li><p><b>Traffic Arrives</b>. The user's device sends its application traffic (e.g., an HTTPS request) to the destination IP it received from Gateway resolver: the initial resolved IP <code>100.80.10.10</code>.</p></li><li><p><b>Routing and Rewriting</b>. When Gateway sees an incoming packet destined for <code>100.80.10.10</code>, it knows this traffic is for <code>canada-payroll-server.acme.local</code> and must be sent through a specific Cloudflare Tunnel. It then rewrites the destination IP on the packet back to the <i>real</i> private destination IP (<code>10.4.4.4</code>) and sends it down the correct tunnel.</p></li></ol><p>The traffic goes down the tunnel and arrives at <code>canada-payroll-server.acme.local</code> at IP (<code>10.4.4.4)</code> and the user is connected to the server without noticing any of these mechanisms. By intercepting the DNS query, we effectively tag the network traffic stream, allowing our Layer 4 router to make the right decision without needing to see Layer 7 data.</p>
    <div>
      <h2>Using Gateway Resolver Policies for fine grained control</h2>
      <a href="#using-gateway-resolver-policies-for-fine-grained-control">
        
      </a>
    </div>
    <p>The routing capabilities we've discussed provide simple, powerful ways to connect to private resources. But what happens when your network architecture is more complex? For example, what if your private DNS servers are in one part of your network, but the application itself is in another?</p><p>With Cloudflare One, you can solve this by creating policies that separate the path for DNS resolution from the path for application traffic for the very same hostname using <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/resolver-policies"><u>Gateway Resolver Policies</u></a>. This gives you fine-grained control to match complex network topologies.</p><p>Let's walk through a scenario:</p><ul><li><p>Your private DNS resolvers, which can resolve <code><b>acme.local</b></code>, are located in your core datacenter, accessible only via <code><b>tunnel-1</b></code>.</p></li><li><p>The webserver for <code><b>canada-payroll-server.acme.local</b></code><b> </b>is hosted in a specific cloud VPC, accessible only via <code><b>tunnel-2</b></code>.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2sVMsS4DhuN2yoTlGWTK5X/e5a66330c951e7b65428f5c76b5c7b0a/7.png" />
          </figure><p>Here’s how to configure this split-path routing.</p><p><b>Step 1: Route DNS Queries via </b><code><b>tunnel-1</b></code></p><p>First, we need to tell Cloudflare Gateway how to reach your private DNS server</p><ol><li><p><b>Create an IP Route:</b> In the Networks &gt; Tunnels area of your Zero Trust dashboard, create a route for the IP address of your private DNS server (e.g., <code><b>10.131.0.5/32</b></code>) and point it to <code><b>tunnel-1</b></code><code>.</code> This ensures any traffic destined for that specific IP goes through the correct tunnel to your datacenter.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/32JcjFZXGuhDEHHlWJoF1C/4223a6f2e5b7b49015abfbfd9b4fd20f/8.png" />
          </figure><p></p></li><li><p><b>Create a Resolver Policy:</b> Go to <b>Gateway -&gt; Resolver Policies</b> and create a new policy with the following logic:</p><ul><li><p><b>If</b> the query is for the domain <code><b>acme.local</b></code> …</p></li><li><p><b>Then</b>... resolve it using a designated DNS server with the IP <code><b>10.131.0.5</b></code>.
</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2j8kYsD692tCRYcDKoDXvb/7dbb20f426ba47350fb0b2906046d5f0/9.png" />
          </figure><p></p></li></ul></li></ol><p>With these two rules, any DNS lookup for <code><b>acme.local</b></code> from a user's device will be sent through <code>tunnel-1</code> to your private DNS server for resolution.</p><p><b>Step 2: Route Application Traffic via </b><code><b>tunnel-2</b></code></p><p>Next, we'll tell Gateway where to send the actual traffic (for example, HTTP/S) for the application.</p><p><b>Create a Hostname Route:</b> In your Zero Trust dashboard, create a <b>hostname route</b> that binds <code><b>canada-payroll-server.acme.local </b></code>to <code><b>tunnel-2</b></code>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3Ufzpsb1FUYrM39gMiyovs/c5d10828f58b0e7c854ff9fa721e1757/10.png" />
          </figure><p>This rule instructs Gateway that any application traffic (like HTTP, SSH, or any TCP/UDP traffic) for <code><b>canada-payroll-server.acme.local</b></code> must be sent through <code><b>tunnel-2</b></code><b> </b>leading to your cloud VPC.</p><p>Similarly to a setup without Gateway Resolver Policy, for this to work, you must delete your private network’s subnet (in this case <code>10.0.0.0/8</code>) and <code>100.64.0.0/10</code> from the <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/configure-warp/route-traffic/split-tunnels/"><u>Split Tunnels Exclude</u></a> list. You also need to remove <code>.local</code> from the <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/configure-warp/route-traffic/local-domains/"><u>Local Domain Fallback</u></a>.</p><p><b>Putting It All Together</b></p><p>With these two sets of policies, the "synthetic IP" mechanism handles the complex flow:</p><ol><li><p>A user tries to access <code>canada-payroll-server.acme.local</code>. Their device sends a DNS query to Cloudflare Gateway Resolver.</p></li><li><p>This DNS query matches a Gateway Resolver Policy, causing Gateway Resolver to forward the DNS query through <code>tunnel-1</code> to your private DNS server (<code>10.131.0.5</code>).</p></li><li><p>Your DNS server responds with the server’s actual private destination IP (<code>10.4.4.4</code>).</p></li><li><p>Gateway receives this IP and generates a “synthetic” initial resolved IP (<code>100.80.10.10</code>) which it sends back to the user's device.</p></li><li><p>The user's device now sends the HTTP/S request to the initial resolved IP (<code>100.80.10.10</code>).</p></li><li><p>Gateway sees the network traffic destined for the initial resolved IP (<code>100.80.10.10</code>) and, using the mapping, knows it's for <code>canada-payroll-server.acme.local</code>.</p></li><li><p>The Hostname Route now matches. Gateway sends the application traffic through tunnel-2 and rewrites its destination IP to the webserver’s actual private IP (<code>10.4.4.4</code>).</p></li><li><p>The <code>cloudflared</code> agent at the end of tunnel-2 forwards the traffic to the application's destination IP (<code>10.4.4.4</code>), which is on the same local network.</p></li></ol><p>The user is connected, without noticing that DNS and application traffic have been routed over totally separate private network paths. This approach allows you to support sophisticated split-horizon DNS environments and other advanced network architectures with simple, declarative policies.</p>
    <div>
      <h2>What onramps does this support?</h2>
      <a href="#what-onramps-does-this-support">
        
      </a>
    </div>
    <p>Our hostname routing capability is built on the "synthetic IP" (also known as <i>initially resolved IP</i>) mechanism detailed earlier, which requires specific Cloudflare One products to correctly handle both the DNS resolution and the subsequent application traffic. Here’s a breakdown of what’s currently supported for connecting your users (on-ramps) and your private applications (off-ramps).</p>
    <div>
      <h4><b>Connecting Your Users (On-Ramps)</b></h4>
      <a href="#connecting-your-users-on-ramps">
        
      </a>
    </div>
    <p>For end-users to connect to private hostnames, the feature currently works with <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/"><b><u>WARP Client</u></b></a>, agentless <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/agentless/pac-files/"><b><u>PAC files</u></b></a> and <a href="https://developers.cloudflare.com/cloudflare-one/policies/browser-isolation/"><b><u>Browser Isolation</u></b></a>.</p><p>Connectivity is also possible when users are behind <a href="https://developers.cloudflare.com/magic-wan/"><b><u>Magic WAN</u></b></a> (in active-passive mode) or <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/private-net/warp-connector/"><b><u>WARP Connector</u></b></a>, but it requires some additional configuration. To ensure traffic is routed correctly, you must update the routing table on your device or router to send traffic for the following destinations through Gateway:</p><ul><li><p>The initially resolved IP ranges: <code>100.80.0.0/16</code> (IPv4) and <code>2606:4700:0cf1:4000::/64</code> (IPv6).</p></li><li><p>The private network CIDR where your application is located (e.g., <code>10.0.0.0/8)</code>.</p></li><li><p>The IP address of your internal DNS resolver.</p></li><li><p>The Gateway DNS resolver IPs: <code>172.64.36.1</code> and <code>172.64.36.2</code>.</p></li></ul><p>Magic WAN customers will also need to point their DNS resolver to these Gateway resolver IPs and ensure they are running Magic WAN tunnels in active-passive mode: for hostname routing to work, DNS queries and the resulting network traffic must reach Cloudflare over the same Magic WAN tunnel. Currently, hostname routing will not work if your end users are at a site that has more than one Magic WAN tunnel actively transiting traffic at the same time.</p>
    <div>
      <h4><b>Connecting Your Private Network (Off-Ramps)</b></h4>
      <a href="#connecting-your-private-network-off-ramps">
        
      </a>
    </div>
    <p>On the other side of the connection, hostname-based routing is designed specifically for applications connected via <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/"><b><u>Cloudflare Tunnel</u></b></a> (<code>cloudflared</code>). This is currently the only supported off-ramp for routing by hostname.</p><p>Other traffic off-ramps, while fully supported for IP-based routing, are not yet compatible with this specific hostname-based feature. This includes using Magic WAN, WARP Connector, or WARP-to-WARP connections as the off-ramp to your private network. We are actively working to expand support for more on-ramps and off-ramps in the future, so stay tuned for more updates.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>By enabling routing by hostname directly within Cloudflare Tunnel, we’re making security policies simpler, more resilient, and more aligned with how modern applications are built. You no longer need to track ever-changing IP addresses. You can now build precise, per-resource authorization policies for HTTPS applications based on the one thing that should matter: the name of the service you want to connect to. This is a fundamental step in making a zero trust architecture intuitive and achievable for everyone.</p><p>This powerful capability is available today, built directly into Cloudflare Tunnel and free for all Cloudflare One customers.</p><p>Ready to leave IP Lists behind for good? Get started by exploring our <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/private-net/cloudflared/connect-private-hostname/"><u>developer documentation</u></a> to configure your first hostname route. If you're new to <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Cloudflare One</u></a>, you can sign up today and begin securing your applications and networks in minutes.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Tunnel]]></category>
            <category><![CDATA[SASE]]></category>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Cloudflare Gateway]]></category>
            <category><![CDATA[Egress]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[Access Control Lists (ACLs)]]></category>
            <category><![CDATA[Hostnames]]></category>
            <guid isPermaLink="false">gnroEH7P2oE00Ba0wJLHT</guid>
            <dc:creator>Nikita Cano</dc:creator>
            <dc:creator>Sharon Goldberg</dc:creator>
        </item>
        <item>
            <title><![CDATA[Analysis of the EPYC 145% performance gain in Cloudflare Gen 12 servers]]></title>
            <link>https://blog.cloudflare.com/analysis-of-the-epyc-145-performance-gain-in-cloudflare-gen-12-servers/</link>
            <pubDate>Tue, 15 Oct 2024 15:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare’s Gen 12 server is the most powerful and power efficient server that we have deployed to date. Through sensitivity analysis, we found that Cloudflare workloads continue to scale with higher core count and higher CPU frequency, as well as achieving a significant boost in performance with larger L3 cache per core. ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare's <a href="https://www.cloudflare.com/network"><u>network</u></a> spans more than 330 cities in over 120 countries, serving over 60 million HTTP requests per second and 39 million DNS queries per second on average. These numbers will continue to grow, and at an accelerating pace, as will Cloudflare’s infrastructure to support them. While we can continue to scale out by deploying more servers, it is also paramount for us to develop and deploy more performant and more efficient servers.</p><p>At the heart of each server is the processor (central processing unit, or CPU). Even though many aspects of a server rack can be redesigned to improve the cost to serve a request, CPU remains the biggest lever, as it is typically the primary compute resource in a server, and the primary enabler of new technologies.</p><p><a href="https://blog.cloudflare.com/gen-12-servers/"><u>Cloudflare’s 12th Generation server with AMD EPYC 9684-X (codenamed Genoa-X) is 145% more performant and 63% more efficient</u></a>. These are big numbers, but where do the performance gains come from? Cloudflare’s hardware system engineering team did a sensitivity analysis on three variants of 4th generation AMD EPYC processor to understand the contributing factors.</p><p>For the 4th generation AMD EPYC Processors, AMD offers three architectural variants: </p><ol><li><p>mainstream classic Zen 4 cores, codenamed Genoa</p></li><li><p>efficiency optimized dense Zen 4c cores, codenamed Bergamo</p></li><li><p>cache optimized Zen 4 cores with 3D V-cache, codenamed Genoa-X</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7hb5yDJksMIwbWVQcCzuoC/c085edca54d3820f7e93564791b0289a/image6.png" />
            
            </figure><p><sup>Figure 1 (from left to right): AMD EPYC 9654 (Genoa), AMD EPYC 9754 (Bergamo), AMD EPYC 9684X (Genoa-X)</sup></p><p>Key features common across the 4th Generation AMD EPYC processors:</p><ul><li><p>Up to 12x Core Complex Dies (CCDs)</p></li><li><p>Each core has a private 1MB L2 cache</p></li><li><p>The CCDs connect to memory, I/O, and each other through an I/O die</p></li><li><p>Configurable Thermal Design Power (cTDP) up to 400W</p></li><li><p>Support up to 12 channels of DDR5-4800 1DPC</p></li><li><p>Support up to 128 lanes PCIe Gen 5</p></li></ul><p>Classic Zen 4 Cores (Genoa):</p><ul><li><p>Each Core Complex (CCX) has 8x Zen 4 Cores (16x Threads)</p></li><li><p>Each CCX has a shared 32 MB L3 cache (4 MB/core)</p></li><li><p>Each CCD has 1x CCX</p></li></ul><p>Dense Zen 4c Cores (Bergamo):</p><ul><li><p>Each CCX has 8x Zen 4c Cores (16x Threads)</p></li><li><p>Each CCX has a shared 16 MB L3 cache (2 MB/core)</p></li><li><p>Each CCD has 2x CCX</p></li></ul><p>Classic Zen 4 Cores with 3D V-cache (Genoa-X):</p><ul><li><p>Each CCX has 8x Zen 4 Cores (16x Threads)</p></li><li><p>Each CCX has a shared 96MB L3 cache (12 MB/core)</p></li><li><p>Each CCD has 1x CCX</p></li></ul><p>For more information on 4th generation AMD EPYC Processors architecture, see: <a href="https://www.amd.com/system/files/documents/4th-gen-epyc-processor-architecture-white-paper.pdf"><u>https://www.amd.com/system/files/documents/4th-gen-epyc-processor-architecture-white-paper.pdf</u></a> </p><p>The following table is a summary of the specification of the AMD EPYC 7713 CPU in our <a href="https://blog.cloudflare.com/the-epyc-journey-continues-to-milan-in-cloudflares-11th-generation-edge-server/"><u>Gen 11 server</u></a> against the three CPU candidates, one from each variant of the 4th generation AMD EPYC Processors architecture:</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>CPU Model</strong></span></span></p>
                    </td>
                    <td>
                        <p><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span><span><strong><u>AMD EPYC 7713</u></strong></span></span></a></p>
                    </td>
                    <td>
                        <p><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span><span><strong><u>AMD EPYC 9654</u></strong></span></span></a></p>
                    </td>
                    <td>
                        <p><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span><span><strong><u>AMD EPYC 9754</u></strong></span></span></a></p>
                    </td>
                    <td>
                        <p><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span><span><strong><u>AMD EPYC 9684X</u></strong></span></span></a></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Series</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>Milan</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Genoa</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Bergamo</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Genoa-X</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong># of CPU Cores</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>64</span></span></p>
                    </td>
                    <td>
                        <p><span><span>96</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>128</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>96</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong># of Threads</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>128</span></span></p>
                    </td>
                    <td>
                        <p><span><span>192</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>256</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>192</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Base Clock</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.0 GHz</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.4 GHz</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.25 GHz</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.4 GHz</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>All Core Boost Clock</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>~2.7 GHz*</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>3.55 Ghz</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>3.1 Ghz</span></span></p>
                    </td>
                    <td>
                        <p><span><span>3.42 Ghz</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Total L3 Cache</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>256 MB</span></span></p>
                    </td>
                    <td>
                        <p><span><span>384 MB</span></span></p>
                    </td>
                    <td>
                        <p><span><span>256 MB</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>1152 MB</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>L3 cache per core</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>4 MB / core</span></span></p>
                    </td>
                    <td>
                        <p><span><span>4 MB / core</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2 MB / core</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>12 MB / core</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Maximum configurable TDP</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>240W</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>400W</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>400W</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>400W</strong></span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p><sup>* AMD EPYC 7713 all core boost clock is based on Cloudflare production data, not the official specification from AMD</sup></p>
    <div>
      <h2>cf_benchmark</h2>
      <a href="#cf_benchmark">
        
      </a>
    </div>
    <p>Readers may remember that Cloudflare introduced <a href="https://github.com/cloudflare/cf_benchmark"><u>cf_benchmark</u></a> when we evaluated <a href="https://blog.cloudflare.com/arm-takes-wing/"><u>Qualcomm's ARM chips</u></a>, using it as our first pass benchmark to shortlist <a href="https://blog.cloudflare.com/an-epyc-trip-to-rome-amd-is-cloudflares-10th-generation-edge-server-cpu/"><u>AMD’s Rome CPU for our Gen 10 servers</u></a> and to evaluate <a href="https://blog.cloudflare.com/arms-race-ampere-altra-takes-on-aws-graviton2/"><u>our chosen ARM CPU Ampere Altra Max against AWS Graviton 2</u></a>. Likewise, we ran cf_benchmark against the three candidate CPUs for our 12th Gen servers: AMD EPYC 9654 (Genoa), AMD EPYC 9754 (Bergamo), and AMD EPYC 9684X (Genoa-X). The majority of cf_benchmark workloads are compute bound, and given more cores or higher CPU frequency, they score better. The graph and the table below show the benchmark performance comparison of the three CPU candidates with Genoa 9654 as the baseline, where &gt; 1.00x indicates better performance.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6YNQ8YB5XV3fZEVpVFT7lM/2dadd7fda9832716f198dee2d5fbfd22/image5.png" />
          </figure><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>Genoa 9654 (baseline)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Bergamo 9754</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Genoa-X 9684X</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>openssl_pki</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.16x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.01x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>openssl_aead</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.20x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.01x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>luajit</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.86x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>brotli</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.11x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.98x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>gzip</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.87x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.01x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>go</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.09x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>Bergamo 9754 with 128 cores scores better in openssl_pki, openssl_aead, brotli, and go benchmark suites, and performs less favorably in luajit and gzip benchmark suites. Genoa-X 9684X (with significantly more L3 cache) doesn’t offer a significant boost in performance for these compute-bound benchmarks.</p><p>These benchmarks are representative of some of the common workloads Cloudflare runs, and are useful in identifying software scaling issues, system configuration bottlenecks, and the impact of CPU design choices on workload-specific performance. However, the benchmark suite is not an exhaustive list of all workloads Cloudflare runs in production, and in reality, the workloads included in the benchmark suites are almost certainly not the exclusive workload running on the CPU. In short, though benchmark results can be informative, they do not represent a good indication of production performance when a mix of these workloads run on the same processor.</p>
    <div>
      <h2>Performance simulation</h2>
      <a href="#performance-simulation">
        
      </a>
    </div>
    <p>To get an early indication of production performance, Cloudflare has an internal performance simulation tool that exercises our software stack to fetch a fixed asset repeatedly. The simulation tool can be configured to fetch a specified fixed-size asset and configured to include or exclude services like WAF or Workers in the request path. Below, we show the simulated performance between the three CPUs for an asset size of 10 KB, where &gt;1.00x indicates better performance.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>Milan 7713</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Genoa 9654</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Bergamo 9754</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Genoa-X 9684X</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Lab simulation performance multiplier</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.20x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.95x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.75x</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>Based on these results, Bergamo 9754, which has the highest core count, but smallest L3 cache per core, is least performant among the three candidates, followed by Genoa 9654. The Genoa-X 9684X with the largest L3 cache per core is the most performant. This data suggests that our software stack is very sensitive to L3 cache size, in addition to core count and CPU frequency. This is interesting and worth a deep dive into a sensitivity analysis of our workload against a few (high level) CPU design points, especially core scaling, frequency scaling, and L2/L3 cache sizes scaling.</p>
    <div>
      <h2>Sensitivity analysis</h2>
      <a href="#sensitivity-analysis">
        
      </a>
    </div>
    
    <div>
      <h3>Core sensitivity</h3>
      <a href="#core-sensitivity">
        
      </a>
    </div>
    <p>Number of cores is the headline specification that practically everyone talks about, and one of the easiest improvements CPU vendors can make to increase performance per socket. The AMD Genoa 9654 has 96 cores, 50% more than the 64 cores available on the AMD Milan 7713 CPUs that we used in our Gen 11 servers. Is more always better? Does Cloudflare’s primary workload scale with core count and effectively utilize all available cores?</p><p>The figure and table below shows the result of a core scaling experiment performed on an AMD Genoa 9654 configured with 96 cores, 80 cores, 64 cores, and 48 cores, which was done by incrementally disabling 2x CCD (8 cores/CCD) at each step. The result is GREAT, as Cloudflare’s simulated primary workload scales linearly with core count on AMD Genoa CPUs.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/f1w9JxhP8aLIoFONMq5tr/b3269fc121b9bfd8f4d9a3394a73c599/image4.png" />
          </figure><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>Core count</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Core increase</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Performance increase</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>48</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>64</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.33x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.39x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>80</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.67x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.71x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>96</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.05x</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
    <div>
      <h3>TDP sensitivity</h3>
      <a href="#tdp-sensitivity">
        
      </a>
    </div>
    <p><a href="https://blog.cloudflare.com/thermal-design-supporting-gen-12-hardware-cool-efficient-and-reliable/"><u>Thermal Design Power (TDP), is the maximum amount of heat generated by a CPU that the cooling system is designed to dissipate</u></a>, but more commonly refers to the power consumption of the processor under the maximum theoretical loads. AMD Genoa 9654’s default TDP is 360W, but can be configured up to 400W TDP. Is more always better? Does Cloudflare continue to see meaningful performance improvement up to 400W, or does performance stagnate at some point?</p><p>The chart below shows the result of sweeping the TDP of the AMD Genoa 9654 (in power determinism mode) from 240W to 400W. (Note: x-axis step size is not linear).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/216pSnjCDQOLwtRS822ugu/17dc1885d34ae10a8a1cecb6584b7dd7/image3.png" />
          </figure><p>Cloudflare’s simulated primary workload continues to see incremental performance improvements up to the maximum configurable 400W, albeit at a less favorable perf/watt ratio.</p><p>Looking at TDP sensitivity data is a quick and easy way to identify if performance stagnates at some power point, but what does power sensitivity actually measure? There are several factors contributing to CPU power consumption, but let's focus on one of the primary factors: dynamic power consumption. Dynamic power consumption is approximately <i>CV</i><i><sup>2</sup></i><i>f</i>, where C is the switched load capacitance, V is the regulated voltage, and f is the frequency. In modern processors like the AMD Genoa 9654, the CPU dynamically scales its voltage along with frequency, so theoretically, CPU dynamic power is loosely proportional to f<sup>3</sup>. In other words, measuring TDP sensitivity is measuring the frequency sensitivity of a workload. Does the data agree? Yes!</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>cTDP</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>All core boost frequency (GHz)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Perf (rps) / baseline</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>240</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.47</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.78x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>280</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.75</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.87x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>320</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.93</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.93x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>340</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>3.13</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.97x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>360</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>3.3</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>380</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>3.4</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.03x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>390</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>3.465</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.04x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>400</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>3.55</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.05x</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
    <div>
      <h3>Frequency sensitivity</h3>
      <a href="#frequency-sensitivity">
        
      </a>
    </div>
    <p>Instead of relying on an indirect measure through the TDP, let’s measure frequency sensitivity directly by sweeping the maximum boost frequency.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2oobymubOiZrpJjSExQxxo/a02985506ecc803e8b09d7f999b86c55/image2.png" />
          </figure><p>At above 3GHz, the data shows that Cloudflare’s primary workload sees roughly 2% incremental improvement for every 0.1GHz all core average frequency increment. We hit the 400W power cap at 3.545GHz. This is notably higher than the typical all core boost frequency that Cloudflare Gen 11 servers with AMD Milan 7713 at 2.7GHz see in production, or at 2.4GHz in our performance simulation, which is amazing!</p>
    <div>
      <h3>L3 cache size sensitivity</h3>
      <a href="#l3-cache-size-sensitivity">
        
      </a>
    </div>
    <p>What about L3 cache size sensitivity? L3 cache size is one of the primary design choices and major differences between the trio of Genoa, Bergamo, and Genoa-X. Genoa 9654 has 4 MB L3/core, Bergamo 9754 has 2 MB L3/core, and Genoa-X has 12 MB L3/core. L3 cache is the last and largest “memory” bank on-chip before having to access memory on DIMMs outside the chip that would take significantly more CPU cycles.</p><p>We ran an experiment on the Genoa 9654 to check how performance scales with L3 cache size. L3 cache size per core is reduced through MSR writes (but could also be done using <a href="https://www.intel.com/content/www/us/en/developer/articles/technical/use-intel-resource-director-technology-to-allocate-last-level-cache-llc.htm"><u>Intel RDT</u></a>) and L3 cache per core is increased by disabling physical cores in a CCD (which reduces the number of cores sharing the fixed size 32 MB L3 cache per CCD effectively growing the L3 cache per core). Below is the result of the experiment, where &gt;1.00x indicates better performance:</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>L3 cache size increase vs baseline 4MB per core</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>0.25x</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>0.5x</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>0.75x</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>1x</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>1.14x</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>1.33x</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>1.60x</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>2.00x</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>rps/core / baseline</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.67x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.78x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.89x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.00x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.08x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.15x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.25x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.31x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>L3 cache miss rate per CCD</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>56.04%</span></span></p>
                    </td>
                    <td>
                        <p><span><span>39.15%</span></span></p>
                    </td>
                    <td>
                        <p><span><span>30.37%</span></span></p>
                    </td>
                    <td>
                        <p><span><span>23.55%</span></span></p>
                    </td>
                    <td>
                        <p><span><span>22.39%</span></span></p>
                    </td>
                    <td>
                        <p><span><span>19.73%</span></span></p>
                    </td>
                    <td>
                        <p><span><span>16.94%</span></span></p>
                    </td>
                    <td>
                        <p><span><span>14.28%</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/cjfPuhUHBfABvQ5hJGchb/e470ca9bb45b0c69b5069556ce866647/image7.png" />
          </figure><p>Even though the expectation was that the impact of a different L3 cache size gets diminished by the faster DDR5 and larger memory bandwidth, Cloudflare’s simulated primary workload is quite sensitive to L3 cache size. The L3 cache miss rate dropped from 56% with only 1 MB L3 per core, to 14.28% with 8 MB L3/core. Changing the L3 cache size by 25% affects the performance by approximately 11%, and we continue to see performance increase to 2x L3 cache size, though the performance increase starts to diminish when we get to 2x L3 cache per core.</p><p>Do we see the same behavior when comparing Genoa 9654, Bergamo 9754 and Genoa-X 9684X? We ran an experiment comparing the impact of L3 cache size, controlling for core count and all core boost frequency, and we also saw significant deltas. Halving the L3 cache size from 4 MB/core to 2 MB/core reduces performance by 24%, roughly matching the experiment above. However, increasing the cache 3x from 4 MB/core to 12 MB/core only increases performance by 25%, less than the indication provided by previous experiments. This is likely because the performance gain we saw on experiment result above could be partially attributed to less cache contention due to reduced number of cores based on how we set up the test. Nevertheless, these are significant deltas!</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>L3/core</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>2MB/core</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>4MB/core</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>12MB/core</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Perf (rps) / baseline</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.76x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.25x</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
    <div>
      <h3>Putting it all together</h3>
      <a href="#putting-it-all-together">
        
      </a>
    </div>
    <p>The table below summarizes how each factor from sensitivity analysis above contributes to the overall performance gain. There are an additional 6% to 14% of unaccounted performance improvement that are contributed by other factors like larger L2 cache, higher memory bandwidth, and miscellaneous CPU architecture changes that improve IPC.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>Milan</strong></span></span></p>
                        <p><span><span><strong>7713</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Genoa</strong></span></span></p>
                        <p><span><span><strong>9654</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Bergamo</strong></span></span></p>
                        <p><span><span><strong>9754</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Genoa-X</strong></span></span></p>
                        <p><span><span><strong>9684X</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Lab simulation performance multiplier</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.2x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.95x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.75x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Performance multiplier due to Core scaling</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.5x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.5x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Performance multiplier due to Frequency scaling</strong></span></span></p>
                        <p><span><span><strong>(*Note: Milan 7713 all core frequency is ~2.4GHz when running simulated workload at 100% CPU utilization)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.32x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.21x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.29x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Performance multiplier due to L3 cache size scaling</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>0.76x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.25x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Performance multiplier due to other factors like larger L2 cache, higher memory bandwidth, miscellaneous CPU architecture changes that improve IPC</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.11x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.06x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.14x</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
    <div>
      <h2>Performance evaluation in production</h2>
      <a href="#performance-evaluation-in-production">
        
      </a>
    </div>
    <p>How do these CPU candidates perform with real-world traffic and an actual production workload mix? The table below summarizes the performance of the three CPUs in lab simulation and in production. Genoa-X 9684X continues to outperform in production.</p><p>In addition, the Gen 12 server equipped with Genoa-X offered outstanding performance but only consumed 1.5x more power per system than our Gen 11 server with Milan 7713. In other words, we see a 63% increase in performance per watt. Genoa-X 9684X provides the best TCO improvement among the 3 options, and was ultimately chosen as the CPU for our Gen 12 server.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>Milan 7713</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Genoa 9654</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Bergamo 9754</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Genoa-X 9684X</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Lab simulation performance multiplier</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.2x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.95x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.75x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Production performance multiplier</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.15x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>2.45x</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Production performance per watt multiplier</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>1x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.33x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.38x</span></span></p>
                    </td>
                    <td>
                        <p><span><span>1.63x</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>The Gen 12 server with AMD Genoa-X 9684X is the most powerful and the most power efficient server Cloudflare has built to date. It serves as the underlying platform for all the incredible services that Cloudflare offers to our customers globally, and will help power the growth of Cloudflare infrastructure for the next several years with improved cost structure. </p><p>Hardware engineers at Cloudflare work closely with our infrastructure engineering partners and externally with our vendors to design and develop world-class servers to best serve our customers. </p><p><a href="https://www.cloudflare.com/careers/jobs/"><u>Come join us</u></a> at Cloudflare to help build a better Internet!</p> ]]></content:encoded>
            <category><![CDATA[AMD]]></category>
            <category><![CDATA[EPYC]]></category>
            <category><![CDATA[Hardware]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <guid isPermaLink="false">1Sj8x3oCQRSZqU8oNgMmkE</guid>
            <dc:creator>JQ Lau</dc:creator>
            <dc:creator>Syona Sarma</dc:creator>
        </item>
        <item>
            <title><![CDATA[Thermal design supporting Gen 12 hardware: cool, efficient and reliable]]></title>
            <link>https://blog.cloudflare.com/thermal-design-supporting-gen-12-hardware-cool-efficient-and-reliable/</link>
            <pubDate>Mon, 07 Oct 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ Great thermal solutions play a crucial role in hardware reliability and performance. Gen 12 servers have implemented an exhaustive thermal analysis to ensure optimal operations within a wide variety of temperature conditions and use cases. By implementing new design and control features for improved power efficiency on the compute nodes we also enabled the support of powerful accelerators to serve our customers. ]]></description>
            <content:encoded><![CDATA[ <p>In the dynamic evolution of AI and cloud computing, the deployment of efficient and reliable hardware is critical. As we roll out our <a href="https://blog.cloudflare.com/cloudflare-gen-12-server-bigger-better-cooler-in-a-2u1n-form-factor/"><u>Gen 12 hardware</u></a> across hundreds of cities worldwide, the challenge of maintaining optimal thermal performance becomes essential. This blog post provides a deep dive into the robust thermal design that supports our newest Gen 12 server hardware, ensuring it remains reliable, efficient, and cool (pun very much intended).</p>
    <div>
      <h2>The importance of thermal design for hardware electronics</h2>
      <a href="#the-importance-of-thermal-design-for-hardware-electronics">
        
      </a>
    </div>
    <p>Generally speaking, a server has five core resources: CPU (computing power), RAM (short term memory), SSD (long term storage), NIC (Network Interface Controller, connectivity beyond the server), and GPU (for AI/ML computations). Each of these components can withstand different temperature limits based on their design, materials, location within the server, and most importantly, the power they are designed to work at. This final criteria is known as thermal design power (TDP).</p><p>The reason why TDP is so important is closely related to the <a href="https://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Supplemental_Modules_(Physical_and_Theoretical_Chemistry)/Thermodynamics/The_Four_Laws_of_Thermodynamics/First_Law_of_Thermodynamics"><u>first law of thermodynamics</u></a>, which states that energy cannot be created or destroyed, only transformed. In semiconductors, electrical energy is converted into heat, and TDP measures the maximum heat output that needs to be managed to ensure proper functioning.</p><p>Back in December 2023, we <a href="https://blog.cloudflare.com/cloudflare-gen-12-server-bigger-better-cooler-in-a-2u1n-form-factor"><u>talked</u></a> about our decision to move to a 2U form factor, doubling the height of the server chassis to optimize rack density and increase cooling capacity. In this post, we want to share more details on how this additional space is being used to improve performance and reliability supporting up to three times more total system power.</p>
    <div>
      <h2>Standardization</h2>
      <a href="#standardization">
        
      </a>
    </div>
    <p>In order to support our multi-vendor strategy that mitigates supply chain risks ensuring continuity for our infrastructure, we introduced our own thermal specification to standardize thermal design and system performance. At Cloudflare, we find significant value in building customized hardware optimized for our unique workloads and applications, and we are very fortunate to partner with great hardware vendors who understand and support this vision. However, partnering with multiple vendors can introduce design variables that Cloudflare then controls for consistency within a hardware generation. Some of the most relevant requirements we include in our thermal specification are:</p><ul><li><p><b>Ambient conditions:</b> Given our globally distributed footprint with presence <a href="https://www.cloudflare.com/network/"><u>in over 330 cities</u></a>, environmental conditions can vary significantly.  Hence, servers in our fleet can experience a wide range of temperatures, typically ranging between 28 to 35°C. Therefore, our systems are designed and validated to operate with no issue over temperature ranges from 5 to 40°C (following the <a href="https://xp20.ashrae.org/datacom1_4th/ReferenceCard.pdf"><u>ASHRAE A3</u></a> definition).</p></li><li><p><b>Thermal margins:</b> Cloudflare designs with clear requirements for temperature limits on different operating conditions, simulating peak stress, average workloads, and idle conditions. This allows Cloudflare to validate that the system won’t experience thermal throttling, which is a power management control mechanism used to protect electronics from high temperatures.</p></li><li><p><b>Fan failure support to increase system reliability:</b> This new generation of servers is 100% air cooled. As such, the algorithm that controls fan speed based on critical component temperature needs to be optimized to support continuous operation over the server life cycle. Even though fans are designed with a high (up to seven years) mean time between failure (MTBF), we know fans can and do fail. Losing a server's worth of capacity due to thermal risks caused by a single fan failure is expensive. Cloudflare requires the server to continue to operate with no issue even in the event of one fan failure. Each Gen 12 server contains four axial fans providing the extra cooling capacity to prevent failures.</p></li><li><p><b>Maximum power used to cool the system:</b> Because our goal is to serve more Internet traffic using less power, we aim to ensure the hardware we deploy is using power efficiently. Great thermal management must consider the overall cost of cooling relative to the total system power input. It is inefficient to burn power consumption on cooling instead of compute. Thermal solutions should look at the hardware architecture holistically and implement mechanical modifications to the system design in order to optimize airflow and cooling capacity before considering increasing fan speed, as fan power consumption proportionally scales to the cube of its rotational speed. (For example, running the fans at twice (2x) the rotational speed would consume 8x more power,)</p></li></ul>
    <div>
      <h2>System layout</h2>
      <a href="#system-layout">
        
      </a>
    </div>
    <p>Placing each component strategically within the server will also influence the thermal performance of the system. For this generation of servers, we made several internal layout decisions, where the final component placement takes into consideration optimal airflow patterns, preventing pre-heated air from affecting equipment in the rear end of the chassis. </p><p>Bigger and more powerful fans were selected in order to take advantage of the additional volume available in a 2U form factor. Growing from 40 to 80 millimeters, a single fan can provide up to four times more airflow. Hence, bigger fans can run at slower speeds to provide the required airflow to cool down the same components, significantly improving power efficiency. </p><p>The Extended Volume Air Cooled (EVAC) heatsink was optimized for Gen 12 hardware, and is designed with increased surface area to maximize heat transfer. It uses heatpipes to move the heat effectively away from the CPU to the extended fin region that sits immediately in front of the fans as shown in the picture below.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5jC8iqP4QeyZ6EfJsQo9Fi/c5ff4981a3f5bb70dfa2576f864df5e1/BLOG-2444_2.png" />
          </figure><p><sub><i>EVAC heatsink installed in one of our Gen 12 servers. The extended fin region sits right in front of the axial fans. (Photo courtesy of vendor.)</i></sub></p><p>The combination of optimized heatsink design and selection of high-performing fans is expected to significantly reduce the power used for cooling the system. These savings will vary depending on ambient conditions and system stress, but under a typical stress scenario at 25°C ambient temperature, power savings could be as much as 50%.</p><p>Additionally, we ensured that the critical components in the rear section of the system, such as the NIC and <a href="https://blog.cloudflare.com/introducing-the-project-argus-datacenter-ready-secure-control-module-design-specification/"><u>DC-SCM</u></a>, were positioned away from the heatsink to promote the use of cooler available air within the system. Learning from past experience, the NIC temperature is monitored by the Baseboard Management Controller (BMC), which provides remote access to the server for administrative tasks and monitoring health metrics. Because the NIC has a built-in feature to protect itself from overheating by going into standby mode when the chip temperature reaches critical limits, it is important to provide air at the lowest possible temperature. As a reference, the temperature of the air right behind the CPU heatsink can reach 70°C or higher, whereas behind the memory banks, it would reach about 55°C under the same circumstances. The image below shows the internal placement of the most relevant components considered while building the thermal solution.  </p><p>Using air as cold as possible to cool down any component will increase overall system reliability, preventing potential thermal issues and unplanned system shutdowns. That’s why our fan algorithm uses every thermal sensor available to ensure thermal health while using the minimum possible amount of energy.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4O9x7Xe3PxR1iU1gNCEHIg/86a8e4a89d069f3f8d41c5c0a83ea00b/BLOG-2444_3.png" />
          </figure><p><sub><i>Components inside the compute server from one of our vendors, viewed from the rear of the server. (Illustration courtesy of vendor.)</i></sub></p><table><tr><td><p>1️. Host Processor Module (HPM)</p></td><td><p>8. Power Distribution Board (PDB)</p></td></tr><tr><td><p>2️. DIMMs (x12) </p></td><td><p>9. GPUs (up to 2)</p></td></tr><tr><td><p>3️. CPU (under CPU heatsink)</p></td><td><p>10. GPU riser card</p></td></tr><tr><td><p>4. CPU heatsink</p></td><td><p>11. GPU riser cage</p></td></tr><tr><td><p>5. System fans (x4: 80mm, dual rotor)</p></td><td><p>12. Power Supply Units, PSUs (x2)</p></td></tr><tr><td><p>6. Bracket with power button and intrusion switch</p></td><td><p>13. DC-SCM 2.0 module</p></td></tr><tr><td><p>7. E1.S SSD </p></td><td><p>14. OCP 3.0 module</p></td></tr></table>
    <div>
      <h2>Making hardware flexible</h2>
      <a href="#making-hardware-flexible">
        
      </a>
    </div>
    <p>With the same thought process of optimizing system layout, we decided to use a PCIe riser above the Power Supply Units (PSUs), enabling the support of up to 2x single wide GPU add-in cards. Once again, the combination of high-performing fans with strategic system architecture gave us the capability to add up to 400W to the original power envelope and incorporate accelerators used in our new and recently announced AI and ML features. </p><p>Hardware lead times are typically long, certainly when compared to software development. Therefore, a reliable strategy for hardware flexibility is imperative in this rapidly changing environment for specialized computing. When we started evaluating Gen 12 hardware architecture and early concept design, we didn’t know for sure we would be needing to implement GPUs for this generation, let alone how many or which type. However, highly efficient design and intentional due diligence analyzing hypothetical use cases help ensure flexibility and scalability of our thermal solution, supporting new requirements from our product teams, and ultimately providing the best solutions to our customers.</p>
    <div>
      <h2>Rack-integrated solutions</h2>
      <a href="#rack-integrated-solutions">
        
      </a>
    </div>
    <p>We are also increasing the volume of integrated racks shipped to our global colocation facilities. Due to the expected increase in rack shipments, it is now more important that we also increase the corresponding mechanical and thermal test coverage from system level (L10) to rack level (L11).</p><p>Since our servers don’t use the full depth of a standard rack in order to leave room for cable management and Power Distribution Units (PDUs), there is another fluid mechanics factor that we need to consider to improve our holistic solution. </p><p>We design our hardware based on one of the most typical data center architectures, which have alternating cold and hot aisles. Fans at the front of the server pull in cold air from the corresponding aisle, the air then flows through the server, cooling down the internal components and the hot air is exhausted into the adjacent aisle, as illustrated in the diagram below.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3rxT3iQ1d2ObzYRzH5tMVc/78e372f64e79e0c1c773b9d108514d01/BLOG-2444_4.png" />
          </figure><p><sub><i>A conventional air-flow diagram of a standard server where the cold air enters from the front of the server and hot air leaves through the rear side of the system. </i></sub></p><p>In fluid dynamics, the minimum effort principle will drive fluids (air in this case) to move where there is less resistance — i.e. wherever it takes less energy to get from point A to point B. With the help of fans forcing air to flow inside the server and pushing it through the rear, the more crowded systems will naturally get less air than those with more space where the air can move around. Since we need more airflow to pass through the systems with higher power demands, we’ve also ensured that the rack configuration keeps these systems in the bottom of the rack where air tends to be at a lower temperature. Remember that heat rises, so even within the cold aisle, there can be a small but important temperature difference between the bottom and the top section of the rack. It is our duty as hardware engineers to use thermodynamics in our favor. </p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Our new generation of hardware is live in our data centers and it represents a significant leap forward in our efficiency, reliability, and sustainability commitments. Combining optimal heat sink design, thoughtful fan selection, and meticulous system layout and hardware architecture, we are confident that these new servers will operate smoothly in our global network with diverse environmental conditions, maintaining optimal performance of our Connectivity Cloud. </p><p><a href="https://www.cloudflare.com/careers/jobs/"><u>Come join us</u></a> at Cloudflare to help deliver a better Internet!</p> ]]></content:encoded>
            <category><![CDATA[Hardware]]></category>
            <category><![CDATA[Edge]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <guid isPermaLink="false">5GRHH8385Hxg3UHjINuHz8</guid>
            <dc:creator>Leslye Paniagua</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare’s 12th Generation servers — 145% more performant and 63% more efficient]]></title>
            <link>https://blog.cloudflare.com/gen-12-servers/</link>
            <pubDate>Wed, 25 Sep 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare is thrilled to announce the general deployment of our next generation of server — Gen 12 powered by AMD Genoa-X processors. This new generation of server focuses on delivering exceptional performance across all Cloudflare services, enhanced support for AI/ML workloads, significant strides in power efficiency, and improved security features. ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare is thrilled to announce the general deployment of our next generation of servers — Gen 12 powered by AMD EPYC 9684X (code name “Genoa-X”) processors. This next generation focuses on delivering exceptional performance across all Cloudflare services, enhanced support for AI/ML workloads, significant strides in power efficiency, and improved security features.</p><p>Here are some key performance indicators and feature improvements that this generation delivers as compared to the <a href="https://blog.cloudflare.com/the-epyc-journey-continues-to-milan-in-cloudflares-11th-generation-edge-server/"><u>prior generation</u></a>: </p><p>Beginning with performance, with close engineering collaboration between Cloudflare and AMD on optimization, Gen 12 servers can serve more than twice as many requests per second (RPS) as Gen 11 servers, resulting in lower Cloudflare infrastructure build-out costs.</p><p>Next, our power efficiency has improved significantly, by more than 60% in RPS per watt as compared to the prior generation. As Cloudflare continues to expand our infrastructure footprint, the improved efficiency helps reduce Cloudflare’s operational expenditure and carbon footprint as a percentage of our fleet size.</p><p>Third, in response to the growing demand for AI capabilities, we've updated the thermal-mechanical design of our Gen 12 server to support more powerful GPUs. This aligns with the <a href="https://www.cloudflare.com/lp/pg-ai/?utm_medium=cpc&amp;utm_source=google&amp;utm_campaign=2023-q4-acq-gbl-developers-wo-ge-general-paygo_mlt_all_g_search_bg_exp__dev&amp;utm_content=workers-ai&amp;gad_source=1&amp;gclid=CjwKCAjwl6-3BhBWEiwApN6_kjigJdDvEYqHPYi8tdXuTe4APbqX923v-CBjpGiAVwITNhp8GrW3ARoCyJ4QAvD_BwE&amp;gclsrc=aw.ds"><u>Workers AI</u></a> objective to support larger large language models and increase throughput for smaller models. This enhancement underscores our ongoing commitment to advancing AI inference capabilities</p><p>Fourth, to underscore our security-first position as a company, we've integrated hardware <a href="https://trustedcomputinggroup.org/about/what-is-a-root-of-trust-rot/"><u>root of trust</u></a> (HRoT) capabilities to ensure the integrity of boot firmware and board management controller firmware. Continuing to embrace open standards, the baseboard management and security controller (Data Center Secure Control Module or <a href="https://drive.google.com/file/d/13BxuseSrKo647hjIXjp087ei8l5QQVb0/view"><u>OCP DC-SCM</u></a>) that we’ve designed into our systems is modular and vendor-agnostic, enabling a unified <a href="https://www.openbmc.org/"><u>openBMC</u></a> image, quicker prototyping, and allowing for reuse.</p><p>Finally, given the increasing importance of supply assurance and reliability in infrastructure deployments, our approach includes a robust multi-vendor strategy to mitigate supply chain risks, ensuring continuity and resiliency of our infrastructure deployment.</p><p>Cloudflare is dedicated to constantly improving our server fleet, empowering businesses worldwide with enhanced performance, efficiency, and security.</p>
    <div>
      <h2>Gen 12 Servers </h2>
      <a href="#gen-12-servers">
        
      </a>
    </div>
    <p>Let's take a closer look at our Gen 12 server. The server is powered by a 4th generation AMD EPYC Processor, paired with 384 GB of DDR5 RAM, 16 TB of NVMe storage, a dual-port 25 GbE NIC, and two 800 watt power supply units.</p>
<div><table><thead>
  <tr>
    <th><span>Generation</span></th>
    <th><span>Gen 12 Compute</span></th>
    <th><span>Previous Gen 11 Compute</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Form Factor</span></td>
    <td><span>2U1N - Single socket</span></td>
    <td><span>1U1N - Single socket</span></td>
  </tr>
  <tr>
    <td><span>Processor</span></td>
    <td><span>AMD EPYC 9684X Genoa-X 96-Core Processor</span></td>
    <td><span>AMD EPYC 7713 Milan 64-Core Processor</span></td>
  </tr>
  <tr>
    <td><span>Memory</span></td>
    <td><span>384GB of DDR5-4800</span><br /><span>x12 memory channel</span></td>
    <td><span>384GB of DDR4-3200</span><br /><span>x8 memory channel</span></td>
  </tr>
  <tr>
    <td><span>Storage</span></td>
    <td><span>x2 E1.S NVMe</span><br /><span>Samsung PM9A3 7.68TB / Micron 7450 Pro 7.68TB</span></td>
    <td><span>x2 M.2 NVMe</span><br /><span>2x Samsung PM9A3 x 1.92TB</span></td>
  </tr>
  <tr>
    <td><span>Network</span></td>
    <td><span>Dual 25 Gbe OCP 3.0 </span><br /><span>Intel Ethernet Network Adapter E810-XXVDA2 / NVIDIA Mellanox ConnectX-6 Lx</span></td>
    <td><span>Dual 25 Gbe OCP 2.0</span><br /><span>Mellanox ConnectX-4 dual-port 25G</span></td>
  </tr>
  <tr>
    <td><span>System Management</span></td>
    <td><span>DC-SCM 2.0</span><br /><span>ASPEED AST2600 (BMC) + AST1060 (HRoT)</span></td>
    <td><span>ASPEED AST2500 (BMC)</span></td>
  </tr>
  <tr>
    <td><span>Power Supply</span></td>
    <td><span>800W - Titanium Grade</span></td>
    <td><span>650W - Titanium Grade</span></td>
  </tr>
</tbody></table></div>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ywinOgSFpevEcQSZLhESv/b61d70a1504b4d873d0bbf2e83221bf6/BLOG-2116_2.png" />
          </figure><p><sup><i>Cloudflare Gen 12 server</i></sup></p>
    <div>
      <h3>CPU</h3>
      <a href="#cpu">
        
      </a>
    </div>
    <p>During the design phase, we conducted an extensive survey of the CPU landscape. These options offer valuable choices as we consider how to shape the future of Cloudflare's server technology to match the needs of our customers. We evaluated many candidates in the lab, and short-listed three standout CPU candidates from the 4th generation AMD EPYC Processor lineup: Genoa 9654, Bergamo 9754, and Genoa-X 9684X for production evaluation. The table below summarizes the differences in <a href="https://www.amd.com/content/dam/amd/en/documents/products/epyc/epyc-9004-series-processors-data-sheet.pdf"><u>specifications</u></a> of the short-listed candidates for Gen 12 servers against the AMD EPYC 7713 used in our Gen 11 servers. Notably, all three candidates offer significant increase in core count and marked increase in all core boost clock frequency.</p>
<div><table><thead>
  <tr>
    <th><span>CPU Model</span></th>
    <th><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span>AMD EPYC 7713</span></a></th>
    <th><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span>AMD EPYC 9654</span></a></th>
    <th><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span>AMD EPYC 9754</span></a></th>
    <th><a href="https://www.amd.com/en/products/specifications/server-processor.html"><span>AMD EPYC 9684X</span></a></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>Series</span></td>
    <td><span>Milan</span></td>
    <td><span>Genoa</span></td>
    <td><span>Bergamo</span></td>
    <td><span>Genoa-X</span></td>
  </tr>
  <tr>
    <td><span># of CPU Cores</span></td>
    <td><span>64</span></td>
    <td><span>96</span></td>
    <td><span>128</span></td>
    <td><span>96</span></td>
  </tr>
  <tr>
    <td><span># of Threads</span></td>
    <td><span>128</span></td>
    <td><span>192</span></td>
    <td><span>256</span></td>
    <td><span>192</span></td>
  </tr>
  <tr>
    <td><span>Base Clock</span></td>
    <td><span>2.0 GHz</span></td>
    <td><span>2.4 GHz</span></td>
    <td><span>2.25 GHz</span></td>
    <td><span>2.4 GHz</span></td>
  </tr>
  <tr>
    <td><span>Max Boost Clock</span></td>
    <td><span>3.67 GHz</span></td>
    <td><span>3.7 Ghz</span></td>
    <td><span>3.1 Ghz</span></td>
    <td><span>3.7 Ghz</span></td>
  </tr>
  <tr>
    <td><span>All Core Boost Clock</span></td>
    <td><span>2.7 GHz *</span></td>
    <td><span>3.55 GHz</span></td>
    <td><span>3.1GHz</span></td>
    <td><span>3.42 GHz</span></td>
  </tr>
  <tr>
    <td><span>Total L3 Cache</span></td>
    <td><span>256 MB</span></td>
    <td><span>384 MB</span></td>
    <td><span>256 MB</span></td>
    <td><span>1152 MB</span></td>
  </tr>
  <tr>
    <td><span>L3 cache per core</span></td>
    <td><span>4MB / core</span></td>
    <td><span>4MB / core</span></td>
    <td><span>2MB / core</span></td>
    <td><span>12MB / core</span></td>
  </tr>
  <tr>
    <td><span>Maximum configurable TDP</span></td>
    <td><span>240W</span></td>
    <td><span>400W</span></td>
    <td><span>400W</span></td>
    <td><span>400W</span></td>
  </tr>
</tbody></table></div><p><sub>*Note: AMD EPYC 7713 all core boost clock frequency of 2.7 GHz is not an official specification of the CPU but based on data collected at Cloudflare production fleet.</sub></p><p>During production evaluation, the configuration of all three CPUs were optimized to the best of our knowledge, including thermal design power (TDP) configured to 400W for maximum performance. The servers are set up to run the same processes and services like any other server we have in production, which makes for a great side-by-side comparison. </p>
<div><table><thead>
  <tr>
    <th></th>
    <th><span>Milan 7713</span></th>
    <th><span>Genoa 9654</span></th>
    <th><span>Bergamo 9754</span></th>
    <th><span>Genoa-X 9684X</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Production performance (request per second) multiplier</span></td>
    <td><span>1x</span></td>
    <td><span>2x</span></td>
    <td><span>2.15x</span></td>
    <td><span>2.45x</span></td>
  </tr>
  <tr>
    <td><span>Production efficiency (request per second per watt) multiplier</span></td>
    <td><span>1x</span></td>
    <td><span>1.33x</span></td>
    <td><span>1.38x</span></td>
    <td><span>1.63x</span></td>
  </tr>
</tbody></table></div>
    <div>
      <h4>AMD EPYC Genoa-X in Cloudflare Gen 12 server</h4>
      <a href="#amd-epyc-genoa-x-in-cloudflare-gen-12-server">
        
      </a>
    </div>
    <p>Each of these CPUs outperforms the previous generation of processors by at least 2x. AMD EPYC 9684X Genoa-X with 3D V-cache technology gave us the greatest performance improvement, at 2.45x, when compared against our Gen 11 servers with AMD EPYC 7713 Milan.</p><p>Comparing the performance between Genoa-X 9684X and Genoa 9654, we see a ~22.5% performance delta. The primary difference between the two CPUs is the amount of L3 cache available on the CPU. Genoa-X 9684X has 1152 MB of L3 cache, which is three times the Genoa 9654 with 384 MB of L3 cache. Cloudflare workloads benefit from more low level cache being accessible and avoid the much larger latency penalty associated with fetching data from memory.</p><p>Genoa-X 9684X CPU delivered ~22.5% improved performance consuming the same amount of 400W power compared to Genoa 9654. The 3x larger L3 cache does consume additional power, but only at the expense of sacrificing 3% of highest achievable all core boost frequency on Genoa-X 9684X, a favorable trade-off for Cloudflare workloads.</p><p>More importantly, Genoa-X 9684X CPU delivered 145% performance improvement with only 50% system power increase, offering a 63% power efficiency improvement that will help drive down operational expenditure tremendously. It is important to note that even though a big portion of the power efficiency is due to the CPU, it needs to be paired with optimal thermal-mechanical design to realize the full benefit. Earlier last year, <a href="https://blog.cloudflare.com/cloudflare-gen-12-server-bigger-better-cooler-in-a-2u1n-form-factor/"><u>we made the thermal-mechanical design choice to double the height of the server chassis to optimize rack density and cooling efficiency across our global data centers</u></a>. We estimated that moving from 1U to 2U would reduce fan power by 150W, which would decrease system power from 750 watts to 600 watts. Guess what? We were right — a Gen 12 server consumes 600 watts per system at a typical ambient temperature of 25°C.</p><p>While high performance often comes at a higher price, fortunately AMD EPYC 9684X offer an excellent balance between cost and capability. A server designed with this CPU provides top-tier performance without necessitating a huge financial outlay, resulting in a good Total Cost of Ownership improvement for Cloudflare.</p>
    <div>
      <h3>Memory</h3>
      <a href="#memory">
        
      </a>
    </div>
    <p>AMD Genoa-X CPU supports twelve memory channels of DDR5 RAM up to 4800 mega transfers per second (MT/s) and per socket Memory Bandwidth of 460.8 GB/s. The twelve channels are fully utilized with 32 GB ECC 2Rx8 DDR5 RDIMM with one DIMM per channel configuration for a combined total memory capacity of 384 GB. </p><p>Choosing the optimal memory capacity is a balancing act, as maintaining an optimal memory-to-core ratio is important to make sure CPU capacity or memory capacity is not wasted. Some may remember that our Gen 11 servers with 64 core AMD EPYC 7713 CPUs are also configured with 384 GB of memory, which is about 6 GB per core. So why did we choose to configure our Gen 12 servers with 384 GB of memory when the core count is growing to 96 cores? Great question! A lot of memory optimization work has happened since we introduced Gen 11, including some that we blogged about, like <a href="https://blog.cloudflare.com/scalable-machine-learning-at-cloudflare/"><u>Bot Management code optimization</u></a> and <a href="https://blog.cloudflare.com/how-we-built-pingora-the-proxy-that-connects-cloudflare-to-the-internet/"><u>our transition to highly efficient Pingora</u></a>. In addition, each service has a memory allocation that is sized for optimal performance. The per-service memory allocation is programmed and monitored utilizing Linux control group resource management features. When sizing memory capacity for Gen 12, we consulted with the team who monitor resource allocation and surveyed memory utilization metrics collected from our fleet. The result of the analysis is that the optimal memory-to-core ratio is 4 GB per CPU core, or 384 GB total memory capacity. This configuration is validated in production. We chose dual rank memory modules over single rank memory modules because they have higher memory throughput, which improves server performance (read more about <a href="https://blog.cloudflare.com/ddr4-memory-organization-and-how-it-affects-memory-bandwidth/"><u>memory module organization and its effect on memory bandwidth</u></a>). </p><p>The table below shows the result of running the <a href="https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html"><u>Intel Memory Latency Checker (MLC)</u></a> tool to measure peak memory bandwidth for the system and to compare memory throughput between 12 channels of dual-rank (2Rx8) 32 GB DIMM and 12 channels of single rank (1Rx4) 32 GB DIMM. Dual rank DIMMs have slightly higher (1.8%) read memory bandwidth, but noticeably higher write bandwidth. As write ratios increased from 25% to 50%, the memory throughput delta increased by 10%.</p>
<div><table><thead>
  <tr>
    <th><span>Benchmark</span></th>
    <th><span>Dual rank advantage over single rank</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Intel MLC ALL Reads</span></td>
    <td><span>101.8%</span></td>
  </tr>
  <tr>
    <td><span>Intel MLC 3:1 Reads-Writes</span></td>
    <td><span>107.7%</span></td>
  </tr>
  <tr>
    <td><span>Intel MLC 2:1 Reads-Writes</span></td>
    <td><span>112.9%</span></td>
  </tr>
  <tr>
    <td><span>Intel MLC 1:1 Reads-Writes</span></td>
    <td><span>117.8%</span></td>
  </tr>
  <tr>
    <td><span>Intel MLC Stream-triad like</span></td>
    <td><span>108.6%</span></td>
  </tr>
</tbody></table></div><p>The table below shows the result of running the <a href="https://www.amd.com/en/developer/zen-software-studio/applications/spack/stream-benchmark.html"><u>AMD STREAM benchmark</u></a> to measure sustainable main memory bandwidth in MB/s and the corresponding computation rate for simple vector kernels. In all 4 types of vector kernels, dual rank DIMMs provide a noticeable advantage over single rank DIMMs.</p>
<div><table><thead>
  <tr>
    <th><span>Benchmark</span></th>
    <th><span>Dual rank advantage over single rank</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Stream Copy</span></td>
    <td><span>115.44%</span></td>
  </tr>
  <tr>
    <td><span>Stream Scale</span></td>
    <td><span>111.22%</span></td>
  </tr>
  <tr>
    <td><span>Stream Add</span></td>
    <td><span>109.06%</span></td>
  </tr>
  <tr>
    <td><span>Stream Triad</span></td>
    <td><span>107.70%</span></td>
  </tr>
</tbody></table></div>
    <div>
      <h3>Storage</h3>
      <a href="#storage">
        
      </a>
    </div>
    <p>Cloudflare’s Gen X server and Gen 11 server support <a href="https://en.wikipedia.org/wiki/M.2"><u>M.2</u></a> form factor drives. We liked the M.2 form factor mainly because it was compact. The M.2 specification was introduced in 2012, but today, the connector system is dated and the industry has concerns about its ability to maintain signal integrity with the high speed signal specified by <a href="https://www.xda-developers.com/pcie-5/"><u>PCIe 5.0</u></a> and <a href="https://pcisig.com/pci-express-6.0-specification"><u>PCIe 6.0</u></a> specifications. The 8.25W thermal limit of the M.2 form factor also limits the number of flash dies that can be fitted, which limits the maximum supported capacity per drive. To address these concerns, the industry has introduced the <a href="https://americas.kioxia.com/content/dam/kioxia/en-us/business/ssd/data-center-ssd/asset/KIOXIA_Meta_Microsoft_EDSFF_E1_S_Intro_White_Paper.pdf"><u>E1.S</u></a> specification and is transitioning from the M.2 form factor to the E1.S form factor. </p><p>In Gen 12, we are making the change to the <a href="https://www.snia.org/forums/cmsi/knowledge/formfactors#EDSFF"><u>EDSFF</u></a> E1 form factor, more specifically the E1.S 15mm. E1.S 15mm, though still in a compact form factor, provides more space to fit more flash dies for larger capacity support. The form factor also has better cooling design to support more than 25W of sustained power.</p><p>While the AMD Genoa-X CPU supports 128 PCIe 5.0 lanes, we continue to use NVMe devices with PCIe Gen 4.0 x4 lanes, as PCIe Gen 4.0 throughput is sufficient to meet drive bandwidth requirements and keep server design costs optimal. The server is equipped with two 8 TB NVMe drives for a total of 16 TB available storage. We opted for two 8 TB drives instead of four 4 TB drives because the dual 8 TB configuration already provides sufficient I/O bandwidth for all Cloudflare workloads that run on each server.</p>
<div><table><thead>
  <tr>
    <th><span>Sequential Read (MB/s) :</span></th>
    <th><span>6,700</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Sequential Write (MB/s) :</span></td>
    <td><span>4,000</span></td>
  </tr>
  <tr>
    <td><span>Random Read IOPS:</span></td>
    <td><span>1,000,000</span></td>
  </tr>
  <tr>
    <td><span>Random Write IOPS: </span></td>
    <td><span>200,000</span></td>
  </tr>
  <tr>
    <td><span>Endurance</span></td>
    <td><span>1 DWPD</span></td>
  </tr>
  <tr>
    <td><span>PCIe GEN4 x4 lane throughput</span></td>
    <td><span>7880 MB/s</span></td>
  </tr>
</tbody></table></div><p><sub><i>Storage devices performance specification</i></sub></p>
    <div>
      <h3>Network</h3>
      <a href="#network">
        
      </a>
    </div>
    <p>Cloudflare servers and top-of-rack (ToR) network equipment operate at <a href="https://en.wikipedia.org/wiki/25_Gigabit_Ethernet"><u>25 GbE</u></a> speeds. In Gen 12, we utilized a <a href="https://www.opencompute.org/wiki/Server/DC-MHS"><u>DC-MHS</u></a> motherboard-inspired design, and upgraded from an <a href="https://drive.google.com/file/d/1VGAtABAKU9fq3KfClYhFOgGFN3oe63Uw/view?usp=sharing"><u>OCP 2.0 form factor</u></a> to an <a href="https://drive.google.com/file/d/1U3oEGiSWfupG4SnIdPuJ_8Nte2lJRqTN/view?usp=sharing"><u>OCP 3.0 form factor</u></a>, which provides tool-less serviceability of the NIC. The OCP 3.0 form factor also occupies less space in the 2U server compared to PCIe-attached NICs, which improves airflow and frees up space for other application-specific PCIe cards, such as GPUs.</p><p>Cloudflare has been using the Mellanox CX4-Lx EN dual port 25 GbE NIC since our <a href="https://blog.cloudflare.com/a-tour-inside-cloudflares-g9-servers/"><u>Gen 9 servers in 2018</u></a>. Even though the NIC has served us well over the years, we are single sourced. During the pandemic, we were faced with supply constraints and extremely long lead times. The team scrambled to qualify the Broadcom M225P dual port 25 GbE NIC as our second-sourced NIC in 2022, ensuring we could continue to turn up servers to serve customer demand. With the lessons learned from single-sourcing the Gen 11 NIC, we are now dual-sourcing and have chosen the Intel Ethernet Network Adapter E810 and NVIDIA Mellanox ConnectX-6 Lx to support Gen 12. These two NICs are compliant with the <a href="https://www.opencompute.org/wiki/Server/NIC"><u>OCP 3.0 specification</u></a> and offer more MSI-X queues that can then be mapped to the increased core count on the AMD EPYC 9684X. The Intel Ethernet Network Adapter comes with an additional advantage, offering full Generic Segmentation Offload (GSO) support including VLAN-tagged encapsulated traffic, whereast many vendors either only support <a href="https://netdevconf.info/1.2/papers/LCO-GSO-Partial-TSO-MangleID.pdf"><u>Partial GSO</u></a> or do not support it at all today. With Full GSO support, the kernel spent noticeably less time in softirq segmenting packets, and servers with Intel E810 NICs are processing approximately 2% more requests per second.</p>
    <div>
      <h3>Improved security with DC-SCM: Project Argus</h3>
      <a href="#improved-security-with-dc-scm-project-argus">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6ur3YccqqXckIL6oWKd6Lq/5352252ff8e5c1fb15eb02d1572a0689/BLOG-2116_3.png" />
          </figure><p><sup><i>DC-SCM in Gen 12 server (Project Argus)</i></sup></p><p>Gen 12 servers are integrated with <a href="https://blog.cloudflare.com/introducing-the-project-argus-datacenter-ready-secure-control-module-design-specification/"><u>Project Argus</u></a>, one of the industry first implementations of <a href="https://drive.google.com/file/d/13BxuseSrKo647hjIXjp087ei8l5QQVb0/view"><u>Data Center Secure Control Module 2.0 (DC-SCM 2.0)</u></a>. DC-SCM 2.0 decouples server management and security functions away from the motherboard. The baseboard management controller (BMC), hardware root of trust (HRoT), trusted platform module (TPM), and dual BMC/BIOS flash chips are all installed on the DC-SCM. </p><p>On our Gen X and Gen 11 server, Cloudflare moved our secure boot trust anchor from the system Basic Input/Output System (BIOS) or the Unified Extensible Firmware Interface (UEFI) firmware to hardware-rooted boot integrity — <a href="https://blog.cloudflare.com/anchoring-trust-a-hardware-secure-boot-story/"><u>AMD’s implementation of Platform Secure Boot (PSB)</u></a> or <a href="https://blog.cloudflare.com/armed-to-boot/"><u>Ampere’s implementation of Single Domain Secure Boot</u></a>. These solutions helped secure Cloudflare infrastructure from BIOS / UEFI firmware attacks. However, we are still vulnerable to out-of-band attacks through compromising the BMC firmware. BMC is a microcontroller that provides out-of-band monitoring and management capabilities for the system. When compromised, attackers can read processor console logs accessible by BMC and control server power states for example. On Gen 12, the HRoT on the DC-SCM serves as the trust store of cryptographic keys and is responsible to authenticate the BIOS/UEFI firmware (independent of CPU vendor) and the BMC firmware for secure boot process.</p><p>In addition, on the DC-SCM, there are additional flash storage devices to enable storing back-up BIOS/UEFI firmware and BMC firmware to allow rapid recovery when a corrupted or malicious firmware is programmed, and to be resilient to flash chip failure due to aging.</p><p>These updates make our Gen 12 server more secure and more resilient to firmware attacks.</p>
    <div>
      <h3>Power</h3>
      <a href="#power">
        
      </a>
    </div>
    <p>A Gen 12 server consumes 600 watts at a typical ambient temperature of 25°C. Even though this is a 50% increase from the 400 watts consumed by the Gen 11 server, as mentioned above in the CPU section, this is a relatively small price to pay for a 145% increase in performance. We’ve paired the server up with dual 800W common redundant power supplies (CRPS) with 80 PLUS Titanium grade efficiency. Both power supply units (PSU) operate actively with distributed power and current. The units are hot-pluggable, allowing the server to operate with redundancy and maximize uptime.</p><p><a href="https://www.clearesult.com/80plus/program-details"><u>80 PLUS</u></a> is a PSU efficiency certification program. The Titanium grade efficiency PSU is 2% more efficient than the Platinum grade efficiency PSU between typical operating load of 25% to 50%. 2% may not sound like a lot, but considering the size of Cloudflare fleet with servers deployed worldwide, 2% savings over the lifetime of all Gen 12 deployment is a reduction of more than 7 GWh, <a href="https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator#results"><u>equivalent to carbon sequestered by more than 3400 acres of U.S. forests in one year</u></a>.  This upgrade also means our Gen 12 server complies with <a href="https://www.unicomengineering.com/blog/eu-lot-9-update-the-coming-server-power-migration/"><u>EU Lot9 requirements</u></a> and can be deployed in the EU region.</p>
<div><table><thead>
  <tr>
    <th><span>80 PLUS certification</span></th>
    <th><span>10%</span></th>
    <th><span>20%</span></th>
    <th><span>50%</span></th>
    <th><span>100%</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>80 PLUS Platinum</span></td>
    <td><span>-</span></td>
    <td><span>92%</span></td>
    <td><span>94%</span></td>
    <td><span>90%</span></td>
  </tr>
  <tr>
    <td><span>80 PLUS Titanium</span></td>
    <td><span>90%</span></td>
    <td><span>94%</span></td>
    <td><span>96%</span></td>
    <td><span>91%</span></td>
  </tr>
</tbody></table></div>
    <div>
      <h3>Drop-in GPU support</h3>
      <a href="#drop-in-gpu-support">
        
      </a>
    </div>
    <p>Demand for machine learning and AI workloads exploded in 2023, and Cloudflare <a href="https://blog.cloudflare.com/workers-ai/"><u>introduced Workers AI </u></a>to serve the needs of our customers. Cloudflare retrofitted or deployed GPUs worldwide in a portion of our Gen 11 server fleet to support the growth of Workers AI. Our Gen 12 server is also designed to accommodate the addition of more powerful GPUs. This gives Cloudflare the flexibility to support Workers AI in all regions of the world, and to strategically place GPUs in regions to reduce inference latency for our customers. With this design, the server can run Cloudflare’s full software stack. During times when GPUs see lower utilization, the server continues to serve general web requests and remains productive.</p><p>The electrical design of the motherboard is designed to support up to two PCIe add-in cards and the power distribution board is sized to support an additional 400W of power. The mechanics are sized to support either a single FHFL (full height, full length) double width GPU PCIe card, or two FHFL single width GPU PCIe cards. The thermal solution including the component placement, fans, and air duct design are sized to support adding GPUs with TDP up to 400W.</p>
    <div>
      <h3>Looking to the future</h3>
      <a href="#looking-to-the-future">
        
      </a>
    </div>
    <p>Gen 12 Servers are currently deployed and live in multiple Cloudflare data centers worldwide, and already process millions of requests per second. Cloudflare’s EPYC journey has not ended — the 5th-gen AMD EPYC CPUs (code name “Turin”) are already available for testing, and we are very excited to start the architecture planning and design discussion for the Gen 13 server. <a href="https://www.cloudflare.com/careers/jobs/"><u>Come join us</u></a> at Cloudflare to help build a better Internet!</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[EPYC]]></category>
            <category><![CDATA[AMD]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Hardware]]></category>
            <guid isPermaLink="false">sdvPBDBwhcEcrODVeOE7A</guid>
            <dc:creator>JQ Lau</dc:creator>
            <dc:creator>Ma Xiong</dc:creator>
            <dc:creator>Syona Sarma</dc:creator>
        </item>
        <item>
            <title><![CDATA[Removing uncertainty through "what-if" capacity planning]]></title>
            <link>https://blog.cloudflare.com/scenario-planner/</link>
            <pubDate>Fri, 20 Sep 2024 14:01:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare’s Capacity Planning team discusses planning for “what-if” type scenarios, and how they’ve introduced a new “Scenario Planner” system to quickly model hypothetical future changes ]]></description>
            <content:encoded><![CDATA[ <p>Infrastructure planning for a network serving more than 81 million requests at peak and which is globally distributed across <a href="https://www.cloudflare.com/network/"><u>more than 330 cities in 120+ countries</u></a> is complex. The capacity planning team at Cloudflare ensures there is enough capacity in place all over the world so that our customers have one less thing to worry about - our infrastructure, which should just work. Through our processes, the team puts careful consideration into “what-ifs”. What if something unexpected happens and <a href="https://blog.cloudflare.com/post-mortem-on-cloudflare-control-plane-and-analytics-outage/"><u>one of our data centers fails</u></a>? What if one of our largest customers triples, or quadruples their request count?  Across a gamut of scenarios like these, the team works to understand where traffic will be served from and how the Cloudflare customer experience may change.</p><p>This blog post gives a look behind the curtain of how these scenarios are modeled at Cloudflare, and why it's so critical for our customers.</p>
    <div>
      <h2>Scenario planning and our customers</h2>
      <a href="#scenario-planning-and-our-customers">
        
      </a>
    </div>
    <p>Cloudflare customers rely on the data centers that Cloudflare has deployed all over the world, placing us within 50 ms of approximately 95% of the Internet-connected population globally. But round-trip time to our end users means little if those data centers don’t have the capacity to serve requests. Cloudflare has invested deeply into systems that are working around the clock to optimize the requests flowing through our network because we know that failures happen all the time: the Internet can be a volatile place. See <a href="https://blog.cloudflare.com/backbone2024"><u>our blog post from August 2024</u></a> on how we handle this volatility in real time on our backbone, and our <a href="https://blog.cloudflare.com/meet-traffic-manager"><u>blog post from late 2023</u></a> about how another system, Traffic Manager, actively works in and between data centers, moving traffic to optimize the customer experience around constraints in our data centers. Both of these systems do a fantastic job in real time, but there is still a gap — what about over the long term?  </p><p>Most of the volatility that the above systems are built to manage is resolved within shorter time scales than which we build plans for. (There are, of course, some failures that are <a href="https://itweb.africa/content/O2rQGMAEyPGMd1ea"><u>exceptions</u></a>.) Most scenarios we model still need to take into account the state of our data centers in the future, as well as what actions systems like Traffic Manager will take during those periods.  But before getting into those constraints, it’s important to note how capacity planning measures things: in units of CPU Time, defined as the time that each request takes in the CPU.  This is done for the same reasons that <a href="https://blog.cloudflare.com/meet-traffic-manager"><u>Traffic Manager</u></a> uses CPU Time, in that it enables the team to 1) use a common unit across different types of customer workloads and 2) speak a common language with other teams and systems (like Traffic Manager). The same reasoning the Traffic Manager team cited <a href="https://blog.cloudflare.com/meet-traffic-manager"><u>in their own blog post</u></a> is equally applicable for capacity planning: </p><blockquote><p><i>…using requests per second as a metric isn’t accurate enough when actually moving traffic. The reason for this is that different customers have different resource costs to our service; a website served mainly from cache with the WAF deactivated is much cheaper CPU wise than a site with all WAF rules enabled and caching disabled. So we record the time that each request takes in the CPU. We can then aggregate the CPU time across each plan to find the CPU time usage per plan. We record the CPU time in ms, and take a per second value, resulting in a unit of milliseconds per second.</i></p></blockquote><p>This is important for customers for the same reason that the Traffic Manager team cited in their blog post as well: we can correlate CPU time to performance, specifically latency.</p><p>Now that we know our unit of measurement is CPU time, we need to set up our models with the new constraints associated with the change that we’re trying to model.  Specifically, there are a subset of constraints that we are particularly interested in because we know that they have the ability to impact our customers by impacting the availability of CPU in a data center.  These are split into two main inputs in our models: Supply and Demand.  We can think of these as “what-if” questions, such as the following examples:</p>
    <div>
      <h3>Demand what-ifs</h3>
      <a href="#demand-what-ifs">
        
      </a>
    </div>
    <ul><li><p>What if a new customer onboards to Cloudflare with a significant volume of requests and/or bytes?  </p></li><li><p>What if an existing customer increased its volume of requests and/or bytes by some multiplier (i.e. 2x, 3x, nx), at peak, for the next three months?</p></li><li><p>What if the growth rate, in number of requests and bytes, of all of our data centers worldwide increased from X to Y two months from now, indefinitely?</p></li><li><p>What if the growth rate, in number of requests and bytes, of data center facility A increased from X to Y one month from now?</p></li><li><p>What if traffic egressing from Cloudflare to a last-mile network shifted from one location (such as Boston) to another (such as New York City) next week?</p></li></ul>
    <div>
      <h3>Supply what-ifs</h3>
      <a href="#supply-what-ifs">
        
      </a>
    </div>
    <ul><li><p>What if data center facility A lost some or all of its available servers two months from now?</p></li><li><p>What if we added X servers to data center facility A today?</p></li><li><p>What if some or all of our connectivity to other ASNs (<a href="https://www.cloudflare.com/network/"><u>12,500 Networks/nearly 300 Tbps</u></a>) failed now?</p></li></ul>
    <div>
      <h3>Output</h3>
      <a href="#output">
        
      </a>
    </div>
    <p>For any one of these, or a combination of them, in our model’s output, we aim to provide answers to the following: </p><ul><li><p>What will the overall capacity picture look like over time? </p></li><li><p>Where will the traffic go? </p></li><li><p>How will this impact our costs?</p></li><li><p>Will we need to deploy additional servers to handle the increased load?</p></li></ul><p>Given these sets of questions and outputs, manually creating a model to answer each of these questions, or a combination of these questions, quickly becomes an operational burden for any team.  This is what led us to launch “Scenario Planner”.</p>
    <div>
      <h2>Scenario Planner</h2>
      <a href="#scenario-planner">
        
      </a>
    </div>
    <p>In August 2024, the infrastructure team finished building “Scenario Planner”, a system that enables anyone at Cloudflare to simulate “what-ifs”. This provides our team the opportunity to quickly model hypothetical changes to our demand and supply metrics across time and in any of Cloudflare’s data centers. The core functionality of the system has to do with the same questions we need to answer in the manual models discussed above.  After we enter the changes we want to model, Scenario Planner converts from units that are commonly associated with each question to our common unit of measurement: CPU Time. These inputs are then used to model the updated capacity across all of our data centers, including how demand may be distributed in cases where capacity constraints may start impacting performance in a particular location.  As we know, if that happens then it triggers Traffic Manager to serve some portion of those requests from a nearby location to minimize impact on customers and user experience.</p>
    <div>
      <h3>Updated demand questions with inputs</h3>
      <a href="#updated-demand-questions-with-inputs">
        
      </a>
    </div>
    <ul><li><p><b>Question:</b> What if a new customer onboards to Cloudflare with a significant volume of requests?  </p></li><li><p><b>Input:</b> The new customer’s expected volume, geographic distribution, and timeframe of requests, converted to a count of virtual CPUs</p></li><li><p><b>Calculation(s)</b>: Scenario Planner converts from server count to CPU Time, and distributes the new demand across the regions selected according to the aggregate distribution of all customer usage.  </p></li></ul><p>
<br />
</p><ul><li><p><b>Question:</b> What if an existing customer increased its volume of requests and/or bytes by some multiplier (i.e. 2x, 3x, nx), at peak, for the next three months?</p></li><li><p><b>Input</b>: Select the customer name, the multiplier, and the timeframe</p></li><li><p><b>Calculation(s)</b>: Scenario Planner already has how the selected customer’s traffic is distributed across all data centers globally, so this involves simply multiplying that value by the multiplier selected by the user</p></li></ul><p>
<br />
</p><ul><li><p><b>Question:</b> What if the growth rate, in number of requests and bytes, of all of our data centers worldwide increased from X to Y two months from now, indefinitely?</p></li><li><p><b>Input:</b> Enter a new global growth rate and timeframe</p></li><li><p><b>Calculation(s)</b>: Scenario Planner distributes this growth across all data centers globally according to their current growth rate.  In other words, the global growth is an aggregation of all individual data center’s growth rates, and to apply a new “Global” growth rate, the system scales up each of the individual data center’s growth rates commensurate with the current distribution of growth.</p></li></ul><p>
<br />
</p><ul><li><p><b>Question:</b> What if the growth rate, in number of requests and bytes, of data center facility A increased from X to Y one month from now?</p></li><li><p><b>Input:</b> Select a data center facility, enter a new growth rate for that data center and the timeframe to apply that change across.</p></li><li><p><b>Calculation(s)</b>: Scenario Planner passes the new growth rate for the data center to the backend simulator, across the timeline specified by the user</p></li></ul><p>
<br />
</p>
    <div>
      <h3>Updated supply questions with inputs</h3>
      <a href="#updated-supply-questions-with-inputs">
        
      </a>
    </div>
    <ul><li><p><b>Question:</b> What if data center facility A lost some or all of its available servers two months from now?</p></li><li><p><b>Input</b>: Select a data center, and enter the number of servers to remove, or select to remove all servers in that location, as well as the timeframe for when those servers will not be available</p></li><li><p><b>Calculation(s)</b>: Scenario Planner converts the server count entered (including all servers in a given location) to CPU Time before passing to the backend</p></li></ul><p>
<br />
</p><ul><li><p><b>Question:</b> What if we added X servers to data center facility A today?</p></li><li><p><b>Input</b>: Select a data center, and enter the number of servers to add, as well as the timeline for when those servers will first go live</p></li><li><p><b>Calculation(s)</b>: Scenario Planner converts the server count entered (including all servers in a given location) to CPU Time before passing to the backend</p></li></ul><p>
<br />
</p><p>We made it simple for internal users to understand the impact of those changes because Scenario Planner outputs the same views that everyone who has seen our heatmaps and visual representations of our capacity status is familiar with. There are two main outputs the system provides: a heatmap and an “Expected Failovers” view. Below, we explore what these are, with some examples.</p>
    <div>
      <h3>Heatmap</h3>
      <a href="#heatmap">
        
      </a>
    </div>
    <p>Capacity planning evaluates its success on its ability to predict demand: we generally produce a weekly, monthly, and quarterly forecast of 12 months to three years worth of demand, and nearly all of our infrastructure decisions are based on the output of this forecast.  Scenario Planner provides a view of the results of those forecasts that are implemented via a heatmap: it shows our current state, as well as future planned server additions that are scheduled based on the forecast.  </p><p>Here is an example of our heatmap, showing some of our largest data centers in Eastern North America (ENAM). Ashburn is showing as yellow, briefly, because our capacity planning threshold for adding more server capacity to our data centers is 65% utilization (based on CPU time supply and demand): this gives the Cloudflare teams time to procure additional servers, ship them, install them, and bring them live <i>before</i> customers will be impacted and systems like Traffic Manager would begin triggering.  The little cloud icons indicate planned upgrades of varying sizes to get ahead of forecasted future demand well ahead of time to avoid customer performance degradation.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/PXIr1mpsrWPLmWpP9RCPg/5062269f1432a41c8177a7850243ab8c/BLOG-2554_2.png" />
          </figure><p><b></b></p><p>The question Scenario Planner answers then is how this view changes with a hypothetical scenario: What if our Ashburn, Miami, and Atlanta facilities shut down completely?  This is unlikely to happen, but we would expect to see enormous impact on the remaining largest facilities in ENAM. We’ll simulate all three of these failing at the same time, taking them offline indefinitely:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3apNpQAHPs9uY1QTAxOQ6d/6230e0950d3f70b50e17bfb10bca999c/BLOG-2554_3.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6vfWrww4uT8VWtLa0vcDAM/dacf2bf102c8c38672ff08c48bc42542/BLOG-2554_4.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7amWSND6PZ95oVJA5oO0Qc/588c7d7efcccfedca7cec240f830fa01/BLOG-2554_5.png" />
          </figure><p><b></b></p><p>This results in a view of our capacity through the rest of the year in the remaining large data centers in ENAM — capacity is clearly constrained: Traffic Manager will be working hard to mitigate any impact to customer performance if this were to happen. Our capacity view in the heatmap is capped at 75%: this is because Traffic Manager typically engages around this level of CPU utilization. Beyond 75%, Cloudflare customers may begin to experience increased latency, though this is dependent on the product and workload, and is in reality much more dynamic. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Tse9FexuemDHzExPYnY0i/0cdba516c0b33888b7079d435ec02fd6/BLOG-2554_6.png" />
          </figure><p><b></b></p><p>This outcome in the heatmap is not unexpected.  But now we typically get a follow-up question: clearly this traffic won’t fit in just Newark, Chicago, and Toronto, so where do all these requests get served from?  Enter the failover simulator: Capacity Planning has been simulating how Traffic Manager may work in the long term for quite a while, and for Scenario Planner, it was simple to extend this functionality to answer exactly this question.</p><p>There is currently no traffic being moved by Traffic Manager from these data centers, but our simulation shows a significant portion of the Atlanta CPU time being served from our DFW/Dallas data center as well as Newark (bottom pink), and Chicago (orange) through the rest of the year, during this hypothetical failure. With Scenario Planner, Capacity Planning can take this information and simulate multiple failures all over the world to understand the impact to customers, taking action to ensure that customers trusting Cloudflare with their web properties can expect high performance even in instances of major data center failures. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1iv8nxYdIXOJ626UHpUadI/8581d88d740b6369ba5fc8500d9c7d97/Screenshot_2024-09-18_at_10.27.50_PM.png" />
          </figure><p><b></b></p>
    <div>
      <h2>Planning with uncertainty</h2>
      <a href="#planning-with-uncertainty">
        
      </a>
    </div>
    <p>Capacity planning a large global network comes with plenty of uncertainties. Scenario Planner is one example of the work the Capacity Planning team is doing to ensure that the millions of web properties our customers entrust to Cloudflare can expect consistent, top tier performance all over the world.</p><p>The Capacity Planning team is hiring — check out the <a href="https://www.cloudflare.com/careers/"><u>Cloudflare careers page</u></a> and <a href="https://www.cloudflare.com/careers/jobs/?title=capacity"><u>search for open roles on the Capacity Planning team</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Data Center]]></category>
            <category><![CDATA[Edge]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Network Services]]></category>
            <guid isPermaLink="false">13ueCTf2YFQfu3c1KeOlWj</guid>
            <dc:creator>Curt Robords</dc:creator>
        </item>
        <item>
            <title><![CDATA[The backbone behind Cloudflare’s Connectivity Cloud]]></title>
            <link>https://blog.cloudflare.com/backbone2024/</link>
            <pubDate>Tue, 06 Aug 2024 14:00:00 GMT</pubDate>
            <description><![CDATA[ Read through the latest milestones and expansions of Cloudflare's global backbone and how it supports our Connectivity Cloud and our services ]]></description>
            <content:encoded><![CDATA[ <p>The modern use of "cloud" arguably traces its origins to the cloud icon, omnipresent in network diagrams for decades. A cloud was used to represent the vast and intricate infrastructure components required to deliver network or Internet services without going into depth about the underlying complexities. At Cloudflare, we embody this principle by providing critical infrastructure solutions in a user-friendly and easy-to-use way. Our logo, featuring the cloud symbol, reflects our commitment to simplifying the complexities of Internet infrastructure for all our users.</p><p>This blog post provides an update about our infrastructure, focusing on our global backbone in 2024, and highlights its benefits for our customers, our competitive edge in the market, and the impact on our mission of helping build a better Internet. Since the time of our last backbone-related <a href="http://blog.cloudflare.com/cloudflare-backbone-internet-fast-lane">blog post</a> in 2021, we have increased our backbone capacity (Tbps) by more than 500%, unlocking new use cases, as well as reliability and performance benefits for all our customers.</p>
    <div>
      <h3>A snapshot of Cloudflare’s infrastructure</h3>
      <a href="#a-snapshot-of-cloudflares-infrastructure">
        
      </a>
    </div>
    <p>As of July 2024, Cloudflare has data centers in 330 cities across more than 120 countries, each running Cloudflare equipment and services. The goal of delivering Cloudflare products and services everywhere remains consistent, although these data centers vary in the number of servers and amount of computational power.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/38RRu7BaumWFemL23JcFLW/fd1e4aced5095b1e04384984c88e48be/BLOG-2432-2.png" />
          </figure><p></p><p>These data centers are strategically positioned around the world to ensure our presence in all major regions and to help our customers comply with local regulations. It is a programmable smart network, where your traffic goes to the best data center possible to be processed. This programmability allows us to keep sensitive data regional, with our <a href="https://www.cloudflare.com/data-localization/">Data Localization Suite solutions</a>, and within the constraints that our customers impose. Connecting these sites, exchanging data with customers, public clouds, partners, and the broader Internet, is the role of our network, which is managed by our infrastructure engineering and network strategy teams. This network forms the foundation that makes our products lightning fast, ensuring our global reliability, security for every customer request, and helping customers comply with <a href="https://www.cloudflare.com/the-net/building-cyber-resilience/challenges-data-sovereignty/">data sovereignty requirements</a>.</p>
    <div>
      <h3>Traffic exchange methods</h3>
      <a href="#traffic-exchange-methods">
        
      </a>
    </div>
    <p>The Internet is an interconnection of different networks and separate <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">autonomous systems</a> that operate by exchanging data with each other. There are multiple ways to exchange data, but for simplicity, we'll focus on two key methods on how these networks communicate: Peering and IP Transit. To better understand the benefits of our global backbone, it helps to understand these basic connectivity solutions we use in our network.</p><ol><li><p><b>Peering</b>: The voluntary interconnection of administratively separate Internet networks that allows for traffic exchange between users of each network is known as “<a href="https://www.netnod.se/ix/what-is-peering">peering</a>”. Cloudflare is one of the <a href="https://bgp.he.net/report/exchanges#_participants">most peered networks</a> globally. We have peering agreements with ISPs and other networks in 330 cities and across all major </p><p><a href="https://www.cloudflare.com/learning/cdn/glossary/internet-exchange-point-ixp/">Internet Exchanges (IX’s)</a>. Interested parties can register to <a href="https://www.cloudflare.com/partners/peering-portal/">peer with us</a> anytime, or directly connect to our network with a link through a <a href="https://developers.cloudflare.com/network-interconnect/pni-and-peering/">private network interconnect (PNI)</a>.</p></li><li><p><b>IP transit</b>: A paid service that allows traffic to cross or "transit" somebody else's network, typically connecting a smaller Internet service provider (ISP) to the larger Internet. Think of it as paying a toll to access a private highway with your car.</p></li></ol><p>The backbone is a dedicated high-capacity optical fiber network that moves traffic between Cloudflare’s global data centers, where we interconnect with other networks using these above-mentioned traffic exchange methods. It enables data transfers that are more reliable than over the public Internet. For the connectivity within a city and long distance connections we manage our own dark fiber or lease wavelengths using Dense Wavelength Division Multiplexing (DWDM). DWDM is a fiber optic technology that enhances network capacity by transmitting multiple data streams simultaneously on different wavelengths of light within the same fiber. It’s like having a highway with multiple lanes, so that more cars can drive on the same highway. We buy and lease these services from our global carrier partners all around the world.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1RgjDtW5LehGZEYXey4AQH/cfef08965313f67c84a052e0541fc42b/BLOG-2432-3.png" />
          </figure><p></p>
    <div>
      <h3>Backbone operations and benefits</h3>
      <a href="#backbone-operations-and-benefits">
        
      </a>
    </div>
    <p>Operating a global backbone is challenging, which is why many competitors don’t do it. We take this challenge for two key reasons: traffic routing control and cost-effectiveness.</p><p>With IP transit, we rely on our transit partners to carry traffic from Cloudflare to the ultimate destination network, introducing unnecessary third-party reliance. In contrast, our backbone gives us full control over routing of both internal and external traffic, allowing us to manage it more effectively. This control is crucial because it lets us optimize traffic routes, usually resulting in the lowest latency paths, as previously mentioned. Furthermore, the cost of serving large traffic volumes through the backbone is, on average, more cost-effective than IP transit. This is why we are doubling down on backbone capacity in regions such as Frankfurt, London, Amsterdam, and Paris and Marseille, where we see continuous traffic growth and where connectivity solutions are widely available and competitively priced.</p><p>Our backbone serves both internal and external traffic. Internal traffic includes customer traffic using our security or performance products and traffic from Cloudflare's internal systems that shift data between our data centers. <a href="http://blog.cloudflare.com/introducing-regional-tiered-cache">Tiered caching</a>, for example, optimizes our caching delivery by dividing our data centers into a hierarchy of lower tiers and upper tiers. If lower-tier data centers don’t have the content, they request it from the upper tiers. If the upper tiers don’t have it either, they then request it from the origin server. This process reduces origin server requests and improves cache efficiency. Using our backbone to transport the cached content between lower and upper-tier data centers and the origin is often the most cost-effective method, considering the scale of our network. <a href="https://www.cloudflare.com/network-services/products/magic-transit/">Magic Transit</a> is another example where we attract traffic, by means of BGP anycast, to the Cloudflare data center closest to the end user and implement our DDoS solution. Our backbone transports the clean traffic to our customer’s data center, which they connect through a <a href="http://blog.cloudflare.com/cloudflare-network-interconnect">Cloudflare Network Interconnect (CNI)</a>.</p><p>External traffic that we carry on our backbone can be traffic from other origin providers like AWS, Oracle, Alibaba, Google Cloud Platform, or Azure, to name a few. The origin responses from these cloud providers are transported through peering points and our backbone to the Cloudflare data center closest to our customer. By leveraging our backbone we have more control over how we backhaul this traffic throughout our network, which results in more reliability and better performance and less dependency on the public Internet.</p><p>This interconnection between public clouds, offices, and the Internet with a controlled layer of performance, security, programmability, and visibility running on our global backbone is our <a href="http://blog.cloudflare.com/welcome-to-connectivity-cloud">Connectivity Cloud</a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Fk6k5NOgfOM3qpK0z3wb0/2fe9631dbe6b2dfc6b3c3cd0156f293e/Screenshot_2024-08-28_at_3.21.50_PM.png" />
          </figure><p><sub><i>This map is a simplification of our current backbone network and does not show all paths</i></sub></p><p></p>
    <div>
      <h3>Expanding our network</h3>
      <a href="#expanding-our-network">
        
      </a>
    </div>
    <p>As mentioned in the introduction, we have increased our backbone capacity (Tbps) by more than 500% since 2021. With the addition of sub-sea cable capacity to Africa, we achieved a big milestone in 2023 by completing our global backbone ring. It now reaches six continents through terrestrial fiber and subsea cables.</p><p>Building out our backbone within regions where Internet infrastructure is less developed compared to markets like Central Europe or the US has been a key strategy for our latest network expansions. We have a shared goal with regional ISP partners to keep our data flow localized and as close as possible to the end user. Traffic often takes inefficient routes outside the region due to the lack of sufficient local peering and regional infrastructure. This phenomenon, known as traffic tromboning, occurs when data is routed through more cost-effective international routes and existing peering agreements.</p><p>Our regional backbone investments in countries like India or Turkey aim to reduce the need for such inefficient routing. With our own in-region backbone, traffic can be directly routed between in-country Cloudflare data centers, such as from Mumbai to New Delhi to Chennai, reducing latency, increasing reliability, and helping us to provide the same level of service quality as in more developed markets. We can control that data stays local, supporting our Data Localization Suite (<a href="https://www.cloudflare.com/data-localization/">DLS</a>), which helps businesses comply with regional data privacy laws by controlling where their data is stored and processed.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4WCNB78y1jHHsid46pBZOo/e950ced1e510cb8caeea0961c43ea8a0/BLOG-2432-5.png" />
          </figure><p></p>
    <div>
      <h3>Improved latency and performance</h3>
      <a href="#improved-latency-and-performance">
        
      </a>
    </div>
    <p>This strategic expansion has not only extended our global reach but has also significantly improved our overall latency. One illustration of this is that since the deployment of our backbone between Lisbon and Johannesburg, we have seen a major performance improvement for users in Johannesburg. Customers benefiting from this improved latency can be, for example, a financial institution running their APIs through us for real-time trading, where milliseconds can impact trades, or our <a href="https://www.cloudflare.com/network-services/products/magic-wan/">Magic WAN</a> users, where we facilitate site-to-site connectivity between their branch offices.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1o0H8BNLf5ca8BBx38Q5Ee/5b22f7c0ad1c5c49a67bc5149763e81d/BLOG-2432-6.png" />
          </figure><p></p><p>The table above shows an example where we measured the round-trip time (RTT) for an uncached origin fetch, from an end-user in Johannesburg to various origin locations, comparing our backbone and the public Internet. By carrying the origin request over our backbone, as opposed to IP transit or peering, local users in Johannesburg get their content up to 22% faster. By using our own backbone to long-haul the traffic to its final destination, we are in complete control of the path and performance. This improvement in latency varies by location, but consistently demonstrates the superiority of our backbone infrastructure in delivering high performance connectivity.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ZEEZJERWQ2UB1sdTjWUtM/f90b11507ab24edbf84e9b4cfb9b1155/BLOG-2432-7.png" />
          </figure><p></p>
    <div>
      <h3>Traffic control</h3>
      <a href="#traffic-control">
        
      </a>
    </div>
    <p>Consider a navigation system using 1) GPS to identify the route and 2) a highway toll pass that is valid until your final destination and allows you to drive straight through toll stations without stopping. Our backbone works quite similarly.</p><p>Our global backbone is built upon two key pillars. The first is BGP (<a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">Border Gateway Protocol</a>), the routing protocol for the Internet, and the second is Segment Routing MPLS (<a href="https://www.cloudflare.com/learning/network-layer/what-is-mpls/">Multiprotocol label switching</a>), a technique for steering traffic across predefined forwarding paths in an IP network. By default, Segment Routing provides end-to-end encapsulation from ingress to egress routers where the intermediate nodes execute no route lookup. Instead, they forward traffic across an end-to-end virtual circuit, or tunnel, called a label-switched path. Once traffic is put on a label-switched path, it cannot detour onto the public Internet and must continue on the predetermined route across Cloudflare’s backbone. This is nothing new, as many networks will even run a “BGP Free Core” where all the route intelligence is carried at the edge of the network, and intermediate nodes only participate in forwarding from ingress to egress.</p><p>While leveraging Segment Routing Traffic Engineering (SR-TE) in our backbone, we can automatically select paths between our data centers that are optimized for latency and performance. Sometimes the “shortest path” in terms of routing protocol cost is not the lowest latency or highest performance path.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6QettBytPdJxacwVLVHYFN/de95a8e5a67514e64931fbe4d26967b6/BLOG-2432-8.png" />
          </figure>
    <div>
      <h3>Supercharged: Argo and the global backbone</h3>
      <a href="#supercharged-argo-and-the-global-backbone">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/lp/pg-argo-smart-routing/?utm_source=google&amp;utm_medium=cpc&amp;utm_campaign=ao-fy-pay-gbl_en_native-applications-ge-ge-general-core_paid_apo_argo&amp;utm_content=argo&amp;utm_term=cloudflare+argo&amp;campaignid=71700000092259497&amp;adgroupid=58700007751943324&amp;creativeid=666481290143&amp;&amp;_bt=666481290143&amp;_bk=cloudflare%20argo&amp;_bm=e&amp;_bn=g&amp;_bg=138787490550&amp;_placement=&amp;_target=&amp;_loc=1017825&amp;_dv=c&amp;awsearchcpc=1&amp;gad_source=1&amp;gclid=Cj0KCQjwvb-zBhCmARIsAAfUI2uj2VOkHjvM2qspAfBodOROAH_bG040P6bjvQeEbVwFF1qwdEKLXLkaAllMEALw_wcB&amp;gclsrc=aw.ds">Argo Smart Routing</a> is a service that uses Cloudflare’s portfolio of backbone, transit, and peering connectivity to find the most optimal path between the data center where a user’s request lands and your back-end origin server. Argo may forward a request from one Cloudflare data center to another on the way to an origin if the performance would improve by doing so. <a href="http://blog.cloudflare.com/orpheus-saves-internet-requests-while-maintaining-speed">Orpheus</a> is the counterpart to Argo, and routes around degraded paths for all customer origin requests free of charge. Orpheus is able to analyze network conditions in real-time and actively avoid reachability failures. Customers with Argo enabled get optimal performance for requests from Cloudflare data centers to their origins, while Orpheus provides error self-healing for all customers universally. By mixing our global backbone using Segment Routing as an underlay with <a href="https://www.cloudflare.com/application-services/products/argo-smart-routing/">Argo Smart Routing</a> and Orpheus as our connectivity overlay, we are able to transport critical customer traffic along the most optimized paths that we have available.</p><p>So how exactly does our global backbone fit together with Argo Smart Routing? <a href="http://blog.cloudflare.com/argo-and-the-cloudflare-global-private-backbone">Argo Transit Selection</a> is an extension of Argo Smart Routing where the lowest latency path between Cloudflare data center hops is explicitly selected and used to forward customer origin requests. The lowest latency path will often be our global backbone, as it is a more dedicated and private means of connectivity, as opposed to third-party transit networks.</p><p>Consider a multinational Dutch pharmaceutical company that relies on Cloudflare's network and services with our <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/">SASE solution</a> to connect their global offices, research centers, and remote employees. Their Asian branch offices depend on Cloudflare's security solutions and network to provide secure access to important data from their central data centers back to their offices in Asia. In case of a cable cut between regions, our network would automatically look for the best alternative route between them so that business impact is limited.</p><p>Argo measures every potential combination of the different provider paths, including our own backbone, as an option for reaching origins with smart routing. Because of our vast interconnection with so many networks, and our global private backbone, Argo is able to identify the most performant network path for requests. The backbone is consistently one of the lowest latency paths for Argo to choose from.</p><p>In addition to high performance, we care greatly about network reliability for our customers. This means we need to be as resilient as possible from fiber cuts and third-party transit provider issues. During a disruption of the <a href="https://en.wikipedia.org/wiki/AAE-1">AAE-1</a> (<a href="https://www.submarinecablemap.com/submarine-cable/asia-africa-europe-1-aae-1">Asia Africa Europe-1</a>) submarine cable, this is what Argo saw between Singapore and Amsterdam across some of our transit provider paths vs. the backbone.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/66CGBePnLzuLRuTErvf8Cr/813b4b60a95935491e967214851e5a04/BLOG-2432-9.png" />
          </figure><p>The large (purple line) spike shows a latency increase on one of our third-party IP transit provider paths due to congestion, which was eventually resolved following likely traffic engineering within the provider’s network. We saw a smaller latency increase (yellow line) over other transit networks, but still one that is noticeable. The bottom (green) line on the graph is our backbone, where round-trip time more or less remains flat throughout the event, due to our diverse backbone connectivity between Asia and Europe. Throughout the fiber cut, we remained stable at around 200ms between Amsterdam and Singapore. There was no noticeable network hiccup as was seen on the transit provider paths, so Argo actively leveraged the backbone for optimal performance.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1A8CdaGq8P2hF3DtIs9dQI/a10fdf3af9de917fb0036d38eace9905/BLOG-2432-10.png" />
          </figure>
    <div>
      <h3>Call to action</h3>
      <a href="#call-to-action">
        
      </a>
    </div>
    <p>As Argo improves performance in our network, Cloudflare Network Interconnects (<a href="https://developers.cloudflare.com/network-interconnect/">CNIs</a>) optimize getting onto it. We encourage our Enterprise customers to use our free CNI’s as on-ramps onto our network whenever practical. In this way, you can fully leverage our network, including our robust backbone, and increase overall performance for every product within your Cloudflare Connectivity Cloud. In the end, our global network is our main product and our backbone plays a critical role in it. This way we continue to help build a better Internet, by improving our services for everybody, everywhere.</p><p>If you want to be part of our mission, join us as a Cloudflare network on-ramp partner to offer secure and reliable connectivity to your customers by integrating directly with us. Learn more about our on-ramp partnerships and how they can benefit your business <a href="https://www.cloudflare.com/network-onramp-partners/">here</a>.</p> ]]></content:encoded>
            <category><![CDATA[Connectivity Cloud]]></category>
            <category><![CDATA[Anycast]]></category>
            <category><![CDATA[Argo Smart Routing]]></category>
            <category><![CDATA[Athenian Project]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Better Internet]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Magic Transit]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">WiHZr8Fb6WzdVjo0egsWW</guid>
            <dc:creator>Shozo Moritz Takaya</dc:creator>
            <dc:creator>Bryton Herdes</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare treats SASE anxiety for VeloCloud customers]]></title>
            <link>https://blog.cloudflare.com/treating-sase-anxiety/</link>
            <pubDate>Wed, 06 Mar 2024 14:00:40 GMT</pubDate>
            <description><![CDATA[ The turbulence in the SASE market is driving many customers to seek help. We’re doing our part to help VeloCloud customers who are caught in the crosshairs of shifting strategies ]]></description>
            <content:encoded><![CDATA[ <p></p><p>We understand that your VeloCloud deployment may be partially or even fully deployed. You may be experiencing discomfort from <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/">SASE</a> anxiety. Symptoms include:</p><ul><li><p><b>Sudden vendor whiplash</b> - Over the past 5 years, the ownership and strategic direction of VeloCloud has undergone a series of dramatic changes. VeloCloud was <a href="https://blogs.vmware.com/networkvirtualization/2017/12/vmware-closes-velocloud.html/">acquired by VMware</a> in 2017, then VMware was spun off from <a href="https://www.dell.com/en-us/dt/corporate/newsroom/announcements/detailpage.press-releases~usa~2021~11~20211101-dell-technologies-announces-completion-of-vmware-spin-off.htm">Dell EMC in 2021</a>, and in 2023 <a href="https://investors.broadcom.com/news-releases/news-release-details/broadcom-completes-acquisition-vmware">Broadcom completed its acquisition of VMware and VeloCloud</a>.</p></li><li><p><b>Dizziness from product names</b> - VeloCloud helpfully published a list of some of its previous product names, which include <a href="https://sase.vmware.com/sd-wan/velocloud">VeloCloud, Velo, Velo SD-WAN, VeloCloud SD-WAN, and VMware SD-WAN by VeloCloud</a>.  But the list also misses other names such as “VMware NSX SD-WAN by VeloCloud” as well. Recently, VMware announced yet another name change by renaming <a href="https://blogs.vmware.com/sase/2024/02/20/back-to-the-future-with-velocloud-the-intelligent-overlay-for-the-software-defined-edge/">VMware SD-WAN to VMware VeloCloud SD-WAN, and renamed VMware SASE to VMware VeloCloud SASE, secured by Symantec</a>.</p></li><li><p><b>Irregular priorities and strategies</b> -  With the number of times that VMware reorganized its various networking and security products into different business units, it’s now about to embark on yet another as Broadcom pursues single vendor SASE.</p></li></ul><p>If you’re a VeloCloud customer, we are here to help you with your transition to Magic WAN, with planning, products and services. You’ve experienced the turbulence, and that’s why we are taking steps to help. First, it’s necessary to illustrate what’s fundamentally wrong with the architecture by acquisition model in order to define the right path forward. Second, we document the steps involved for making a transition from VeloCloud to Cloudflare. Third, we are offering a helping hand to help VeloCloud customers to get their SASE strategies back on track.</p>
    <div>
      <h2>Architecture is the key to SASE</h2>
      <a href="#architecture-is-the-key-to-sase">
        
      </a>
    </div>
    <p>Your IT organization must deliver stability across your information systems, because the future of your business depends on the decisions that you make today. You need to make sure that your SASE journey is backed by vendors that you can depend on. Indecisive vendors and unclear strategies rarely inspire confidence, and it’s driving organizations to reconsider their relationship.</p><p>It’s not just VeloCloud that’s pivoting. Many vendors are chasing the brass ring to meet the requirement for Single Vendor SASE, and they’re trying to reduce their time to market by acquiring features on their checklist, rather than taking the time to build the right architecture for consistent management and user experience. It’s led to rapid consolidation of both startups and larger product stacks, but now we’re seeing many many instances of vendors having to rationalize their overlapping product lines. Strange days indeed.</p><p>But the thing is, Single Vendor SASE is not a feature checklist game. It’s not like shopping for PC antivirus software where the most attractive option was the one with the most checkboxes. It doesn’t matter if you acquire a large stack of product acronyms (ZTNA, SD-WAN, SWG, CASB, DLP,  FWaaS, SD-WAN to name but a few) if the results are just as convoluted as the technology it aims to replace.</p><p>If organizations are new to SASE, then it can be difficult to know what to look for. However, one clear sign of trouble is taking an SSE designed by one vendor and combining it with SD-WAN from another. Because you can’t get a converged platform out of two fundamentally incongruent technologies.</p>
    <div>
      <h2>Why SASE Math Doesn’t Work</h2>
      <a href="#why-sase-math-doesnt-work">
        
      </a>
    </div>
    <p>The conceptual model for SASE typically illustrates two half circles, with one consisting of cloud-delivered networking and the other being cloud-delivered security. With this picture in mind, it’s easy to see how one might think that combining an implementation of cloud-delivered networking (VeloCloud SD-WAN) and an implementation of cloud-delivered security (Symantec Network Protection - SSE) might satisfy the requirements. Does Single Vendor SASE = SD-WAN + SSE?</p><p>In practice, networking and network security do not exist in separate universes, but SD-WAN and SSE implementations do, especially when they were designed by different vendors. That’s why the math doesn’t work, because even with the requisite SASE functionality, the implementation of the functionality doesn’t fit. <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-sd-wan/">SD-WAN</a> is designed for network connectivity between sites over the SD-WAN fabric, whereas <a href="https://www.cloudflare.com/learning/access-management/security-service-edge-sse/">SSE</a> largely focuses on the enforcement of security policy for user-&gt;application traffic from remote users or traffic leaving (rather than traversing) the SD-WAN fabric. Therefore, to bring these two worlds together, you end up with security inconsistency, proxy chains which create a burden on latency, or implementing security at the edge rather than in the cloud.</p>
    <div>
      <h2>Why Cloudflare is different</h2>
      <a href="#why-cloudflare-is-different">
        
      </a>
    </div>
    <p>At Cloudflare, the basis for our approach to single vendor SASE starts from building a global network designed with private data centers, overprovisioned network and compute capacity, and a private backbone designed to deliver our customer’s traffic to any destination. It’s what we call any-to-any connectivity. It’s not using the public cloud for SASE services, because the public cloud was designed as a destination for traffic rather than being optimized for transit. We are in full control of the design of our data centers and network and we’re obsessed with making it even better every day.</p><p>It’s from this network that we deliver networking and <a href="https://www.cloudflare.com/network-security/">security services</a>. Conceptually, we implement a philosophy of composability, where the fundamental network connection between the customer’s site and the Cloudflare data center remains the same across different use cases. In practice, and unlike traditional approaches, it means no downtime for service insertion when you need more functionality — the connection to Cloudflare remains the same. It’s the services and the onboarding of additional destinations that changes as organizations expand their use of Cloudflare.</p><p>From the perspective of branch connectivity, use Magic WAN for the connectivity that ties your business together, no matter which way traffic passes. That’s because we don’t treat the directions of your network traffic as independent problems. We solve for consistency by on-ramping all traffic through one of Cloudflare’s 310+ anycasted data centers (whether inbound, outbound, or east-west) for enforcement of security policy. We solve for latency by eliminating the need to forward traffic to a compute location by providing full compute services in every data center. We implement SASE using a light edge / heavy cloud model, with services delivered within the Cloudflare connectivity cloud rather than on-prem.</p>
    <div>
      <h2>How to transition from VeloCloud to Cloudflare</h2>
      <a href="#how-to-transition-from-velocloud-to-cloudflare">
        
      </a>
    </div>
    <p>Start by contacting us to get a consultation session with our solutions architecture team. Our architects specialize in <a href="https://www.cloudflare.com/learning/network-layer/how-to-prepare-for-network-modernization-projects/">network modernization</a> and can map your SASE goals across a series of smaller projects. We’ve worked with hundreds of organizations to achieve their SASE goals with the Cloudflare connectivity cloud and can build a plan that your team can execute on.</p><p>For product education, join one of our product workshops on Magic WAN to get a deep dive into how it’s built and how it can be rolled out to your locations. Magic WAN uses a light edge, heavy cloud model that has multiple network insertion models (whether a tunnel from an existing device, using our turnkey Magic WAN Connector, or deploying a virtual appliance) which can work in parallel or as a replacement for your branch connectivity needs, thus allowing you to migrate at your pace. Our specialist teams can help you mitigate transitionary hardware and license costs as you phase out VeloCloud and accelerate your rollout of Magic WAN.</p><p>The Magic WAN technical engineers have a number of resources to help you build product knowledge as well. This includes reference architectures and quick start guides that address your organization’s connectivity goals, whether sizing down your on-prem network in favor of the emerging “coffee shop networking” philosophy, retiring legacy SD-WAN, and full replacement of conventional MPLS.</p><p>For services, our <a href="https://www.cloudflare.com/success-offerings/">customer success teams</a> are ready to support your transition, with services that are tailored specifically for Magic WAN migrations both large and small.</p>
    <div>
      <h2>Your next move</h2>
      <a href="#your-next-move">
        
      </a>
    </div>
    <p>Interested in learning more? <a href="https://www.cloudflare.com/lp/velocloud-replacement-sd-wan/">Contact us to get started</a>, and we’ll help you with your SASE journey. Contact us to learn how to replace VeloCloud with Cloudflare Magic WAN and use our network as an extension of yours.</p> ]]></content:encoded>
            <category><![CDATA[SASE]]></category>
            <category><![CDATA[Magic WAN]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">2psJqoZFr5Bh7pDmHQ0yUw</guid>
            <dc:creator>Brian Tokuyoshi</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare Gen 12 Server: bigger, better, cooler in a 2U1N form factor]]></title>
            <link>https://blog.cloudflare.com/cloudflare-gen-12-server-bigger-better-cooler-in-a-2u1n-form-factor/</link>
            <pubDate>Fri, 01 Dec 2023 18:45:57 GMT</pubDate>
            <description><![CDATA[ Cloudflare Gen 12 Compute servers are moving to 2U1N form factor to optimize the thermal design to accommodate both high-power CPUs (>350W) and GPUs effectively while maintaining performance and reliability ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4k4pAqN8lhelYK7bHj3j2n/853ec5b25a2e62e08b85f611d0f76eb6/image5.png" />
            
            </figure><p>Two years ago, Cloudflare undertook a significant upgrade to our compute server hardware as we deployed our cutting-edge <a href="/the-epyc-journey-continues-to-milan-in-cloudflares-11th-generation-edge-server/">11th Generation server fleet</a>, based on AMD EPYC Milan x86 processors. It's nearly time for another refresh to our x86 infrastructure, with deployment planned for 2024. This involves upgrading not only the processor itself, but many of the server's components. It must be able to accommodate the GPUs that drive inference on <a href="/workers-ai/">Workers AI</a>, and leverage the latest advances in memory, storage, and security. Every aspect of the server is rigorously evaluated — including the server form factor itself.</p><p>One crucial variable always in consideration is temperature. The latest generations of x86 processors have yielded significant leaps forward in performance, with the tradeoff of higher power draw and heat output. In this post we will explore this trend, and how it informed our decision to adopt a new physical footprint for our next-generation fleet of servers.</p><p>In preparation for the upcoming refresh, we conducted an extensive survey of the x86 CPU landscape. AMD recently introduced its latest offerings: Genoa, Bergamo, and Genoa-X, featuring the power of their innovative Zen 4 architecture. At the same time, Intel unveiled Sapphire Rapids as part of its 4th Generation Intel Xeon Scalable Processor Platform, code-named “Eagle Stream”, showcasing their own advancements. These options offer valuable choices as we consider how to shape the future of Cloudflare's server technology to match the needs of our customers.</p><p>A continuing challenge we face across x86 CPU vendors, including the new Intel and AMD chipsets, is the rapidly increasing CPU Thermal Design Point (TDP) generation over generation. TDP is defined to be the maximum heat dissipated by the CPU under load that a cooling system should be designed to cool; TDP also describes the maximum power consumption of the CPU socket. This plot shows the CPU TDP trend of each hardware server generation since 2014:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Ae6NrHId3uKhuS6mIhW7F/055e89fef13a710b3cc45607841de8ec/image4.png" />
            
            </figure><p>At Cloudflare, our <a href="/a-tour-inside-cloudflares-g9-servers/">Gen 9 server</a> was based on Intel Skylake 6162 with a TDP of 150W, our <a href="/cloudflares-gen-x-servers-for-an-accelerated-future/">Gen 10 server</a> was based on AMD Rome 7642 at 240W, and our <a href="/the-epyc-journey-continues-to-milan-in-cloudflares-11th-generation-edge-server/">Gen 11 server</a> was based on AMD Milan 7713 at 240W. Today, <a href="https://www.amd.com/system/files/documents/epyc-9004-series-processors-data-sheet.pdf">AMD EPYC 9004 Series SKU Stack</a> default TDP goes up to 360W and is configurable up to 400W. <a href="https://ark.intel.com/content/www/us/en/ark/products/codename/126212/products-formerly-sapphire-rapids.html#@Server">Intel Sapphire Rapid SKU stack</a> default TDP goes up to 350W. This trend of rising TDP is expected to continue with the next generation of x86 CPU offerings.</p>
    <div>
      <h2>Designing multi-generational cooling solutions</h2>
      <a href="#designing-multi-generational-cooling-solutions">
        
      </a>
    </div>
    <p>Cloudflare Gen 10 servers and Gen 11 servers were designed in a 1U1N form factor, with air cooling to maximize rack density (1U means the server form factor is 1 Rack Unit, which is 1.75” in height or thickness; 1N means there is one server node per chassis). However, to cool more than 350 Watt TDP CPU with air in a 1U1N form factor requires fans to be spinning at 100% duty cycle (running all the time, at max speed). A single fan running at full speed consumes about 40W, and a typical server configuration of 7–8 dual rotor fans per server can hit 280–320 W to power the fans alone. At peak loads, the total system power consumed, including the cooling fans, processor, and other components, can eclipse 750 Watt per server.</p><p>The 1U form factor can fit a maximum of eight 40mm dual rotor fans, which sets an upper bound on the temperature range it can support. We first take into account ambient room temperature, which we assume to be 40° C (the maximum expected temperature under normal conditions). Under these conditions we determined that air-cooled servers, with all eight fans running at 100% duty cycle, can support CPUs with a maximum TDP of 400W.</p><p>This poses a challenge, because the next generation of AMD CPUs, while being socket compatible with the current gen, rise up to 500W TDP and we expect other vendors to follow a similar trend in subsequent generations. In order to future-proof, and re-use as much of Gen 12 design as possible for future generations across all x86 CPU products, we will need a scalable thermal solution. Moreover, many co-location facilities where Cloudflare deploys servers have a rack power limit. With total system power consumption at north of 750 Watt per node, and after accounting for space utilized by networking gear, we would have been underutilizing rack space by as much as 50%.</p>
    <div>
      <h3>We have a problem!</h3>
      <a href="#we-have-a-problem">
        
      </a>
    </div>
    <p>We do have a variety of SKU options available to use on each CPU generation, and if power is the primary constraint, we could choose to limit the TDP and use a lower core count, low-power SKU. To evaluate this, the hardware team ran a synthetic workload benchmark in the lab across several CPU SKUs. We found that Cloudflare services continue to scale with cores effectively up to 128 cores or 256 hardware threads, resulting in significant performance gain, and Total Cost of Ownership (TCO) benefit, at and above 360W TDP.</p><p>However, while the performance metric and TCO metric look good on a per-server basis, this is only part of the story: servers go into a server rack when they are deployed, and server racks come with constraints and limitations that have to be taken into design consideration. The two limiting factors are rack power budget and rack height. Taking these two rack-level constraints into account, how does the combined Total Cost of Ownership (TCO) benefit scale with TDP? We ran a performance sweep across the configurable TDP range of the highest core count CPUs and noticed that rack-level TCO benefit stagnates when CPU TDP rises above roughly 340W.</p><p>TCO advantage stagnates because we hit our rack power budget limit — the incremental performance gain per server, coinciding with an incremental increase of CPU TDP above 340W, is negated by the reduction in the number of servers that can be installed in a rack to remain within the rack’s power budget. Even with CPU TDP power capped at 340W, we are still underutilizing the rack, with 30% of the space still available.</p><p>Thankfully, there is an alternative to power capping and compromising on possible performance gain, by increasing the chassis height to a 2U form factor (from 1.75” in height to 3.5” in height). The benefits from doing this include:</p><ul><li><p>Larger fans (up to 80mm) that can move more air</p></li><li><p>Allowing for a taller and larger heatsink that can dissipate heat more effectively</p></li><li><p>Less air impedance within the chassis since the majority of components are 1U height</p></li><li><p>Providing sufficient room to add PCIe attached accelerators / GPUs, including dual-slot form factor options</p></li></ul><p><i>Click images to enlarge</i></p><p>2U chassis design is nothing new, and is actually very common in the industry for various reasons, one of which is better airflow to dissipate more heat, but it does come with the tradeoff of taking up more space and limiting the number of servers than can be installed in a rack. Since we are power constrained instead of space constrained, the tradeoff did not negatively impact our design.</p><p>Thermal simulations provided by Cloudflare vendors showed that 4x 60mm fans or 4x 80mm fans at less than 40 Watt per fan is sufficient to cool the system. That is a theoretical savings of at least 150 Watt compared to 8x 40mm fans in a 1U design, which would result in significant Operational Expenditure (OPEX) savings and a boost to TCO improvement. Switching to a 2U form factor also gives us the benefit of fully utilizing our rack power budget and our rack space, and provides ample room for the addition of PCIe attached accelerators / GPUs, including dual-slot form factor options.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>It might seem counter-intuitive, but our observations indicate that growing the server chassis, and utilizing more space per node actually increases rack density and improves overall TCO benefit over previous generation deployments, since it allows for a better thermal design. We are very happy with the result of this technical readiness investigation, and are actively working on validating our Gen 12 Compute servers and launching them into production soon. Stay tuned for more details on our Gen 12 designs.</p><p>If you are excited about helping build  a better Internet, come join us, <a href="https://www.cloudflare.com/careers/jobs/">we are hiring</a>!</p> ]]></content:encoded>
            <category><![CDATA[AMD]]></category>
            <category><![CDATA[Hardware]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <guid isPermaLink="false">2aWmwl401SPH5eZNo2wHMg</guid>
            <dc:creator>JQ Lau</dc:creator>
            <dc:creator>Syona Sarma</dc:creator>
        </item>
        <item>
            <title><![CDATA[Armed to Boot: an enhancement to Arm's Secure Boot chain]]></title>
            <link>https://blog.cloudflare.com/armed-to-boot/</link>
            <pubDate>Wed, 25 Jan 2023 14:00:00 GMT</pubDate>
            <description><![CDATA[ Enhancing the Arm Secure Boot chain to improve platform security on modern systems. ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6DXRZwWpgxoNOs7LL87gEw/26d5483a74be3dcbe2a5c56c65c5986c/image1-45.png" />
            
            </figure><p>Over the last few years, there has been a rise in the number of attacks that affect how a computer boots. Most modern computers use a specification called Unified Extensible Firmware Interface (<a href="https://en.wikipedia.org/wiki/UEFI">UEFI</a>) that defines a software interface between an operating system (e.g. Windows) and platform firmware (e.g. disk drives, video cards). There are security mechanisms built into UEFI that ensure that platform firmware can be cryptographically validated and boot securely through an application called a bootloader. This firmware is stored in non-volatile <a href="https://en.wikipedia.org/wiki/Serial_Peripheral_Interface">SPI</a> flash memory on the motherboard, so it persists on the system even if the operating system is reinstalled and drives are replaced.</p><p>This creates a ‘trust anchor’ used to validate each stage of the boot process, but, unfortunately, this trust anchor is also a target for attack. In these UEFI attacks, malicious actions are loaded onto a compromised device early in the boot process. This means that malware can change configuration data, establish persistence by ‘<a href="https://www.zdnet.com/article/chinese-apt-deploy-moonbounce-malware-in-uefi-firmware/">implanting</a>’ itself, and can bypass security measures that are only loaded at the operating system stage. So, while UEFI-anchored secure boot protects the bootloader from bootloader attacks, it does not protect the UEFI firmware itself.</p><p>Because of this growing trend of attacks, we began the process of <a href="/anchoring-trust-a-hardware-secure-boot-story/">cryptographically signing our UEFI firmware</a> as a mitigation step. While our existing solution is platform specific to our x86 AMD server fleet, we did not have a similar solution to UEFI firmware signing for Arm. To determine what was missing, we had to take a deep dive into the Arm secure boot process.</p><p>Read on to learn about the world of Arm Trusted Firmware Secure Boot.</p>
    <div>
      <h2>Arm Trusted Firmware Secure Boot</h2>
      <a href="#arm-trusted-firmware-secure-boot">
        
      </a>
    </div>
    <p>Arm defines a trusted boot process through an architecture called <a href="https://developer.arm.com/documentation/den0006/d">Trusted Board Boot Requirements</a> (TBBR), or Arm Trusted Firmware (ATF) Secure Boot. TBBR works by authenticating a series of cryptographically signed binary images each containing a different stage or element in the system boot process to be loaded and executed. Every bootloader (BL) stage accomplishes a different stage in the initialization process:</p>
    <div>
      <h3>BL1</h3>
      <a href="#bl1">
        
      </a>
    </div>
    <p>BL1 defines the boot path (is this a cold boot or warm boot), initializes the architectures (exception vectors, CPU initialization, and control register setup), and initializes the platform (enables watchdog processes, MMU, and DDR initialization).</p>
    <div>
      <h3>BL2</h3>
      <a href="#bl2">
        
      </a>
    </div>
    <p>BL2 prepares initialization of the Arm Trusted Firmware (ATF), the stack responsible for setting up the secure boot process. After ATF setup, the console is initialized, memory is mapped for the MMU, and message buffers are set for the next bootloader.</p>
    <div>
      <h3>BL3</h3>
      <a href="#bl3">
        
      </a>
    </div>
    <p>The BL3 stage has multiple parts, the first being initialization of runtime services that are used in detecting system topology. After initialization, there is a handoff between the ATF ‘secure world’ boot stage to the ‘normal world’ boot stage that includes setup of UEFI firmware. Context is set up to ensure that no secure state information finds its way into the normal world execution state.</p><p>Each image is authenticated by a public key, which is stored in a signed certificate and can be traced back to a root key stored on the SoC in one time programmable (OTP) memory or ROM.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1UdHJVQqnXX9gZ6uaglVa4/83e7065007376d4f774fc6d9d6ee867b/image9.png" />
            
            </figure><p>TBBR was originally designed for cell phones. This established a reference architecture on how to build a “Chain of Trust” from the first ROM executed (BL1) to the handoff to “normal world” firmware (BL3). While this creates a validated firmware signing chain, it has caveats:</p><ol><li><p>SoC manufacturers are heavily involved in the secure boot chain, while the customer has little involvement.</p></li><li><p>A unique SoC SKU is required per customer. With one customer this could be easy, but most manufacturers have thousands of SKUs</p></li><li><p>The SoC manufacturer is primarily responsible for end-to-end signing and maintenance of the PKI chain. This adds complexity to the process  requiring USB key fobs for signing.</p></li><li><p>Doesn’t scale outside the manufacturer.</p></li></ol><p>What this tells us is what was built for cell phones doesn’t scale for servers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3ycPPB81rgzRIuCGSu0KE7/6ed984097e839782f83ecbc28c25212a/image3-27.png" />
            
            </figure><p>If we were involved 100% in the manufacturing process, then this wouldn’t be as much of an issue, but we are a customer and consumer. As a customer, we have a lot of control of our server and block design, so we looked at design partners that would take some of the concepts we were able to implement with AMD Platform Secure Boot and refine them to fit Arm CPUs.</p>
    <div>
      <h2>Amping it up</h2>
      <a href="#amping-it-up">
        
      </a>
    </div>
    <p>We partnered with Ampere and tested their Altra Max <a href="/arms-race-ampere-altra-takes-on-aws-graviton2/">single socket rack server CPU</a> (code named Mystique) that provides high performance with incredible power efficiency per core, much of what we were looking for in reducing power consumption. These are only a small subset of specs, but Ampere backported various features into the Altra Max notably, speculative attack mitigations that include Meltdown and Spectre (variants 1 and 2) from the Armv8.5 instruction set architecture, giving Altra the “+” designation in their ISA.</p><p>Ampere does implement a signed boot process similar to the ATF signing process mentioned above, but with some slight variations. We’ll explain it a bit to help set context for the modifications that we made.</p>
    <div>
      <h2>Ampere Secure Boot</h2>
      <a href="#ampere-secure-boot">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3z02FzpTwAjg8aK9VdC2Ro/6ab22618c187ff09fcc1b7d177d165db/image4-21.png" />
            
            </figure><p>The diagram above shows the Arm processor boot sequence as implemented by Ampere. System Control Processors (SCP) are comprised of the System Management Processor (SMpro) and the Power Management Processor (PMpro). The SMpro is responsible for features such as secure boot and bmc communication while the PMpro is responsible for power features such as Dynamic Frequency Scaling and on-die thermal monitoring.</p><p>At power-on-reset, the SCP runs the system management bootloader from ROM and loads the SMpro firmware. After initialization, the SMpro spawns the power management stack on the PMpro and ATF threads. The ATF BL2 and BL31 bring up processor resources such as DRAM, and PCIe. After this, control is passed to BL33 BIOS.</p>
    <div>
      <h3>Authentication flow</h3>
      <a href="#authentication-flow">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1uctsORF4iINvxIpkpAMCM/34e931a29cc21a6b0e9d3549e81abcf8/image7-5.png" />
            
            </figure><p>At power on, the SMpro firmware reads Ampere’s public key (ROTPK) from the SMpro key certificate in SCP EEPROM, computes a hash and compares this to Ampere’s public key hash stored in eFuse. Once authenticated, Ampere’s public key is used to decrypt key and content certificates for SMpro, PMpro, and ATF firmware, which are launched in the order described above.</p><p>The SMpro public key will be used to authenticate the SMpro and PMpro images and ATF keys which in turn will authenticate ATF images. This cascading set of authentication that originates with the Ampere root key and stored in chip called an electronic fuse, or eFuse.  An eFuse can be programmed only once, setting the content to be read-only and can not be tampered with nor modified.</p><p>This is the original hardware root of trust used for signing system, secure world firmware. When we looked at this, after referencing the signing process we had with AMD PSB and knowing there was a large enough one-time-programmable (OTP) region within the SoC, we thought: why can’t we insert our key hash in here?</p>
    <div>
      <h2>Single Domain Secure Boot</h2>
      <a href="#single-domain-secure-boot">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6FQq6V32FKRBNJ8z0efGuP/e718f9c9cf5e8b754b4e2ce5018f904a/image11-2.png" />
            
            </figure><p>Single Domain Secure Boot takes the same authentication flow and adds a hash of the customer public key (Cloudflare firmware signing key in this case) to the eFuse domain. This enables the verification of UEFI firmware by a hardware root of trust. This process is performed in the already validated ATF firmware by BL2. Our public key (dbb) is read from UEFI secure variable storage, a hash is computed and compared to the public key hash stored in eFuse. If they match, the validated public key is used to decrypt the BL33 content certificate, validating and launching the BIOS, and remaining boot items. This is the key feature added by SDSB. It validates the entire software boot chain with a single eFuse root of trust on the processor.</p>
    <div>
      <h2>Building blocks</h2>
      <a href="#building-blocks">
        
      </a>
    </div>
    <p>With a basic understanding of how Single Domain Secure Boot works, the next logical question is “How does it get implemented?”. We ensure that all UEFI firmware is signed at build time, but this process can be better understood if broken down into steps.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7JbZYTpQmAzs85jK66urgj/dd5cf1128921938209c0ea090372eab4/Building-Blocks-1.png" />
            
            </figure><p>Ampere, our original device manufacturer (ODM), and we play a role in execution of SDSB. First, we generate certificates for a public-private key pair using our internal, secure PKI. The public key side is provided to the ODM as dbb.auth and dbu.auth in UEFI secure variable format. Ampere provides a reference Software Release Package (SRP) including the baseboard management controller, system control processor, UEFI, and complex programmable logic device (CPLD) firmware to the ODM, who customizes it for their platform. The ODM generates a board file describing the hardware configuration, and also customizes the UEFI to enroll dbb and dbu to secure variable storage on first boot.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3AtHZs6bsQ0VgTHkNDskFD/6ef31e083683fa462cf1f0f9e0860501/Building-Blocks-2.png" />
            
            </figure><p>Once this is done, we generate a UEFI.slim file using the ODM’s UEFI ROM image, Arm Trusted Firmware (ATF) and Board File. (Note: This differs from AMD PSB insofar as the entire image and ATF files are signed; with AMD PSB, only the first block of boot code is signed.) The entire .SLIM file is signed with our private key, producing a signature hash in the file. This can only be authenticated by the correct public key. Finally, the ODM packages the UEFI into .HPM format compatible with their platform BMC.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/74fFXKUQ6ao22MbxnLRTOZ/8ee373903359790a543c2907280069ec/Security-Provisioning-Firmware.png" />
            
            </figure><p>In parallel, we provide the debug fuse selection and hash of our DER-formatted public key. Ampere uses this information to create a special version of the SCP firmware known as Security Provisioning (SECPROV) .slim format. This firmware is run one time only, to program the debug fuse settings and public key hash into the SoC eFuses. Ampere delivers the SECPROV .slim file to the ODM, who packages it into a .hpm file compatible with the BMC firmware update tooling.</p>
    <div>
      <h2>Fusing the keys</h2>
      <a href="#fusing-the-keys">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5PcDXfMPCbyZHbW06S9TQK/a1fbf46b9176e2291172387cda350829/eFuse-Key-Provisioning.png" />
            
            </figure><p>During system manufacturing, firmware is pre-programmed into storage ICs before placement on the motherboard. Note that the SCP EEPROM contains the SECPROV image, not standard SCP firmware. After a system is first powered on, an IPMI command is sent to the BMC which releases the Ampere processor from reset. This allows SECPROV firmware to run, burning the SoC eFuse with our public key hash and debug fuse settings.</p>
    <div>
      <h2>Final manufacturing flow</h2>
      <a href="#final-manufacturing-flow">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5APB65geJdVYfsHYJDeU5W/b9ce5c4abbf565525110c79852d53e87/Final-manufacturing-flow.png" />
            
            </figure><p>Once our public key has been provisioned, manufacturing proceeds by re-programming the SCP EEPROM with its regular firmware. Once the system powers on, ATF detects there are no keys present in secure variable storage and allows UEFI firmware to boot, regardless of signature. Since this is the first UEFI boot, it programs our public key into secure variable storage and reboots. ATF is validated by Ampere’s public key hash as usual. Since our public key is present in dbb, it is validated against our public key hash in eFuse and allows UEFI to boot.</p>
    <div>
      <h2>Validation</h2>
      <a href="#validation">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/QqpZGCJe52609SFewuSK0/bc0514db4f993630e9ec8b735c38d91e/Validation.png" />
            
            </figure><p>The first part of validation requires observing successful destruction of the eFuses. This imprints our public key hash into a dedicated, immutable memory region, not allowing the hash to be overwritten. Upon automatic or manual issue of an IPMI OEM command to the BMC, the BMC observes a signal from the SECPROV firmware, denoting eFuse programming completion. This can be probed with BMC commands.</p><p>When the eFuses have been blown, validation continues by observing the boot chain of the other firmware. Corruption of the SCP, ATF, or UEFI firmware obstructs boot flow and boot authentication and will cause the machine to fail booting to the OS. Once firmware is in place, happy path validation begins with booting the machine.</p><p>Upon first boot, firmware boots in the following order: BMC, SCP, ATF, and UEFI. The BMC, SCP, and ATF firmware can be observed via their respective serial consoles. The UEFI will automatically enroll the dbb and dbu files to the secure variable storage and trigger a reset of the system.</p><p>After observing the reset, the machine should successfully boot to the OS if the feature is executed correctly. For further validation, we can use the UEFI shell environment to extract the dbb file and compare the hash against the hash submitted to Ampere. After successfully validating the keys, we flash an unsigned UEFI image. An unsigned UEFI image causes authentication failure at bootloader stage BL3-2. The ATF firmware undergoes a boot loop as a result. Similar results will occur for a UEFI image signed with incorrect keys.</p>
    <div>
      <h2>Updated authentication flow</h2>
      <a href="#updated-authentication-flow">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6AsINoXMVNcl9qeYT3mwgz/04850535a1613572e174fec5f40003a0/UEFI-Authentication.png" />
            
            </figure><p>On all subsequent boot cycles, the ATF will read secure variable dbb (our public key), compute a hash of the key, and compare it to the read-only Cloudflare public key hash in eFuse. If the computed and eFuse hashes match, our public key variable can be trusted and is used to authenticate the signed UEFI. After this, the system boots to the OS.</p>
    <div>
      <h2>Let’s boot!</h2>
      <a href="#lets-boot">
        
      </a>
    </div>
    <div></div>
<p></p><p>We were unable to get a machine without the feature enabled to demonstrate the set-up of the feature since we have the eFuse set at build time, but we can demonstrate what it looks like to go between an unsigned BIOS and a signed BIOS. What we would have observed with the set-up of the feature is a custom BMC command to instruct the SCP to burn the ROTPK into the SOC’s OTP fuses. From there, we would observe feedback to the BMC detailing whether burning the fuses was successful. Upon booting the UEFI image for the first time, the UEFI will write the dbb and dbu into secure storage.</p><p>As you can see, after flashing the unsigned BIOS, the machine fails to boot.</p><p>Despite the lack of visibility in failure to boot, there are a few things going on underneath the hood. The SCP (System Control Processor) still boots.</p><ol><li><p>The SCP image holds a key certificate with Ampere’s generated ROTPK and the SCP key hash. SCP will calculate the ROTPK hash and compare it against the burned OTP fuses. In the failure case, where the hash does not match, you will observe a failure as you saw earlier. If successful, the SCP firmware will proceed to boot the PMpro and SMpro. Both the PMpro and SMpro firmware will be verified and proceed with the ATF authentication flow.</p></li><li><p>The conclusion of the SCP authentication is the passing of the BL1 key to the first stage bootloader via the SCP HOB(hand-off-block) to proceed with the standard three stage bootloader ATF authentication mentioned previously.</p></li><li><p>At BL2, the dbb is read out of the secure variable storage and used to authenticate the BL33 certificate and complete the boot process by booting the BL33 UEFI image.</p></li></ol><div></div>
<p></p>
    <div>
      <h2>Still more to do</h2>
      <a href="#still-more-to-do">
        
      </a>
    </div>
    <p>In recent years, management interfaces on servers, like the BMC, have been the target of cyber attacks including ransomware, implants, and disruptive operations. Access to the BMC can be local or remote. With remote vectors open, there is potential for malware to be installed on the BMC via network interfaces. With compromised software on the BMC, malware or spyware could maintain persistence on the server. An attacker might be able to update the BMC directly using flashing tools such as flashrom or socflash without the same level of firmware resilience established at the UEFI level.</p><p>The future state involves using host CPU-agnostic infrastructure to enable a cryptographically secure host prior to boot time. We will look to incorporate a modular approach that has been proposed by the Open Compute Project’s Data Center Secure Control Module Specification (DC-SCM) 2.0 <a href="https://drive.google.com/file/d/13BxuseSrKo647hjIXjp087ei8l5QQVb0/view">specification</a>. This will allow us to standardize our Root of Trust, sign our BMC, and assign physically unclonable function (PUF) based identity keys to components and peripherals to limit the use of OTP fusing. OTP fusing creates a problem with trying to “e-cycle” or reuse machines as you cannot truly remove a machine identity.</p> ]]></content:encoded>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Hardware]]></category>
            <category><![CDATA[Encryption]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <guid isPermaLink="false">1N9Xj1EVgCAmk3Sv1HVHQR</guid>
            <dc:creator>Derek Chamorro</dc:creator>
            <dc:creator>Ryan Chow</dc:creator>
        </item>
        <item>
            <title><![CDATA[A more sustainable end-of-life for your legacy hardware appliances with Cloudflare and Iron Mountain]]></title>
            <link>https://blog.cloudflare.com/sustainable-end-of-life-hardware/</link>
            <pubDate>Wed, 14 Dec 2022 14:00:00 GMT</pubDate>
            <description><![CDATA[ Today, as part of Cloudflare’s Impact Week, we’re excited to announce an opportunity for Cloudflare customers to make it easier to decommission and dispose of their used hardware appliances sustainably. ]]></description>
            <content:encoded><![CDATA[ <p><i></i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1WxJA85fI6x55aF4RuwqUc/270e974f3ece5319c3e847bdbd7647be/image2-24.png" />
            
            </figure><p>Today, as part of Cloudflare’s Impact Week, we’re excited to announce an opportunity for Cloudflare customers to make it easier to decommission and dispose of their used hardware appliances sustainably. We’re partnering with Iron Mountain to offer preferred pricing and discounts for Cloudflare customers that recycle or remarket legacy hardware through its service.</p>
    <div>
      <h2>Replacing legacy hardware with Cloudflare’s network</h2>
      <a href="#replacing-legacy-hardware-with-cloudflares-network">
        
      </a>
    </div>
    <p>Cloudflare’s products enable customers to replace legacy hardware appliances with our <a href="/welcome-to-the-supercloud-and-developer-week-2022/">global network</a>. Connecting to our network enables access to firewall (including <a href="https://www.cloudflare.com/learning/ddos/glossary/web-application-firewall-waf/">WAF</a> and Network Firewalls, Intrusion Detection Systems, etc), DDoS mitigation, VPN replacement, WAN optimization, and other networking and security functions that were traditionally delivered in physical hardware. These are served from our network and delivered as a service. This creates a myriad of benefits for customers including stronger security, better performance, lower operational overhead, and none of the headaches of traditional hardware like capacity planning, maintenance, or upgrade cycles. It’s also better for the Earth: our multi-tenant SaaS approach means more efficiency and a <a href="/understand-and-reduce-your-carbon-impact-with-cloudflare/">lower carbon footprint</a> to deliver those functions.</p><p>But what happens with all that hardware you no longer need to maintain after switching to Cloudflare?</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7L7PZ2pt6xmIMZ1W4j6TIG/793a9e4359d63349e4e5473a247e8e9d/image1-23.png" />
            
            </figure>
    <div>
      <h2>The life of a hardware box</h2>
      <a href="#the-life-of-a-hardware-box">
        
      </a>
    </div>
    <p>The life of a hardware box begins on the factory line at the manufacturer. These are then packaged, shipped and installed at the destination infrastructure where they provide processing power to run front-end products or services, and routing network traffic. Occasionally, if the hardware fails to operate, or its performance declines over time, it will get fixed or will be returned for replacement under the warranty.</p><p>When none of these options work, the hardware box is considered end-of-life and it “dies”. This hardware must be decommissioned by being disconnected from the network, and then physically removed from the data center for disposal.</p><p>The useful lifespan of hardware depends on the availability of newer generations of processors which help realize critical efficiency improvements around cost, performance, and power. In general, the industry standard of hardware decommissioning timeline is between three and six years after installation. There are additional benefits to refreshing these physical assets at the lower end of the hardware lifespan spectrum, keeping your infrastructure at optimal performance.</p><p>In the instance where the hardware still works, but is replaced by newer technologies, it would be such a waste to discard this gear. Instead, there could be recoverable value in this outdated hardware. And simply tossing unwanted hardware into the trash indiscriminately, which will eventually become part of the landfill, causes devastating consequences as these electronic devices contain hazardous materials like lithium, palladium, lead, copper and cobalt or mercury, and those could contaminate the environment. Below, we explain sustainable alternatives and cost-beneficial practices one can pursue to dispose of your infrastructure hardware.</p>
    <div>
      <h3>Option 1: Remarket / Reuse</h3>
      <a href="#option-1-remarket-reuse">
        
      </a>
    </div>
    <p>For hardware that still works, the most sustainable route is to sanitize it of data, refurbish, and resell it in the second-hand market at a depreciated cost. Some IT asset disposition firms would also repurpose used hardware to maximize its market value. For example, harvesting components from a device to build part of another product and selling that at a higher price. For working parts that have very little resale value, companies can also consider reusing them to build a spare parts inventory for replacing failed parts later in the data centers.</p><p>The benefits of remarket and reuse are many. It helps maximize a hardware’s return of investment by including any reclaimed value at end-of-life stage, offering financial benefits to the business. And it reduces discarded electronics, or e-waste and their harmful efforts on our environment, helping socially responsible organizations build a more sustainable business. Lastly, it provides alternatives to individuals and organizations that cannot afford to buy new IT equipment.</p>
    <div>
      <h3>Option 2: Recycle</h3>
      <a href="#option-2-recycle">
        
      </a>
    </div>
    <p>For used hardware that is not able to be remarketed, it is recommended to engage an asset disposition firm to professionally strip it of any valuable and recyclable materials, such as precious metal and plastic, before putting it up for physical destruction. Similar to remarketing, recycling also reduces environmental impact, and cuts down the amount of raw materials needed to manufacture new products.</p><p>A key factor in hardware recycling is a secure chain of custody. Meaning, a supplier has the right certification, preferably its own fleet and secure facilities to properly and securely process the equipment.</p>
    <div>
      <h3>Option 3: Destroy</h3>
      <a href="#option-3-destroy">
        
      </a>
    </div>
    <p>From a sustainable point of view, this route should only be used as a last resort. When hardware does not operate as it is intended to, and has no remarketed nor recycled value, an asset disposition supplier would remove all the asset tags and information from it in preparation for a physical destruction. Depending on disposal policies, some companies would choose to sanitize and destroy all the data bearing hardware, such as SSD or HDD, for security reasons.</p><p>To further maximize recycling value and reduce e-waste, it is recommended to keep security policy up to date on discarded IT equipment and explore the option of reusing working devices after a professional data wiping as much as possible.</p><p>At Cloudflare, we follow an industry-standard capital depreciation timeline, which culminates in recycling actions through the engagement of IT asset disposition partners including Iron Mountain. Through these partnerships, besides data bearing hardware which follows the security policy to be sanitized and destroyed, approximately 99% of the rest decommissioned IT equipment from Cloudflare is sold or recycled.</p>
    <div>
      <h2>Partnering with Iron Mountain to make sustainable goals more accessible</h2>
      <a href="#partnering-with-iron-mountain-to-make-sustainable-goals-more-accessible">
        
      </a>
    </div>
    <p>Hardware discomission can be a burden on a business, from operational strain to complex processes, a lack of streamlined execution to the risk of a data breach. Our experience shows that partnering with an established firm like Iron Mountain who is specialized in IT asset disposition would help kick-start one's hardware recycling journey.</p><p>Iron Mountain has more than two decades of experience working with Hyperscale technology and data centers. A market leader in decommissioning, data security and remarketing capabilities. It has a wide footprint of facilities to support their customers’ sustainability goals globally.</p><p>Today, Iron Mountain has generated more than US$1.5 billion through value recovery and has been continually developing new ways to sell mass volumes of technology for their best use. Other than their end-to-end decommission offering, there are two additional value adding services that Iron Mountain provides to their customers that we find valuable. They offer a quarterly survey report which presents insights in the used market, and a sustainability report that measures the environmental impact based on total hardware processed with their customers.</p>
    <div>
      <h2>Get started today</h2>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>Get started today with Iron Mountain on your hardware recycling journey and sign up from <a href="https://reach.ironmountain.com/data-centers-decomm-contact-us">here</a>. After receiving the completed contact form, Iron Mountain will consult with you on the best solution possible. It has multiple programs to support including revenue share, fair market value, and guaranteed destruction with proper recycling. For example, when it comes to reselling used IT equipment, Iron Mountain would propose an appropriate revenue split, namely how much percentage of sold value will be shared with the customer, based on business needs. Iron Mountain's secure chain of custody with added solutions such as redeployment, equipment retrieval programs, and onsite destruction can ensure it can tailor the solution that works best for your company's security and environmental needs.</p><p>And in collaboration with Cloudflare, Iron Mountain offers additional two percent on your revenue share of the remarketed items and a five percent discount on the standard fees for other IT asset disposition services if you are new to Iron Mountain and choose to use these services via the link in this blog.</p> ]]></content:encoded>
            <category><![CDATA[Impact Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Hardware]]></category>
            <category><![CDATA[Sustainability]]></category>
            <guid isPermaLink="false">1I5RdBJCDUlcgzlHiHHztN</guid>
            <dc:creator>May Ma</dc:creator>
            <dc:creator>Annika Garbers</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare deployment in Guam]]></title>
            <link>https://blog.cloudflare.com/cloudflare-deployment-in-guam/</link>
            <pubDate>Mon, 25 Jul 2022 13:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare Deployment in Guam - Delivering a Better Internet for Faraway Pacific Ocean Archipelagos Residents ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6oXc7IRzLqEUfS9I7xGdrG/dbf3056ab6a7b9d38013629de9f593ff/image7-9.png" />
            
            </figure><p>Having fast Internet properties means being as few milliseconds as possible away from our customers and their users, no matter where they are on Earth. And because of the design of Cloudflare's network we don't just make Internet properties faster by being closer, we bring our <a href="https://www.cloudflare.com/products/zero-trust/">Zero Trust</a> services closer too. So whether you're connecting to a public API, a website, a SaaS application, or your company's internal applications, we're close by.</p><p>This is possible by adding new cities, partners, capacity, and cables. And we have seen over and over again how making the Internet faster in a region also can have a clear impact on traffic: if the experience is quicker, people usually do more online.</p><p>Cloudflare’s network keeps increasing, and its global footprint does so accordingly. In April 2022 we announced that <a href="/new-cities-april-2022-edition/">the Cloudflare network now spans 275 cities</a> and the number keeps growing.</p><p>In this blog post we highlight the deployment of our data center in <a href="https://en.wikipedia.org/wiki/Hag%C3%A5t%C3%B1a,_Guam">Hagatna, Guam</a>.</p>
    <div>
      <h3>Why a blog about Guam?</h3>
      <a href="#why-a-blog-about-guam">
        
      </a>
    </div>
    <p>Guam is about 2,400 km from both Tokyo in the north and Manila in the west, and about 6,100 km from Honolulu in the east. Honolulu itself is the most remote major city in the US and one of the most remote in the world, the closest major city from it being San Francisco, California at 3,700 km. From here one can derive how far Guam is from the US to the west and from Asia to the east.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3WhLFmnmj0yt4ZUnvhU1tb/e66197c0fd0356046d065026a8c65a07/image3-9.png" />
            
            </figure><p>Figure 1: Guam Geographical Location.</p><p>Why is this relevant? As explained <a href="https://www.cloudflare.com/learning/performance/glossary/what-is-latency/">here</a>, latency is the time it takes for data to pass from one point on a network to another. And one of the main reasons behind network latency is the distance between client devices — like a browser on a mobile phone — making requests and the servers responding to those requests. So, if we consider where Guam is geographically, we get a good picture about how Guam’s residents can be affected by the long distances their Internet requests, and responses, have to travel.</p><p>This is why every time Cloudflare adds a new location, we help make the Internet a bit faster. The reason is that every new location brings Cloudflare’s services closer to the users. As part of Cloudflare’s mission, the Guam deployment is a perfect example of how we are going from being the most global network on Earth to the most local one as well.</p>
    <div>
      <h3>Submarine cables</h3>
      <a href="#submarine-cables">
        
      </a>
    </div>
    <p><a href="https://submarine-cable-map-2022.telegeography.com/">There are 486 active submarine cables and 1,306 landings that are currently active or under construction</a>, running to an estimated 1.3 million km around the globe.</p><p><a href="https://www.submarinecablemap.com/country/guam">A closer look at specific submarine cables landing in Guam</a> show us that the region is actually well served in terms of submarine cables, with several connections to the mainland such as Japan, Taiwan, Philippines, Australia, Indonesia and Hawaii, therefore making Guam more resilient to matters such as the one that affected Tonga in January 2022 due to the impact of a volcanic eruption on submarine cables - we wrote about it <a href="/tonga-internet-outage/">here</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/42ARhRWKJc7Q01c28WM846/1239c2930cd6096bb37b18e26d1a7d01/image2-16.png" />
            
            </figure><p>Figure 2: Submarine Cables Landing in Guam (source: <a href="https://www.submarinecablemap.com/country/guam">submarinecablemap.com</a>)</p><p>The picture above also shows the relevance of Guam for other even more remote locations, such as the <a href="https://en.wikipedia.org/wiki/Federated_States_of_Micronesia">Federated States of Micronesia (FSM)</a> or the <a href="https://en.wikipedia.org/wiki/Marshall_Islands">Marshall Islands</a>, which have an ‘extra-hop’ to cover when trying to reach the rest of the Internet. <a href="https://en.wikipedia.org/wiki/Palau">Palau</a> also relies on Guam but, from a resilience point of view, has alternatives to locations such as the Philippines or to Australia.</p>
    <div>
      <h3>Presence at Mariana Islands Internet Exchange</h3>
      <a href="#presence-at-mariana-islands-internet-exchange">
        
      </a>
    </div>
    <p>Cloudflare’s presence in Guam is through Mariana Islands <a href="https://www.cloudflare.com/learning/cdn/glossary/internet-exchange-point-ixp/">Internet Exchange</a>, or <a href="https://mariix.net/">MARIIX</a>, allowing Cloudflare to <a href="https://mariix.net/peering">peer with participants</a> such as:</p><ul><li><p>AS 395400 - University of Guam</p></li><li><p>AS 9246 - GTA Teleguam</p></li><li><p>AS 3605 - DoCoMo Pacific</p></li><li><p>AS 7131 - IT&amp;E</p></li><li><p>AS 17456 - PDS</p></li></ul><p>As there are multiple participants, these are being added gradually. The first was AS 7131,  being served from April 2022, and the latest addition is AS 9246, from July 2022.</p><p>As some of these ASNs or ASs (<a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">autonomous systems</a> — large networks or group of networks) have their own downstream customers, further ASs can leverage Cloudflare’s deployment at Guam, examples being AS 17893 - Palau National Communications Corp - or AS 21996 - Guam Cell.</p><p>Therefore, the Cloudflare deployment brings not only a better (and generally faster) Internet to Guam’s residents, but also to residents in nearby archipelagos that are anchored on Guam. <a href="https://www.worldometers.info/world-population/">In May 2022, according to UN’s forecasts</a>, the covered resident population in the main areas in the region stands around 171k in Guam, 105k in FSM and 60k in the Marshall Islands.</p><p>For this deployment, Cloudflare worked with the skilled MARIIX personnel for the physical installations, provisioning and services turn-up. Despite the geographical distance and time zone differences (Hagatna is 9 hours ahead of GMT but only two hours ahead of the Cloudflare office in Singapore, so the time difference wasn’t a big challenge), all the logistics involved and communications went smoothly. A <a href="https://blog.apnic.net/2022/07/06/mariix-improves-local-internet-for-growing-pacific-hub/">recent blog posted by APNIC</a>, where we can see some personnel with whom Cloudflare worked, reiterates the valuable work being done locally and the increasing importance of Guam in the region.</p>
    <div>
      <h3>Performance impact for local/regional habitants</h3>
      <a href="#performance-impact-for-local-regional-habitants">
        
      </a>
    </div>
    <p>Before Cloudflare’s deployment in Guam, customers of local ASs trying to reach Internet properties via Cloudflare’s network were redirected mainly to Cloudflare’s deployments in <a href="https://www.cloudflare.com/network/">Tokyo and Seattle</a>. This is due to the anycast routing used by Cloudflare — as described <a href="https://www.cloudflare.com/learning/cdn/glossary/anycast-network/">here</a>; anycast typically routes incoming traffic to the nearest data center. In the case of Guam, and as previously described, these large distances to the nearest locations represents a distance of thousands of kilometers or, in other words, high latency thus affecting user experience badly.</p><p>With Cloudflare’s deployment in Guam, Guam’s and nearby archipelagos’ residents are no longer redirected to those faraway locations, instead they are served locally by the new Cloudflare servers. Although a decrease of a few milliseconds may not seem a lot, it actually represents a significant boost in user experience as latency is dramatically reduced. As the total distance between users and servers is reduced, load time is reduced as well. And as users very often quit waiting for a site to load when the load time is high, the opposite occurs as well, i.e., users are more willing to stay within a site if loading times are good. This improvement represents both a better user experience and higher use of the Internet.</p><p>In the case of Guam, we use AS 9246 as an example as it was previously served by Seattle but since around 23h00 UTC 14/July/2022 is served by Guam, as illustrated below:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5fWbSHuuVlhYnjuxpumovQ/e14ee81f5b64a52861a2f403048ab79e/1-1.png" />
            
            </figure><p>Figure 3: Requests per Colo for AS 9246 Before vs After Cloudflare Deployment at Guam.</p><p>The following chart displays the median and the 90th percentile of the eyeball TCP RTT for AS 9246 immediately before and after AS 9246 users started to use the Guam deployment:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/30b6AqRcEICiRdXdVqXZny/686d02c528ef5eeb37d4cab7079ecb6f/2-1.png" />
            
            </figure><p>Figure 4: Eyeball TCP RTT for AS 9246 Before vs After Cloudflare Deployment at Guam.</p><p>From the chart above we can derive that the overall reduction for the eyeball TCP <a href="https://www.cloudflare.com/learning/cdn/glossary/round-trip-time-rtt/">RTT</a> immediately before and after Guam’s deployment was:</p><ul><li><p>Median decreased from 136.3ms to 9.3ms, a 93.2% reduction;</p></li><li><p>P90 decreased from 188.7ms to 97.0ms, a 48.5% reduction.</p></li></ul><p>When comparing the [12h00 ; 13h00] UTC period of the 14/July/2022 (therefore, AS 9246 still served by Seattle) vs the same hour but for the 15th/July/2022 (thus AS9246 already served by Guam), the differences are also clear. We pick this period as this is a busy hour period locally since local time equals UTC plus 10 hours:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bGuMIPCrGjL7JXzbPpesO/b82ece1afba89ef9900e73dca85bc5ed/3-1.png" />
            
            </figure><p>Figure 5 - Median Eyeball TCP RTT for AS 9246 from Seattle vs Guam.</p><p>The median eyeball TCP RTT decreased from 146ms to 12ms, i.e., a massive 91.8% reduction and perhaps, due to already mentioned geographical specificities, one of Cloudflare's deployments representing a larger reduction in latency for the local end users.</p>
    <div>
      <h3>Impact on Internet traffic</h3>
      <a href="#impact-on-internet-traffic">
        
      </a>
    </div>
    <p>We can actually see an increase in HTTP requests in Guam since early April, right when we were setting up our Guam data center. The impact of the deployment was more clear after mid-April, with a further step up in mid-June. Comparing March 8 with July 17, there was an 11.5% increase in requests, as illustrated below:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JxzkZSSoKkogNbs4YwTMY/b4c748335e83bfcacb4c2d142e6cf0e5/4-1.png" />
            
            </figure><p>Figure 6: Trend in HTTP Requests per Second in Guam.</p>
    <div>
      <h3>Edge Partnership Program</h3>
      <a href="#edge-partnership-program">
        
      </a>
    </div>
    <p>If you’re an ISP that is interested in hosting a Cloudflare cache to improve performance and reduce backhaul, get in touch on our <a href="https://www.cloudflare.com/partners/peering-portal/?cf_target_id=B87954540A24583D38E89307A8ADC63D">Edge Partnership Program</a> page. And if you’re a software, data, or network engineer – or just the type of person who is curious and wants to help make the Internet better – consider <a href="https://www.cloudflare.com/careers/jobs/">joining our team</a>.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Data Center]]></category>
            <category><![CDATA[APJC]]></category>
            <guid isPermaLink="false">0I6nSi8jrvACD0QKGgI4F</guid>
            <dc:creator>David Antunes</dc:creator>
        </item>
        <item>
            <title><![CDATA[Debugging Hardware Performance on Gen X Servers]]></title>
            <link>https://blog.cloudflare.com/debugging-hardware-performance-on-gen-x-servers/</link>
            <pubDate>Tue, 17 May 2022 12:59:23 GMT</pubDate>
            <description><![CDATA[ In Cloudflare’s global network, every server runs the whole software stack. Therefore, it's critical that every server performs to its maximum potential capacity ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5T1hkMmQNOOw0lktUx3JFF/2e5e4aa40039a6ad46c2a251eb421650/gen-x-color-Friday--twitter_2x-1.png" />
            
            </figure><p>In Cloudflare’s global network, every server runs the whole software stack. Therefore, it's critical that every server performs to its maximum potential capacity. In order to provide us better flexibility from a supply chain perspective, we buy server hardware from multiple vendors with the exact same configuration. However, after the deployment of our Gen X AMD EPYC Zen 2 (Rome) servers, we noticed that servers from one vendor (which we’ll call SKU-B) were consistently performing 5-10% worse than servers from second vendor (which we'll call SKU-A).</p><p>The graph below shows the performance discrepancy between the two SKUs in terms of percentage difference. The performance is gauged on the metric of requests per second, and this data is an average of observations captured over 24 hours.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/wvNDGyAxlthdlOg3DW5Ur/163ac5682c189dae77797fc9eda582f7/1-2.png" />
            
            </figure><p>Machines before implementing performance improvements. The average RPS for SKU-B is approximately 10% below SKU-A.</p>
    <div>
      <h3>Compute performance via DGEMM</h3>
      <a href="#compute-performance-via-dgemm">
        
      </a>
    </div>
    <p>The initial debugging efforts centered around the compute performance. We ran AMD’s <a href="http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html">DGEMM</a> high performance computing tool to determine if CPU performance was the cause. DGEMM is designed to measure the sustained floating-point computation rate of a single server. Specifically, the code measures the floating point rate of execution of a real matrix–matrix multiplication with double precision. We ran a modified version of DGEMM equipped with specific AMD libraries to support the EPYC processor instruction set.</p><p>The DGEMM test brought out a few points related to the processor’s Thermal Design Power (TDP). TDP refers to the maximum power that a processor can draw for a thermally significant period while running a software application. The underperforming servers were only drawing 215 to 220 watts of power when fully stressed, whereas the max supported TDP on the processors is 240 watts. Additionally, their floating-point computation rate was ~100 gigaflops (GFLOPS) behind the better performing servers.</p><p>Screenshot from the DGEMM run of a good system:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2vOh2PokcYbj1v6kev1fQt/96fdec6bfa79b9cb9042c2f7d7d60119/2.png" />
            
            </figure><p>Screenshot from an underperforming system:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1OQ7wN5uXDI1Ttiv28QpzE/5389a020a89ace802506133dcaf04ce0/3.png" />
            
            </figure><p>To debug the issue, we first tried disabling idle power saving mode (also known as C-states) in the CPU BIOS configuration. The servers reported expected GFLOPS numbers and achieved max TDP. Believing that this could have been the root cause, the servers were moved back into the production test environment for data collection.</p><p>However, the performance delta was still there.</p>
    <div>
      <h3>Further debugging</h3>
      <a href="#further-debugging">
        
      </a>
    </div>
    <p>We started the debugging process all over again by comparing the BIOS settings logs of both SKU-A and SKU-B. Once this debugging option was exhausted, the focus shifted to the network interface. We ran the open source network interface tool <a href="https://iperf.fr/iperf-download.php">iPerf</a> to check if there were any bottlenecks — no issues were observed either.</p><p>During this activity, we noticed that the BIOS of both SKUs were not using the AMD Preferred I/O functionality, which allows devices on a single PCIe bus to obtain improved DMA write performance. We enabled the Preferred I/O option on SKU-B and tested the change in production. Unfortunately, there were no performance gains. After the focus on network activity, the team started looking into memory configuration and operating speed.</p>
    <div>
      <h3>AMD HSMP tool &amp; Infinity Fabric diagnosis</h3>
      <a href="#amd-hsmp-tool-infinity-fabric-diagnosis">
        
      </a>
    </div>
    <p>The Gen X systems are configured with DDR4 memory modules that can support a maximum of 2933 megatransfers per second (MT/s). With the BIOS configuration for memory clock Frequency set to Auto, the 2933 MT/s automatically configures the memory clock frequency to 1467 MHz. Double Data Rate (DDR) technology allows for the memory signal to be sampled twice per clock cycle: once on the rising edge and once on the falling edge of the clock signal. Because of this, the reported memory speed rate is twice the true memory clock frequency. Memory bandwidth was validated to be working quite well by running a <a href="https://github.com/jeffhammond/STREAM">Stream</a> benchmark test.</p><p>Running out of debugging options, we reached out to AMD for assistance and were provided a debug tool called <a href="https://github.com/amd/amd_hsmp">HSMP</a> that lets users access the Host System Management Port. This tool provides a wide variety of processor management options, such as reading and changing processor TDP limits, reading processor and memory temperatures, and reading memory and processor clock frequencies. When we ran the HSMP tool, we discovered that the memory was being clocked at a different rate from AMD’s Infinity Fabric system — the architecture which connects the memory to the processor core. As shown below, the memory clock frequency was set to 1467 MHz while the Infinity Fabric clock frequency was set to 1333 MHz.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3Cq2VxpFjJN4fesZur1auc/979823ac8ebd8961a859b23ccbaee087/4.png" />
            
            </figure><p>Ideally, the two clock frequencies need to be equal for optimized workloads sensitive to latency and throughput. Below is a snapshot for an SKU-A server where both memory and Infinity Fabric frequencies are equal.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3niJPphw3HJr2Z8CoSBCzU/28b6a48b6f881983b6a41f38563559b4/5.png" />
            
            </figure>
    <div>
      <h3>Improved Performance</h3>
      <a href="#improved-performance">
        
      </a>
    </div>
    <p>Since the Infinity Fabric clock setting was not exposed as a tunable parameter in the BIOS, we asked the vendor to provide a new BIOS that set the frequency to 1467 MHz during compile time. Once we deployed the new BIOS on the underperforming systems in production, we saw that all servers started performing to their expected levels. Below is a snapshot of the same set of systems with data captured and averaged over a 24 hours window on the same day of the week as the previous dataset. Now, the requests per second performance of SKU-B equals and in some cases exceeds the performance of SKU-A!</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1sFDwVBvdiVO3gWaTtI2FG/3fa8433840f2d5bef0d8504a50a1962b/6.png" />
            
            </figure><p>Internet Requests Per Second (RPS) for four SKU-A machines and four SKU-B machines after implementing the BIOS fix on SKU-B. The RPS of SKU-B now equals the RPS of SKU-A.</p><p>Hello, I am Yasir Jamal. I recently joined Cloudflare as a Hardware Engineer with the goal of helping provide a better Internet to everyone. If you share the same interest, come <a href="https://www.cloudflare.com/careers/">join us</a>!</p> ]]></content:encoded>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[Gen X]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Hardware]]></category>
            <guid isPermaLink="false">5YSW4T1CSOOlsvgHAheCMW</guid>
            <dc:creator>Yasir Jamal</dc:creator>
        </item>
        <item>
            <title><![CDATA[The Cloudflare network now spans 275 cities]]></title>
            <link>https://blog.cloudflare.com/new-cities-april-2022-edition/</link>
            <pubDate>Fri, 29 Apr 2022 13:00:07 GMT</pubDate>
            <description><![CDATA[ Today, we are announcing the addition of 4 new cities, bringing our network to 275 cities globally. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7jYdbZl0Rzv9UyNXTxPeox/aeb8eeea720244c7004f475366b28493/Screen-Shot-2022-04-28-at-11.51.05-AM.png" />
            
            </figure><p>It was just last month that <a href="/mid-2022-new-cities/">we announced</a> our network had <a href="/250-cities-is-just-the-start/">grown</a> to over 270 <a href="/ten-new-cities-four-new-countries/">cities</a> <a href="/expanding-to-25-plus-cities-in-brazil/">globally</a>. Today, we’re announcing that with recent additions we’ve reached 275 cities. With each new city we add, we help make the Internet faster, more reliable, and more secure. In this post, we’ll talk about the cities we added, the performance increase, and look closely at our network expansion in India.</p>
    <div>
      <h2>The Cities</h2>
      <a href="#the-cities">
        
      </a>
    </div>
    <p>Here are the four new cities we added in the last month: <b>Ahmedabad</b>, India; <b>Chandigarh</b>, India; <b>Jeddah</b>, Saudi Arabia; and <b>Yogyakarta</b>, Indonesia.</p>
    <div>
      <h3>A closer look at India</h3>
      <a href="#a-closer-look-at-india">
        
      </a>
    </div>
    <p>India is home to one of the largest and most rapidly growing bases of digital consumers. Recognising this, Cloudflare has increased its footprint in India in order to optimize reachability to users within the country.</p><p>Cloudflare’s expansion in India is facilitated through interconnections with several of the largest Internet Service Providers (ISPs), mobile network providers and Internet Exchange points (IXPs). At present, we are directly connected to the major networks that account for more than 95% of the country’s broadband subscribers. We are continuously working to not only expand the interconnection capacity and locations with these networks, but also establish new connections to the networks that we have yet to interconnect with.</p><p>In 2020, we were served through seven cities in the country. Since then, we have added our network presence in another five cities, totaling to 12 cities in India. In the case of one of our biggest partners, with whom we interconnect in these 12 cities, Cloudflare’s latency performance is better in comparison to other major platforms, as shown in the chart below.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6cr4GKSTnfPFdy3fNuorYY/cb8b195dc560aa901080d5ecd9ba42fb/1.jpg" />
            
            </figure><p><i>Response time (in ms) for the top network in India to Cloudflare and other platforms. Source: Cedexis</i></p>
    <div>
      <h3>Helping make the Internet faster</h3>
      <a href="#helping-make-the-internet-faster">
        
      </a>
    </div>
    <p>Every time we add a new location, we help make the Internet a little bit faster. The reason is every new location brings our content and services closer to the person (or machine) that requested them. Instead of driving 25 minutes to the grocery store, it’s like one opened in your neighborhood.</p><p>In the case of Jeddah, Saudi Arabia, we already have six other locations in two different cities in Saudi Arabia. Still, by adding this new location, we were able to improve median performance (TCP <a href="https://www.cloudflare.com/learning/cdn/glossary/round-trip-time-rtt/">RTT</a> latency) by 26% from 81ms to 60ms. 20 milliseconds doesn’t sound like a lot, right? But this location is serving almost 10 million requests per day. That’s approximately 55 hours <i>per day</i> that someone (or something) wasn’t waiting for data.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3uiTpplYNjtFHhi43dT1P1/8cc08c1317aab207177c8dbe7905615e/2.png" />
            
            </figure><p>As we continue to put dots on the map, we’ll keep putting updates here on how Internet performance is improving. As we like to say, we’re just getting started.</p><p><i>If you’re an ISP that is interested in hosting a Cloudflare cache to improve performance and reduce backhaul, get in touch on our</i> <a href="https://www.cloudflare.com/partners/peering-portal/"><i>Edge Partnership Program</i></a> <i>page. And if you’re a software, data, or network engineer – or just the type of person who is curious and wants to help make the Internet better – consider joining our team.</i></p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[India]]></category>
            <category><![CDATA[Indonesian]]></category>
            <category><![CDATA[Middle East]]></category>
            <guid isPermaLink="false">2AXQ0RCB7tlKWaqhvEOSJP</guid>
            <dc:creator>Joanne Liew</dc:creator>
            <dc:creator>Mike Conlow</dc:creator>
        </item>
        <item>
            <title><![CDATA[New cities on the Cloudflare global network: March 2022 edition]]></title>
            <link>https://blog.cloudflare.com/mid-2022-new-cities/</link>
            <pubDate>Mon, 21 Mar 2022 12:59:02 GMT</pubDate>
            <description><![CDATA[ Today, we are announcing the addition of 18 new cities in Africa, South America, Asia, and the Middle East, bringing our network to over 270 cities globally ]]></description>
            <content:encoded><![CDATA[ <p>If you follow the Cloudflare blog, you know that we <a href="/250-cities-is-just-the-start/">love to</a> <a href="/ten-new-cities-four-new-countries/">add cities</a> to our <a href="/expanding-to-25-plus-cities-in-brazil/">global map</a>. With each new city we add, we help make the Internet faster, more reliable, and more secure. Today, we are announcing the addition of 18 new cities in Africa, South America, Asia, and the Middle East, bringing our network to over 270 cities globally. We’ll also look closely at how adding new cities improves Internet performance, such as our new locations in Israel, which reduced median response time (latency) from 86ms to 29ms (a 66% improvement) in a matter of weeks for subscribers of one Israeli Internet service provider (ISP).</p>
    <div>
      <h3>The Cities</h3>
      <a href="#the-cities">
        
      </a>
    </div>
    <p>Without further ado, here are the 18 new cities in 10 countries we welcomed to our global network: <b>Accra</b>, Ghana; <b>Almaty</b>, Kazakhstan; <b>Bhubaneshwar</b>, India; <b>Chiang Mai</b>, Thailand; <b>Joinville</b>, Brazil; <b>Erbil</b>, Iraq; <b>Fukuoka</b>, Japan; <b>Goiânia</b>, Brazil; <b>Haifa</b>, Israel; <b>Harare</b>, Zimbabwe; <b>Juazeiro do Norte</b>, Brazil; <b>Kanpur</b>, India; <b>Manaus</b>, Brazil; <b>Naha</b>, Japan; <b>Patna</b>, India; <b>São José do Rio Preto</b>, Brazil; <b>Tashkent</b>, Uzbekistan; <b>Uberlândia</b>, Brazil.</p>
    <div>
      <h3>Cloudflare’s ISP Edge Partnership Program</h3>
      <a href="#cloudflares-isp-edge-partnership-program">
        
      </a>
    </div>
    <p>But let’s take a step back and understand why and how adding new cities to our list helps <a href="https://blog.cloudflare.com/50-years-of-the-internet-work-in-progress-to-a-better-internet/">make the Internet better</a>. First, we should reintroduce the Cloudflare Edge Partnership Program. Cloudflare is used as a reverse proxy by nearly 20% of all Internet properties, which means the volume of ISP traffic trying to reach us can be significant. In some cases, as we’ll see in Israel, the distance data needs to travel can also be significant, adding to latency and reducing Internet performance for the user. Our solution is partnering with ISPs to embed our servers inside their network. Not only does the ISP avoid lots of back haul traffic, but their subscribers also get much better performance because the website is served on-net, and close to them geographically. It is a win-win-win.</p><p>Consider a large Israeli ISP we did not peer with locally in Tel Aviv. Last year, if a subscriber wanted to reach a website on the Cloudflare network, their request had to travel on the Internet backbone – the large carriers that connect networks together on behalf of smaller ISPs – from Israel to Europe before reaching Cloudflare and going back. The map below shows where they were able to find Cloudflare content before our deployment went live: 48% in Frankfurt, 33% in London, and 18% in Amsterdam. That’s a long way!</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2wA04KAAeSIrk7oel2PZsQ/5e04175095395b9a5525d580eeaefb9b/image1-89.png" />
            
            </figure><p>In January and March 2022, we turned up deployments with the ISP  in Tel Aviv and Haifa. Now live, these two locations serve practically all requests from their subscribers locally within Israel. Instead of traveling 3,000 km to reach one of the millions of websites on our network, most requests from Israel now travel 65 km, or less. The improvement has been dramatic: now we’re serving 66% of requests in under 50ms; before the deployment we couldn’t serve any in under 50ms because the distance was too great. Now, 85% are served in under 100ms; before, we served 66% of requests in under 100ms.</p><p>![Logarithmic graph depicting the improvement in performance. The 50th percentile of requests decreased from almost 90ms to around 30ms.]](<a href="/content/images/2022/03/image2-76.png_REGULAR">http://staging.blog.mrk.cfdata.org/content/images/2022/03/image2-76.png_REGULAR</a>)</p><p>As we continue to put dots on the map, we’ll keep putting updates here on how Internet performance is improving. As we like to say, we’re just getting started.</p><p><i>If you’re an ISP that is interested in hosting a Cloudflare cache to improve performance and reduce back haul, get in touch on our </i><a href="https://www.cloudflare.com/partners/peering-portal/"><i>Edge Partnership Program</i></a><i> page. And if you’re a software, data, or network engineer – or just the type of person who is curious and wants to help make the Internet better – consider joining our team.</i></p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Internet Performance]]></category>
            <guid isPermaLink="false">y4jDt739XIxK6pdkGKFKl</guid>
            <dc:creator>Mike Conlow</dc:creator>
        </item>
        <item>
            <title><![CDATA[Protect all network traffic with Cloudflare]]></title>
            <link>https://blog.cloudflare.com/protect-all-network-traffic/</link>
            <pubDate>Thu, 17 Mar 2022 12:59:25 GMT</pubDate>
            <description><![CDATA[ Today, we’re extending the availability of Magic Transit to customers with smaller networks by offering Magic Transit-protected, Cloudflare-managed IP space ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Magic Transit protects customers' entire networks—any port/protocol—from DDoS attacks and provides built-in performance and reliability. Today, we’re excited to extend the capabilities of Magic Transit to customers with any size network, from home networks to offices to large cloud properties, by offering Cloudflare-maintained and Magic Transit-protected IP space as a service.</p>
    <div>
      <h3>What is Magic Transit?</h3>
      <a href="#what-is-magic-transit">
        
      </a>
    </div>
    <p>Magic Transit extends the power of <a href="https://www.cloudflare.com/network/">Cloudflare’s global network</a> to customers, absorbing all traffic destined for your network at the location closest to its source. Once traffic lands at the closest Cloudflare location, it flows through a stack of security protections including industry-leading DDoS mitigation and cloud firewall. Detailed <a href="https://support.cloudflare.com/hc/en-us/articles/360038696631-Understanding-Cloudflare-Network-Analytics">Network Analytics</a>, alerts, and reporting give you deep visibility into all your traffic and attack patterns. Clean traffic is forwarded to your network using Anycast <a href="https://www.cloudflare.com/learning/network-layer/what-is-gre-tunneling/">GRE</a> or <a href="/anycast-ipsec/">IPsec</a> tunnels or <a href="/cloudflare-network-interconnect/">Cloudflare Network Interconnect</a>. Magic Transit includes load balancing and automatic failover across tunnels to steer traffic across the healthiest path possible, from everywhere in the world.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6nWPieP77NIQkEBK5sCXpG/9189a1c4f4466461263233ae3d02ce48/image2-54.png" />
            
            </figure><p><i>Magic Transit architecture: Internet BGP advertisement attracts traffic to Cloudflare’s network, where attack mitigation and security policies are applied before clean traffic is forwarded back to customer networks with an Anycast GRE tunnel or Cloudflare Network Interconnect.</i></p><p>The “Magic” is in our Anycast architecture: every server across our network runs every Cloudflare service, so traffic can be processed wherever it lands. This means the entire capacity of our network—121+Tbps as of this post—is available to block even the largest attacks. It also drives <a href="/magic-makes-your-network-faster/">huge benefits for performance</a> versus traditional “scrubbing center” solutions that route traffic to specialized locations for processing, and makes onboarding much easier for network engineers: one tunnel to Cloudflare automatically connects customer infrastructure to our entire network in over 250 cities worldwide.</p>
    <div>
      <h3>What’s new?</h3>
      <a href="#whats-new">
        
      </a>
    </div>
    <p>Historically, Magic Transit has required customers to <a href="/bringing-your-own-ips-to-cloudflare-byoip/">bring their own IP addresses</a>—a minimum of a /24—in order to use this service. This is because a /24 is the minimum prefix length that can be advertised via BGP on the public Internet, which is how we attract traffic for customer networks.</p><p>But not all customers have this much IP space; we've talked to many of you who want IP layer protection for a smaller network than we're able to advertise to the Internet on your behalf. Today, we’re extending the availability of Magic Transit to customers with smaller networks by offering Magic Transit-protected, Cloudflare-managed IP space. Starting now, you can direct your network traffic to dedicated static IPs and receive all the benefits of Magic Transit including industry leading DDoS protection, visibility, performance, and resiliency.</p><p>Let’s talk through some new ways you can leverage Magic Transit to protect and accelerate any network.</p>
    <div>
      <h3>Consistent cross-cloud security</h3>
      <a href="#consistent-cross-cloud-security">
        
      </a>
    </div>
    <p>Organizations adopting a hybrid or poly-cloud strategy have struggled to maintain consistent security controls across different environments. Where they used to manage a single firewall appliance in a datacenter, security teams now have a myriad of controls across different providers—physical, virtual, and cloud-based—all with different capabilities and control mechanisms.</p><p>Cloudflare is the single control plane across your hybrid cloud deployment, allowing you to manage security policies from one place, get uniform protection across your entire environment, and get deep visibility into your traffic and attack patterns.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3DSMWItkN5Z8lw8N8axZcl/40dc51f86aae1d6dd833c94654c09fc4/image5-12.png" />
            
            </figure>
    <div>
      <h3>Protecting branches of any size</h3>
      <a href="#protecting-branches-of-any-size">
        
      </a>
    </div>
    <p>As DDoS attack frequency and variety continues to grow, attackers are getting more creative with angles to target organizations. Over the past few years, we have seen a <a href="/tag/trends/">consistent rise</a> in attacks targeted at corporate infrastructure including internal applications. As the percentage of a corporate network dependent on the Internet continues to grow, organizations need consistent protection across their entire network.</p><p>Now, you can get any network location covered—branch offices, stores, remote sites, event venues, and more—with Magic Transit-protected IP space. Organizations can also <a href="/replace-your-hardware-firewalls-with-cloudflare-one/">replace legacy hardware firewalls</a> at those locations with our built-in cloud firewall, which filters bidirectional traffic and propagates changes globally within seconds.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2XRSzqVBvWObRHpKaco1RW/2e366ed5034f23676c0c3608c7bb9e19/image4-9.png" />
            
            </figure>
    <div>
      <h3>Keeping streams alive without worrying about leaked IPs</h3>
      <a href="#keeping-streams-alive-without-worrying-about-leaked-ips">
        
      </a>
    </div>
    <p>Generally, DDoS attacks target a specific application or network in order to impact the availability of an Internet-facing resource. But you don’t have to be <i>hosting</i> anything in order to get attacked, as many gamers and streamers have unfortunately discovered. The public IP associated with a home network can easily be leaked, giving attackers the ability to directly target and take down a live stream.</p><p>As a streamer, you can now route traffic from your home network through a Magic Transit-protected IP. This means no more worrying about leaking your IP: attackers targeting you will have traffic blocked at the closest Cloudflare location to them, far away from your home network. And no need to worry about impact to your game: thanks to Cloudflare’s globally distributed and interconnected network, you can get protected without sacrificing performance.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2SQtiM8AwhgF9qDcUrSafI/28614651822205e9ee79ffdaedf6c088/image3-25.png" />
            
            </figure>
    <div>
      <h3>Get started today</h3>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>This solution is available today; <a href="https://www.cloudflare.com/magic-transit/">learn more</a> or contact your account team to get started.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Magic Transit]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">2sBwFvGpLwJVFXDn9sC2YT</guid>
            <dc:creator>Annika Garbers</dc:creator>
        </item>
        <item>
            <title><![CDATA[Project Myriagon: Cloudflare Passes 10,000 Connected Networks]]></title>
            <link>https://blog.cloudflare.com/10000-networks-and-beyond/</link>
            <pubDate>Sun, 19 Sep 2021 12:59:09 GMT</pubDate>
            <description><![CDATA[ This is the culmination of a special project we’ve been working on for the last few months dubbed Project Myriagon, a reference to the 10,000-sided polygon of the same name. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>During Speed Week, we’ve talked a lot about the products we’ve improved and the places we’ve expanded to. Today, we have a final exciting announcement: Cloudflare now connects with <b>more than 10,000 other networks</b>. Put another way, over 10,000 networks have direct on-ramps to the Cloudflare network.</p><p>This is the culmination of a special project we’ve been working on for the last few months dubbed Project Myriagon, a reference to the 10,000-sided polygon of the same name. In going about this project, we have learned a lot about the performance impact of adding more direct connections to our network — in one recent case, we saw a <b>90% reduction</b> in median round-trip end-user latency.</p><p>But to really explain why this is such a big milestone, we first need to explain a bit about how the Internet works.</p>
    <div>
      <h3>More roads leading to Rome</h3>
      <a href="#more-roads-leading-to-rome">
        
      </a>
    </div>
    <p>The Internet that all know and rely on is, on a basic level, an interconnected series of independently run local networks. Each network is defined as its own “autonomous system.” These networks are delineated numerically with Autonomous Systems Numbers, or ASNs. An ASN is like the Internet version of a zip code, a short number directly mapping to a distinct region of IP space using a clearly defined methodology. Network interconnection is all about bringing together different ASNs to exponentially multiply the number of possible paths between source and destination.</p><p>Most of us have home networks behind a modem and router, connecting your individual miniature network to your ISP. Your ISP then connects with other networks, to fetch the web pages or other Internet traffic you request. These networks in turn have connections to different networks, who in turn connect to interconnected networks, and so on, until your data reaches its destination. The fewer networks your request has to traverse, generally, the lower the end-to-end latency and odds that something will get lost along the way.</p><p>The average number of hops between any one network on the Internet to any other network is around 5.7 and 4.7, for the IPv4 and IPv6 networks respectively.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3sWjivJCnZ6TD1e4HnzclS/dbbfa3293bbddb176fbf527ad25c8955/image2-29.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7QEwR5zzWO981NGqJmPkK/a5f4ab8b1158e3786776823416fb0594/image6-18.png" />
            
            </figure><p>Source: <a href="https://blog.apnic.net/2020/01/14/bgp-in-2019-the-bgp-table/">https://blog.apnic.net/2020/01/14/bgp-in-2019-the-bgp-table/</a></p>
    <div>
      <h3>How do ASNs work?</h3>
      <a href="#how-do-asns-work">
        
      </a>
    </div>
    <p>ASNs are a key part of the routing protocol that directs traffic along the Internet, BGP. Internet Assigned Numbers Authority (IANA), the global coordinator of the DNS Root, IP addressing, and other Internet protocol resources like AS Numbers, delegates ASN-making authority to Regional Internet Registries (RIRs), who in turn assign individual ASNs to network operators in line with their regional policies. The five RIRs are AFRINIC, APNIC, ARIN, LACNIC and RIPE, each entitled to assign and attribute ASN numbers in their respective appointed regions.</p><p>Cloudflare’s ASN is 13335, one of the approximately 70,000 ASNs advertised on the Internet. While we’d like to — and plan on — connecting to every one of these ASNs eventually, our team tries to prioritize those with the greatest impact on our overall breadth and improving the proximity to as many people on Earth as possible.</p><p>As enabling optimal routes is key to our core business and services, we continuously track how many ASNs we connect to (technically referred to as “adjacent networks”). With Project Myriagon, we aimed to speed up our rate of interconnection and pass 10,000 adjacent networks by the end of the year. By September 2021, we reached that milestone, bringing us from 8,300 at the start of 2020 to over 10,000 today.</p><p>As shown in the table below, that milestone is part of a continuous effort towards gradually hitting more of the total advertised ASNs on the Internet.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/a69oo79Vcp55IsVzerODn/0360aa8f366624ace41b10cf317a61d4/image1-23.png" />
            
            </figure><p><i>The Regional Internet Registries and their Regions</i></p><p>Table 1: Cloudflare's peer ASNs and their respective RIR</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3HrTB0gZ43DuOdAqcv3g17/7a83cd5611c9bcc36a5db1abcaef1780/Screen-Shot-2021-09-18-at-10.02.06-PM.png" />
            
            </figure><p>Given that there are 70,000+ ASNs out there, you might be wondering: why is 10,000 a big deal? To understand this, we need to look deeply at BGP, the protocol that glues the Internet together. There are three different classes of ASNs:</p><ul><li><p>Transit Only ASNs: these networks only provide connectivity to other networks. They don't have any IP addresses inside their networks. These networks are quite rare, as it's very unusual to not have any IP addresses inside your network. Instead, these networks are often used primarily for distinct management purposes within a single organization.</p></li><li><p>Origin Only ASNs: these are networks that do not provide connectivity to other networks. They are a stub network, and often, like your home network, only connected to a single ISP.</p></li><li><p>Mixed ASNs: these networks both have IP addresses inside their network, and provide connectivity to other networks.</p></li></ul><table><tr><td><p><b>Origin Only ASNs</b></p></td><td><p><b>Mixed ASNs</b></p></td><td><p><b>Transit Only ASNs</b></p></td></tr><tr><td><p>61,127</p></td><td><p>11,128</p></td><td><p>443</p></td></tr></table><p><i>Source: </i><a href="https://bgp.potaroo.net/as6447/"><i>https://bgp.potaroo.net/as6447/</i></a></p><p>One interesting fact: of the 61,127 origin only ASNs, nearly 43,000 of them are only connected to their ISP. As such, our direct connections to over 10,000 networks indicates that of the networks that connect more than one network, a very good percentage are now already connected to Cloudflare.</p>
    <div>
      <h3>Cutting out the middle man</h3>
      <a href="#cutting-out-the-middle-man">
        
      </a>
    </div>
    <p>Directly connecting to a network — and eliminating the hops in between — can greatly improve performance in two ways. First, connecting with a network directly allows for Internet traffic to be exchanged locally rather than detouring through remote cities; and secondly, direct connections help avoid the congestion caused by bottlenecks that sometimes happen between networks.</p><p>To take a recent real-world example, turning up a direct peering session caused a <b>90% improvement</b> in median end-user latency when turning up a peering session with a European network, from an average of 76ms to an average of 7ms.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3cZt8LhHGKU2yOfLT1shRl/a97786bd4cca86cdaf9a6f9275ffba83/image3-26.png" />
            
            </figure><p><i>Immediate </i><b><i>90% improvement</i></b><i> in median end-user latency after peering with a new network.</i> </p><p>By using our own on-ramps to other networks, we both ensure superior performance for our users and avoid adding load and causing congestion on the Internet at large.</p>
    <div>
      <h3>And AS13335 is just getting started</h3>
      <a href="#and-as13335-is-just-getting-started">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2bBWubnmNYHbSd6hPITLJz/490e8466d7fb1fb2bc86a0aa91ad69ca/image4-22.png" />
            
            </figure><p>Cloudflare is an anycast network, meaning that the better connected we are, the faster and better-protected we are — obviating legacy concepts like scrubbing centers and slow origins. Hitting five digits of connected networks is something we’re really proud of as a step on our goal to helping to build a better Internet. As we’ve mentioned throughout the week, we’re all about high speed without having to pay a security or reliability cost.</p><p>There’s still work to do! While Project Myriagon has brought us, we believe, to be one of the top 5 most connected networks in the world, we estimate Google is connected to 12,000-15,000 networks. And so, today, we are kicking off Project CatchG. We won’t rest until we’re #1.</p><p>Interested in peering with us to help build a better Internet? Reach out to <a>peering@cloudflare.com</a> with your request. More details on the locations we are present at can be found at <a href="http://as13335.peeringdb.com/">http://as13335.peeringdb.com/</a>.</p> ]]></content:encoded>
            <category><![CDATA[Speed Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Internet Performance]]></category>
            <category><![CDATA[Better Internet]]></category>
            <guid isPermaLink="false">3zRDO0tpQscS2ISRCjEDwK</guid>
            <dc:creator>Ticiane Takami</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare Backbone: A Fast Lane on the Busy Internet Highway]]></title>
            <link>https://blog.cloudflare.com/cloudflare-backbone-internet-fast-lane/</link>
            <pubDate>Thu, 16 Sep 2021 12:59:48 GMT</pubDate>
            <description><![CDATA[ It’s important that our network continues to help bring improved performance and resiliency to the Internet. To accomplish this, we built our own backbone.  ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The Internet is an amazing place. It’s a communication superhighway, allowing people and machines to exchange exabytes of information every day. But it's not without its share of issues: whether it’s <a href="/cloudflare-thwarts-17-2m-rps-ddos-attack-the-largest-ever-reported/">DDoS attacks</a>, <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">route leaks</a>, <a href="/not-one-not-two-but-three-undersea-cables-cut-in-jersey/">cable cuts</a>, or <a href="/a-post-mortem-on-this-mornings-incident/">packet loss</a>, the components of the Internet do not always work as intended.</p><p>The reason Cloudflare exists is to help solve these problems. As we continue to grow our <a href="https://www.cloudflare.com/network/">rapidly expanding global network</a> in more than 250 cities, while directly connecting with more than 9,800 networks, it’s important that our network continues to help bring improved performance and resiliency to the Internet. To accomplish this, we built our own backbone. Other than improving redundancy, the immediate advantage to you as a Cloudflare user? It can reduce your website loading times by up to 45% — and you don’t have to do a thing.</p>
    <div>
      <h3>The Cloudflare Backbone</h3>
      <a href="#the-cloudflare-backbone">
        
      </a>
    </div>
    <p>We began building out our global backbone in 2018. It comprises a network of long-distance fiber optic cables connecting various Cloudflare data centers across North America, South America, Europe, and Asia. This also includes Cloudflare’s metro fiber network, directly connecting data centers within a metropolitan area.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/jn1eE7h9w0TMITPDwbsHF/c9131d2d2fe23d34051312c7b9fd23a7/Untitled.png" />
            
            </figure><p>Our backbone is a dedicated network, providing guaranteed network capacity and consistent latency between various locations. It gives us the ability to securely, reliably, and quickly route packets between our data centers, without having to rely on other networks.</p><p>This dedicated network can be thought of as a fast lane on a busy highway. When traffic in the normal lanes of the highway encounter slowdowns from congestion and accidents, vehicles can make use of a fast lane to bypass the traffic and get to their destination on time.</p><p>Our <a href="https://www.cloudflare.com/learning/network-layer/what-is-sdn/">software-defined network</a> is like a smart GPS device, as we’re always calculating the performance of routes between various networks. If a route on the public Internet becomes congested or unavailable, our network automatically adjusts routing preferences in real-time to make use of all routes we have available, including our dedicated backbone, helping to deliver your network packets to the destination as fast as we can.</p>
    <div>
      <h3>Measuring backbone improvements</h3>
      <a href="#measuring-backbone-improvements">
        
      </a>
    </div>
    <p>As we grow our global infrastructure, it’s important that we analyze our network to quantify the impact we’re having on performance.</p><p>Here’s a simple, real-world test we’ve used to validate that our backbone helps speed up our global network. We deployed a simple API service hosted on a public cloud provider, located in Chicago, Illinois. Once placed behind Cloudflare, we performed benchmarks from various geographic locations with the backbone disabled and enabled to measure the change in performance.</p><p>Instead of comparing the difference in latency our backbone creates, it is important that our experiment captures a real-world performance gain that an API service or website would experience. To validate this, our primary metric is measuring the average request time when accessing an API service from Miami, Seattle, San Jose, São Paulo, and Tokyo. To capture the response of the network itself, we disabled caching on the Cloudflare dashboard and sent 100 requests from each testing location, both while forcing traffic through our backbone, and through the public Internet.</p><p>Now, before we claim our backbone solves all Internet problems, you can probably notice that for some tests (Seattle, WA and San Jose, CA), there was actually an increase in response time when we forced traffic through the backbone. Since latency is directly proportional to the distance of fiber optic cables, and since we have over 9,800 direct connections with other Internet networks, there is a possibility that an uncongested path on the public Internet might be geographically shorter, causing this speed up compared to our backbone.</p><p>Luckily for us, we have technologies like <a href="/argo-and-the-cloudflare-global-private-backbone/">Argo Smart Routing</a>, <a href="/introducing-smarter-tiered-cache-topology-generation/">Argo Tiered Caching</a>, <a href="https://1.1.1.1/">WARP+</a>, and most recently announced <a href="/orpheus/">Orpheus</a>, which dynamically calculates the performance of each route at our data centers, choosing the fastest healthy route at that time. What might be the fastest path during this test may not be the fastest at the time you are reading this.</p><p>With that disclaimer out of the way, now onto the test.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1J6FgOCk20reULhsoVF8BE/0c2e430b5ca8ddf670aad16d4d825449/averages.png" />
            
            </figure><p>With the backbone disabled, if a visitor from São Paulo performed a request to our service, they would be routed to our São Paulo data center via <a href="/a-brief-anycast-primer/">BGP Anycast</a>. With caching disabled, our São Paulo data center forwarded the request over the public Internet to the origin server in Chicago. On average, the entire process to fetch data from the origin server and return to the response to the requesting user took 335.8 milliseconds.</p><p>Once the backbone was enabled and requests were created, our software performed tests to determine the fastest healthy route to the origin, whether it was a route on the public Internet or through our private backbone. For this test the backbone was faster, resulting in an average total request time of 230.2 milliseconds. Just by routing the request through our private backbone, we <b>improved the average response time by 31%</b>.</p><p>We saw even better improvement when testing from Tokyo. When routing the request over the public Internet, the request took an average of 424 milliseconds. By enabling our backbone which created a faster path, the request took an average of 234 milliseconds, creating an <b>average response time improvement of 44%</b>.</p><table><tr><td><p><b>Visitor Location</b></p></td><td><p><b>Distance to Chicago</b></p></td><td><p><b>Avg. response time using public Internet (ms)</b></p></td><td><p><b>Avg. response using backbone (ms)</b></p></td><td><p><b>Change in response time</b></p></td></tr><tr><td><p>Miami, FL, US</p></td><td><p>1917 km</p></td><td><p>84</p></td><td><p>75</p></td><td><p><b>10.7% decrease</b></p></td></tr><tr><td><p>Seattle, WA, US</p></td><td><p>2785 km</p></td><td><p>118</p></td><td><p>124</p></td><td><p>5.1% increase</p></td></tr><tr><td><p>San Jose, CA, US</p></td><td><p>2856 km</p></td><td><p>122</p></td><td><p>132</p></td><td><p>8.2% increase</p></td></tr><tr><td><p>São Paulo, BR</p></td><td><p>8403 km</p></td><td><p>336</p></td><td><p>230</p></td><td><p><b>31.5% decrease</b></p></td></tr><tr><td><p>Tokyo, JP</p></td><td><p>10129 km</p></td><td><p>424</p></td><td><p>234</p></td><td><p><b>44.8% decrease</b></p></td></tr></table><p>We also observed a smaller deviation in the response time of packets routed through our backbone over larger distances.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4l7WB7dKR0zXUM7yjyoBwv/031ff75e5b3da67b7f1be69d509b3dd6/timeseries.png" />
            
            </figure>
    <div>
      <h3>Our next generation network</h3>
      <a href="#our-next-generation-network">
        
      </a>
    </div>
    <p>Cloudflare is built on top of lossy, unreliable networks that we do not have control over. It’s our software that turns these traditional tubes of the Internet into a smart, high performing, and reliable network Cloudflare customers get to use today. Coupled with our new, but rapidly expanding backbone, it is this software that produces significant performance gains over traditional Internet networks.</p><p>Whether you visit a website powered by Cloudflare’s Argo Smart Routing, Argo Tiered Caching, Orpheus, or use our 1.1.1.1 service with WARP+ to access the Internet, you get direct access to the Internet fast lane we call the Cloudflare backbone.</p><p>For Cloudflare, a better Internet means improving Internet security, reliability, and performance. The backbone gives us the ability to build out our network in areas that have typically lacked infrastructure investments by other networks. Even with issues on the public Internet, these initiatives allow us to be located within 50 milliseconds of 95% of the Internet connected population.</p><p>In addition to our growing global infrastructure providing 1.1.1.1, WARP, <a href="https://developers.cloudflare.com/time-services/roughtime/usage">Roughtime</a>, <a href="https://developers.cloudflare.com/time-services/ntp/usage">NTP</a>, <a href="https://developers.cloudflare.com/distributed-web/ipfs-gateway">IPFS Gateway</a>, and <a href="https://developers.cloudflare.com/randomness-beacon/about">Drand</a> to the greater Internet, it’s important that we extend our services to those who are most vulnerable. This is why we extend all our infrastructure benefits directly to the community, through projects like <a href="https://www.cloudflare.com/galileo/">Galileo</a>, <a href="https://www.cloudflare.com/athenian/">Athenian</a>, <a href="https://www.cloudflare.com/fair-shot/">Fair Shot</a>, and <a href="https://www.cloudflare.com/pangea/">Pangea</a>.</p><p>And while these thousands of fiber optic connections are already fixing today’s Internet issues, we truly are just getting started.</p><p>Want to help build the future Internet? Networks that are faster, safer, and more reliable than they are today? The Cloudflare Infrastructure team is <a href="https://www.cloudflare.com/careers/jobs/?department=Infrastructure&amp;location=default">currently hiring</a>!</p><p>If you operate an ISP or transit network and would like to bring your users faster and more reliable access to websites and services powered by Cloudflare’s rapidly expanding network, please reach out to our Edge Partnerships team at <a>epp@cloudflare.com</a>.</p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div></div><p></p> ]]></content:encoded>
            <category><![CDATA[Speed Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[Better Internet]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">3vmfGkLMgyROXmDwChgfdf</guid>
            <dc:creator>Tanner Ryan</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare Passes 250 Cities, Triples External Network Capacity, 8x-es Backbone]]></title>
            <link>https://blog.cloudflare.com/250-cities-is-just-the-start/</link>
            <pubDate>Mon, 13 Sep 2021 12:59:02 GMT</pubDate>
            <description><![CDATA[ Today, I have three speedy infrastructure updates: we’ve passed 250 on-network cities, more than tripled our external network capacity, and increased our long-haul internal backbone network by over 800% since the start of 2020. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/34Ar6YddsGUPXLwYMKVwcf/fa4a404d21a353b55853bc509a013497/image2-10.png" />
            
            </figure><p>It feels like just the other week that we <a href="/ten-new-cities-four-new-countries/">announced ten new cities</a> and our <a href="/expanding-to-25-plus-cities-in-brazil/">expansion to 25+ cities in Brazil</a> — probably because it was. Today, I have three speedy infrastructure updates: we’ve passed 250 on-network cities, more than tripled our external network capacity, and increased our long-haul internal backbone network by over 800% since the start of 2020.</p><p>Light only travels through fiber so fast and with so much bandwidth — and worse still over the copper or on mobile networks that make up most end-users’ connections to the Internet. At some point, there’s only so much software you can throw at the problem before you run into the fundamental problem that an edge network solves: if you want your users to see incredible performance, you have to have servers incredibly physically close. For example, over the past three months, we’ve added another 10 cities in Brazil.  Here’s how that lowered the connection time to Cloudflare. The red line shows the latency prior to the expansion, the blue shows after.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3dmDMmnBF8lSbzMmf3leJ0/f835025bffad0e203839815dae298aa4/image1-5.png" />
            
            </figure><p>We’re exceptionally proud of all the teams at Cloudflare that came together to raise the bar for the entire industry in terms of global performance despite border closures, semiconductor shortages, and a sudden shift to working from home. 95% of the entire Internet-connected world is now within 50 ms of a Cloudflare presence, and 80% of the entire Internet-connected world is within 20ms (for reference, it takes 300-400 ms for a human to blink):</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7E00uhdomd53lu4ZsbXbBD/52b542b8af841f0d844ad1df9ae9cb52/image5-5.png" />
            
            </figure><p>Today, when we ask ourselves what it means to have a fast website, it means having a server less than 0.05 seconds away from your user, no matter where on Earth they are. This is only possible by adding new cities, partners, capacity, and cables — so let’s talk about those.</p>
    <div>
      <h3>New Cities</h3>
      <a href="#new-cities">
        
      </a>
    </div>
    <p>Cutting straight to the point, let’s start with cities and countries: in the last two-ish months, we’ve added another 17 cities (outside of mainland China) split across eight countries: <b>Guayaquil</b>, Ecuador; <b>Dammam</b>, Saudi Arabia; <b>Algiers</b>, Algeria; <b>Surat Thani</b>, Thailand; <b>Hagåtña,</b> Guam, United States; <b>Krasnoyarsk</b>, Russia; <b>Cagayan</b>, Philippines; and ten cities in Brazil: <b>Caçador</b>, <b>Ribeirão Preto</b>, <b>Brasília</b>, <b>Florianópolis</b>, <b>Sorocaba</b>, <b>Itajaí</b>, <b>Belém</b>, <b>Americana</b>, <b>Blumenau</b>, and <b>Belo Horizonte</b>.</p><p>Meanwhile, with our partner, JD Cloud and AI, we’re up to <b>37</b> cities in mainland China: <b>Anqing</b> and <b>Huainan</b>, Anhui; <b>Beijing</b>, Beijing; <b>Fuzhou</b> and <b>Quanzhou</b>, Fujian; <b>Lanzhou</b>, Gansu; <b>Foshan</b>, <b>Guangzhou</b>, and <b>Maoming</b>, Guangdong; <b>Guiyang</b>, Guizhou; <b>Chengmai</b> and <b>Haikou</b>, Hainan; <b>Langfang</b> and <b>Qinhuangdao</b>, Hebei; <b>Zhengzhou</b>, Henan; <b>Shiyan</b> and <b>Yichang</b>, Hubei; <b>Changde</b> and <b>Yiyang</b>, Hunan; <b>Hohhot</b>, Inner Mongolia; <b>Changzhou</b>, <b>Suqian</b>, and <b>Wuxi</b>, Jiangsu; <b>Nanchang</b> and <b>Xinyu</b>, Jiangxi; <b>Dalian</b> and <b>Shenyang</b>, Liaoning; <b>Xining</b>, Qinghai; <b>Baoji</b> and <b>Xianyang</b>, Shaanxi; <b>Jinan</b> and <b>Qingdao</b>, Shandong; <b>Shanghai</b>, Shanghai; <b>Chengdu</b>, Sichuan; <b>Jinhua</b>, <b>Quzhou</b>, and <b>Taizhou</b>, Zhejiang. These are subject to change: as we ramp up, we have been working with JD Cloud to “trial” cities for a few weeks or months to observe performance and tweak the cities to match.</p>
    <div>
      <h3>More Capacity: What and Why?</h3>
      <a href="#more-capacity-what-and-why">
        
      </a>
    </div>
    <p>In addition to all these new cities, we’re also proud to announce that we have seen a <b>3.5x increase</b> in external network capacity from the start of 2020 to now. This is just as key to our network strategy as new cities: it wouldn’t matter if we were in every city on Earth if we weren’t interconnected with other networks. <a href="/understanding-where-the-internet-isnt-good-enough-yet/">Last-mile ISPs will sometimes still “trombone” their traffic</a>, but in general, end users will get faster Internet as we interconnect more.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2ffYaUBtPlPgEOXiy3yB0X/14a99c2a44a90ab38573d642a189d16d/image4-6.png" />
            
            </figure><p>This interconnection is spread far and wide, both to user networks and those of website hosts and other major cloud networks. This has involved a lot of middleman-removal: rather than run fiber optics from our routers through a third-party network to an origin or user’s network, we’re running more and more Private Network Interconnects (PNIs) and, better yet, Cloudflare Network Interconnects (CNIs) to our customers.</p><p>These PNIs and CNIs can not only reduce egress costs for our customers (particularly <a href="https://www.cloudflare.com/bandwidth-alliance/">with our Bandwidth Alliance partners</a>) but also increase the speed, reliability, and privacy of connections. The fewer networks and less distance your Internet traffic flows through, the better off everyone is. To put some numbers on that, only 30% of this newly doubled capacity was transit, leaving 70% flowing directly either physically over PNIs/CNIs or logically over peering sessions at Internet exchange points.</p>
    <div>
      <h3>The Backbone</h3>
      <a href="#the-backbone">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3xbsht8vdjngl4BjfP8HAl/e49dd944fccbf27ad462de3627a5c6bb/image6.png" />
            
            </figure><p>At the same time as this increase in external capacity, we’ve quietly been adding hundreds of new segments to our backbone. Our backbone consists of dedicated fiber optic lines and reserved portions of wavelength that connect Cloudflare data centers together. This is split approximately 55/45 between “metro” capacity, which redundantly connects data centers in which we have a presence, and “long-haul” capacity, which connects Cloudflare data centers in different cities.</p><p>The backbone is used to increase the speed of our customer traffic, e.g., for <a href="https://www.cloudflare.com/products/argo-smart-routing/">Argo Smart Routing</a>, <a href="/introducing-smarter-tiered-cache-topology-generation/">Argo Tiered Caching</a>, and <a href="https://1.1.1.1/">WARP+</a>. Our backbone is like a private highway connecting cities, while public Internet routing is like local roads: not only does the backbone directly connect two cities, but it’s <a href="/argo-and-the-cloudflare-global-private-backbone/">reliably faster and sees fewer issues</a>. We’ll dive into some benchmarks of the speed improvements of the backbone in a more comprehensive future blog post.</p><p>The backbone is also more secure. While Cloudflare <a href="/rpki-details/">signs all of its BGP routes with RPKI</a>, <a href="/rpki-2020-fall-update/">pushes adjacent networks to use RPKI to avoid route hijacks</a>, and <a href="https://www.cloudflare.com/trust-hub/">encrypts external and internal traffic</a>, the most secure and private way to safeguard our users’ traffic is to keep it on-network as much as possible.</p><p>Internal load balancing between cities has also been greatly improved, thanks to the use of the backbone for traffic management with a technology we call Plurimog (a reference to our <a href="/unimog-cloudflares-edge-load-balancer/">in-colo Layer 4 load balancer, Unimog</a>). A surge of traffic into Portland can be shifted instantaneously over diverse links to Seattle, Denver, or San Jose with a single hop, without waiting for changes to propagate over anycast or running the risk of an interim increase in errors.</p><p>From an expansion perspective, two key areas of focus have been our undersea North America to Europe (transatlantic) and Asia to North America (transpacific) backbone rings. These links use geographically diverse subsea cable systems and connect into diverse routers and data centers on both ends — four transatlantic cables from North America to Europe, three transamerican cables connecting South and North America, and three transpacific cables connecting Asia and North America. User traffic coming from Los Angeles could travel to an origin as west as Singapore or as east as Moscow without leaving our network.</p><p>This rate of growth has been enabled by improved traffic forecast modeling, rapid internal feedback loops on link utilization, and more broadly by growing our teams and partnerships. We are creating a global view of capacity, pricing, and desirability of backbone links in the same way that we have for transit and peering. The result is a backbone that doubled in long-haul capacity this year, increased more than 800% from the start of last year, and will continue to expand to intelligently crisscross the globe.</p><p>The backbone has taken on a huge amount of traffic that would otherwise go over external transit and peering connections, freeing up capacity for when it is explicitly needed (last-hop routes, failover, etc.) and avoiding any outages on other major global networks (e.g., <a href="/analysis-of-todays-centurylink-level-3-outage/">CenturyLink</a>, <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">Verizon</a>).</p>
    <div>
      <h3>In Conclusion</h3>
      <a href="#in-conclusion">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4dL0k5EarrzYMsUQIGOMhJ/1c9b514741d37d38bab848b5e1764fef/image3-5.png" />
            
            </figure><p><i>A map of the world highlighting all 250+ cities in which Cloudflare is deployed.</i></p><p>More cities, capacity, and backbone are more steps as part of going from being the most global network on Earth to the most <i>local</i> one as well. We believe in providing security, privacy, and reliability for all — not just those who have the money to pay for something we consider fundamental Internet rights. We have seen the investment into our network pay huge dividends this past year.</p><p>Happy Speed Week!</p><p><i>Do you want to work on the future of a globally local network? Are you passionate about edge networks? Do you thrive in an exciting, rapid-growth environment? If so, good news: Cloudflare Infrastructure is hiring; check our open roles</i> <a href="https://www.cloudflare.com/careers/jobs/?department=Infrastructure&amp;location=default"><i>here</i></a><i>!</i></p><p><i>Alternatively — if you work at an ISP we aren’t already deployed with and want to bring this level of speed and control to your users, we’re here to make that happen. Please reach out to our Edge Partnerships team at </i><a><i>epp@cloudflare.com</i></a><i>.</i></p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div></div> ]]></content:encoded>
            <category><![CDATA[Speed Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Internet Performance]]></category>
            <guid isPermaLink="false">3H2LfF3dALtpfa7BdAAV5S</guid>
            <dc:creator>Jon Rolfe</dc:creator>
        </item>
        <item>
            <title><![CDATA[The EPYC journey continues to Milan in Cloudflare’s 11th generation Edge Server]]></title>
            <link>https://blog.cloudflare.com/the-epyc-journey-continues-to-milan-in-cloudflares-11th-generation-edge-server/</link>
            <pubDate>Tue, 31 Aug 2021 13:00:45 GMT</pubDate>
            <description><![CDATA[ At Cloudflare we aim to introduce a new server platform to our edge network every 12 to 18 months or so, to ensure that we keep up with the latest industry technologies and developments.  ]]></description>
            <content:encoded><![CDATA[ <p></p><p>When I was interviewing to join Cloudflare in 2014 as a member of the SRE team, we had just introduced our <a href="/a-tour-inside-cloudflares-latest-generation-servers/">generation 4 server</a>, and I was excited about the prospects. Since then, Cloudflare, the industry and I have all changed dramatically. The best thing about working for a rapidly growing company like Cloudflare is that as the company grows, new roles open up to enable career development. And so, having left the SRE team last year, I joined the recently formed hardware engineering team, a team that simply didn’t exist in 2014.</p><p>We aim to introduce a new server platform to our edge network every 12 to 18 months or so, to ensure that we keep up with the latest industry technologies and developments. We announced the <a href="/a-tour-inside-cloudflares-g9-servers/">generation 9 server</a> in October 2018 and we announced the <a href="/a-tour-inside-cloudflares-g9-servers/">generation 10 server</a> in February 2020. We consider this length of cycle optimal: short enough to stay nimble and take advantage of the latest technologies, but long enough to offset the time taken by our hardware engineers to test and validate the entire platform. When we are <a href="https://www.cloudflare.com/network/">shipping servers to over 200 cities</a> around the world with a variety of regulatory standards, it’s essential to get things right the first time.</p><p>We continually work with our silicon vendors to receive product roadmaps and stay on top of the latest technologies. Since mid-2020, the hardware engineering team at Cloudflare has been working on our generation 11 server.</p><p>Requests per Watt is one of our defining characteristics when testing new hardware and we use it to identify how much more efficient a new hardware generation is than the previous generation. We continually strive to reduce our operational costs and <a href="/the-climate-and-cloudflare/">power consumption reduction</a> is one of the most important parts of this. It’s good for the planet and we can fit more servers into a rack, reducing our physical footprint.</p><p>The design of these Generation 11 x86 servers has been in parallel with our efforts to design next-generation edge servers using the <a href="/arms-race-ampere-altra-takes-on-aws-graviton2/">Ampere Altra</a> Arm architecture. You can read more about our tests in a blog post by my colleague Sung and we will document our work on Arm at the edge in a subsequent blog post.</p><p>We evaluated Intel’s latest generation of “Ice Lake” Xeon processors. Although Intel’s chips were able to compete with AMD in terms of raw performance, the power consumption was several hundred watts higher per server - that’s enormous. This meant that Intel’s Performance per Watt was unattractive.</p><p>We previously described how we had deployed AMD EPYC 7642’s processors in our generation 10 server. This has 48 cores and is based on AMD’s 2nd generation EPYC architecture, code named Rome. For our generation 11 server, we evaluated 48, 56 and 64 core samples based on AMD’s 3rd generation EPYC architecture, code named Milan. We were interested to find that comparing the two 48 core processors directly, we saw a performance boost of several percent in the <a href="https://www.amd.com/en/events/epyc">3rd generation EPYC</a> architecture. We therefore had high hopes for the 56 core and 64 core chips.</p><p>So, based on the samples we received from our vendors and our subsequent testing, hardware from AMD and Ampere made the shortlist for our generation 11 server. On this occasion, we decided that Intel did not meet our requirements. However, it’s healthy that Intel and AMD compete and innovate in the x86 space and we look forward to seeing how Intel’s next generation shapes up.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ajXt7adRxT5V2FVBVw7IJ/56872f59f7e681093f6591fa6ef42840/IMG_4118.jpeg.jpeg" />
            
            </figure>
    <div>
      <h3>Testing and validation process</h3>
      <a href="#testing-and-validation-process">
        
      </a>
    </div>
    <p>Before we go on to talk about the hardware, I’d like to say a few words about the testing process we went through to test out generation 11 servers.</p><p>As we elected to proceed with AMD chips, we were able to use our generation 10 servers as our Engineering Validation Test platform, with the only changes being the new silicon and updated firmware. We were able to perform these upgrades ourselves in our hardware validation lab.</p><p>Cloudflare’s network is built with commodity hardware and we source the hardware from multiple vendors, known as ODMs (Original Design Manufacturer) who build the servers to our specifications.</p><p>When you are working with bleeding edge silicon and experimental firmware, not everything is plain sailing. We worked with one of our ODMs to eliminate an issue which was causing the Linux kernel to panic on boot. Once resolved, we used a variety of synthetic benchmarking tools to verify the performance including <a href="https://github.com/cloudflare/cf_benchmark">cf_benchmark</a>, as well as an internal tool which applies a synthetic load to our entire software stack.</p><p>Once we were satisfied, we ordered Design Validation Test samples, which were manufactured by our ODMs with the new silicon. We continued to test these and iron out the inevitable issues that arise when you are developing custom hardware. To ensure that performance matched our expectations, we used synthetic benchmarking to test the new silicon. We also began testing it in our production environment by gradually introducing customer traffic to them as confidence grew.</p><p>Once the issues were resolved, we ordered the Product Validation Test samples, which were again manufactured by our ODMs, taking into account the feedback obtained in the DVT phase. As these are intended to be production grade, we work with the broader Cloudflare teams to deploy these units like a mass production order.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5qZWpcqMs5De6LfHMCudjn/b7ca612abce657d53e51f2199ebbc7a9/20210722_113627.jpeg.jpeg" />
            
            </figure>
    <div>
      <h3>CPU</h3>
      <a href="#cpu">
        
      </a>
    </div>
    <p>Previously: AMD EPYC 7642 48-Core ProcessorNow: AMD EPYC 7713 64-Core Processor</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6aIqXZIeXOANPykcbW8tJJ/bc57b2023460a731e92e8c2e9b924afd/IMG_4203.jpeg.jpeg" />
            
            </figure>
<div><table><thead>
  <tr>
    <th></th>
    <th><a href="https://www.amd.com/en/products/cpu/amd-epyc-7642"><span>AMD EPYC 7642</span></a></th>
    <th><a href="https://www.amd.com/en/products/cpu/amd-epyc-7643"><span>AMD EPYC 7643</span></a></th>
    <th><a href="https://www.amd.com/en/products/cpu/amd-epyc-7663"><span>AMD EPYC 7663 </span></a></th>
    <th><a href="https://www.amd.com/en/products/cpu/amd-epyc-7713"><span>AMD EPYC 7713</span></a></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Status</span></td>
    <td><span>Incumbent</span></td>
    <td><span>Candidate</span></td>
    <td><span>Candidate</span></td>
    <td><span>Candidate</span></td>
  </tr>
  <tr>
    <td><span>Core Count</span></td>
    <td><span>48</span></td>
    <td><span>48</span></td>
    <td><span>56</span></td>
    <td><span>64</span></td>
  </tr>
  <tr>
    <td><span>Thread Count</span></td>
    <td><span>96</span></td>
    <td><span>96</span></td>
    <td><span>112</span></td>
    <td><span>128</span></td>
  </tr>
  <tr>
    <td><span>Base Clock</span></td>
    <td><span>2.3GHz</span></td>
    <td><span>2.3GHz</span></td>
    <td><span>2.0GHz</span></td>
    <td><span>2.0GHz</span></td>
  </tr>
  <tr>
    <td><span>Max Boost Clock</span></td>
    <td><span>3.3GHz</span></td>
    <td><span>3.6GHz</span></td>
    <td><span>3.5GHz</span></td>
    <td><span>3.675GHz</span></td>
  </tr>
  <tr>
    <td><span>Total L3 Cache</span></td>
    <td><span>256MB</span></td>
    <td><span>256MB</span></td>
    <td><span>256MB</span></td>
    <td><span>256MB</span></td>
  </tr>
  <tr>
    <td><span>Default TDP</span></td>
    <td><span>225W</span></td>
    <td><span>225W</span></td>
    <td><span>240W</span></td>
    <td><span>225W </span></td>
  </tr>
  <tr>
    <td><span>Configurable TDP</span></td>
    <td><span>240W</span></td>
    <td><span>240W</span></td>
    <td><span>240W</span></td>
    <td><span>240W</span></td>
  </tr>
</tbody></table></div><p>In the above chart, TDP refers to Thermal Design Power, a measure of the heat dissipated. All of the above processors have a configurable TDP - assuming the cooling solution is capable - giving more performance at the expense of increased power consumption. We tested all processors configured at their highest supported TDP.</p><p>The 64 core processors have 33% more cores than the 48 core processors so you might hypothesize that we would see a corresponding 33% increase in performance, although our benchmarks saw slightly more modest gains. This can be explained because the 64 core processors have lower base clock frequencies to fit within the same 225W power envelope.</p><p>In production testing, we found that the 64 core EPYC 7713 gave us around a 29% performance boost over the incumbent, whilst having similar power consumption and thermal properties.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3rYE9p57gOFlQ5xwL6D29t/aaf3624f32e1bf15ecf91db8c25dd96a/IMG_4196.jpeg.jpeg" />
            
            </figure>
    <div>
      <h2>Memory</h2>
      <a href="#memory">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Z86awdPPL6Yt3G5GRZP41/c76caa9c5cbf76f718e5b3da99a989b2/_MG_4100.jpeg.jpeg" />
            
            </figure><p>Previously: 256GB DDR4-2933Now: 384GB DDR4-3200</p><p>Having made a decision about the processor, the next step was to determine the optimal amount of memory for our workload. We ran a series of experiments with our chosen EPYC 7713 processor and 256GB, 384GB and 512GB memory configurations. We started off by running synthetic benchmarks with tools such as <a href="https://www.cs.virginia.edu/stream/">STREAM</a> to ensure that none of the configurations performed unexpectedly poorly and to generate a baseline understanding of the performance.</p><p>After the synthetic benchmarks, we proceeded to test the various configurations with production workloads to empirically determine the optimal quantity. We use Prometheus and Grafana to gather and display a rich set of metrics from all of our servers so that we can monitor and spot trends, and we re-used the same infrastructure for our performance analysis.</p><p>As well as measuring available memory, previous experience has shown us that one of the best ways to ensure that we have enough memory is to observe request latency and disk IO performance. If there is insufficient memory, we expect to see request latency and disk IO volume and latency to increase. The reason for this is that our core HTTP server uses memory to cache web assets and if there is insufficient memory the assets will be ejected from memory prematurely and more assets will be fetched from disk instead of memory, degrading performance.</p><p>Like most things in life, it’s a balancing act. We want enough memory to take advantage of the fact that serving web assets directly from memory is much faster than even the best NVMe disks. We also want to future proof our platform to enable the new features such as the ones that we recently announced in <a href="/tag/security-week/">security week</a> and <a href="/tag/developer-week/">developer week</a>. However, we don’t want to spend unnecessarily on excess memory that will never be used. We found that the 512GB configuration did not provide a performance boost to justify the extra cost and settled on the 384GB configuration.</p><p>We also tested the performance impact of switching from DDR4-2933 to DDR4-3200 memory. We found that it provided a performance boost of several percent and the pricing has improved to the point where it is cost beneficial to make the change.</p>
    <div>
      <h2>Disk</h2>
      <a href="#disk">
        
      </a>
    </div>
    <p>Previously: 3x Samsung PM983 x 960GBNow: 2x Samsung PM9A3 x 1.92TB</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1bzxh2LTLqrPAHlIWynt9t/10c8fe856bf9e3a6feed7a8624451c66/20210722_113829.jpeg.jpeg" />
            
            </figure><p>We validated samples by studying the manufacturer’s data sheets and <a href="https://fio.readthedocs.io/en/latest/fio_doc.html">testing using fio</a> to ensure that the results being obtained in our test environment were in line with the published specifications. We also developed an automation framework to help compare different drive models using fio. The framework helps us to restore the drives close to factory settings, precondition the drives, perform the sequential and random tests in our environment, and analyze the data results to evaluate the bandwidth and latency results. Since our SSD samples were arriving in our test center at different months, having an automated framework helped in dealing with speedy evaluations by reducing our time spent testing and doing analysis.</p><p>For Gen 11 we decided to move to a 2x 2TB configuration from the original 3x 1TB configuration giving us an extra 1 TB of storage. This also meant we could use the higher performance of a 2TB drive and save around 6W of power since there is one less SSD.</p><p>After analyzing the performances of various 2TB drives, their latencies and endurances, we chose Samsung’s PM9A3 SSDs as our Gen11 drives. The results we obtained below were consistent with the manufacturer's claims.</p><p>Sequential performance:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5yOK70l1XM5JJij5aeUHTQ/5653a3e28925e06407b88193780acd77/pasted-image-0-5.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5nifB3rxB1gHWX6JBxBhfY/707192c1a778c46a15aff4b419d42c13/pasted-image-0--1--2.png" />
            
            </figure><p>Random Performance:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/A8p9s1IjM4UnmNmaMPx0T/358e01b8832c5e1ed7c1479a05c6cd5a/pasted-image-0--2-.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/61rhIo4w8sSIYbUiGfHRn9/d07ed49c5b957e3b1dde1db58aa1e484/pasted-image-0--3-.png" />
            
            </figure><p>Compared to our previous generation drives, we could see a 1.5x - 2x improvement in read and write bandwidths. The higher values for the PM9A3 can be attributed to the fact that these are PCIe 4.0 drives, have more intelligent SSD controllers and an upgraded NAND architecture.</p>
    <div>
      <h3>Network</h3>
      <a href="#network">
        
      </a>
    </div>
    <p>Previously: Mellanox ConnectX-4 dual-port 25GNow: Mellanox ConnectX-4 dual-port 25G</p><p>There is no change on the network front; the Mellanox ConnectX-4 is a solid performer which continues to meet our needs. We investigated higher speed Ethernet, but we do not currently see this as beneficial. Cloudflare’s network is built on cheap commodity hardware and the highly distributed nature of Cloudflare’s network means we don’t have discrete DDoS scrubbing centres. All points of presence operate as scrubbing centres. This means that we distribute the load across our entire network and do not need to employ higher speed and more expensive Ethernet devices.</p>
    <div>
      <h3>Open source firmware</h3>
      <a href="#open-source-firmware">
        
      </a>
    </div>
    <p>Transparency, security and integrity is absolutely critical to us at Cloudflare. Last year, we described how we had <a href="/anchoring-trust-a-hardware-secure-boot-story/">deployed Platform Secure Boot</a> to create trust that we were running the software that we thought we were.</p><p>Now, we are pleased to announce that we are deploying open source firmware to our servers using OpenBMC. With access to the source code, we have been able to configure BMC features such as the fan PID controller, having BIOS POST codes recorded and accessible, and managing networking ports and devices. Prior to OpenBMC, requesting these features from our vendors led to varying results and misunderstandings of the scope and capabilities of the BMC. After working with the BMC source code much more directly, we have the flexibility to work on features ourselves to our liking, or understand why the BMC is incapable of running our desired software.</p><p>Whilst our current BMC is an industry standard, we feel that OpenBMC better suits our needs and gives us advantages such as allowing us to deal with upstream security issues without a dependency on our vendors. Some opportunities with security include integration of desired authentication modules, usage of specific software packages, staying up to date with the latest Linux kernel, and controlling a variety of attack vectors. Because we have a kernel lockdown implemented, flashing tooling is difficult to use in our environment. With access to source code of the flashing tools, we have an understanding of what the tools need access to, and assess whether or not this meets our standard of security.</p>
    <div>
      <h3>Summary</h3>
      <a href="#summary">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/01wz0PNamn0NdOlkgxbgUs/104ce7600fe455376a0e39538c929882/IMG_4195.jpeg.jpeg" />
            
            </figure><p>The jump between our generation 9 and generation 10 servers was enormous. To summarise, we changed from a dual-socket Intel platform to a single socket AMD platform. We upgraded the SATA SSDs to NVMe storage devices, and physically the multi-node chassis changed to a 1U form factor.</p><p>At the start of the generation 11 project we weren’t sure if we would be making such radical changes again. However, after a thorough testing of the latest chips and a review of how well the generation 10 server has performed in production for over a year, our generation 11 server built upon the solid foundations of generation 10 and ended up as a refinement rather than total revamp. Despite this, and bearing in mind that performance varies by time of day and geography, we are pleased that generation 11 is capable of serving approximately 29% more requests than generation 10 without an increase in power consumption.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1hWA14j4FrNG6KYDaHVFI4/30172744bf9b0d94a08c8a8abdb34b3f/Screenshot-2021-06-22-at-15.27.27.png" />
            
            </figure><p>Thanks to Denny Mathew and Ryan Chow’s work on benchmarking and OpenBMC, respectively.</p><p>If you are interested in working with bleeding edge hardware, open source server firmware, solving interesting problems, helping to improve our performance, and are interested in helping us work on our generation 12 server platform (amongst many other things!), <a href="https://www.cloudflare.com/careers/jobs/?department=Infrastructure&amp;location=default">we’re hiring</a>.</p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[EPYC]]></category>
            <category><![CDATA[AMD]]></category>
            <category><![CDATA[Hardware]]></category>
            <guid isPermaLink="false">1c9Tnv6ewSXOVymWrMDUzF</guid>
            <dc:creator>Chris Howells</dc:creator>
        </item>
    </channel>
</rss>