
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sun, 05 Apr 2026 04:06:30 GMT</lastBuildDate>
        <item>
            <title><![CDATA[ASPA: making Internet routing more secure]]></title>
            <link>https://blog.cloudflare.com/aspa-secure-internet/</link>
            <pubDate>Fri, 27 Feb 2026 06:00:00 GMT</pubDate>
            <description><![CDATA[ ASPA is the cryptographic upgrade for BGP that helps prevent route leaks by verifying the path network traffic takes. New features in Cloudflare Radar make tracking its adoption easy. ]]></description>
            <content:encoded><![CDATA[ <p>Internet traffic relies on the <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/"><u>Border Gateway Protocol (BGP)</u></a> to find its way between networks. However, this traffic can sometimes be misdirected due to configuration errors or malicious actions. When traffic is routed through networks it was not intended to pass through, it is known as a <a href="https://datatracker.ietf.org/doc/html/rfc7908"><u>route leak</u></a>. We have <a href="https://blog.cloudflare.com/bgp-route-leak-venezuela/"><u>written on our blog</u></a> <a href="https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024/"><u>multiple times</u></a> about <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/"><u>BGP route leaks</u></a> and the impact they have on Internet routing, and a few times we have even alluded to a future of path verification in BGP. </p><p>While the network community has made significant progress in verifying the final destination of Internet traffic, securing the actual path it takes to get there remains a key challenge for maintaining a reliable Internet. To address this, the industry is adopting a new cryptographic standard called <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>ASPA (Autonomous System Provider Authorization)</u></a>, which is designed to validate the entire path of network traffic and prevent route leaks.</p><p>To help the community track the rollout of this standard, Cloudflare Radar has introduced a new ASPA deployment monitoring feature. This view allows users to observe ASPA adoption trends over time across the five<a href="https://en.wikipedia.org/wiki/Regional_Internet_registry"><u> Regional Internet Registries (RIRs)</u></a>, and view ASPA records and changes over time at the<a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u> Autonomous System (AS)</u></a> level.</p>
    <div>
      <h2>What is ASPA?</h2>
      <a href="#what-is-aspa">
        
      </a>
    </div>
    <p>To understand how ASPA works, it is helpful to look at how the Internet currently secures traffic destinations.</p><p>Today, networks use a secure infrastructure system called <a href="https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure"><u>RPKI (Resource Public Key Infrastructure)</u></a>, which has seen <a href="https://blog.apnic.net/2026/02/20/rpkis-2025-year-in-review/"><u>significant deployment growth</u></a> over the past few years. Within RPKI, networks publish specific cryptographic records called ROAs (Route Origin Authorizations). A ROA acts as a verifiable digital ID card, confirming that an Autonomous System (AS) is officially authorized to announce specific IP addresses. This addresses the "origin hijacks" issue, where one network attempts to impersonate another.</p><p><a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>ASPA (Autonomous System Provider Authorization)</u></a> builds directly on this foundation. While a ROA verifies the <i>destination</i>, an ASPA record verifies the <i>journey</i>.</p><p>When data travels across the Internet, it keeps a running log of every network it passes through. In BGP, this log is known as the <a href="https://datatracker.ietf.org/doc/html/rfc4271#section-5.1.2"><code><u>AS_PATH</u></code></a> (Autonomous System Path). ASPA provides networks with a way to officially publish a list of their authorized upstream providers within the RPKI system. This allows any receiving network to look at the <code>AS_PATH</code>, check the associated ASPA records, and verify that the traffic only traveled through an approved chain of networks.</p><p>A ROA helps ensure the traffic arrives at the correct destination, ASPA ensures the traffic takes an intended, authorized route to get there. Let’s take a look at how path evaluation actually works in practice.</p>
    <div>
      <h2>Route leak detection with ASPA</h2>
      <a href="#route-leak-detection-with-aspa">
        
      </a>
    </div>
    <p>How does ASPA know if a route is a <i>detour</i>? It relies on the hierarchy of the Internet.</p><p>In a healthy Internet routing topology (e.g. <a href="https://ieeexplore.ieee.org/document/6363987"><u>“valley-free” routing</u></a>), traffic generally follows a specific path: it travels "up" from a customer to a large provider (like a major ISP), optionally crosses over to another big provider, and then flows "down" to the destination. You can visualize this as a “mountain” shape:</p><ol><li><p><b>The Up-Ramp:</b> Traffic starts at a Customer and travels "up" through larger and larger Providers (ISPs), where ISPs pay other ISPs to transit traffic for them.</p></li><li><p><b>The Apex:</b> It reaches the top tier of the Internet backbone and may cross a single peering link.</p></li></ol><p><b>The Down-Ramp:</b> It travels "down" through providers to reach the destination Customer.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1VGuSHfq6GcQZUYLGmoDH3/a1486f40c16e568f32ca2fa81d58ac41/1.png" />
          </figure><p><sup><i>A visualization of "valley-free" routing. Routes propagate up to a provider, optionally across one peering link, and down to a customer.</i></sup></p><p>In this model, a route leak is like a valley, or dip. One type of such leak happens when traffic goes down to a customer and then unexpectedly tries to go back <i>up</i> to another provider. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6eoaEpdIJpCCLbMnNZD5ob/7ceca2a98f2252e8161915b942bf7dbd/2.png" />
          </figure><p>This "down-and-up" movement is undesirable as customers aren't intended nor equipped to transit traffic between two larger network providers.</p>
    <div>
      <h4>How ASPA validation works</h4>
      <a href="#how-aspa-validation-works">
        
      </a>
    </div>
    <p>ASPA gives network operators a cryptographic way to declare their <i>authorized providers,</i> enabling receiving networks to verify that an AS path follows this expected structure.</p><p>ASPA validates AS paths by checking the “chain of relationships” from both ends of the routes propagation:</p><ul><li><p><b>Checking the Up-Ramp:</b> The check starts at the origin and moves forward. At every hop, it asks: <i>"Did this network authorize the next network as a Provider?"</i> It keeps going until the chain stops.</p></li><li><p><b>Checking the Down-Ramp:</b> It does the same thing from the destination of a BGP update, moving backward.</p></li></ul><p>If the "Up" path and the "Down" path overlap or meet at the top, the route is <b>Valid</b>. The mountain shape is intact.</p><p>However, if the two valid paths <b>do not meet</b>, i.e. there is a gap in the middle where authorization is missing or invalid, ASPA reports such paths as problematic. That gap represents the "valley" or the leak.</p>
    <div>
      <h4>Validation process example</h4>
      <a href="#validation-process-example">
        
      </a>
    </div>
    <p>Let’s look at a scenario where a network (AS65539) receives a bad route from a customer (AS65538).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7kn8W6c7CaPcMLMycjS5NO/59036dc52a942870e9bb0e377f235dd4/3.png" />
          </figure><p>The customer (AS65538) is trying to send traffic received from one provider (AS65537) "up" to another provider (AS65539), acting like a bridge between providers. This is a <a href="https://datatracker.ietf.org/doc/html/rfc7908#autoid-4"><u>classic route leak</u></a>. Now let’s walk the ASPA validation process.</p><ol><li><p>We check the <b>Up-Ramp</b>: The original source (AS65536) authorizes its provider. (Check passes).</p></li><li><p>We check the <b>Down-Ramp</b>: We start from the destination and look back. We see the customer (AS65538).</p></li><li><p><b>The Mismatch:</b> The up-ramp ends at AS65537, while the down-ramp ends at 65538. The two ramps do not connect.</p></li></ol><p>Because the "Up" path and "Down" path fail to connect, the system flags this as ASPA <b>Invalid</b>. ASPA is required to do this path validation, as without signed ASPA objects in RPKI, we cannot find which networks are authorized to advertise which prefixes to whom. By signing a list of provider networks for each AS, we know which networks should be able to propagate prefixes laterally or upstream.</p>
    <div>
      <h3>ASPA against forged-origin hijacks</h3>
      <a href="#aspa-against-forged-origin-hijacks">
        
      </a>
    </div>
    <p>ASPA can serve as an effective defense against <a href="https://www.usenix.org/conference/nsdi24/presentation/holterbach"><u>forged-origin hijacks</u></a>, where an attacker bypasses Route Origin Validation (ROV) by pretending and advertising a BGP path to a real origin prefix. Although the origin AS remains correct, the relationship between the hijacker and the victim is fabricated.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6MNVRoNDzxlHDVGTP2nRPw/87485ad246baa734eef3192fd48012a8/4.png" />
          </figure><p>ASPA exposes this deception by allowing the victim network to cryptographically declare its actual authorized providers; because the hijacker is not on that authorized list, the path is rejected as invalid, effectively preventing the malicious redirection.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2KGtsKWBdlNFySIswRb5rD/d26b266c2dc942be9b9f6c6ec383843b/5.png" />
          </figure><p>ASPA cannot fully protect against forged-origin hijacks, however. There is still at least one case where not even ASPA validation can fully prevent this type of attack on a network. An example of a forged-origin hijack that ASPA cannot account for is when a provider forges a path advertisement <i>to their customer.</i></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3DPYEXwSUPmWUWvsyxFJHX/b059dae85cf764fdcd5a5257f5ebc373/6.png" />
          </figure><p>Essentially, a provider could “fake” a peering link with another AS to attract traffic from a customer with a short AS_PATH length, even when no such peering link exists. ASPA does not prevent this path forgery by the provider, because ASPA only works off of provider information and knows nothing specific about peering relationships.</p><p>So while ASPA can be an effective means of rejecting forged-origin hijack routes, there are still some rare cases where it will be ineffective, and those are worth noting.</p>
    <div>
      <h2>Creating ASPA objects: just a few clicks away</h2>
      <a href="#creating-aspa-objects-just-a-few-clicks-away">
        
      </a>
    </div>
    <p>Creating an ASPA object for your network (or Autonomous System) is now a simple process in registries like <a href="https://labs.ripe.net/author/tim_bruijnzeels/aspa-in-the-rpki-dashboard-a-new-layer-of-routing-security/"><u>RIPE</u></a> and <a href="https://www.arin.net/announcements/20260120/"><u>ARIN</u></a>. All you need is your AS number and the AS numbers of the providers you purchase Internet transit service from. These are the authorized upstream networks you trust to announce your IP addresses to the wider Internet. In the opposite direction, these are also the networks you authorize to send you a full routing table, which acts as the complete map of how to reach the rest of the Internet.</p><p>We’d like to show you just how easy creating an ASPA object is with a quick example. </p><p>Say we need to create the ASPA object for AS203898, an AS we use for our Cloudflare London office Internet. At the time of writing we have three Internet providers for the office: AS8220, AS2860, and AS1273. This means we will create an ASPA object for AS203898 with those three provider members in a list.</p><p>First, we log into the RIPE <a href="https://dashboard.rpki.ripe.net/#overview"><u>RPKI dashboard</u></a> and navigate to the <a href="https://dashboard.rpki.ripe.net/#aspa"><u>ASPA</u></a> section:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3CCFItZpP8JbYCotDGfuM3/c7ad0041ceea4f48c37ff59f416f8242/7.png" />
          </figure><p>Then, we click on “Create ASPA” for the object we want to create an ASPA object for. From there, we just fill in the providers for that AS. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6ABmfhRQQRbLhMat6Ug01K/3a9d73ad8ade315416c4cc6eb9073ada/8.png" />
          </figure><p>It’s as simple as that. After just a short period of waiting, we can query the global RPKI ecosystem and find our ASPA object for AS203898 with the providers we defined. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/xAKD1b704fg7SMxeg867P/59ce564867565331e2d531de15dc7e87/Screenshot_2026-02-27_at_11.09.55.png" />
          </figure><p>It’s a similar story with <a href="https://www.arin.net/"><u>ARIN</u></a>, the only other <a href="https://en.wikipedia.org/wiki/Regional_Internet_registry"><u>Regional Internet Registries (RIRs)</u></a> that currently supports the creation of ASPA objects. Log in to <a href="https://account.arin.net/public/login"><u>ARIN online,</u></a> then navigate to Routing Security, and click “Manage RPKI”.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/PTCdyldcTc1Uc0iazZlg4/51ed97a8ef0d095b947f7ab2bf4b1fd3/9.png" />
          </figure><p>From there, you’ll be able to click on “Create ASPA”. In this example, we will create an object for another one of our ASNs, AS400095.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bLLHAeU4Eaz6RPgSPDzn9/a482b6c7ed4ef78f7346ca80c9a5ba46/10.png" />
          </figure><p>And that’s it – now we have created our ASPA object for AS40095 with provider AS0.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/9yOZPpMH4olQXu2DNpJPO/cca432fd82e61c6492987b7ccedbdc57/11.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6emDXxMbU0BSKk5BVXUdJ6/132b9a7279b93e3ae0329dec83a9bfac/Screenshot_2026-02-27_at_11.11.31.png" />
          </figure><p>The “AS0” provider entry is special when used, and means the AS owner attests there are <b>no</b> valid upstream providers for their network. By definition this means every transit-free Tier-1 network should eventually sign an ASPA with only “AS0” in their object, if they truly only have peer and customer relationships.</p>
    <div>
      <h2>New ASPA features in Cloudflare Radar </h2>
      <a href="#new-aspa-features-in-cloudflare-radar">
        
      </a>
    </div>
    <p>We have added a new ASPA deployment monitoring feature to <a href="https://radar.cloudflare.com/"><u>Cloudflare Radar</u></a>. The new ASPA deployment view allows users to examine the growth of ASPA adoption over time, with the ability to visualize trends across the five <a href="https://en.wikipedia.org/wiki/Regional_Internet_registry"><u>Regional Internet Registries</u></a> (RIRs) based on AS registration. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7FoW86CVloqBqZO7wcOiq8/f5f2973db227b8184127f76fdad64dc4/12.png" />
          </figure><p>We have also integrated ASPA data directly into the country/region and ASN routing pages. Users can now track how different locations are progressing in securing their infrastructure, based on the associated ASPA records from the customer ASNs registered locally.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1tBhOpIxc6tNOJXWPPAk4U/7c334c9ddb089eb823ce23eb49ddbdc3/13.png" />
          </figure><p>There are also new features when you zoom into a particular Autonomous System (AS), for example <a href="https://radar.cloudflare.com/routing/AS203898#connectivity"><u>AS203898</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nTBPD0PgzPOS0eOxpG3rW/65d9b312be2e81e4c59164c98b7b6276/14.png" />
          </figure><p>We can see whether a network’s observed BGP upstream providers are ASPA authorized, their full list of providers in their ASPA object, and the timeline of ASPA changes that involve their AS.</p>
    <div>
      <h2>The road to better routing security</h2>
      <a href="#the-road-to-better-routing-security">
        
      </a>
    </div>
    <p>With ASPA finally becoming a reality, we have our cryptographic upgrade for Internet path validation. However, those who have been around since the start of RPKI for route origin validation know <a href="https://manrs.org/2023/05/estimating-the-timeline-for-aspa-deployment/"><u>this will be a long road</u></a> to actually providing significant value on the Internet. Changes are needed to RPKI Relaying Party (RP) packages, signer implementations, RTR (RPKI-to-Router protocol) software, and BGP implementations to actually use ASPA objects and validate paths with them.</p><p>In addition to ASPA adoption, operators should also configure BGP roles as described within <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234</u></a>. The BGP roles configured on BGP sessions will help future ASPA implementations on routers <a href="https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-aspa-verification-24#section-6.3"><u>decide which algorithm to apply</u></a>: <i>upstream</i> or <i>downstream</i>. In other words, BGP roles give us the power as operators to directly tie our intended BGP relationships with another AS to sessions with those neighbors. Check with your routing vendors and make sure they support <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234 BGP roles and OTC</u></a> (Only-to-Customer) attribute implementation.</p><p>To get the most out of ASPA, we encourage everyone to create their ASPA objects for their AS<i>. </i>Creating and maintaining these ASPA objects requires careful attention. In the future, as networks use these records to actively block invalid paths, omitting a legitimate provider could cause traffic to be dropped. However, managing this risk is no different from how networks already handle Route Origin Authorizations (ROAs) today. ASPA is the necessary cryptographic upgrade for Internet path validation, and we’re happy it’s here!</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Routing]]></category>
            <category><![CDATA[Routing Security]]></category>
            <category><![CDATA[Radar]]></category>
            <guid isPermaLink="false">5NwDf8fspgoSx9Pgcx1xLy</guid>
            <dc:creator>Mingwei Zhang</dc:creator>
            <dc:creator>Bryton Herdes</dc:creator>
        </item>
        <item>
            <title><![CDATA[Route leak incident on January 22, 2026]]></title>
            <link>https://blog.cloudflare.com/route-leak-incident-january-22-2026/</link>
            <pubDate>Fri, 23 Jan 2026 14:00:00 GMT</pubDate>
            <description><![CDATA[ An automated routing policy configuration error caused us to leak some Border Gateway Protocol prefixes unintentionally from a router at our Miami data center. We discuss the impact and the changes we are implementing as a result. ]]></description>
            <content:encoded><![CDATA[ <p>On January 22, 2026, an automated routing policy configuration error caused us to leak some <a href="http://cloudflare.com/learning/security/glossary/what-is-bgp/"><u>Border Gateway Protocol (BGP)</u></a> prefixes unintentionally from a router at our data center in Miami, Florida. While the route leak caused some impact to Cloudflare customers, multiple external parties were also affected because their traffic was accidentally funnelled through our Miami data center location.</p><p>The route leak lasted 25 minutes, causing congestion on some of our backbone infrastructure in Miami, elevated loss for some Cloudflare customer traffic, and higher latency for traffic across these links. Additionally, some traffic was discarded by firewall filters on our routers that are designed to only accept traffic for Cloudflare services and our customers.</p><p>While we’ve written about route leaks before, we rarely find ourselves causing them. This route leak was the result of an accidental misconfiguration on a router in Cloudflare’s network, and only affected IPv6 traffic. We sincerely apologize to the users, customers, and networks we impacted yesterday as a result of this BGP route leak.</p>
    <div>
      <h3>BGP route leaks </h3>
      <a href="#bgp-route-leaks">
        
      </a>
    </div>
    <p>We have <a href="https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/"><u>written multiple times</u></a> about <a href="https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024/"><u>BGP route leaks</u></a>, and we even record <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/"><u>route leak events</u></a> on Cloudflare Radar for anyone to view and learn from. To get a fuller understanding of what route leaks are, you can refer to this <a href="https://blog.cloudflare.com/bgp-route-leak-venezuela/#background-bgp-route-leaks"><u>detailed background section</u></a>, or refer to the formal definition within <a href="https://datatracker.ietf.org/doc/html/rfc7908"><u>RFC7908</u></a>. </p><p>Essentially, a route leak occurs when a network tells the broader Internet to send it traffic that it's not supposed to forward. Technically, a route leak occurs when a network, or Autonomous System (AS), appears unexpectedly in an AS path. An AS path is what BGP uses to determine the path across the Internet to a final destination. An example of an anomalous AS path indicative of a route leak would be finding a network sending routes received from a peer to a provider.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/30z4NDtf6DjVQZOfZatGUX/6ff06eb02c61d8e818d9da8ecd87c1c8/BLOG-3135_2.png" />
          </figure><p>During this type of route leak, the rules of <a href="https://people.eecs.berkeley.edu/~sylvia/cs268-2019/papers/gao-rexford.pdf"><u>valley-free routing</u></a> are violated, as BGP updates are sent from AS64501 to their peer (AS64502), and then unexpectedly up to a provider (AS64503). Oftentimes the leaker, in this case AS64502, is not prepared to handle the amount of traffic they are going to receive and may not even have firewall filters configured to accept all of the traffic coming in their direction. In simple terms, once a route update is sent to a peer or provider, it should only be sent further to customers and not to another peer or provider AS.</p><p>During the incident on January 22, we caused a similar kind of route leak, in which we took routes from some of our peers and redistributed them in Miami to some of our peers and providers. According to the route leak definitions in RFC7908, we caused a mixture of Type 3 and Type 4 route leaks on the Internet. </p>
    <div>
      <h3>Timeline</h3>
      <a href="#timeline">
        
      </a>
    </div>
    <table><tr><th><p><b>Time (UTC)</b></p></th><th><p><b>Event</b></p></th></tr><tr><td><p>2026-01-22 19:52 UTC</p></td><td><p>A change that ultimately triggers the routing policy bug is merged in our network automation code repository</p></td></tr><tr><td><p>2026-01-22 20:25 UTC</p></td><td><p>Automation is run on single Miami edge-router resulting in unexpected advertisements to BGP transit providers and peers</p><p><b>IMPACT START</b></p></td></tr><tr><td><p>2026-01-22 20:40 UTC</p></td><td><p>Network team begins investigating unintended route advertisements from Miami</p></td></tr><tr><td><p>2026-01-22 20:44 UTC</p></td><td><p>Incident is raised to coordinate response</p></td></tr><tr><td><p>2026-01-22 20:50 UTC</p></td><td><p>The bad configuration change is manually reverted by a network operator, and automation is paused for the router, so it cannot run again</p><p><b>IMPACT STOP</b></p></td></tr><tr><td><p>2026-01-22 21:47 UTC</p></td><td><p>The change that triggered the leak is reverted from our code repository</p></td></tr><tr><td><p>2026-01-22 22:07 UTC</p></td><td><p>Automation is confirmed by operators to be healthy to run again on the Miami router, without the routing policy bug</p></td></tr><tr><td><p>2026-01-22 22:40 UTC</p></td><td><p>Automation is unpaused on the single router in Miami</p></td></tr></table>
    <div>
      <h3>What happened: the configuration error</h3>
      <a href="#what-happened-the-configuration-error">
        
      </a>
    </div>
    <p>On January 22, 2026, at 20:25 UTC, we pushed a change via our policy automation platform to remove the BGP announcements from Miami for one of our data centers in Bogotá, Colombia. This was purposeful, as we previously forwarded some IPv6 traffic through Miami toward the Bogotá data center, but recent infrastructure upgrades removed the need for us to do so.</p><p>This change generated the following diff (a program that <a href="https://www.google.com/search?sca_esv=3236b0192813a1a3&amp;rlz=1C5GCEM_enUS1183US1183&amp;sxsrf=ANbL-n4_5E8v7Ar8tpKKnczz7xfci6HL8w:1769145786825&amp;q=compares&amp;si=AL3DRZGCrnAF0R35UNPJcgaBbCFaNwxQEU_o22EUn2GaHpSR2UyenaROnahi_5cmhKmzHjtezT-J9hw3KLJeTkLeyo7_nJgoJebLkbDRWvoJl5t5oX8bAPI%3D&amp;expnd=1&amp;sa=X&amp;ved=2ahUKEwiW1bfR9aCSAxXaOjQIHSIBDMoQyecJegQIKhAR">compares</a> configuration files in order to determine how or whether they differ):</p>
            <pre><code>[edit policy-options policy-statement 6-COGENT-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-COMCAST-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-GTT-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-LEVEL3-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PRIVATE-PEER-ANYCAST-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PUBLIC-PEER-ANYCAST-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PUBLIC-PEER-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-TELEFONICA-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-TELIA-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;</code></pre>
            <p>While this policy change looks innocent at a glance, only removing the prefix lists containing BOG04 unicast prefixes resulted in a policy that was too permissive:</p>
            <pre><code>policy-options policy-statement 6-TELIA-ACCEPT-EXPORT {
    term ADV-SITELOCAL-GRE-RECEIVER {
        from route-type internal;
        then {
            community add STATIC-ROUTE;
            community add SITE-LOCAL-ROUTE;
            community add MIA01;
            community add NORTH-AMERICA;
            accept;
        }
    }
}
</code></pre>
            <p>The policy would now mark every prefix of type “internal” as acceptable, and proceed to add some informative communities to all matching prefixes. But more importantly, the policy also accepted the route through the policy filter, which resulted in the prefix — which was intended to be “internal” —  being advertised externally. This is an issue because the “route-type internal” match in JunOS or JunOS EVO (the operating systems used by <a href="https://www.hpe.com/us/en/home.html"><u>HPE Juniper Networks</u></a> devices) will match any non-external route type, such as Internal BGP (IBGP) routes, which is what happened here.</p><p>As a result, all IPv6 prefixes that Cloudflare redistributes internally across the backbone were accepted by this policy, and advertised to all our BGP neighbors in Miami. This is unfortunately very similar to the outage we experienced in 2020, on which you can read more <a href="https://blog.cloudflare.com/cloudflare-outage-on-july-17-2020/"><u>on our blog</u></a>.</p><p>When the policy misconfiguration was applied at 20:25 UTC, a series of unintended BGP updates were sent from AS13335 to peers and providers in Miami. These BGP updates are viewable historically by looking at MRT files with the <a href="https://github.com/bgpkit/monocle"><u>monocle</u></a> tool or using <a href="https://stat.ripe.net/bgplay/2a03%3A2880%3Af312%3A%3A%2F48#starttime=1769112000&amp;endtime=1769115659&amp;instant=56,1769113845"><u>RIPE BGPlay</u></a>. </p>
            <pre><code>➜  ~ monocle search --start-ts 2026-01-22T20:24:00Z --end-ts 2026-01-22T20:30:00Z --as-path ".*13335[ \d$]32934$*"
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f077::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f091::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f16f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f17c::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f26f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f27c::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f33f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f17c::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f27c::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f091::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f091::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f17c::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f27c::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
{trimmed}
</code></pre>
            <p><sup><i>In the monocle output seen above, we have the timestamp of our BGP update, followed by the next-hop in the announcement, the ASN of the network feeding a given route-collector, the prefix involved, and the AS path and BGP communities if any are found. At the end of the output per-line, we also find the route-collector instance.</i></sup></p><p>Looking at the first update for prefix 2a03:2880:f077::/48, the AS path is <i>64112 22850 174 3356 13335 32934</i>. This means we (AS13335) took the prefix received from Meta (AS32934), our peer, and then advertised it toward Lumen (AS3356), one of our upstream transit providers. We know this is a route leak as routes received from peers are only meant to be readvertised to downstream (customer) networks, not laterally to other peers or up to providers.</p><p>As a result of the leak and the forwarding of unintended traffic into our Miami router from providers and peers, we experienced congestion on our backbone between Miami and Atlanta, as you can see in the graph below. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/SIiBSb7qnfStZ0jQAZ8ne/14009779b7551e4f26c4cc3ae2c1141b/BLOG-3135_3.png" />
          </figure><p>This would have resulted in elevated loss for some Cloudflare customer traffic, and higher latency than usual for traffic traversing these links. In addition to this congestion, the networks whose prefixes we leaked would have had their traffic discarded by firewall filters on our routers that are designed to only accept traffic for Cloudflare services and our customers. At peak, we discarded around 12Gbps of traffic ingressing our router in Miami for these non-downstream prefixes. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/jMaMLbijdtS8GYZOVzwoX/40325907032cbb8d27f00bc561191d23/BLOG-3135_4.png" />
          </figure>
    <div>
      <h3>Follow-ups and preventing route leaks </h3>
      <a href="#follow-ups-and-preventing-route-leaks">
        
      </a>
    </div>
    <p>We are big supporters and active contributors to efforts within the <a href="https://www.ietf.org/"><u>IETF</u></a> and <a href="https://manrs.org/"><u>infrastructure community</u></a> that strengthen routing security. We know firsthand how easy it is to cause a route leak accidentally, as evidenced by this incident. </p><p>Preventing route leaks will require a multi-faceted approach, but we have identified multiple areas in which we can improve, both short- and long-term.</p><p>In terms of our routing policy configurations and automation, we are:</p><ul><li><p>Patching the failure in our routing policy automation that caused the route leak, and will mitigate this potential failure and others like it immediately </p></li><li><p>Implementing additional BGP community-based safeguards in our routing policies that explicitly reject routes that were received from providers and peers on external export policies </p></li><li><p>Adding automatic routing policy evaluation into our <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/">CI/CD pipelines</a> that looks specifically for empty or erroneous policy terms </p></li><li><p>Improve early detection of issues with network configurations and the negative effects of an automated change</p></li></ul><p>To help prevent route leaks in general, we are: </p><ul><li><p>Validating routing equipment vendors' implementation of <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234</u></a> (BGP roles and the Only-to-Customer Attribute) in preparation for our rollout of the feature, which is the only way <i>independent of routing policy</i> to prevent route leaks caused at the <i>local</i> Autonomous System (AS)</p></li><li><p>Encouraging the long term adoption of RPKI <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>Autonomous System Provider Authorization (ASPA)</u></a>, where networks could automatically reject routes that contain anomalous AS paths</p></li></ul><p>Most importantly, we would again like to apologize for the impact we caused users and customers of Cloudflare, as well as any impact felt by external networks.</p><p></p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Post Mortem]]></category>
            <guid isPermaLink="false">1lDFdmcpnlPwczsEbswsTs</guid>
            <dc:creator>Bryton Herdes</dc:creator>
            <dc:creator>Tom Strickx</dc:creator>
        </item>
        <item>
            <title><![CDATA[A closer look at a BGP anomaly in Venezuela]]></title>
            <link>https://blog.cloudflare.com/bgp-route-leak-venezuela/</link>
            <pubDate>Tue, 06 Jan 2026 08:00:00 GMT</pubDate>
            <description><![CDATA[ There has been speculation about the cause of a BGP anomaly observed in Venezuela on January 2. We take a look at BGP route leaks, and dive into what the data suggests caused the anomaly in question. ]]></description>
            <content:encoded><![CDATA[ <p>As news unfolds surrounding the U.S. capture and arrest of Venezuelan leader Nicolás Maduro, a <a href="https://loworbitsecurity.com/radar/radar16/?cf_target_id=8EBD08FC8E3F122A23413E8273CF4AF3"><u>cybersecurity newsletter</u></a> examined <a href="https://radar.cloudflare.com/"><u>Cloudflare Radar</u></a> data and took note of a routing leak in Venezuela on January 2.</p><p>We dug into the data. Since the beginning of December there have been eleven route leak events, impacting multiple prefixes, where AS8048 is the leaker. Although it is impossible to determine definitively what happened on the day of the event, this pattern of route leaks suggests that the CANTV (AS8048) network, a popular Internet Service Provider (ISP) in Venezuela, has insufficient routing export and import policies. In other words, the BGP anomalies observed by the researcher could be tied to poor technical practices by the ISP rather than malfeasance.</p><p>In this post, we’ll briefly discuss Border Gateway Protocol (BGP) and BGP route leaks, and then dig into the anomaly observed and what may have happened to cause it. </p>
    <div>
      <h3>Background: BGP route leaks</h3>
      <a href="#background-bgp-route-leaks">
        
      </a>
    </div>
    <p>First, let’s revisit what a <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/"><u>BGP route leak</u></a> is. BGP route leaks cause behavior similar to taking the wrong exit off of a highway. While you may still make it to your destination, the path may be slower and come with delays you wouldn’t otherwise have traveling on a more direct route.</p><p>Route leaks were given a formal definition in <a href="https://datatracker.ietf.org/doc/html/rfc7908"><u>RFC7908</u></a> as “the propagation of routing announcement(s) beyond their intended scope.” Intended scope is defined using <a href="https://en.wikipedia.org/wiki/Pairwise"><u>pairwise</u></a> business relationships between networks. The relationships between networks, which in BGP we represent using <a href="https://en.wikipedia.org/wiki/Autonomous_system_(Internet)"><u>Autonomous Systems (ASes)</u></a>, can be one of the following: </p><ul><li><p>customer-provider: A customer pays a provider network to connect them and their own downstream customers to the rest of the Internet</p></li><li><p>peer-peer: Two networks decide to exchange traffic between one another, to each others’ customers, settlement-free (without payment)</p></li></ul><p>In a customer-provider relationship, the provider will announce <i>all routes to the customer</i>. The customer, on the other hand, will advertise <i>only the routes</i> from their own customers and originating from their network directly.</p><p>In a peer-peer relationship, each peer will advertise to one another <i>only their own routes and the routes of their downstream customers</i>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/16jXNbH5R5Q4evGm4oRY5p/08d0474000111923f37a7e53b809b5c2/BLOG-3107_2.png" />
          </figure><p>These advertisements help direct traffic in expected ways: from customers upstream to provider networks, potentially across a single peering link, and then potentially back down to customers on the far end of the path from their providers. </p><p>A valid path would look like the following that abides by the <a href="https://ieeexplore.ieee.org/document/6363987"><u>valley-free routing</u></a> rule: </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3qKtxTWTrGcpMm8u3nAjYT/dd19181418076c0e12b6035154639e75/BLOG-3107_3.png" />
          </figure><p>A <b>route leak</b> is a violation of valley-free routing where an AS takes routes from a provider or peer and redistributes them to another provider or peer. For example, a BGP path should never go through a “valley” where traffic goes up to a provider, and back down to a customer, and then up to a provider again. There are different types of route leaks defined in RFC7908, but a simple one is the Type 1: Hairpin route leak between two provider networks by a customer. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4XdHugzuQAVhucuninsUfB/43912258386debc0500e3ceb7c8abab2/BLOG-3107_4.png" />
          </figure><p>In the figure above, AS64505 takes routes from one of its providers and redistributes them to their other provider. This is unexpected, since we know providers should not use their customer as an intermediate IP transit network. AS64505 would become overwhelmed with traffic, as a smaller network with a smaller set of backbone and network links than its providers. This can become very impactful quickly. </p>
    <div>
      <h3>Route leak by AS8048 (CANTV)</h3>
      <a href="#route-leak-by-as8048-cantv">
        
      </a>
    </div>
    <p>Now that we have reminded ourselves what a route leak is in BGP, let’s examine what was hypothesized  in <a href="https://loworbitsecurity.com/radar/radar16/?cf_history_state=%7B%22guid%22%3A%22C255D9FF78CD46CDA4F76812EA68C350%22%2C%22historyId%22%3A106%2C%22targetId%22%3A%2251107FD345D9B86C319316904C23F966%22%7D"><u>the newsletter post</u></a>. The post called attention to a <a href="https://radar.cloudflare.com/routing/as8048#bgp-route-leaks"><u>few route leak anomalies</u></a> on Cloudflare Radar involving AS8048. On the <a href="https://radar.cloudflare.com/routing/anomalies/leak-462460"><u>Radar page</u></a> for this leak, we see this information:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7d1gGdOtSEvPxciyfNswhw/619965acdc1a7a4d3eafbd99b0ccb9f3/BLOG-3107_5.png" />
          </figure><p>We see the leaker AS, which is AS8048 — CANTV, Venezuela’s state-run telephone and Internet Service Provider. We observe that routes were taken from one of their providers AS6762 (Sparkle, an Italian telecom company) and then redistributed to AS52320 (V.tal GlobeNet, a Colombian network service provider). This is definitely a route leak. </p><p>The newsletter suggests “BGP shenanigans” and posits that such a leak could be exploited to collect intelligence useful to government entities. </p><p>While we can’t say with certainty what caused this route leak, our data suggests that its likely cause was more mundane. That’s in part because BGP route leaks happen all of the time, and they have always been part of the Internet — most often for reasons that aren’t malicious.</p><p>To understand more, let’s look closer at the impacted prefixes and networks. The prefixes involved in the leak were all originated by AS21980 (Dayco Telecom, a Venezuelan company):</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/42cmmWdskKqGw7bByd3tGs/c3841ffa205b241798593b91194d25b1/BLOG-3107_6.png" />
          </figure><p>The prefixes are also all members of the same 200.74.224.0/20 <a href="https://www.cloudflare.com/learning/network-layer/what-is-a-subnet/"><u>subnet</u></a>, as noted by the newsletter author. Much more intriguing than this, though, is the relationship between the originating network AS21980 and the leaking network AS8048: AS8048 is a <i>provider</i> of AS21980. </p><p>The customer-provider relationship between AS8048 and AS21980 is visible in both <a href="https://radar.cloudflare.com/routing/as21980#connectivity"><u>Cloudflare Radar</u></a> and <a href="https://bgp.tools/as/21980#upstreams"><u>bgp.tools</u></a> AS relationship interference data. We can also get a confidence score of the AS relationship using the monocle tool from <a href="https://bgpkit.com/"><u>BGPKIT</u></a>, as you see here: </p><p><code>➜  ~ monocle as2rel 8048 21980
Explanation:
- connected: % of 1813 peers that see this AS relationship
- peer: % where the relationship is peer-to-peer
- as1_upstream: % where ASN1 is the upstream (provider)
- as2_upstream: % where ASN2 is the upstream (provider)</code></p><p><code>Data source: https://data.bgpkit.com/as2rel/as2rel-latest.json.bz2</code></p><p><code>╭──────┬───────┬───────────┬──────┬──────────────┬──────────────╮
│ asn1 │ asn2  │ connected │ peer │ as1_upstream │ as2_upstream │
├──────┼───────┼───────────┼──────┼──────────────┼──────────────┤
│ 8048 │ 21980 │    9.9%   │ 0.6% │     9.4%     │ 0.0%         │
╰──────┴───────┴───────────┴──────┴──────────────┴──────────────╯</code></p><p>While only 9.9% of route collectors see these two ASes as adjacent, almost all of the paths containing them reflect AS8048 as an upstream provider for AS21980, meaning confidence is high in the provider-customer relationship between the two.</p><p>Many of the leaked routes were also heavily prepended with AS8048, meaning it would have been <a href="https://blog.cloudflare.com/prepends-considered-harmful/"><u>potentially</u></a> <i>less</i> attractive for routing when received by other networks. <b>Prepending</b> is the padding of an AS more than one time in an outbound advertisement by a customer or peer, to attempt to switch traffic away from a particular circuit to another. For example, many of the paths during the leak by AS8048 looked like this: “52320,8048,8048,8048,8048,8048,8048,8048,8048,8048,23520,1299,269832,21980”. </p><p>You can see that AS8048 has sent their AS multiple times in an advertisement to AS52320, because by means of BGP loop prevention the path would never actually travel in and out of AS8048 multiple times in a row. A non-prepended path would look like this: “52320,8048,23520,1299,269832,21980”. </p><p>If AS8048 was intentionally trying to become a <a href="https://en.wikipedia.org/wiki/Man-in-the-middle_attack"><u>man-in-the-middle (MITM)</u></a> for traffic, why would they make the BGP advertisement less attractive instead of <i>more </i>attractive? Also, why leak prefixes to try and MITM traffic when you’re <i>already</i> a provider for the downstream AS anyway? That wouldn’t make much sense. </p><p>The leaks from AS8048 also surfaced in multiple separate announcements, each around an hour apart on January 2, 2026 between 15:30 and 17:45 UTC, suggesting they may have been having network issues that surfaced in a routing policy issue or a convergence-based mishap. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4LrT6YC3j7V9p6ab0OYEmA/cebeba76857f7976d8dd4912371c1c43/BLOG-3107_7.png" />
          </figure><p>It is also noteworthy that these leak events begin over twelve hours prior to the <a href="https://www.nytimes.com/2026/01/03/world/americas/venezuela-maduro-capture-trump.html"><u>U.S. military strikes in Venezuela</u></a>. Leaks that impact South American networks <a href="https://radar.cloudflare.com/routing/br#routing-anomalies"><u>are common</u></a>, and we have no reason to believe, based on timing or the other factors I have discussed, that the leak is related to the capture of Maduro several hours later.</p><p>In fact, looking back the past two months, we can see plenty of leaks by AS8048 that are just like this one, meaning this is not a new BGP anomaly:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Md8BdHtP1GIDkyNT8PTbD/3915068dc6f47046665410b665e29853/BLOG-3107_8.png" />
          </figure><p>You can see above in the history of Cloudflare Radar’s route leak alerting pipeline that AS8048 is no stranger to Type 1 hairpin route leaks. Since the beginning of December alone there have been <b>eleven route leak events</b> where AS8048 is the leaker.</p><p>From this we can draw a more innocent possible explanation about the route leak: AS8048 may have configured too loose of export policies facing at least one of their providers, AS52320. And because of that, redistributed routes belong to their customer even when the direct customer BGP routes were missing. If their export policy toward AS52320 only matched on <a href="https://blog.cloudflare.com/monitoring-as-sets-and-why-they-matter/"><u>IRR-generated</u></a> prefix list and not a <i>customer</i> BGP <a href="https://datatracker.ietf.org/doc/html/rfc1997"><u>community</u></a> tag, for example, it would make sense why an indirect path toward AS6762 was leaked back upstream by AS8048. </p><p>These types of policy errors are something <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234</u></a> and the Only-to-Customer (OTC) attribute would help with considerably, by coupling BGP more tightly to customer-provider and peer-peer roles, when supported <a href="https://blog.apnic.net/2025/09/05/preventing-route-leaks-made-simple-bgp-roleplay-with-junos-rfc-9234/"><u>by all routing vendors</u></a>. I will save the more technical details on RFC9234 for a follow-up blog post.</p>
    <div>
      <h3>The difference between origin and path validation</h3>
      <a href="#the-difference-between-origin-and-path-validation">
        
      </a>
    </div>
    <p>The newsletter also calls out as “notable” that Sparkle (AS6762) does not implement <a href="https://rpki.cloudflare.com/"><u>RPKI (Resource Public Key Infrastructure)</u></a> Route Origin Validation (ROV). While it is true that AS6762 appears to have an <a href="https://stats.labs.apnic.net/rpki/AS6762?a=6762&amp;c=IT&amp;ll=1&amp;ss=0&amp;mm=1&amp;vv=1&amp;w=7&amp;v=0"><u>incomplete deployment</u></a> of ROV and is flagged as “unsafe” on <a href="http://isbgpsafeyet.com"><u>isbgpsafeyet.com</u></a> <a href="https://isbgpsafeyet.com/#faq"><u>because of it</u></a>, origin validation would not have prevented this BGP anomaly in Venezuela. </p><p>It is important to separate BGP anomalies into two categories: route misoriginations, and path-based anomalies. Knowing the difference between the two helps to understand the solution for each. Route misoriginations, often called BGP hijacks, are meant to be fixed by RPKI Route Origin Validation (ROV) by making sure the originator of a prefix is who rightfully owns it. In the case of the BGP anomaly described in this post, the origin AS was correct as AS21980 and <b>only</b> the path was anomalous. This means ROV wouldn’t help here.</p><p>Knowing that, we need path-based validation. This is what <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>Autonomous System Provider Authorization (ASPA)</u></a>, an upcoming draft standard in the IETF, is going to provide. The idea is similar to RPKI Route Origin Authorizations (ROAs) and ROV: create an ASPA object that defines a list of authorized providers (upstreams) for our AS, and everyone will use this to invalidate route leaks on the Internet at various vantage points. Using a concrete example, AS6762 is a <a href="https://en.wikipedia.org/wiki/Tier_1_network"><u>Tier-1</u></a> transit-free network, and they would use the special reserved “AS0” member in their ASPA signed object to communicate to the world that they have no upstream providers, only lateral peers and customers. Then, AS52320, the other provider of AS8048, would see routes from their customer with “6762” in the path and reject them by performing an ASPA verification process.</p><p>ASPA is based on RPKI and is exactly what would help prevent route leaks similar to the one we observed in Venezuela.</p>
    <div>
      <h3>A safer BGP, built together </h3>
      <a href="#a-safer-bgp-built-together">
        
      </a>
    </div>
    <p>We felt it was important to offer an alternative explanation for the BGP route leak by AS8048 in Venezuela that was observed on Cloudflare Radar. It is helpful to understand that route leaks are an expected side effect of BGP historically being based entirely on trust and carefully-executed business relationship-driven intent. </p><p>While route leaks could be done with malicious intent, the data suggests this event may have been an accident caused by a lack of routing export and import policies that would prevent it. This is why to have a safer BGP and Internet we need to work together and drive adoption of RPKI-based ASPA, for which <a href="https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/aspa/"><u>RIPE recently released object creation</u></a>, on the wide Internet. It will be a collaborative effort, just like RPKI has been for origin validation, but it will be worth it and prevent BGP incidents such as the one in Venezuela. </p><p>In addition to ASPA, we can all implement simpler mechanisms such as <a href="https://github.com/job/peerlock"><u>Peerlock</u></a> and <a href="https://archive.nanog.org/sites/default/files/Snijders_Everyday_Practical_Bgp.pdf"><u>Peerlock-lite</u></a> as operators, which sanity-checks received paths for obvious leaks. One especially promising initiative is the adoption of <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234</u></a>, which should be used in addition to ASPA for preventing route leaks with the establishing of BGP roles and a new Only-To-Customer (OTC) attribute. If you haven’t already asked your routing vendors for an implementation of RFC9234 to be on their roadmap: <i>please</i> <i>do</i>. You can help make a big difference.</p><p><i>Update: Sparkle (AS6762) finished RPKI ROV deployment and </i><a href="https://github.com/cloudflare/isbgpsafeyet.com/pull/829"><i><u>was marked safe</u></i></a><i> on February 3, 2026.</i></p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Routing]]></category>
            <category><![CDATA[Routing Security]]></category>
            <category><![CDATA[Network Services]]></category>
            <guid isPermaLink="false">4WOdNrtTGvlrQDV7Apw8R1</guid>
            <dc:creator>Bryton Herdes</dc:creator>
        </item>
        <item>
            <title><![CDATA[BGP zombies and excessive path hunting]]></title>
            <link>https://blog.cloudflare.com/going-bgp-zombie-hunting/</link>
            <pubDate>Fri, 31 Oct 2025 15:30:00 GMT</pubDate>
            <description><![CDATA[ A BGP “zombie” is essentially a route that has become stuck in the Default-Free Zone (DFZ) of the Internet, potentially due to a missed or lost prefix withdrawal. We’ll walk through some situations where BGP zombies are more likely to rise from the dead and wreak havoc.
 ]]></description>
            <content:encoded><![CDATA[ <p>Here at Cloudflare, we’ve been celebrating Halloween with some zombie hunting of our own. The zombies we’d like to remove are those that disrupt the core framework responsible for how the Internet routes traffic: <a href="http://cloudflare.com/learning/security/glossary/what-is-bgp/"><u>BGP (Border Gateway Protocol)</u></a>.</p><p>A <a href="https://dl.acm.org/doi/10.1145/3472305.3472315"><u>BGP zombie</u></a> is a silly name for a route that has become stuck in the Internet’s <a href="https://en.wikipedia.org/wiki/Default-free_zone"><u>Default-Free Zone</u></a>, aka the DFZ: the collection of all internet routers that do not require a default route, potentially due to a missed or lost prefix withdrawal.</p><p>The underlying root cause of a zombie could be multiple things, spanning from buggy software in routers or just general route processing slowness. It’s when a BGP prefix is meant to be gone from the Internet, but for one reason or another it becomes a member of the undead and hangs around for some period of time.</p><p>The longer these zombies linger, the more they create operational impact and become a real headache for network operators. Zombies can lead packets astray, either by trapping them inside of route loops or by causing them to take an excessively scenic route. Today, we’d like to celebrate Halloween by covering how BGP zombies form and how we can lessen the likelihood that they wreak havoc on Internet traffic.</p>
    <div>
      <h2>Path hunting</h2>
      <a href="#path-hunting">
        
      </a>
    </div>
    <p>To understand the slowness that can often lead to BGP zombies, we need to talk about path hunting. <a href="https://www.noction.com/blog/bgp-path-hunting"><u>Path hunting</u></a> occurs when routers running BGP exhaustively search for the best path to a prefix as determined by <a href="https://en.wikipedia.org/wiki/Longest_prefix_match"><u>Longest Prefix Matching</u></a> (LPM) and BGP routing attributes like path length and local preference. This becomes relevant in our observations of exactly how routes become stuck, for how long they become stuck, and how visible they are on the Internet.</p><p>For example, path hunting happens when a more-specific BGP prefix is withdrawn from the global routing table, and networks need to fallback to a less-specific BGP advertisement. In this example, we use 2001:db8::/48 for the more-specific BGP announcement and 2001:db8::/32 for the less-specific prefix. When the /48 is withdrawn by the originating <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>Autonomous System</u></a> (AS), BGP routers have to recognize that route as missing and begin routing traffic to IPs such as 2001:db8::1 via the 2001:db8::/32 route, which still remains while the prefix 2001:db8::/48 is gone. </p><p>Let’s see what this could look like in action with a few diagrams. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7xRNAHChJUyiMbtBZyLOlF/973d10be053b7b7f088721389c34c10e/BLOG-3059_2.png" />
          </figure><p><sub><i>Diagram 1: Active 2001:db8::/48 route</i></sub></p><p>In this initial state, 2001:db8::/48 is used actively for traffic forwarding, which all flows through AS13335 on the way to AS64511. In this case, AS64511 would be a BYOIP customer of Cloudflare. AS64511 also announces a <i>backup</i> route to another Internet Service Provider (ISP), AS64510, but this route is not active even in AS64510’s routing table for forwarding to 2001:db8::1 because 2001:db8::/48 is a longer prefix match when compared to 2001:db8::/32.</p><p>Things get more interesting when AS64510 signals for 2001:db8::/48 to be withdrawn by Cloudflare (AS13335), perhaps because a DDoS attack is over and the customer opts to use Cloudflare only when they are actively under attack.</p><p>When the customer signals to Cloudflare (via BGP Control or API call) to withdraw the 2001:db8::/48 announcement, all BGP routers have to <a href="https://en.wikipedia.org/wiki/Convergence_(routing)"><u>converge</u></a> upon this update, which involves path hunting. AS13335 sends a BGP withdrawal message for 2001:db8::/48 to its directly-connected BGP neighbors. While the news of withdrawal may travel quickly from AS13335 to the other networks, news may travel more quickly to some of the neighbors than others. This means that until everyone has received and processed the withdrawal, networks may try routing through one another to reach the 2001:db8::/48 prefix – even after AS13335 has withdrawn it. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7h3Vba4T7tm6XPB2pIyQex/f5f7c27148bed4dd72959b3820d045ac/BLOG-3059_3.png" />
          </figure><p><sub><i>Diagram 2: 2001:db8::/48 route withdrawn via AS13335</i></sub></p><p>Imagine AS64501 is a little slower than the rest – perhaps due to using older hardware, hardware being overloaded, a software bug, specific configuration settings, poor luck, or some other factor – and still has not processed the withdrawal of the /48. This in itself could be a BGP zombie, since the route is stuck for a small period. Our pings toward 2001:db8::1 are never able to actually reach AS64511, because AS13335 knows the /48 is meant to be withdrawn, but some routers carrying a full table have not yet converged upon that result.</p><p>The length of time spent path hunting is amplified by something called the Minimum Route Advertisement Interval (MRAI). The MRAI specifies the minimum amount of time between BGP advertisement messages from a BGP router, meaning it introduces a purposeful number of seconds of delay between each BGP advertisement update. <a href="https://datatracker.ietf.org/doc/html/rfc4271"><u>RFC4271</u></a> recommends an MRAI value of 30-seconds for eBGP updates, and while this can cut down on the chattiness of BGP, or even potential oscillation of updates, it also makes path hunting take longer. </p><p>At the next cycle of path hunting, even AS64501, which was previously still pointing toward a nonexistent /48 route from AS13335, should find the /32 advertisement is all that is left toward 2001:db8::1. Once it has done so, the traffic flow will become the following: </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5sCGMS95R8y32WTjnUigfN/1e5c9a7551c572a08596985edac5c17b/BLOG-3059_4.png" />
          </figure><p><sub><i>Diagram 3: Routing fallback to 2001:db8::/32 and 2001:db8::/48 is gone from DFZ</i></sub></p><p>This would mean BGP path hunting is over, and the Internet has realized that the 2001:db8::/32 is the best route available toward 2001:db8::1, and that 2001:db8::/48 is really gone. While in this example we’ve purposely made path hunting only last two cycles, in reality it can be far more, especially with how highly connected AS13335 is to thousands of peer networks and <a href="https://en.wikipedia.org/wiki/Tier_1_network"><u>Tier-1</u></a>’s globally. </p><p>Now that we’ve discussed BGP path hunting and how it works, you can probably already see how a BGP zombie outbreak can begin and how routing tables can become stuck for a lengthy period of time. Excessive BGP path hunting for a previously-known more-specific prefix can be an early indicator that a zombie could follow.</p>
    <div>
      <h2>Spawning a zombie</h2>
      <a href="#spawning-a-zombie">
        
      </a>
    </div>
    <p>Zombies have captured our attention more recently as they were noticed by some of our customers leveraging <a href="https://developers.cloudflare.com/byoip/"><u>Bring-Your-Own-IP (BYOIP)</u></a> on-demand advertisement for <a href="https://www.cloudflare.com/en-gb/network-services/products/magic-transit/"><u>Magic Transit</u></a>. BYOIP may be configured in two modes: "always-on", in which a prefix is continuously announced, or "on-demand", where a prefix is announced only when a customer chooses to. For some on-demand customers, announcement and withdrawal cycles <i>may</i> be a more frequent occurrence, which can lead to an increase in BGP zombies.</p><p>With that in mind and also knowing how path hunting works, let’s spawn our own zombie onto the Internet. To do so, we’ll take a spare block of IPv4 and IPv6 and announce them like so:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/20shWBMhqLR3tBMh50v7Uy/bf40e90c2f6a506a5bcfc9bafd1e31d2/BLOG-3059_5.png" />
          </figure><p>Once the routes are announced and stable, we’ll then proceed to withdraw the more specific routes advertised via Cloudflare globally. With a few quick clicks, we’ve successfully re-animated the dead.</p><p><i>Variant A: Ghoulish Gateways</i></p><p>One place zombies commonly occur is between upstream ISPs. When one router in a given ISP’s network is a little slower to update, routes can become stuck. </p><p>Take, for example, the following loop we observed between two of our upstream partners:</p>
            <pre><code>7. be2431.ccr31.sjc04.atlas.cogentco.com
8. tisparkle.sjc04.atlas.cogentco.com
9. 213.144.177.184
10. 213.144.177.184
11. 89.221.32.227
12. (waiting for reply)
13. be2749.rcr71.goa01.atlas.cogentco.com
14. be3219.ccr31.mrs02.atlas.cogentco.com
15. be2066.agr21.mrs02.atlas.cogentco.com
16. telecomitalia.mrs02.atlas.cogentco.com
17. 213.144.177.186
18. 89.221.32.227</code></pre>
            <p></p><p>Or this loop - observed on the same withdrawal test - between two different providers:  </p>
            <pre><code>15. if-bundle-12-2.qcore2.pvu-paris.as6453.net
16. if-bundle-56-2.qcore1.fr0-frankfurt.as6453.net
17. if-bundle-15-2.qhar1.fr0-frankfurt.as6453.net
18. 195.219.223.11
19. 213.144.177.186
20. 195.22.196.137
21. 213.144.177.186
22. 195.22.196.137</code></pre>
            <p><i></i></p><p><i>Variant B: Undead LAN (Local Area Network)</i></p><p>Simultaneously, zombies can occur entirely within a given network. When a route is withdrawn from Cloudflare’s network, each device in our network must individually begin the process of withdrawing the route. While this is generally a smooth process, things can still become stuck.</p><p>Take, for instance, a situation where one router inside of our network has not yet fully processed the withdrawal. Connectivity partners will continue routing traffic towards that router (as they have not yet received the withdrawal) while no host remains behind the router which is capable of actually processing the traffic. The result is an internal-only looping path:</p>
            <pre><code>10. 192.0.2.112
11. 192.0.2.113
12. 192.0.2.112
13. 192.0.2.113
14. 192.0.2.112
15. 192.0.2.113
16. 192.0.2.112
17. 192.0.2.113
18. 192.0.2.112
19. 192.0.2.113
</code></pre>
            <p></p><p>Unlike most fictionally-depicted hoards of the walking dead, our highly-visible zombie has a limited lifetime in most major networks – in this instance, only around around 6 minutes, after which most had re-converged around the less-specific as the best path. Sadly, this is on the shorter side – in some cases, we have seen long-lived zombies cause reachability issues for more than 10 minutes. It’s safe to say this is longer than most network operators would expect BGP convergence to take in a normal situation. </p><p>But, you may ask – is this the excessive path hunting we talked about earlier, or a BGP zombie? Really, it depends on the expectation and tolerance around <a href="https://dl.acm.org/doi/10.1145/3472305.3472315"><u>how long BGP convergence</u></a> should take to process the prefix withdrawal. In any case, even over 30 minutes after our withdrawal of our more-specific prefix, we are able to see zombie routes in the route-views public collectors easily:</p>
            <pre><code>~ % monocle search --start-ts 2025-10-28T12:40:13Z --end-ts 2025-10-28T13:00:13Z --prefix 198.18.0.0/24
A|1761656125.550447|206.82.105.116|54309|198.18.0.0/24|54309 13335 395747|IGP|206.82.104.31|0|0|54309:111|false|||route-views.ny

</code></pre>
            <p></p><p>You might argue that six to eleven minutes (or more) is a reasonable time for worst-case BGP convergence in the Tier-1 network layer, though that itself seems like a stretch. Even setting that aside, our data shows that very real BGP zombies exist in the global routing table, and they will negatively impact traffic. Curiously, we observed the path hunting delay is worse on IPv4, with the longest observed IPv6 impact in major (Tier-1) networks being just over 4 minutes. One could speculate this is in part due to the <a href="https://bgp.potaroo.net/index-bgp.html"><u>much higher number</u></a> of IPv4 prefixes in the Internet global routing table than the IPv6 global table, and how BGP speakers handle them separately.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4WOQBZb7MV2a5PA84xm6Be/28e252e5212781ae2d477150692605db/25x_10fps_a.gif" />
          </figure><p><sub><i>Source: RIPEstat’s BGPlay</i></sub></p><p>Part of the delay appears to originate from how interconnected AS13335 is; being heavily peered with a large portion of the Internet increases the likelihood of a route becoming stuck in a given location. Given that, perhaps a zombie would be shorter-lived if we operated in the opposite direction: announcing a less-specific persistently to 13335 and announcing more specifics via our local ISP during normal operation. Since the withdrawal will come from what is likely a less well-peered network, the time-to-convergence may be shorter:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4O7r3Nffpbus6ht3Eo6eiF/a7d577042e43c9b4da988cf9bd29f6fe/BLOG-3059_7.png" />
          </figure><p>Indeed, as predicted, we still get a stuck route, and it only lives for around 20 seconds in the Tier-1 network layer:</p>
            <pre><code>19. be12488.ccr42.ams03.atlas.cogentco.com
20. 38.88.214.142
21. be2020.ccr41.ams03.atlas.cogentco.com
22. 38.88.214.142
23. (waiting for reply)
24. 38.88.214.142
25. (waiting for reply)
26. 38.88.214.142
</code></pre>
            <p></p><p>Unfortunately, that 20 seconds is still an impactful 20 seconds - while better, it’s not where we want to be. The exact length of time will depend on the native ISP networks one is connected with, and it could certainly ease into the minutes worth of stuck routing. </p><p>In both cases, the initial time-to-announce yielded no loss, nor was a zombie created, as both paths remained valid for the entirety of their initial lifetime. Zombies were only created when a more specific prefix was fully withdrawn. A newly-announced route is not subject to path hunting in the same way a withdrawn more-specific route is. As they say, good (new) news travels fast.</p>
    <div>
      <h2>Lessening the zombie outbreak</h2>
      <a href="#lessening-the-zombie-outbreak">
        
      </a>
    </div>
    <p>Our findings lead us to believe that the withdrawal of a more-specific prefix may lead to zombies running rampant for longer periods of time. Because of this, we are exploring some improvements that make the consequences of BGP zombie routing less impactful for our customers relying on our on-demand BGP functionality.</p><p>For the traffic that <b>does</b> reach Cloudflare with stuck routes, we will introduce some BGP traffic forwarding improvements internally that allow for a more graceful withdrawal of traffic, even if routes are erroneously pointing toward us. In many ways, this will closely resemble the BGP <a href="https://www.rfc-editor.org/rfc/rfc1997.html"><u>well-known no-export</u></a> community’s functionality from our servers running BGP. This means even if we receive traffic from external parties due to stuck routing, we will still have the opportunity to deliver traffic to our far-end customers over a tunneled connection or via a <a href="https://www.cloudflare.com/network-services/products/network-interconnect/"><u>Cloudflare Network Interconnect</u></a> (CNI). We look forward to reporting back the positive impact after making this improvement for a more graceful draining of traffic by default. </p><p>For the traffic that <b>does not</b> reach Cloudflare’s edge, and instead loops between network providers, we need to use a different approach. Since we know more-specific to less-specific prefix routing fallback is more prone to BGP zombie outbreak, we are encouraging customers to instead use a multi-step draining process when they want traffic drained from the Cloudflare edge for an on-demand prefix without introducing route loops or blackhole events. The draining process when removing traffic for a BYOIP prefix from Cloudflare should look like this: </p><ol><li><p>The customer is already announcing an example prefix from Cloudflare, ex. 198.18.0.0/24</p></li><li><p>The customer begins <i>natively </i>announcing the prefix 198.18.0.0/24 (i.e. the same-length as the prefix they are advertising via Cloudflare) from their network to the Internet Service Providers that they wish to fail over traffic to.</p></li><li><p>After a few minutes, the customer signals BGP withdrawal from Cloudflare for the 198.18.0.0/24 prefix.</p></li></ol><p>The result is a clean cut over: impactful zombies are avoided because the same-length prefix (198.18.0.0/24) remains in the global routing table. Excessive path hunting is avoided because instead of routers needing to aggressively seek out a missing more-specific prefix match, they can fallback to the same-length announcement that persists in the routing table from the natively-originated path to the customer’s network.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/KmRvPsGUOp5PsNXcKCE1F/78f2c29c8c278d158972114df875ad0c/25x_10fps_b.gif" />
          </figure><p><sub><i>Source: RIPEstat’s BGPlay</i></sub></p>
    <div>
      <h2>What next?</h2>
      <a href="#what-next">
        
      </a>
    </div>
    <p>We are going to continue to refine our methods of measuring BGP zombies, so you can look forward to more insights in the future. There is also <a href="https://www.thousandeyes.com/bgp-stuck-route-observatory/"><u>work from others</u></a> in the <a href="https://blog.benjojo.co.uk/post/bgp-stuck-routes-tcp-zero-window"><u>community</u></a> around zombie measurement that is interesting and producing useful data. In terms of combatting the software bugs around BGP zombie creation, routing vendors should implement <a href="https://datatracker.ietf.org/doc/html/rfc9687"><u>RFC9687</u></a>, the BGP SendHoldTimer. The general idea is that a local router can detect via the SendHoldTimer if the far-end router stops processing BGP messages unexpectedly, which lowers the possibility of zombies becoming stuck for long periods of time. </p><p>In addition, it’s worth keeping in mind our observations made in this post about more-specific prefix announcements and excessive path hunting. If as a network operator you rely on more-specific BGP prefix announcements for failover, or for traffic engineering, you need to be aware that routes could become stuck for a longer period of time before full BGP convergence occurs.</p><p>If you’re interested in problems like BGP zombies, consider <a href="https://www.cloudflare.com/en-gb/careers/jobs/?location=default"><u>coming to work</u></a> at Cloudflare or applying for an <a href="https://www.cloudflare.com/en-gb/careers/early-talent/"><u>internship</u></a>. Together we can help build a better Internet!  </p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Routing]]></category>
            <category><![CDATA[Network]]></category>
            <category><![CDATA[BYOIP]]></category>
            <guid isPermaLink="false">6Qk6krBb9GkFrf67N6NhyW</guid>
            <dc:creator>Bryton Herdes</dc:creator>
            <dc:creator>June Slater</dc:creator>
            <dc:creator>Mingwei Zhang</dc:creator>
        </item>
        <item>
            <title><![CDATA[Monitoring AS-SETs and why they matter]]></title>
            <link>https://blog.cloudflare.com/monitoring-as-sets-and-why-they-matter/</link>
            <pubDate>Fri, 26 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ We will cover some of the reasons why operators need to monitor the AS-SET memberships for their ASN, and now Cloudflare Radar can help.  ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Introduction to AS-SETs</h2>
      <a href="#introduction-to-as-sets">
        
      </a>
    </div>
    <p>An <a href="https://www.apnic.net/manage-ip/using-whois/guide/as-set/"><u>AS-SET</u></a>, not to be confused with the <a href="https://datatracker.ietf.org/doc/rfc9774/"><u>recently deprecated BGP AS_SET</u></a>, is an <a href="https://irr.net/overview/"><u>Internet Routing Registry (IRR)</u></a> object that allows network operators to group related networks together. AS-SETs have been used historically for multiple purposes such as grouping together a list of downstream customers of a particular network provider. For example, Cloudflare uses the <a href="https://irrexplorer.nlnog.net/as-set/AS13335:AS-CLOUDFLARE"><u>AS13335:AS-CLOUDFLARE</u></a> AS-SET to group together our list of our own <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>Autonomous System Numbers</u></a> (ASNs) and our downstream Bring-Your-Own-IP (BYOIP) customer networks, so we can ultimately <a href="https://www.peeringdb.com/net/4224"><u>communicate</u></a> to other networks whose prefixes they should accept from us. </p><p>In other words, an AS-SET is <i>currently</i> the way on the Internet that allows someone to attest the networks for which they are the provider. This system of provider authorization is completely trust-based, meaning it's <a href="https://www.kentik.com/blog/the-scourge-of-excessive-as-sets/"><u>not reliable at all</u></a>, and is best-effort. The future of an RPKI-based provider authorization system is <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>coming in the form of ASPA (Autonomous System Provider Authorization),</u></a> but it will take time for standardization and adoption. Until then, we are left with AS-SETs.</p><p>Because AS-SETs are so critical for BGP routing on the Internet, network operators need to be able to monitor valid and invalid AS-SET <i>memberships </i>for their networks. Cloudflare Radar now introduces a transparent, public listing to help network operators in our <a href="https://radar.cloudflare.com/routing/as13335"><u>routing page</u></a> per ASN.</p>
    <div>
      <h2>AS-SETs and building BGP route filters</h2>
      <a href="#as-sets-and-building-bgp-route-filters">
        
      </a>
    </div>
    <p>AS-SETs are a critical component of BGP policies, and often paired with the expressive <a href="https://irr.net/rpsl-guide/"><u>Routing Policy Specification Language (RPSL)</u></a> that describes how a particular BGP ASN accepts and propagates routes to other networks. Most often, networks use AS-SET to express what other networks should accept from them, in terms of downstream customers. </p><p>Back to the AS13335:AS-CLOUDFLARE example AS-SET, this is published clearly on <a href="https://www.peeringdb.com/net/4224"><u>PeeringDB</u></a> for other peering networks to reference and build filters against. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2590TMppv2h4SAi7uy6xS9/617ec81e2364f470c0efe243a528f695/image6.png" />
          </figure><p>When turning up a new transit provider service, we also ask the provider networks to build their route filters using the same AS-SET. Because BGP prefixes are also created in IRR <a href="https://irr.net/registry/"><u>registries</u></a> using the <i>route</i> or <i>route6 </i><a href="https://developers.cloudflare.com/byoip/concepts/irr-entries/best-practices/"><u>objects</u></a>, peers and providers now know what BGP prefixes they should accept from us and deny the rest. A popular tool for building prefix-lists based on AS-SETs and IRR databases is <a href="https://github.com/bgp/bgpq4"><u>bgpq4</u></a>, and it’s one you can easily try out yourself. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7F2QdhcZTLEJjKNtZbBWxR/92efe32dcef67aa6d51c3b1a29218843/image3.png" />
          </figure><p>For example, to generate a Juniper router’s IPv4 prefix-list containing prefixes that AS13335 could propagate for Cloudflare and its customers, you may use: </p>
            <pre><code>% bgpq4 -4Jl CLOUDFLARE-PREFIXES -m24 AS13335:AS-CLOUDFLARE | head -n 10
policy-options {
replace:
 prefix-list CLOUDFLARE-PREFIXES {
    1.0.0.0/24;
    1.0.4.0/22;
    1.1.1.0/24;
    1.1.2.0/24;
    1.178.32.0/19;
    1.178.32.0/20;
    1.178.48.0/20;</code></pre>
            <p><sup><i>Restricted to 10 lines, actual output of prefix-list would be much greater</i></sup></p><p>This prefix list would be applied within an eBGP import policy by our providers and peers to make sure AS13335 is only able to propagate announcements for ourselves and our customers.</p>
    <div>
      <h2>How accurate AS-SETs prevent route leaks</h2>
      <a href="#how-accurate-as-sets-prevent-route-leaks">
        
      </a>
    </div>
    <p>Let’s see how accurate AS-SETs can help prevent route leaks with a simple example. In this example, AS64502 has two providers – AS64501 and AS64503. AS64502 has accidentally messed up their BGP export policy configuration toward the AS64503 neighbor, and is exporting <b>all</b> routes, including those it receives from their AS64501 provider. This is a typical <a href="https://datatracker.ietf.org/doc/html/rfc7908#section-3.1"><u>Type 1 Hairpin route leak</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/D69Fq0jXg9MaGieS0KqZ2/42fa33a433c875591b85ce9a6db91610/image5.png" />
          </figure><p>Fortunately, AS64503 has implemented an import policy that they generated using IRR data including AS-SETs and route objects. By doing so, they will only accept the prefixes that originate from the <a href="https://www.manrs.org/wp-content/uploads/2021/11/AS-Cones-MANRS.pdf"><u>AS Cone</u></a> of AS64502, since they are their customer. Instead of having a major reachability or latency impact for many prefixes on the Internet because of this route leak propagating, it is stopped in its tracks thanks to the responsible filtering by the AS64503 provider network. Again it is worth keeping in mind the success of this strategy is dependent upon data accuracy for the fictional AS64502:AS-CUSTOMERS AS-SET.</p>
    <div>
      <h2>Monitoring AS-SET misuse</h2>
      <a href="#monitoring-as-set-misuse">
        
      </a>
    </div>
    <p>Besides using AS-SETs to group together one’s downstream customers, AS-SETs can also represent other types of relationships, such as peers, transits, or IXP participations.</p><p>For example, there are 76 AS-SETs that directly include one of the Tier-1 networks, Telecom Italia / Sparkle (AS6762). Judging from the names of the AS-SETs, most of them are representing peers and transits of certain ASNs, which includes AS6762. You can view this output yourself at <a href="https://radar.cloudflare.com/routing/as6762#irr-as-sets"><u>https://radar.cloudflare.com/routing/as6762#irr-as-sets</u></a></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/eeAA6iWaAVd6qd2rB93VM/ff37a27156f8229639a6ec377c7eb273/image7.png" />
          </figure><p>There is nothing wrong with defining AS-SETs that contain one’s peers or upstreams as long as those AS-SETs are not submitted upstream for customer-&gt;provider BGP session filtering. In fact, an AS-SET for upstreams or peer-to-peer relationships can be useful for defining a network’s policies in RPSL.</p><p>However, some AS-SETs in the AS6762 membership list such as AS-10099 look to attest customer relationships. </p>
            <pre><code>% whois -h rr.ntt.net AS-10099 | grep "descr"
descr:          CUHK Customer</code></pre>
            <p>We know AS6762 is transit free and this customer membership must be invalid, so it is a prime example of AS-SET misuse that would ideally be cleaned up. Many Internet Service Providers and network operators are more than happy to correct an invalid AS-SET entry when asked to. It is reasonable to look at each AS-SET membership like this as a potential risk of having higher route leak propagation to major networks and the Internet when they happen.</p>
    <div>
      <h2>AS-SET information on Cloudflare Radar</h2>
      <a href="#as-set-information-on-cloudflare-radar">
        
      </a>
    </div>
    <p><a href="https://radar.cloudflare.com/"><u>Cloudflare Radar</u></a> is a hub that showcases global Internet traffic, attack, and technology trends and insights. Today, we are adding IRR AS-SET information to Radar’s routing section, freely available to the public via both website and API access. To view all AS-SETs an AS is a member of, directly or indirectly via other AS-SETs, a user can visit the corresponding AS’s routing page. For example, the AS-SETs list for Cloudflare (AS13335) is available at <a href="https://radar.cloudflare.com/routing/as13335#irr-as-sets"><u>https://radar.cloudflare.com/routing/as13335#irr-as-sets</u></a></p><p>The AS-SET data on IRR contains only limited information like the AS members and AS-SET members. Here at Radar, we also enhance the AS-SET table with additional useful information as follows.</p><ul><li><p><code>Inferred ASN</code> shows the AS number that is inferred to be the creator of the AS-SET. We use PeeringDB AS-SET information match if available. Otherwise, we parse the AS-SET name to infer the creator.</p></li><li><p><code>IRR Sources</code> shows which IRR databases we see the corresponding AS-SET. We are currently using the following databases: <code>AFRINIC</code>, <code>APNIC</code>, <code>ARIN</code>, <code>LACNIC</code>, <code>RIPE</code>, <code>RADB</code>, <code>ALTDB</code>, <code>NTTCOM</code>, and <code>TC</code>.</p></li><li><p><code>AS Members</code> and <code>AS-SET members</code> show the count of the corresponding types of members.</p></li><li><p><code>AS Cone</code> is the count of the unique ASNs that are included by the AS-SET directly or indirectly.</p></li><li><p><code>Upstreams</code> is the count of unique AS-SETs that includes the corresponding AS-SET.</p></li></ul><p>Users can further filter the table by searching for a specific AS-SET name or ASN. A toggle to show only direct or indirect AS-SETs is also available.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/0ssTf7bi6yjT2m0YKWPJE/e20b18a7d3151652fecbe606bbe13346/image1.png" />
          </figure><p>In addition to listing AS-SETs, we also provide a tree-view to display how an AS-SET includes a given ASN. For example, the following screenshot shows how as-delta indirectly includes AS6762 through 7 additional other AS-SETs. Users can copy or download this tree-view content in the text format, making it easy to share with others.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2hNbh2gdj2F0eLTYrzjrVN/eceb588456067a387e7cb6eb3e1e3c5e/image4.png" />
          </figure><p>We built this Radar feature using our<a href="https://developers.cloudflare.com/api/resources/radar/subresources/entities/subresources/asns/methods/as_set/"><u> publicly available API</u></a>, the same way other Radar websites are built. We have also experimented using this API to build additional features like a full AS-SET tree visualization. We encourage developers to give <a href="https://developers.cloudflare.com/api/resources/radar/subresources/entities/subresources/asns/methods/as_set/"><u>this API</u></a> (and <a href="https://developers.cloudflare.com/api/resources/radar/"><u>other Radar APIs</u></a>) a try, and tell us what you think!</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ElaU3M5oe8xRnblrrf67u/3fa35d3a25d797c0b0cbe96f0490fa93/image8.png" />
          </figure>
    <div>
      <h2>Looking ahead</h2>
      <a href="#looking-ahead">
        
      </a>
    </div>
    <p>We know AS-SETs are hard to keep clean of error or misuse, and even though Radar is making them easier to monitor, the mistakes and misuse will continue. Because of this, we as a community need to push forth adoption of <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234</u></a> and <a href="https://blog.apnic.net/2025/09/05/preventing-route-leaks-made-simple-bgp-roleplay-with-junos-rfc-9234/"><u>implementations</u></a> of it from the major vendors. RFC9234 embeds roles and an Only-To-Customer (OTC) attribute directly into the BGP protocol itself, helping to detect and prevent route leaks in-line. In addition to BGP misconfiguration protection with RFC9234, Autonomous System Provider Authorization (ASPA) is still making its way <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/"><u>through the IETF</u></a> and will eventually help offer an authoritative means of attesting who the actual providers are per BGP Autonomous System (AS).</p><p>If you are a network operator and manage an AS-SET, you should seriously consider moving to <a href="https://manrs.org/2022/12/why-network-operators-should-use-hierarchical-as-sets/"><u>hierarchical AS-SETs</u></a> if you have not already. A hierarchical AS-SET looks like AS13335:AS-CLOUDFLARE instead of AS-CLOUDFLARE, but the difference is very important. Only a proper maintainer of the AS13335 ASN can create AS13335:AS-CLOUDFLARE, whereas anyone could create AS-CLOUDFLARE in an IRR database if they wanted to. In other words, using hierarchical AS-SETs helps guarantee ownership and prevent the malicious poisoning of routing information.</p><p>While keeping track of AS-SET memberships seems like a chore, it can have significant payoffs in preventing BGP-related <a href="https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024/"><u>incidents</u></a> such as route leaks. We encourage all network operators to do their part in making sure the AS-SETs you submit to your providers and peers to communicate your downstream customer cone are accurate. Every small adjustment or clean-up effort in AS-SETs could help lessen the impact of a BGP incident later.</p><p>Visit <a href="https://radar.cloudflare.com/"><u>Cloudflare Radar</u></a> for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at <a href="https://twitter.com/CloudflareRadar"><u>@CloudflareRadar</u></a> (X), <a href="https://noc.social/@cloudflareradar"><u>https://noc.social/@cloudflareradar</u></a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com"><u>radar.cloudflare.com</u></a> (Bluesky), or contact us via <a><u>e-mail</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Radar]]></category>
            <guid isPermaLink="false">6QVNgwE5ZlVbZcWQHJKsDS</guid>
            <dc:creator>Mingwei Zhang</dc:creator>
            <dc:creator>Bryton Herdes</dc:creator>
        </item>
        <item>
            <title><![CDATA[Bringing connections into view: real-time BGP route visibility on Cloudflare Radar]]></title>
            <link>https://blog.cloudflare.com/bringing-connections-into-view-real-time-bgp-route-visibility-on-cloudflare/</link>
            <pubDate>Wed, 21 May 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ Real-time BGP route visualization is now available on Cloudflare Radar, providing immediate insights into global Internet routing. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The Internet relies on the <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/"><u>Border Gateway Protocol (BGP)</u></a> to exchange IP address reachability information. This information outlines the path a sender or router can use to reach a specific destination. These paths, conveyed in BGP messages, are sequences of <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>Autonomous System Numbers (ASNs)</u></a>, with each ASN representing an organization that operates its own segment of Internet infrastructure.</p><p>Throughout this blog post, we'll use the terms "BGP routes" or simply "routes" to refer to these paths. In essence, BGP functions by enabling autonomous systems to exchange routes to IP address blocks (“IP prefixes”), allowing different entities across the Internet to construct their routing tables.</p><p>When network operators debug reachability issues or assess a resource's global reach, BGP routes are often the first thing they examine. Therefore, it’s critical to have an up-to-date view of the routes toward the IP prefixes of interest. Some networks provide tools called "looking glasses" — public routing information services offering data directly from their own BGP routers. These allow external operators to examine routes from that specific network's perspective. Furthermore, services like <a href="https://bgp.tools/"><u>bgp.tools</u></a>, <a href="http://bgp.he.net"><u>bgp.he.net</u></a>, <a href="https://lg.routeviews.org/lg/"><u>RouteViews</u></a>, or the <a href="https://lg.ring.nlnog.net/"><u>NLNOG RING looking glass</u></a> offer aggregated, looking glass-like lookup capabilities, drawing on data sources from multiple organizations rather than just a single one.</p><p>However, individual looking glass instances offer a limited scope, typically restricted to the infrastructure of the service provider's network. While aggregated routing information services provide broader vantage points, they often lack the API access necessary for building automated tools on top of them. For example, systems designed for automated tasks, such as BGP <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/"><u>leak</u></a> or <a href="https://blog.cloudflare.com/bgp-hijack-detection/"><u>hijack</u></a> detection, depend on programmatic API access.</p><p>We're excited to introduce Cloudflare Radar's new real-time BGP route lookup service, described below. Built using <a href="#architecture-overview"><u>public data sources</u></a>, this service provides visualizations of real-time routes directly on the corresponding IP prefix pages within Radar (see the page for <a href="https://radar.cloudflare.com/routing/prefix/1.1.1.0/24"><u>1.1.1.0/24</u></a> as an example). We are also offering <a href="https://developers.cloudflare.com/api/resources/radar/subresources/bgp/subresources/routes/methods/pfx2as/"><u>API access</u></a> through our free-to-use Cloudflare Radar API, empowering developers to leverage this data to build their own innovative systems and tools.</p>
    <div>
      <h2>Cloudflare Radar provides real-time routes</h2>
      <a href="#cloudflare-radar-provides-real-time-routes">
        
      </a>
    </div>
    <p>We are excited to announce the launch of our new real-time BGP route lookup service, now accessible through both Cloudflare Radar web interface and the Cloudflare Radar API. This enhancement provides users with a near instantaneous view into global BGP routing data.</p>
    <div>
      <h3>Cloudflare Radar prefix pages</h3>
      <a href="#cloudflare-radar-prefix-pages">
        
      </a>
    </div>
    <p>Cloudflare Radar's real-time routes feature now offers a <a href="https://en.wikipedia.org/wiki/Sankey_diagram"><u>Sankey diagram</u></a> illustrating the BGP routes for a given prefix. To minimize visual complexity, the visualization displays routes directed towards the <a href="https://en.wikipedia.org/wiki/Tier_1_network"><u>Tier 1 networks</u></a>. For example, the diagram below shows that 1.1.1.0/24 is announced by AS13335 (Cloudflare) and that Cloudflare has direct connections to almost all U.S.-based and international Tier 1 network providers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6GjZl5vG4YNF1ltVFy8F2E/a406afdd00cbaa4689cace22f08d4cc9/image9.png" />
          </figure><p>Expanding on this more concise view, users also have the option to 'Show full paths' and visualize every BGP route from the prefix of interest to the collectors. (The role of the collectors in gathering this data is <a href="#architecture-overview"><u>discussed below</u></a>.) The interactive view allows panning and zooming, and hovering over the links provides tooltip information on which collector saw the route and when it was last updated.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2XNEaV0TlWbM3xvHnAulY7/bbee5bbb48a38011558909df3eef598f/image3.png" />
          </figure><p>For both views, the prefix origin table is displayed above the route’s visualization. The table shows the originating <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>Autonomous System (AS)</u></a>, the visibility percentage (representing the proportion of route collectors observing the origin ASN announcement), and <a href="https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure"><u>RPKI validation</u></a> outcomes.</p><p>During a recently detected BGP misconfiguration, we saw two origin ASNs for a prefix, with AS3 incorrectly used instead of the intended origin <a href="https://blog.cloudflare.com/prepends-considered-harmful/#bgp-best-path-selection"><u>being prepended</u></a> three times. The visualization reveals AS3 as RPKI invalid with low visibility, indicating limited network acceptance. Operators can analyze these issues visually or in the table and monitor real-time corrections by refreshing the page.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1R29X594SgfZYwZlc5XldH/7ba3f4a312b0da95d5332c2fb8d9a8b4/image2.png" />
          </figure><p>Whether facing network outages, implementing new deployments, or investigating route leaks, users can leverage this feature for any scenario where a clear, global understanding of a prefix's routing paths is essential.</p><p>To allow easier access to this information, users can now search for any prefix using the Radar search bar and navigate to the corresponding prefix routing pages. Prefixes involved in BGP <a href="https://radar.cloudflare.com/routing#bgp-route-leaks"><u>route leak</u></a> and <a href="https://radar.cloudflare.com/routing#bgp-origin-hijacks"><u>origin hijack</u></a> events are also linked to this enhanced routing information page, helping operators debug BGP anomalies in real-time.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/54k7J98bl8zS8K8HZViVT5/736499be7c2c8704a1f71d83a1ebcef5/image10.png" />
          </figure>
    <div>
      <h3>Cloudflare Routes API</h3>
      <a href="#cloudflare-routes-api">
        
      </a>
    </div>
    <p>Cloudflare Radar real-time route data is also accessible <a href="https://developers.cloudflare.com/api/resources/radar/subresources/bgp/subresources/routes/methods/pfx2as/"><u>via the Radar API</u></a>, and users can follow <a href="https://developers.cloudflare.com/radar/get-started/first-request/"><u>this guide</u></a> to get started.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5sfEKbK9HxSc343CPnMXt0/4a68548bf0fcf9617cba9ff5fa38a64b/image4.png" />
          </figure><p>The following example shows an HTTP <code>GET</code> request to query all the current routes for a prefix of interest:</p>
            <pre><code>curl -X GET 
"https://api.cloudflare.com/client/v4/radar/bgp/routes/realtime?prefix=1.1.1.0/24" -H 
"Authorization: Bearer &lt;API_TOKEN&gt;"</code></pre>
            <p>With the help of JSON data processing tools like <a href="https://jqlang.org/"><u>jq</u></a>, users can further filter data results by routes containing a certain ASN. In the following example, we make a request to ask for all current routes toward the prefix <code>1.1.1.0/24</code> and filter all routes with AS paths containing AS174:</p>
            <pre><code>curl -X GET
 "https://api.cloudflare.com/client/v4/radar/bgp/routes/realtime?prefix=1.1.1.0/24" \
    -H "Authorization: Bearer &lt;API_TOKEN&gt;" | \
jq '.result.routes[]|select(.as_path | contains([174]))'</code></pre>
            <p>The command output is a JSON array of route objects. Each object details a route that includes AS174 in its AS path. Additional information provided for each route includes the BGP route collector, BGP community values, and the timestamp of the last update.</p>
            <pre><code>{
  "as_path": [
    3130,
    174,
    13335
  ],
  "collector": "route-views2",
  "communities": [
    "174:21001",
    "174:22013",
    "3130:394"
  ],
  "peer_asn": 3130,
  "prefix": "1.1.1.0/24",
  "timestamp": "2025-05-14T00:00:00Z"
}
{
  "as_path": [
    263237,
    174,
    13335
  ],
  "collector": "rrc15",
  "communities": [
    "174:21001",
    "174:22013",
    "65237:174"
  ],
  "peer_asn": 263237,
  "prefix": "1.1.1.0/24",
  "timestamp": "2025-05-14T01:39:52Z"
}</code></pre>
            <p>The API also offers supplementary metadata alongside BGP route information, including insights into BGP route collector status and aggregated prefix-to-origin data. Recalling the earlier example of an AS path prepending misconfiguration, the RPKI invalid AS3 origin is now directly visible to users and API clients in the JSON response, showing that just 9% of all collectors observed its announcements.</p>
            <pre><code>"meta": {
  "collectors": [
    {
      "latest_realtime_ts": "2025-05-19T21:35:40Z",
      "latest_rib_ts": "2025-05-19T20:00:00Z",
      "latest_updates_ts": "2025-05-19T21:15:00Z",
      "peers_count": 24,
      "peers_v4_count": 0,
      "peers_v6_count": 24,
      "collector": "route-views6"
    },
  ],
  "prefix_origins": [
    {
      "origin": 3,
      "prefix": "2804:4e28::/32",
      "rpki_validation": "invalid",
      "total_peers": 121,
      "total_visible": 11,
      "visibility": 0.09090909090909091
    },
    {
      "origin": 268243,
      "prefix": "2804:4e28::/32",
      "rpki_validation": "valid",
      "total_peers": 121,
      "total_visible": 94,
      "visibility": 0.7768595041322314
    }
  ],
}</code></pre>
            
    <div>
      <h2>From archives to real-time</h2>
      <a href="#from-archives-to-real-time">
        
      </a>
    </div>
    
    <div>
      <h3>Architecture overview</h3>
      <a href="#architecture-overview">
        
      </a>
    </div>
    <p>Cloudflare Radar uses <a href="https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/"><u>RIPE RIS</u></a> and the University of Oregon’s <a href="https://www.routeviews.org/routeviews/"><u>RouteViews</u></a> as our primary BGP data sources for services like the <a href="https://radar.cloudflare.com/routing#routing-statistics"><u>routing statistics widget,</u></a> <a href="https://radar.cloudflare.com/routing#routing-anomalies"><u>anomaly detection</u></a>, and <a href="https://radar.cloudflare.com/routing/us#announced-ip-address-space"><u>announced address space graphs</u></a>. We have previously discussed in detail on how we use the data archives from these two providers to build Cloudflare Radar’s <a href="https://blog.cloudflare.com/radar-routing/"><u>routing pages</u></a>, and our <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/"><u>route leak</u></a> and <a href="https://blog.cloudflare.com/bgp-hijack-detection/"><u>hijack</u></a> detection systems.</p><p>In brief, RIPE RIS and RouteViews maintain several BGP route collectors, each connected to BGP routers across a diverse set of networks. These routers forward BGP messages to the collectors, which generate periodic data dumps for public access. These data dumps include both collections of BGP message updates and full routing table snapshots (RIB dump files).</p><p>For services monitoring stable routing information, like <a href="https://radar.cloudflare.com/routing"><u>global routing statistics</u></a>, we process RIB dump files from the archives as they become available. Conversely, for detecting dynamic events such as <a href="https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/"><u>route leaks</u></a> and <a href="https://blog.cloudflare.com/bgp-hijack-detection/"><u>hijacks</u></a>, we process periodic BGP update files in batches. Services depending on this historical BGP data may experience processing delays of 10 to 30 minutes at the route collectors.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/25MdQGrekEzjwNYpgoxOf2/3c16e120dd06412d05dfe42f5cd53c04/image5.png" />
          </figure><p>For the new real-time BGP routes feature, we aim to reduce the data delay from minutes or tens of minutes down to seconds. With the real-time stream capability provided by the BGP archiver services — <a href="https://ris-live.ripe.net/"><u>RIS Live</u></a> WebSocket from RIPE RIS and <a href="https://github.com/SNAS/openbmp"><u>OpenBMP</u></a> Kafka stream from RouteViews — we designed an additional real-time data stream component that enhances the routes snapshots built with MRT archive files by constantly updating the snapshots.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7MAYm4qw5F723D2MNGzoKm/4e11a257d5b5d2751368c0c09e040dfa/image6.png" />
          </figure>
    <div>
      <h3>System design</h3>
      <a href="#system-design">
        
      </a>
    </div>
    <p>At its core, the system enables a user to look up a prefix's routes stored in BGP routes snapshots. The BGP routes snapshots serve as a queryable data repository, organized hierarchically. The snapshots use a <a href="https://en.wikipedia.org/wiki/Trie"><u>trie structure</u></a> to allow for the retrieval of route information (such as AS paths and community values) associated with specific address prefixes. Each node in the hierarchy stores routing information from different peering routers, providing a consolidated global view. To handle the large data volumes from multiple BGP route collectors, the system partitions routing data into separate BGP routes snapshots, where each snapshot receives data streamed from its corresponding collector. This architecture enables horizontal scalability, allowing for dynamic adjustment of data sources by selecting which independent collectors' data to include or exclude.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3aLPEEWOeaHxR97VGeqpCE/0f2f4a3c88ba9e8a1e9ee9de8aa23762/image7.png" />
          </figure><p>Because the collectors’ BGP route information is maintained independently, to query for a global status, we apply the <a href="https://en.wikipedia.org/wiki/Actor_model"><u>actor model software architecture</u></a> for the implementation. Each collector is considered an actor that runs completely independently and communicates with a central controller via a dedicated communication channel. The central controller starts all actors by sending a signal to each of them, triggering actors to start collecting archival and real-time BGP data, on their separate threads.</p><p>Upon queries from users, the central controller will relay the query to all running actors via a query message. The actors will retrieve the corresponding route information on its prefix-trie and return the results to the controller with another message. The controller aggregates all messages from the actors and compiles them into a reply response to the user. During the whole process, the real-time BGP streaming and snapshots’ updating processes continue to run in the background without interruptions.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3Nq5RNk4FleQZxa3suzxU8/66eea37509df2563528fc85d38299f15/image1.png" />
          </figure><p>Our actor-model implementation enables a single node to efficiently store hundreds of full routing tables in its memory. Our current deployment uses eight route collectors, housing a total of 261 full routing tables. This in-memory system operating on a single node consumes approximately 45 GB of memory, which translates to about 170 MB per full routing table.</p>
    <div>
      <h2>Summary</h2>
      <a href="#summary">
        
      </a>
    </div>
    <p>Cloudflare Radar now offers a real-time BGP route lookup service, providing near-instantaneous insights into global Internet routing. This feature leverages real-time data streams from RouteViews and RIPE RIS, moving beyond historical archives to deliver up-to-the-minute information. Users can now visualize routes in real time on Cloudflare Radar's prefix pages with intuitive Sankey diagrams that detail complete route information. Furthermore, the Cloudflare Radar API provides programmatic access to this data, allowing for seamless integration into custom tools and workflows.</p><p>Visit <a href="https://radar.cloudflare.com/"><u>Cloudflare Radar</u></a> for additional insights around Internet disruptions, routing issues, Internet traffic trends, attacks, and Internet quality. Follow us on social media at <a href="https://twitter.com/CloudflareRadar"><u>@CloudflareRadar</u></a> (X), <a href="https://noc.social/@cloudflareradar"><u>noc.social/@cloudflareradar</u></a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com"><u>radar.cloudflare.com</u></a> (Bluesky), or contact us via <a><u>email</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Real-time]]></category>
            <guid isPermaLink="false">WF4vnYPMXN4pKu9Xwqj7g</guid>
            <dc:creator>Mingwei Zhang</dc:creator>
        </item>
        <item>
            <title><![CDATA[Making progress on routing security: the new White House roadmap]]></title>
            <link>https://blog.cloudflare.com/white-house-routing-security/</link>
            <pubDate>Mon, 02 Sep 2024 23:00:00 GMT</pubDate>
            <description><![CDATA[ On September 3, 2024, the White House published a report on Internet routing security. We’ll talk about what that means and how you can help. ]]></description>
            <content:encoded><![CDATA[ <p>The Internet can feel like magic. When you load a webpage in your browser, many simultaneous requests for data fly back and forth to remote servers. Then, often in less than one second, a website appears. Many people know that DNS is used to look up a hostname, and resolve it to an IP address, but fewer understand how data flows from your home network to the network that controls the IP address of the web server.</p><p>The Internet is an interconnected network of networks, operated by thousands of independent entities. To allow these networks to communicate with each other, in 1989, <a href="https://weare.cisco.com/c/r/weare/amazing-stories/amazing-things/two-napkin.html"><u>on the back of two napkins</u></a>, three network engineers devised the <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/"><u>Border Gateway Protocol (BGP)</u></a>. It allows these independent networks to signal directions for IP prefixes they own, or that are reachable through their network. At that time, Internet security wasn’t a big deal — <a href="https://www.cloudflare.com/learning/ssl/what-is-ssl/"><u>SSL</u></a>, initially developed to secure websites, wasn’t developed until 1995, six years later. So BGP wasn’t originally built with security in mind, but over time, security and availability concerns have emerged.</p><p>Today, the <a href="https://bidenwhitehouse.archives.gov/oncd/"><u>White House Office of the National Cyber Director</u></a> issued the <a href="https://bidenwhitehouse.archives.gov/oncd/briefing-room/2024/09/03/fact-sheet-biden-harris-administration-releases-roadmap-to-enhance-internet-routing-security/"><u>Roadmap to Enhancing Internet Routing Security</u></a>, and we’re excited to highlight their recommendations. But before we get into that, let’s provide a quick refresher on what BGP is and why routing security is so important.</p>
    <div>
      <h2>BGP: pathways through the Internet</h2>
      <a href="#bgp-pathways-through-the-internet">
        
      </a>
    </div>
    <p>BGP is the core signaling protocol used on the Internet. It’s fully distributed, and managed independently by all the individual operators of the Internet. With BGP, operators will send messages to their neighbors (other networks they are directly connected with, either physically or through an <a href="https://www.cloudflare.com/learning/cdn/glossary/internet-exchange-point-ixp/"><u>Internet Exchange</u></a>) that indicate their network can be used to reach a specific IP prefix. These IP prefixes can be resources the network owns themselves, such as <a href="https://radar.cloudflare.com/routing/prefix/104.16.128.0/20"><u>104.16.128.0/20</u></a> for Cloudflare, or resources that are reachable through their network, by transiting the network.</p><p>By exchanging all of this information between peers, each individual network on the Internet can form a full map of what the Internet looks like, and ideally, how to reach each IP address on the Internet. This map is in an almost constant state of flux: networks disappear from the Internet for a wide variety of reasons, ranging from scheduled maintenance to catastrophic failures, like the <a href="https://blog.cloudflare.com/october-2021-facebook-outage/"><u>Facebook incident in 2021</u></a>. On top of this, the ideal path to take from point A (your home ISP) to point B (Cloudflare) can change drastically, depending on routing decisions made by your home ISP, and any or all intermediate networks between your home ISP and Cloudflare (<a href="https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/"><u>here’s an example from 2019</u></a>). These <a href="https://blog.cloudflare.com/prepends-considered-harmful/"><u>routing decisions</u></a> are entirely arbitrary, and left to the owners of the networks. Performance and security can be considered, but neither of these have been historically made visible through BGP itself.</p><p>As all the networks can independently make their own routing decisions, there are a lot of individual points where things can go wrong. Going wrong can have multiple meanings here: this can range from routing loops, causing Internet traffic to go back and forth repeatedly between two networks, never reaching its destination, to more malicious problems, such as traffic interception or traffic manipulation.</p><p>As routing security wasn’t accounted for in that initial two-napkin draft, it is easy for a malicious actor on the Internet to <a href="https://www.cloudflare.com/en-gb/learning/security/glossary/bgp-hijacking/"><u>pretend to either be an originating network</u></a> (where they claim to own the IP prefix, positioning themselves as the destination network), or they can pretend to be a viable middle network, getting traffic to transit through their network.</p><p>In either of these examples, the actor can manipulate the Internet traffic of unsuspecting end users and potentially steal passwords, cryptocurrency, or any other data that can be of value. While transport security (<a href="https://www.cloudflare.com/learning/ssl/transport-layer-security-tls/"><u>TLS</u></a> for HTTP/1.x and HTTP/2, <a href="https://blog.cloudflare.com/the-road-to-quic/"><u>QUIC</u></a> for HTTP/3) has reduced this risk significantly, there’s still ways this can be bypassed. Over time, the Internet community has acknowledged the security concerns with BGP, and has built infrastructure to mitigate some of these problems. </p>
    <div>
      <h3>BGP security: The RPKI is born</h3>
      <a href="#bgp-security-the-rpki-is-born">
        
      </a>
    </div>
    <p>This journey is now coming to a final destination with the development and adoption of the Resource Public Key Infrastructure (RPKI). The RPKI is a <a href="https://research.cloudflare.com/projects/internet-infrastructure/pki/"><u>PKI</u></a>, just like the Web PKI which provides security certificates for the websites we browse (the “s” in https). The RPKI is a PKI specifically with the Internet in mind: it provides core constructs for <a href="https://www.cloudflare.com/learning/dns/glossary/what-is-my-ip-address/"><u>IP addresses</u></a> and <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>Autonomous System Numbers (ASNs</u></a>), the numbers used to identify these individual operating networks mentioned earlier.</p><p>Through the RPKI, it’s possible for an operator to establish a cryptographically secure relationship between the IP prefixes they originate, and their ASN, through the issuance of <a href="https://www.arin.net/resources/manage/rpki/roa_request/"><u>Route Origin Authorization records (ROAs)</u></a>. These ROAs can be used by all other networks on the Internet to validate that the IP prefix update they just received for a given origin network actually belongs to that origin network, a process called <a href="https://blog.cloudflare.com/rpki-updates-data/"><u>Route Origin Validation (ROV)</u></a>. If a malicious party tries to hijack an IP prefix that has a ROA to their (different) origin network, validating networks would know this update is invalid and reject it, maintaining the origin security and ensuring reachability.</p>
    <div>
      <h2>Why does BGP security matter? Examples of route hijacks and leaks</h2>
      <a href="#why-does-bgp-security-matter-examples-of-route-hijacks-and-leaks">
        
      </a>
    </div>
    <p>But why should you care about BGP? And more importantly, why does the White House care about BGP? Put simply: BGP (in)security can cost people and companies millions of dollars and cause widespread disruptions for critical services.</p><p>In February 2022, Korean crypto platform KLAYswap was the target of a <a href="https://manrs.org/2022/02/klayswap-another-bgp-hijack-targeting-crypto-wallets/"><u>malicious BGP hijack</u></a>, which was used to steal $1.9 million of cryptocurrency from their customers. The attackers were able to serve malicious code that mimicked the service KLAYswap was using for technical support. They were able to do this by announcing the IP prefix used to serve the JavaScript SDK KLAYswap was using. When other networks accepted this announcement, end user traffic loading the technical support page instead received malicious JavaScript, which was used to drain customer wallets. As the attackers hijacked the IP address, they were also able to register a <a href="https://www.cloudflare.com/application-services/products/ssl/">TLS certificate</a> for the domain name used to serve the SDK. As a result, nothing looked out of the ordinary for Klayswap’s customers until they noticed their wallets had been drained.</p><p>However, not all BGP problems are intentional hijacks. In March 2022, <a href="https://radar.cloudflare.com/as8342"><u>RTComm (AS8342)</u></a>, a Russian ISP, announced itself as the origin of <a href="https://radar.cloudflare.com/routing/prefix/104.244.42.0/24"><u>104.244.42.0/24</u></a>, which is an IP prefix actually owned by <a href="https://radar.cloudflare.com/as13414"><u>Twitter (now X) (AS13414)</u></a>. In this case, all researchers have drawn a similar conclusion: RTComm wanted to block its users from accessing Twitter, but inadvertently advertised the route to its peers and upstream providers. Thankfully, the impact was limited, in large part due to Twitter issuing ROA records for their IP prefixes, which meant the hijack was blocked at all networks that had implemented ROV and were validating announcements.</p><p>Inadvertent incorrect advertisements passing from one network to another, or route leaks, can happen to anyone, even Cloudflare. Our <a href="https://1.1.1.1/dns"><u>1.1.1.1 public DNS service</u></a> — used by millions of consumers and businesses — is often the unintended victim. Consider this situation (versions of which have happened numerous times): a network engineer running a local ISP is testing a configuration on their router and announces to the Internet that you can reach the IP address 1.1.1.1 through their network. They will often pick this address because it’s easy to input on the router and observe in network analytics. They accidentally push that change out to all their peer networks — the networks they’re connected to — and now, if proper routing security isn’t in place, users on multiple networks around the Internet trying to reach 1.1.1.1 might be directed to this local ISP where there is no DNS service to be found. This can lead to widespread outages.</p><p>The types of routing security measures in the White House roadmap can prevent these issues. In the case of 1.1.1.1, <a href="https://rpki.cloudflare.com/?view=explorer&amp;prefix=1.1.1.0%2F24"><u>Cloudflare has ROAs in place</u></a> that tell the Internet that we originate the IP prefix that contains 1.1.1.1. If someone else on the Internet is advertising 1.1.1.1, that’s an invalid route, and other networks should stop accepting it. In the case of KLAYswap, had there been ROAs in place, other networks could have used common filtering techniques to filter out the routes pointing to the attackers malicious JavaScript. So now let’s talk more about the plan the White House has to improve routing security on the Internet, and how the US government developed its recommendations.</p>
    <div>
      <h2>Work leading to the roadmap</h2>
      <a href="#work-leading-to-the-roadmap">
        
      </a>
    </div>
    <p>The new routing security roadmap from the <a href="https://www.whitehouse.gov/oncd/"><u>Office of the National Cyber Director (ONCD)</u></a> is the product of years of work, throughout both government and industry. The <a href="https://www.nist.gov/"><u>National Institute of Standards and Technology (NIST)</u></a> has been a longstanding proponent of improving routing security, developing <a href="https://www.nist.gov/news-events/news/2014/05/nist-develops-test-and-measurement-tools-internet-routing-security"><u>test and measurement</u></a> <a href="https://rpki-monitor.antd.nist.gov/"><u>tools</u></a> and publishing <a href="https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1800-14.pdf"><u>special publication 1800-14</u></a> on Protecting the Integrity of Internet Routing, among many other initiatives. They are active participants in the Internet community, and an important voice for routing security.</p><p>Cloudflare first started publicly <a href="https://blog.cloudflare.com/is-bgp-safe-yet-rpki-routing-security-initiative/"><u>advocating</u></a> for adoption of security measures like RPKI after a <a href="https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/"><u>massive BGP route leak</u></a> took down a portion of the Internet, including websites using Cloudflare’s services, in 2019. </p><p>Since that time, the federal government has increasingly recognized the need to elevate efforts to secure Internet routing, a process that Cloudflare has helped support along the way. The <a href="https://www.solarium.gov/"><u>Cyberspace Solarium Commission report</u></a>, published in 2020, encouraged the government to develop a strategy and recommendations to define “common, implementable guidance for securing the DNS and BGP.”    </p><p>In February 2022, the Federal Communication Commission <a href="https://www.fcc.gov/document/fcc-launches-inquiry-internet-routing-vulnerabilities"><u>launched</u></a> a notice of inquiry to better understand Internet routing. Cloudflare <a href="https://www.fcc.gov/ecfs/document/10412234101460/1"><u>responded</u></a> with a detailed explanation of our history with RPKI and routing security. In July 2023, the FCC, jointly with the Director of the <a href="https://cisa.gov/"><u>Cybersecurity and Infrastructure Security Agency</u></a>, held a <a href="https://www.fcc.gov/news-events/events/2023/07/bgp-security-workshop"><u>workshop</u></a> for stakeholders, with <a href="https://youtu.be/VQhoNX2Q0aM?si=VHbB5uc-0DzHaWpL&amp;t=11462"><u>Cloudflare as one of the presenters</u></a>. In June 2024, the FCC issued a <a href="https://docs.fcc.gov/public/attachments/FCC-24-62A1.pdf"><u>Notice of Proposed Rulemaking</u></a> that would require large service providers to develop security risk management plans and report on routing security efforts, including RPKI adoption. </p><p>The White House has been involved as well. In March 2023, they cited the need to secure the technical foundation of the Internet, from issued such as BGP vulnerabilities, as one of the strategic objectives of the <a href="https://www.whitehouse.gov/wp-content/uploads/2023/03/National-Cybersecurity-Strategy-2023.pdf"><u>National Cybersecurity Strategy</u></a>. Citing those efforts, in May 2024, the Department of Commerce <a href="https://www.commerce.gov/news/press-releases/2024/05/us-department-commerce-implements-internet-routing-security"><u>issued</u></a> <a href="https://rpki.cloudflare.com/?view=explorer&amp;asn=3477"><u>ROAs signing some of its IP space</u></a>, and this roadmap strongly encourages other departments and agencies to do the same. All of those efforts and the focus on routing security have resulted in increased adoption of routing security measures. </p>
    <div>
      <h2>Report observations and recommendations</h2>
      <a href="#report-observations-and-recommendations">
        
      </a>
    </div>
    <p>The report released by the White House Office of the National Cyber Director details the current state of BGP security, and the challenges associated with Resource Public Key Infrastructure (RPKI) Route Origin Authorization (ROA) issuance and RPKI Route Origin Validation (ROV) adoption. It also provides network operators and government agencies with next steps and recommendations for BGP security initiatives. </p><p>One of the first recommendations is for all networks to create and publish ROAs. It’s important that every network issues ROAs for their IP prefixes, as it’s the only way for other networks to validate they are the authorized originator of those prefixes. If one network is advertising an IP address as their own, but a different network issued the ROA, that’s an important sign that something might be wrong!</p><p>As shown in the chart below from <a href="https://rpki-monitor.antd.nist.gov/"><u>NIST’s RPKI Monitor</u></a>, as of September 2024, at least 53% of all the IPv4 prefixes on the Internet have a valid ROA record available (IPv6 reached this milestone in late 2023), up from only 6% in 2017. (The metric is even better when measured as a percent of Internet traffic: data from <a href="https://kentik.com/"><u>Kentik</u></a>, a network observability company, <a href="https://www.kentik.com/blog/rpki-rov-deployment-reaches-major-milestone/"><u>shows</u></a> that 70.3% of Internet traffic is exchanged with IP prefixes that have a valid ROA.) This increase in the number of signed IP prefixes (ROAs) is foundational to secure Internet routing.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4f4Y1fXcdxYRxUhQYjxlWp/7f26d617648539980f2c8e65873139e4/image2.png" />
          </figure><p>Unfortunately, the US is lagging behind: <a href="https://radar.cloudflare.com/routing/us"><u>Only 39% of IP prefixes</u></a> originated by US networks have a valid ROA. This is not surprising, considering the US has significantly more Internet address resources than other parts of the world. However, the report highlights the need for the US to overcome the common barriers network operators face when implementing BGP security measures. Administrative challenges, the perception of risk, and prioritization and resourcing constraints are often cited as the problems networks face when attempting to move forward with ROV and RPKI.</p><p>A related area of the roadmap highlights the need for networks that allow their customers to control IP address space to still create ROAs for those addresses. The reality of how every ISP, government, and large business allocates its IP address space is undoubtedly messy, but that doesn’t reduce the importance of making sure that the correct entity is identified in the official records with a ROA. </p><p>A network signing routes for its IP addresses is an important step, but it isn’t enough. To prevent incorrect routes — malicious or not — from spreading around the Internet, networks need to implement Route Origin Validation (ROV) and implement other BGP best practices, outlined by <a href="https://manrs.org/"><u>MANRS</u></a> in their <a href="https://manrs.org/wp-content/uploads/2023/12/The_Zen_of_BGP_Sec_Policy_Nov2023.docx.pdf"><u>Zen Guide to Routing Security Policy</u></a>. If one network incorrectly announces itself as the origin for 1.1.1.1, that won’t have any effect beyond its own borders if no other networks pick up that invalid route. The Roadmap calls out filtering invalid routes as another action for network service providers. </p><p>As of <a href="https://blog.cloudflare.com/rpki-updates-data/"><u>2022</u></a>, our data<a href="https://blog.cloudflare.com/rpki-updates-data/"><u> showed</u></a> that around 15 percent of networks were validating routes. Ongoing measurements from APNIC show progress: this year about 20 percent <a href="https://stats.labs.apnic.net/rpki/XA"><u>of APNIC probes</u></a> globally correctly filter invalid routes with ROV. <a href="https://stats.labs.apnic.net/rpki/US"><u>In the US</u></a>, it’s 70 percent. Continued growth of ROV is a critical step towards achieving better BGP security.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Ne3sPYqAEytLjO0Vm53yA/ad573ba885e61d249d0a4601b70c8df6/image1.png" />
          </figure><p>Filtering out invalid routes is prominently highlighted in the report’s recommendations. While recognizing that there’s been dramatic improvement in filtering by the large transit networks, the first report recommendation is for network service providers — large and small —  to fully deploy ROV. </p><p>In addition, the Roadmap proposes using the federal government’s considerable weight as a purchaser, writing, “<i>[Office of Management and Budget] should require the Federal Government’s contracted service providers to adopt and deploy current commercially-viable Internet routing security technologies.</i>” It goes on to say that grant programs, particularly broadband grants, “<i>should require grant recipients to incorporate routing security measures into their projects.</i>”</p><p>The roadmap doesn’t only cover well-established best practices, but also highlights emerging security technologies, such as <a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-profile/"><u>Autonomous System Provider Authorization (ASPA)</u></a> and <a href="https://datatracker.ietf.org/doc/html/rfc8205"><u>BGPsec</u></a>. ROAs only cover part of the BGP routing ecosystem, so additional work is needed to ensure we secure everything. It’s encouraging to see the work being done by the wider community to address these concerns is acknowledged, and more importantly, actively followed.</p>
    <div>
      <h2>What’s next for the Internet community</h2>
      <a href="#whats-next-for-the-internet-community">
        
      </a>
    </div>
    <p>The new roadmap is an important step in outlining actions that can be taken today to improve routing security. But as the roadmap itself recognizes, there’s more work to be done both in making sure that the steps are implemented, and that we continue to push routing security forward.</p><p>From an implementation standpoint, our hope is that the government’s focus on routing security through all the levers outlined in the roadmap will speed up ROA adoption, and encourage wider implementation of ROV and other best practices. At Cloudflare, we’ll continue to report on routing practices on <a href="https://radar.cloudflare.com/routing/us"><u>Cloudflare Radar</u></a> to help assess progress against the goals in the roadmap.</p><p>At a technical level, the wider Internet community has made massive strides in adopting RPKI ROV, and have set their sights on the next problem: we are securing the IP-to-originating network relationship, but what about the relationships between the individual networks?</p><p>Through the adoption of BGPsec and ASPA, network operators are able to not only validate the destination of a prefix, but also validate the path to get there. These two new technical additions within the RPKI will combine with ROV to ultimately provide a fully secure signaling protocol for the modern Internet. The community has actively undertaken this work, and we’re excited to see it progress!</p><p>Outside the RPKI, the community has also ratified the formalization of customer roles through <a href="https://datatracker.ietf.org/doc/rfc9234/"><u>RFC9234: Route Leak Prevention and Detection Using Roles in UPDATE and OPEN Messages</u></a>. As this new BGP feature gains support, we’re hopeful that this will be another helpful tool in the operator toolbox in preventing route leaks of any kind.</p>
    <div>
      <h2>How you can help keep the Internet secure</h2>
      <a href="#how-you-can-help-keep-the-internet-secure">
        
      </a>
    </div>
    <p>If you’re a network operator, you’ll need to sign your routes, and validate incoming prefixes. This consists of signing Route Origin Authorization (ROA) records, and performing Route Origin Validation (ROV). Route signing involves creating records with your local <a href="https://www.nro.net/about/rirs/"><u>Regional Internet Registry (RIR)</u></a> and signing to their PKI. Route validation involves only accepting routes that are signed with a ROA. This will help ensure that only secure routes get through. You can learn more about that <a href="https://blog.cloudflare.com/rpki-updates-data/"><u>here</u></a>.</p><p>If you’re not a network operator, head to <a href="http://isbgpsafeyet.com"><u>isbgpsafeyet.com</u></a>, and test your ISP. If your ISP is not keeping BGP safe, be sure to let them know how important it is. If the government has pointed out prioritization is a consistent problem, let’s help increase the priority of routing security.</p>
    <div>
      <h2>A secure Internet is an open Internet</h2>
      <a href="#a-secure-internet-is-an-open-internet">
        
      </a>
    </div>
    <p>As the report points out, one of the keys to keeping the Internet open is ensuring that users can feel safe accessing any site they need to without worrying about attacks that they can’t control. Cloudflare wholeheartedly supports the US government’s efforts to bolster routing security around the world and is eager to work to ensure that we can help create a safe, open Internet for every user.</p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Routing Security]]></category>
            <category><![CDATA[Better Internet]]></category>
            <guid isPermaLink="false">10dR1e1P8WbOojN0JGTPOp</guid>
            <dc:creator>Mike Conlow</dc:creator>
            <dc:creator>Emily Music</dc:creator>
            <dc:creator>Tom Strickx</dc:creator>
        </item>
        <item>
            <title><![CDATA[The backbone behind Cloudflare’s Connectivity Cloud]]></title>
            <link>https://blog.cloudflare.com/backbone2024/</link>
            <pubDate>Tue, 06 Aug 2024 14:00:00 GMT</pubDate>
            <description><![CDATA[ Read through the latest milestones and expansions of Cloudflare's global backbone and how it supports our Connectivity Cloud and our services ]]></description>
            <content:encoded><![CDATA[ <p>The modern use of "cloud" arguably traces its origins to the cloud icon, omnipresent in network diagrams for decades. A cloud was used to represent the vast and intricate infrastructure components required to deliver network or Internet services without going into depth about the underlying complexities. At Cloudflare, we embody this principle by providing critical infrastructure solutions in a user-friendly and easy-to-use way. Our logo, featuring the cloud symbol, reflects our commitment to simplifying the complexities of Internet infrastructure for all our users.</p><p>This blog post provides an update about our infrastructure, focusing on our global backbone in 2024, and highlights its benefits for our customers, our competitive edge in the market, and the impact on our mission of helping build a better Internet. Since the time of our last backbone-related <a href="http://blog.cloudflare.com/cloudflare-backbone-internet-fast-lane">blog post</a> in 2021, we have increased our backbone capacity (Tbps) by more than 500%, unlocking new use cases, as well as reliability and performance benefits for all our customers.</p>
    <div>
      <h3>A snapshot of Cloudflare’s infrastructure</h3>
      <a href="#a-snapshot-of-cloudflares-infrastructure">
        
      </a>
    </div>
    <p>As of July 2024, Cloudflare has data centers in 330 cities across more than 120 countries, each running Cloudflare equipment and services. The goal of delivering Cloudflare products and services everywhere remains consistent, although these data centers vary in the number of servers and amount of computational power.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/38RRu7BaumWFemL23JcFLW/fd1e4aced5095b1e04384984c88e48be/BLOG-2432-2.png" />
          </figure><p></p><p>These data centers are strategically positioned around the world to ensure our presence in all major regions and to help our customers comply with local regulations. It is a programmable smart network, where your traffic goes to the best data center possible to be processed. This programmability allows us to keep sensitive data regional, with our <a href="https://www.cloudflare.com/data-localization/">Data Localization Suite solutions</a>, and within the constraints that our customers impose. Connecting these sites, exchanging data with customers, public clouds, partners, and the broader Internet, is the role of our network, which is managed by our infrastructure engineering and network strategy teams. This network forms the foundation that makes our products lightning fast, ensuring our global reliability, security for every customer request, and helping customers comply with <a href="https://www.cloudflare.com/the-net/building-cyber-resilience/challenges-data-sovereignty/">data sovereignty requirements</a>.</p>
    <div>
      <h3>Traffic exchange methods</h3>
      <a href="#traffic-exchange-methods">
        
      </a>
    </div>
    <p>The Internet is an interconnection of different networks and separate <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">autonomous systems</a> that operate by exchanging data with each other. There are multiple ways to exchange data, but for simplicity, we'll focus on two key methods on how these networks communicate: Peering and IP Transit. To better understand the benefits of our global backbone, it helps to understand these basic connectivity solutions we use in our network.</p><ol><li><p><b>Peering</b>: The voluntary interconnection of administratively separate Internet networks that allows for traffic exchange between users of each network is known as “<a href="https://www.netnod.se/ix/what-is-peering">peering</a>”. Cloudflare is one of the <a href="https://bgp.he.net/report/exchanges#_participants">most peered networks</a> globally. We have peering agreements with ISPs and other networks in 330 cities and across all major </p><p><a href="https://www.cloudflare.com/learning/cdn/glossary/internet-exchange-point-ixp/">Internet Exchanges (IX’s)</a>. Interested parties can register to <a href="https://www.cloudflare.com/partners/peering-portal/">peer with us</a> anytime, or directly connect to our network with a link through a <a href="https://developers.cloudflare.com/network-interconnect/pni-and-peering/">private network interconnect (PNI)</a>.</p></li><li><p><b>IP transit</b>: A paid service that allows traffic to cross or "transit" somebody else's network, typically connecting a smaller Internet service provider (ISP) to the larger Internet. Think of it as paying a toll to access a private highway with your car.</p></li></ol><p>The backbone is a dedicated high-capacity optical fiber network that moves traffic between Cloudflare’s global data centers, where we interconnect with other networks using these above-mentioned traffic exchange methods. It enables data transfers that are more reliable than over the public Internet. For the connectivity within a city and long distance connections we manage our own dark fiber or lease wavelengths using Dense Wavelength Division Multiplexing (DWDM). DWDM is a fiber optic technology that enhances network capacity by transmitting multiple data streams simultaneously on different wavelengths of light within the same fiber. It’s like having a highway with multiple lanes, so that more cars can drive on the same highway. We buy and lease these services from our global carrier partners all around the world.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1RgjDtW5LehGZEYXey4AQH/cfef08965313f67c84a052e0541fc42b/BLOG-2432-3.png" />
          </figure><p></p>
    <div>
      <h3>Backbone operations and benefits</h3>
      <a href="#backbone-operations-and-benefits">
        
      </a>
    </div>
    <p>Operating a global backbone is challenging, which is why many competitors don’t do it. We take this challenge for two key reasons: traffic routing control and cost-effectiveness.</p><p>With IP transit, we rely on our transit partners to carry traffic from Cloudflare to the ultimate destination network, introducing unnecessary third-party reliance. In contrast, our backbone gives us full control over routing of both internal and external traffic, allowing us to manage it more effectively. This control is crucial because it lets us optimize traffic routes, usually resulting in the lowest latency paths, as previously mentioned. Furthermore, the cost of serving large traffic volumes through the backbone is, on average, more cost-effective than IP transit. This is why we are doubling down on backbone capacity in regions such as Frankfurt, London, Amsterdam, and Paris and Marseille, where we see continuous traffic growth and where connectivity solutions are widely available and competitively priced.</p><p>Our backbone serves both internal and external traffic. Internal traffic includes customer traffic using our security or performance products and traffic from Cloudflare's internal systems that shift data between our data centers. <a href="http://blog.cloudflare.com/introducing-regional-tiered-cache">Tiered caching</a>, for example, optimizes our caching delivery by dividing our data centers into a hierarchy of lower tiers and upper tiers. If lower-tier data centers don’t have the content, they request it from the upper tiers. If the upper tiers don’t have it either, they then request it from the origin server. This process reduces origin server requests and improves cache efficiency. Using our backbone to transport the cached content between lower and upper-tier data centers and the origin is often the most cost-effective method, considering the scale of our network. <a href="https://www.cloudflare.com/network-services/products/magic-transit/">Magic Transit</a> is another example where we attract traffic, by means of BGP anycast, to the Cloudflare data center closest to the end user and implement our DDoS solution. Our backbone transports the clean traffic to our customer’s data center, which they connect through a <a href="http://blog.cloudflare.com/cloudflare-network-interconnect">Cloudflare Network Interconnect (CNI)</a>.</p><p>External traffic that we carry on our backbone can be traffic from other origin providers like AWS, Oracle, Alibaba, Google Cloud Platform, or Azure, to name a few. The origin responses from these cloud providers are transported through peering points and our backbone to the Cloudflare data center closest to our customer. By leveraging our backbone we have more control over how we backhaul this traffic throughout our network, which results in more reliability and better performance and less dependency on the public Internet.</p><p>This interconnection between public clouds, offices, and the Internet with a controlled layer of performance, security, programmability, and visibility running on our global backbone is our <a href="http://blog.cloudflare.com/welcome-to-connectivity-cloud">Connectivity Cloud</a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Fk6k5NOgfOM3qpK0z3wb0/2fe9631dbe6b2dfc6b3c3cd0156f293e/Screenshot_2024-08-28_at_3.21.50_PM.png" />
          </figure><p><sub><i>This map is a simplification of our current backbone network and does not show all paths</i></sub></p><p></p>
    <div>
      <h3>Expanding our network</h3>
      <a href="#expanding-our-network">
        
      </a>
    </div>
    <p>As mentioned in the introduction, we have increased our backbone capacity (Tbps) by more than 500% since 2021. With the addition of sub-sea cable capacity to Africa, we achieved a big milestone in 2023 by completing our global backbone ring. It now reaches six continents through terrestrial fiber and subsea cables.</p><p>Building out our backbone within regions where Internet infrastructure is less developed compared to markets like Central Europe or the US has been a key strategy for our latest network expansions. We have a shared goal with regional ISP partners to keep our data flow localized and as close as possible to the end user. Traffic often takes inefficient routes outside the region due to the lack of sufficient local peering and regional infrastructure. This phenomenon, known as traffic tromboning, occurs when data is routed through more cost-effective international routes and existing peering agreements.</p><p>Our regional backbone investments in countries like India or Turkey aim to reduce the need for such inefficient routing. With our own in-region backbone, traffic can be directly routed between in-country Cloudflare data centers, such as from Mumbai to New Delhi to Chennai, reducing latency, increasing reliability, and helping us to provide the same level of service quality as in more developed markets. We can control that data stays local, supporting our Data Localization Suite (<a href="https://www.cloudflare.com/data-localization/">DLS</a>), which helps businesses comply with regional data privacy laws by controlling where their data is stored and processed.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4WCNB78y1jHHsid46pBZOo/e950ced1e510cb8caeea0961c43ea8a0/BLOG-2432-5.png" />
          </figure><p></p>
    <div>
      <h3>Improved latency and performance</h3>
      <a href="#improved-latency-and-performance">
        
      </a>
    </div>
    <p>This strategic expansion has not only extended our global reach but has also significantly improved our overall latency. One illustration of this is that since the deployment of our backbone between Lisbon and Johannesburg, we have seen a major performance improvement for users in Johannesburg. Customers benefiting from this improved latency can be, for example, a financial institution running their APIs through us for real-time trading, where milliseconds can impact trades, or our <a href="https://www.cloudflare.com/network-services/products/magic-wan/">Magic WAN</a> users, where we facilitate site-to-site connectivity between their branch offices.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1o0H8BNLf5ca8BBx38Q5Ee/5b22f7c0ad1c5c49a67bc5149763e81d/BLOG-2432-6.png" />
          </figure><p></p><p>The table above shows an example where we measured the round-trip time (RTT) for an uncached origin fetch, from an end-user in Johannesburg to various origin locations, comparing our backbone and the public Internet. By carrying the origin request over our backbone, as opposed to IP transit or peering, local users in Johannesburg get their content up to 22% faster. By using our own backbone to long-haul the traffic to its final destination, we are in complete control of the path and performance. This improvement in latency varies by location, but consistently demonstrates the superiority of our backbone infrastructure in delivering high performance connectivity.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ZEEZJERWQ2UB1sdTjWUtM/f90b11507ab24edbf84e9b4cfb9b1155/BLOG-2432-7.png" />
          </figure><p></p>
    <div>
      <h3>Traffic control</h3>
      <a href="#traffic-control">
        
      </a>
    </div>
    <p>Consider a navigation system using 1) GPS to identify the route and 2) a highway toll pass that is valid until your final destination and allows you to drive straight through toll stations without stopping. Our backbone works quite similarly.</p><p>Our global backbone is built upon two key pillars. The first is BGP (<a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">Border Gateway Protocol</a>), the routing protocol for the Internet, and the second is Segment Routing MPLS (<a href="https://www.cloudflare.com/learning/network-layer/what-is-mpls/">Multiprotocol label switching</a>), a technique for steering traffic across predefined forwarding paths in an IP network. By default, Segment Routing provides end-to-end encapsulation from ingress to egress routers where the intermediate nodes execute no route lookup. Instead, they forward traffic across an end-to-end virtual circuit, or tunnel, called a label-switched path. Once traffic is put on a label-switched path, it cannot detour onto the public Internet and must continue on the predetermined route across Cloudflare’s backbone. This is nothing new, as many networks will even run a “BGP Free Core” where all the route intelligence is carried at the edge of the network, and intermediate nodes only participate in forwarding from ingress to egress.</p><p>While leveraging Segment Routing Traffic Engineering (SR-TE) in our backbone, we can automatically select paths between our data centers that are optimized for latency and performance. Sometimes the “shortest path” in terms of routing protocol cost is not the lowest latency or highest performance path.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6QettBytPdJxacwVLVHYFN/de95a8e5a67514e64931fbe4d26967b6/BLOG-2432-8.png" />
          </figure>
    <div>
      <h3>Supercharged: Argo and the global backbone</h3>
      <a href="#supercharged-argo-and-the-global-backbone">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/lp/pg-argo-smart-routing/?utm_source=google&amp;utm_medium=cpc&amp;utm_campaign=ao-fy-pay-gbl_en_native-applications-ge-ge-general-core_paid_apo_argo&amp;utm_content=argo&amp;utm_term=cloudflare+argo&amp;campaignid=71700000092259497&amp;adgroupid=58700007751943324&amp;creativeid=666481290143&amp;&amp;_bt=666481290143&amp;_bk=cloudflare%20argo&amp;_bm=e&amp;_bn=g&amp;_bg=138787490550&amp;_placement=&amp;_target=&amp;_loc=1017825&amp;_dv=c&amp;awsearchcpc=1&amp;gad_source=1&amp;gclid=Cj0KCQjwvb-zBhCmARIsAAfUI2uj2VOkHjvM2qspAfBodOROAH_bG040P6bjvQeEbVwFF1qwdEKLXLkaAllMEALw_wcB&amp;gclsrc=aw.ds">Argo Smart Routing</a> is a service that uses Cloudflare’s portfolio of backbone, transit, and peering connectivity to find the most optimal path between the data center where a user’s request lands and your back-end origin server. Argo may forward a request from one Cloudflare data center to another on the way to an origin if the performance would improve by doing so. <a href="http://blog.cloudflare.com/orpheus-saves-internet-requests-while-maintaining-speed">Orpheus</a> is the counterpart to Argo, and routes around degraded paths for all customer origin requests free of charge. Orpheus is able to analyze network conditions in real-time and actively avoid reachability failures. Customers with Argo enabled get optimal performance for requests from Cloudflare data centers to their origins, while Orpheus provides error self-healing for all customers universally. By mixing our global backbone using Segment Routing as an underlay with <a href="https://www.cloudflare.com/application-services/products/argo-smart-routing/">Argo Smart Routing</a> and Orpheus as our connectivity overlay, we are able to transport critical customer traffic along the most optimized paths that we have available.</p><p>So how exactly does our global backbone fit together with Argo Smart Routing? <a href="http://blog.cloudflare.com/argo-and-the-cloudflare-global-private-backbone">Argo Transit Selection</a> is an extension of Argo Smart Routing where the lowest latency path between Cloudflare data center hops is explicitly selected and used to forward customer origin requests. The lowest latency path will often be our global backbone, as it is a more dedicated and private means of connectivity, as opposed to third-party transit networks.</p><p>Consider a multinational Dutch pharmaceutical company that relies on Cloudflare's network and services with our <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/">SASE solution</a> to connect their global offices, research centers, and remote employees. Their Asian branch offices depend on Cloudflare's security solutions and network to provide secure access to important data from their central data centers back to their offices in Asia. In case of a cable cut between regions, our network would automatically look for the best alternative route between them so that business impact is limited.</p><p>Argo measures every potential combination of the different provider paths, including our own backbone, as an option for reaching origins with smart routing. Because of our vast interconnection with so many networks, and our global private backbone, Argo is able to identify the most performant network path for requests. The backbone is consistently one of the lowest latency paths for Argo to choose from.</p><p>In addition to high performance, we care greatly about network reliability for our customers. This means we need to be as resilient as possible from fiber cuts and third-party transit provider issues. During a disruption of the <a href="https://en.wikipedia.org/wiki/AAE-1">AAE-1</a> (<a href="https://www.submarinecablemap.com/submarine-cable/asia-africa-europe-1-aae-1">Asia Africa Europe-1</a>) submarine cable, this is what Argo saw between Singapore and Amsterdam across some of our transit provider paths vs. the backbone.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/66CGBePnLzuLRuTErvf8Cr/813b4b60a95935491e967214851e5a04/BLOG-2432-9.png" />
          </figure><p>The large (purple line) spike shows a latency increase on one of our third-party IP transit provider paths due to congestion, which was eventually resolved following likely traffic engineering within the provider’s network. We saw a smaller latency increase (yellow line) over other transit networks, but still one that is noticeable. The bottom (green) line on the graph is our backbone, where round-trip time more or less remains flat throughout the event, due to our diverse backbone connectivity between Asia and Europe. Throughout the fiber cut, we remained stable at around 200ms between Amsterdam and Singapore. There was no noticeable network hiccup as was seen on the transit provider paths, so Argo actively leveraged the backbone for optimal performance.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1A8CdaGq8P2hF3DtIs9dQI/a10fdf3af9de917fb0036d38eace9905/BLOG-2432-10.png" />
          </figure>
    <div>
      <h3>Call to action</h3>
      <a href="#call-to-action">
        
      </a>
    </div>
    <p>As Argo improves performance in our network, Cloudflare Network Interconnects (<a href="https://developers.cloudflare.com/network-interconnect/">CNIs</a>) optimize getting onto it. We encourage our Enterprise customers to use our free CNI’s as on-ramps onto our network whenever practical. In this way, you can fully leverage our network, including our robust backbone, and increase overall performance for every product within your Cloudflare Connectivity Cloud. In the end, our global network is our main product and our backbone plays a critical role in it. This way we continue to help build a better Internet, by improving our services for everybody, everywhere.</p><p>If you want to be part of our mission, join us as a Cloudflare network on-ramp partner to offer secure and reliable connectivity to your customers by integrating directly with us. Learn more about our on-ramp partnerships and how they can benefit your business <a href="https://www.cloudflare.com/network-onramp-partners/">here</a>.</p> ]]></content:encoded>
            <category><![CDATA[Connectivity Cloud]]></category>
            <category><![CDATA[Anycast]]></category>
            <category><![CDATA[Argo Smart Routing]]></category>
            <category><![CDATA[Athenian Project]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Better Internet]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Magic Transit]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">WiHZr8Fb6WzdVjo0egsWW</guid>
            <dc:creator>Shozo Moritz Takaya</dc:creator>
            <dc:creator>Bryton Herdes</dc:creator>
        </item>
        <item>
            <title><![CDATA[Exam-ining recent Internet shutdowns in Syria, Iraq, and Algeria]]></title>
            <link>https://blog.cloudflare.com/syria-iraq-algeria-exam-internet-shutdown/</link>
            <pubDate>Fri, 21 Jun 2024 13:00:02 GMT</pubDate>
            <description><![CDATA[ Similar to actions taken over the last several years, governments in Syria, Iraq, and Algeria have again disrupted Internet connectivity nationwide in an attempt to prevent cheating on exams. We investigate how these disruptions were implemented, and their impact ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The practice of cheating on exams (or at least attempting to) is presumably as old as the concept of exams itself, especially when the results of the exam can have significant consequences for one’s academic future or career. As access to the Internet became more ubiquitous with the growth of mobile connectivity, and communication easier with an assortment of social media and messaging apps, a new avenue for cheating on exams emerged, potentially facilitating the sharing of test materials or answers. <a href="https://www.theguardian.com/technology/2016/may/18/iraq-shuts-down-internet-to-stop-pupils-cheating-in-exams">Over the last decade</a>, some governments have reacted to this perceived risk by taking aggressive action to prevent cheating, ranging from targeted DNS-based blocking/filtering to multi-hour nationwide shutdowns across multi-week exam periods.</p><p>Syria and Iraq are well-known practitioners of the latter approach, and we have covered past exam-related Internet shutdowns in Syria (<a href="/syria-exam-related-internet-shutdowns">2021</a>, <a href="/syria-sudan-algeria-exam-internet-shutdown">2022</a>, <a href="/q2-2023-internet-disruption-summary">2023</a>) and Iraq (<a href="/syria-sudan-algeria-exam-internet-shutdown">2022</a>, <a href="/exam-internet-shutdowns-iraq-algeria">2023</a>) here on the Cloudflare blog. It is now mid-June 2024, and exams in both countries took place over the last several weeks, and with those exams, regular nationwide Internet shutdowns. In addition, Baccalaureate exams also took place in Algeria, and we have written about related Internet disruptions there in the past (<a href="/syria-sudan-algeria-exam-internet-shutdown">2022</a>, <a href="/exam-internet-shutdowns-iraq-algeria">2023</a>). However, in contrast to the single daily shutdowns in Syria and Iraq, the Algerian government opted instead for two multi-hour disruptions each day – one in the morning, one in the afternoon – and appears to be pursuing a content blocking strategy, rather than a full nationwide shutdown.</p><p>As we have done in past year’s posts, we will examine the impact that these shutdowns have on Internet traffic, but also analyze routing information and traffic from other Cloudflare services in an effort to better understand how these shutdowns are being implemented.</p>
    <div>
      <h3>Syria</h3>
      <a href="#syria">
        
      </a>
    </div>
    <p>The Syrian Telecom Company, to their credit, publishes an exam schedule on social media, with the image below <a href="https://www.facebook.com/photo/?fbid=827972736029921&amp;set=a.449047400589125">published to their Facebook page</a>. The English version was created by applying Google Translate to the image. The schedule shows the date &amp; time of each Internet shutdown (“disconnection”), in addition to the subject(s) of that day’s exam(s). In 2024, exams started on May 26, and went through June 13.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/9lgp9NIpBSo2RWII1nUmh/1331a833433eda16be39e4fda41d3413/Screenshot-2024-06-20-at-1.00.58-PM.png" />
            
            </figure><p>In Syria, <a href="https://radar.cloudflare.com/as29256">AS29256 (Syrian Telecom)</a> is effectively the Internet, as shown <a href="https://radar.cloudflare.com/routing/as29256">in the table below</a>. While there are a few other <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">autonomous systems</a> (ASNs/ASes) registered in Syria, there are only two that currently announce IP address space to the public Internet. As such, the trends seen at a country level for Syria reflect those seen for AS29256, and this is clearly evident in the traffic graphs below.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2XEcIEFQPHIKevCJZkVbaM/3be900a49f17d26e90505a2c77704bc0/unnamed--1--2.png" />
            
            </figure><p>Nationwide Internet shutdowns in Syria began on May 26, taking place for varying multi-hour periods from Sunday to Thursday for three consecutive weeks. The graphs below show Internet traffic from the country, as well as AS29256, dropping to zero during the scheduled shutdowns.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6uW4E8cmeiDolFehpOFXaE/113ee0007138f1eccebd2cec87ae2891/image42.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/FjHhDRQjGou5kNsIZNzQp/3cb60e3a92c1142f1c483b942db5afa2/image5-3.png" />
            
            </figure><p>In addition, graphs from the Cloudflare Radar <a href="https://radar.cloudflare.com/routing/">Routing</a> pages for <a href="https://radar.cloudflare.com/routing/sy">Syria</a> and <a href="https://radar.cloudflare.com/routing/as29256">AS29256</a> show the number of IPv4 and IPv6 prefixes being announced country-wide and by AS29256 dropping to at or near zero during each shutdown. This ultimately means that there is no Internet path back to systems (IP addresses) connected to Syrian Telecom. Below, we explore why this is important and problematic.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5xFTZkPlwMrvEhmn5Tnz42/b7eaffa70e91993663d39a1b5fff9682/image4-4.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1NhFb2khiuAhEJ9Fo8QN19/95f146770a68bf63b87726099eff0143/image47.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3QesstGGtNnaKfBMpRO6GL/f1ea77bb39c751aae45de68096426e00/image15-1.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7yXJYfsxpfK1mwQs5a2ZLw/b7432284cdd28841c22041eb1d4e323a/image30.png" />
            
            </figure><p>As has been <a href="/syria-sudan-algeria-exam-internet-shutdown">observed in the past</a>, the shutdowns in Syria are <a href="https://x.com/DougMadory/status/1138064496008806400">asymmetrical</a>. That is, traffic can exit the country (via AS29256), but there are no paths for responses to return. The impact of this approach is clearly evident in traffic to <a href="https://one.one.one.one/dns/">Cloudflare’s 1.1.1.1 DNS Resolver</a>. We continue to see traffic to the resolver when the shutdowns take place, and in fact, we see the traffic spike during the shutdowns, as the graph below shows.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2LSn47S6TsjvoLuDx5fKPU/6cc2d335fcbdf706e26887b84d873824/image49.png" />
            
            </figure><p>If we dig into traffic to 1.1.1.1 by protocol, we can see that it is driven by requests over <a href="https://www.cloudflare.com/learning/ddos/glossary/user-datagram-protocol-udp/">UDP</a> port 53, the <a href="https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?&amp;page=2">standard port</a> used for DNS requests over UDP and TCP. (Given the request pattern, that also appears to be the primary way that we see traffic to the resolver from Syria.)</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4CDPlKnjVUF6ViD75S2Zwa/e27da9a43911fd9808bbcadbd097477a/image12-1.png" />
            
            </figure><p>If we remove the UDP line from the graph, we see that request volume for DNS over TCP port 53, as well as <a href="https://developers.cloudflare.com/1.1.1.1/encryption/dns-over-https/">DNS over HTTPS (DoH)</a> and <a href="https://developers.cloudflare.com/1.1.1.1/encryption/dns-over-tls/">DNS over TLS (DoT)</a>, all drops to zero during the shutdowns.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/vWtoXnCHPogP4HfSjm2G5/ea6f8b6fb58f58cff01c70d9ee592f75/image1-18.png" />
            
            </figure><p>Similarly, we can clearly see the shutdowns in HTTP(S) request-based traffic graphs as well, since HTTP(S) is also a TCP-based protocol.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7phbRNoQ9Kg4L1n5F7chjg/572136f844df3879c4486374f5bc5092/image35.png" />
            
            </figure><p>Why do we see this impact? With DNS over UDP, the client simply makes a request to the resolver – no multi-step handshake is involved, as with TCP. So in this case, 1.1.1.1 is receiving these requests, but as shown above, there’s no path for the response to reach the client. Because it hasn’t received a response, the client retries the request, and this flood of retries is manifested as the spike seen in the graphs above.</p><p>However, as we see above, request volume for DNS over TCP, as well as DoH, DoT, and HTTP(S) (which all use TCP), falls to zero during the shutdowns. The lack of a path back to the client means that the <a href="https://www.geeksforgeeks.org/tcp-3-way-handshake-process/">TCP 3-way handshake</a> can’t complete, and thus we don’t see DNS requests over these protocols.</p><p>In looking at 1.1.1.1 Resolver request volume from Syria for popular social media and messaging applications, we can see traffic for facebook.com most closely matches the spikes shown above. Removing facebook.com from the graph, we can also see similar, though more limited, increases for domains used by popular messaging applications WhatsApp, Signal, and Telegram. Facebook and WhatsApp are <a href="https://medialandscapes.org/country/syria/media/social-networks">reportedly</a> the most popular social media and messaging applications in Syria.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nmo0Zbae88h2VZhbRpd8f/85eaf702c9f638ce544d2b485114aa65/image18.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3gKoz2uKoz2kVIchd0C2vV/7a2a6107c35d3e1433d7d956e0af9fb6/image33.png" />
            
            </figure><p>Although we have focused on the analysis of traffic to Cloudflare’s DNS resolver, and the patterns seen within that traffic, it is also worth highlighting an interesting pattern observed in traffic to Cloudflare’s <a href="https://www.cloudflare.com/application-services/products/dns/">Authoritative DNS</a> platform. (<a href="https://www.cloudflare.com/en-gb/learning/dns/dns-server-types/">DNS resolvers</a> act as a middleman between clients, such as a laptop or phone, and an authoritative DNS server. <a href="https://www.cloudflare.com/en-gb/learning/dns/dns-server-types/">Authoritative DNS servers</a> contain information specific to the domain names they serve, including IP addresses and other types of records.)</p><p>The graph below shows bits/second traffic from Syria for Cloudflare’s <a href="https://www.cloudflare.com/application-services/products/dns/">authoritative DNS service</a> on June 13. (Similar patterns were observed during the other days when shutdowns occurred, but data volume limits the ability to create a graph showing an extended period of time.) In this graph, we can see that at the start of the shutdown (03:00 UTC), traffic rises sharply, effectively plateaus for the duration of the shutdown, and then returns to normal levels. We believe that the traffic pattern illustrated here could be the result of some local resolvers in Syria having the IP addresses for our authoritative DNS servers cached, and are making requests to them. The increased traffic level could be because they are retrying their queries after not receiving responses, but in a less aggressive fashion than the client applications driving the resolver traffic spikes shown above.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2fOqZ4ORPIY4znUD1HRb70/06d1cc0e24f13bbb23886eedea3a89a0/unnamed-2.png" />
            
            </figure><p>In summary, Syria appears to be implementing their Internet shutdowns not through filtering, but rather by simply not announcing their IP address space for the duration of the shutdown, thereby preventing any responses from returning to the originating requestor, whether client application, web browser, or local DNS resolver.</p>
    <div>
      <h3>Iraq</h3>
      <a href="#iraq">
        
      </a>
    </div>
    <p>On May 19, the Iraqi Ministry of Communication <a href="https://moc.gov.iq/?article=767">posted an update</a> that stated (translated) <i>“The Ministry of Communications would like to note that the Internet service will be cut off for two hours during the general exams for intermediate studies, from six in the morning until eight in the morning, based on higher directives and at the request of the Ministry of Education.”</i> The post came nearly a year after the Iraqi Ministry of Communication <a href="https://www.kurdistan24.net/en/story/31453-Iraq%E2%80%99s-communication-ministry-refuses-to-enforce-internet-blackout-for-final-exams">refused a request from the Ministry of Education to shut down the Internet</a> during the baccalaureate exams as part of efforts to prevent cheating. On May 20, the Iraqi Ministry of Education <a href="https://www.facebook.com/Iraq.Ministry.of.Education/posts/pfbid07ny6LazyvGJED37iCmRkk9h9rNPWeEPtANVu8vaL8gknoaBmwgmVZX9a7LkSbhy2l">posted the schedule</a> for the upcoming set of exams to its Facebook page.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Wb00a9bDq0VYe1zhqFHAR/22aecee54a71bfd430789e976315b040/Screenshot-2024-06-21-at-11.07.18.png" />
            
            </figure><p>Iraq has a much richer network service provider environment than Syria does, with <a href="https://radar.cloudflare.com/routing/iq#a-ses-registered-in-iraq">over 150</a> <a href="https://www.cloudflare.com/en-gb/learning/network-layer/what-is-an-autonomous-system/">autonomous systems (ASNs)</a> registered in the country and announcing IP address space, compared to just <a href="https://radar.cloudflare.com/routing/sy#a-ses-registered-in-syria">two</a> ASNs (both Syrian Telecom) in Syria announcing IP address space. Although traffic in Iraq is generally concentrated among the larger providers, shutdowns are rarely “complete” at a country level because not every autonomous system (network provider) in the country implements a shutdown. (This is due in part to the autonomous Kurdistan region in the north, which often implements similar shutdowns on their own schedule. Network providers in this region are included in Iraq’s country-level graphs.)</p><p>We can see this in a Cloudflare Radar traffic graph that shows the shutdowns at a country level, where traffic is dropping by around 87% during each multi-hour shutdown. In addition to the five networks also shown here (<a href="https://radar.cloudflare.com/as203214">AS203214 (HulumTele)</a>, <a href="https://radar.cloudflare.com/as199739">AS199739 (Earthlink)</a>, <a href="https://radar.cloudflare.com/as58322">AS58322 (Halasat)</a>, <a href="https://radar.cloudflare.com/as51684">AS51684 (Asiacell)</a>, and <a href="https://radar.cloudflare.com/as59588">AS59588 (Zainas)</a>), further analysis finds more than 30 where we observed a complete loss of traffic during the shutdowns, with a number of them downstream of these providers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6YkIihynKuDtI63g32YJ0u/3c6f135f0e890145f3b0be6ba6659553/image45.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6W7vWmG8ivneqIJavrTLCU/1b0248818877d395987fca076af52ce6/image28.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ezNpiALAh32m1LyOng14j/ecb5d055692266f0f9c758976204baae/image38.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bRSpefwBNbcdvMQ2kOkO1/1b072bfd9ffc362923d9a2db06908068/image8-3.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2WqSWExs5iBdfDlGTqjXlK/63c853a1b688e0bd4a328881a2b3c280/image22.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5iUNjwfTlp0hz8lqgEnaAV/caf0caa06f14d277ce9a874b0fb9738d/image44.png" />
            
            </figure><p>In contrast to Syria, the changes to announced IP address space during the shutdowns are much less severe in Iraq. Several of the shutdowns are correlated with a drop of ~20-25% in announced IPv4 address space, while a few others saw a drop closer to just 2%.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/kAYLZYHK3wnfSh5KNfc33/309f01763d1815a38bf55e1aa42e9725/image51.png" />
            
            </figure><p>At an ASN level, the changes in announced address space were mixed – <a href="https://radar.cloudflare.com/routing/as59588">AS59588 (Zainas)</a>, <a href="https://radar.cloudflare.com/routing/as199739">AS199739 (Earthlink)</a>, and <a href="https://radar.cloudflare.com/routing/as51684">AS51684 (Asiacell)</a> experienced a significant loss, while <a href="https://radar.cloudflare.com/routing/as203214">AS203214 (HulumTele)</a> and <a href="https://radar.cloudflare.com/routing/as58322">AS58322 (Halasat)</a> experienced little to no change.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3HoWb2U5b8uBf9aZX5PJAb/e4ae72f055f2f9fd627251769b16d6bd/image13-3.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1jJpJYqXgDGeb8So4bMWee/34ba179a6e9fff7b311e057134c34d97/image50.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6sVOgW0xr4mwXUU3id092u/6c198aeb32b840f1142e84ec06822f79/image39.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5H64bnOria4mzCtW7VEs40/178e8f4e060b3be172c97b762c334207/image20.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6k1oSwvj9pwTyHD9OMesEV/a28168e19e0f21ce771166872153c9a4/image24.png" />
            
            </figure><p>Similar to Syria, we can also look at 1.1.1.1 resolver traffic data to better understand how the shutdowns are being implemented. The country-level graphs below suggest that UDP traffic patterns are not visibly changing, suggesting that responses from the resolver are, in fact, getting back to the clients. However, this likely isn’t the case, and such a conclusion is at least in part an artifact of the graph’s time frame and hourly granularity, as well as the inclusion of resolver traffic from Kurdish network providers (ASNs). The shutdowns are more clearly evident in the DNS-over-TCP and DNS-over-HTTPS graphs below, as well as in the graph for HTTP(S) request traffic (both mobile &amp; desktop), which is also TCP-based. In these graphs, the troughs on days that shutdowns occurred generally dip lower than those on the days that the Internet remained available.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/o1NrRFgAMr5FLKa2j4Ktb/b85d5ccf7a9cd7e560229b58130fc91e/image41.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4pD02P65uRmaiL73oLzAm9/44ab4e57a7e90ce367e2fc3c3f9e467a/image3-8.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2AscDvWNoxmwWeTGcKyDds/a36e4278a86f651bb75f3a789bb4840b/image27.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/51pfAyJ519aXX0w6KYCm5I/d055dafda12380a3b3b46e748e198ded/image32.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3XlAjHrSgzngJulJRIP36J/b5e8b3058c5441e8dcdff18b347c64e8/image16-1.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4zO7k4idsNJRtohBPYgXhD/676ddfe345adfcab86ea380bdcfa7e54/image43.png" />
            
            </figure><p>In looking at authoritative DNS traffic from Iraq during a shutdown (for June 13 as an example day, as above), we see evidence of a decline in traffic during the time the shutdown occurs.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5LQtyB4CbJBlUjw2PrIBjI/76cdc488ae2d8a0cc46c4441f9ac08ee/image34.png" />
            
            </figure><p>The decline in authoritative DNS traffic is more evident at an ASN level, such as in the graph below for AS203214 (Hulum), effectively confirming that UDP traffic is not getting through here either.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/14fZ29LuL3wsJpTnTVbeku/466dd9542998c77af044647e5e3d49ac/image48.png" />
            
            </figure><p>Considering the traffic, 1.1.1.1 Resolver, and authoritative DNS observations reviewed here, it suggests that the Internet shutdowns taking place in Iraq are more complex than Syria’s, as it appears that both UDP and TCP traffic are unable to egress from impacted network providers. As not all impacted network providers are showing a complete loss of announced IP address space during the shutdowns, Iraq is taking a different approach to disrupting Internet connectivity. Although analysis of our data doesn’t provide a definitive conclusion, there are several likely options, and network providers in the country may be combining several. These options revolve around:</p><ol><li><p><b>IP:</b> Block packets from reaching IP addresses. This may be done by withdrawing prefix announcements from the routing table (a brute force approach) or by blocking access to specific IP addresses, such as those associated with a specific application or service (a more surgical approach).</p></li><li><p><b>Connection:</b> Block connections based on <a href="https://www.cloudflare.com/learning/ssl/what-is-sni/">SNI</a>/HTTP headers, or other application data. If a network or on-path device is able to observe the server name (or other relevant headers/data), then the connection can be terminated.</p></li><li><p><b>DNS:</b> Operators of private or ‘internal’ DNS resolvers, offered by ISPs and enterprise environments for use by their own users, can apply content restrictions, blocking the resolution of hostnames associated with websites and other applications.</p></li></ol><p>The consequences of these options are covered in more detail <a href="/consequences-of-ip-blocking">in a blog post</a>. In addition, applying them at common network chokepoints, such as <a href="https://iraqixp.com/">AS212330 (IRAQIXP)</a> or <a href="https://radar.cloudflare.com/routing/as208293">AS208293</a> (<a href="https://alsalam.gov.iq/">AlSalam State Company</a>, associated with the Iraqi Ministry of Communications), can disrupt connectivity at multiple downstream ISPs, without those providers necessarily having to take action themselves.</p>
    <div>
      <h3>Algeria</h3>
      <a href="#algeria">
        
      </a>
    </div>
    <p>As we noted in blog posts in <a href="/syria-sudan-algeria-exam-internet-shutdown">2022</a> and <a href="/exam-internet-shutdowns-iraq-algeria">2023</a>, Algeria has a history of disrupting Internet connectivity during Baccalaureate exams. This has been taking place since <a href="https://www.bbc.com/news/world-africa-44557028">2018</a>, following widespread cheating in 2016 that saw questions leaked online both before and during tests. On March 13, the Algerian Ministry of Education <a href="https://www.aps.dz/en/health-science-technology/51394-ministry-of-education-announces-dates-for-middle-school-and-high-school-final-exams">announced</a> that the Baccalaureate exams would be held June 9-13. As expected, Internet disruptions were observed both country-wide and at a network level. Similar to previous years, two disruptions were observed each day. The first one began at 08:00 local time (07:00 UTC), and except for June 9, lasted three hours, ending at 11:00 local time (10:00 UTC). (On June 9, it lasted until 13:00 local time (12:00 UTC).) The second one began between 14:00-14:30 local time (13:00-13:30 UTC), and lasted until 16:00-17:00 local time (15:00-16:00 UTC) – the end time varied by day.</p><p>As seen in the graphs below, the impact to traffic was fairly nominal, suggesting that wide scale Internet shutdowns similar to those seen in Syria were not being implemented. While this is in line with 2023’s <a href="https://x.com/TheAlgiersPost/status/1535917324485656576">pronouncement</a> by the Minister of Education that there would be no Internet shutdown on exam days, <a href="https://x.com/search?f=live&amp;q=algeria%20exam%20until%3A2024-06-13%20since%3A2024-06-09&amp;src=typed_query">a number of posts on X</a> complained of broader cuts to Internet connectivity.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5b4bFM6GwqaslpXJNGxCSu/063e3729ef3197b88dfe367cc91fc0f4/image14-2.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5I0HpVYB36DNHhycNbRUjp/ecc17427d720b5ca90ed554946924dfc/image17.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1pOO1TV09EyRfv5hRlV3sX/950600eac0cd068804f2ce990e93e112/image25.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1brvTsdkG8vlUDB7CNQ6vL/2c4ba312f499adc4893fb8afa5378ca6/image2-6.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4mOPVdahSJQ0n6gzs7cGSL/86fa9d9e06718da3fa7a291c055773b3/image37.png" />
            
            </figure><p>Similar to the analysis above of the shutdowns in Syria and Iraq, we can also review changes to announced IP address space to better understand how connectivity was being disrupted. In this case, as the graphs below show, no meaningful changes to announced IPv4 address space were observed during the days the Baccalaureate exams were given. As such, the observed drops in traffic were not caused by routing changes.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7l6qUEMRcrdvShbzdd8y5I/5696832adbe82ab92cef0265df11bdc9/image52.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3gEPeuP0l9czg2nipM8Jjb/8f49d22da1c031d1db5ed2577ec8462f/image21.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6m0XeCwN3lYLot6AutTNaS/7b0034f1106d4e99eaaec28b1cd8e9a5/image6-3.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3oiRlajqzZGBY5Io0dbXkj/234f0120c0002f7ffcf69b6a8bbe0038/image40.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Kv6LndNtCYpsJUhUPF8UT/ae8359550b84298abd5d0994b475cd76/AD_4nXdmsCY4R4OwP5lh6E6PgQdXYDxwUTWl8o5A-sRdNCSBmRNe0Zq7-OlWczYH8tr8q75P8WLqOsd3Po-03gykFfJDJNgqXcOkX4i3KuVp73q1GW7aLXeTNAzkK7yU" />
            
            </figure><p>In the HTTP(S) request traffic graph below, the twice-daily disruptions are highlighted, with the morning one appearing as a nominal drop in traffic, and the afternoon one causing a more severe decline. (The graph shows request traffic aggregated at a country level, but the graphs for the ASNs listed above also show similar patterns.)</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7afFEaI4r2fREfV0o54f2/68f9fe97d63624a99e0868d0de5df096/image19.png" />
            
            </figure><p>In addition, similar patterns are observed in 1.1.1.1 resolver traffic at a country and ASN level, but only for DNS over TCP, DNS over TLS, and DNS over HTTPS, all of which leverage TCP. In the graph below showing only resolver traffic over UDP, there’s no clear evidence of disruptions. However, in the graph that shows resolver traffic over HTTPS, TCP, and TLS, a slight perturbation is visible in the morning, as traffic begins to rise for the day, and a sharper decrease is visible in the afternoon, with both disruptions aligning with the twice daily drops in traffic discussed above.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ydmjTs8hCqfZ4W969jSSy/4120e9a30b2c0c2015c6f097c1f8ee8b/image31.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7qSu9orpg29sE5T2cawABq/1ec368088682d0f3d34f1287b3ed7d03/image7-2.png" />
            
            </figure><p>These observations support the conjecture that the Algerian government is likely taking a more nuanced approach to restricting access to content, interfering in some fashion with TCP-based traffic. The conjecture is also supported by an internal tool that helps to understand connection tampering that is based on research co-designed and developed by members of the <a href="https://research.cloudflare.com/">Cloudflare Research</a> team. We will be launching insights into TCP connection tampering on Cloudflare Radar later in 2024 and, in the meantime, technical details can be found in the peer-reviewed paper titled <a href="https://research.cloudflare.com/publications/SundaraRaman2023/">Global, Passive Detection of Connection Tampering</a>.</p><p>The graph below, taken from the internal tool, highlights observed TCP connection tampering in connections from Algeria during the week that the Baccalaureate exams took place. While some baseline level of post-ACK and post-PSH tampering is consistently visible, we see significant increases in post-ACK twice a day during the exam period, at the times that align with the shifts in traffic discussed above. Technical descriptions of post-ACK and post-PSH tampering can be found in the <a href="https://developers.cloudflare.com/radar/glossary/#tcp-resets-and-timeouts">Cloudflare Radar glossary</a>, but in short, tampering post-ACK means an established TCP connection to Cloudflare’s server has been abruptly ended by one or more RST packets <i>before</i> the server sees data packets. Although clients do use RSTs, clients are more likely to close connections with a FIN (as specified by the <a href="https://datatracker.ietf.org/doc/html/rfc9293">RFC</a>). The RST method can also be used by middleboxes that  (i) sees the data packet, then (ii) drops the data packet, then (iii) sends an RST to the server to force the server to close the connection (and very likely another RST to the client too for the same reason). Tampering post-PSH means that something on the path, like a middlebox, (i) saw something it didn't like on an established connection, then (ii) permitted the data to pass but then, (iii) it sends the RST to force endpoints to close the connection.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/75i4amnWWtCkUaMhftCllH/a612735891aef71331339b9164171d48/image11-1.png" />
            
            </figure><p>Looking beyond Cloudflare-sourced data, aggregated test results from the <a href="https://ooni.org/">Open Observatory of Network Interference (OONI)</a> also show evidence of anomalous behavior. Using <a href="https://ooni.org/install/">OONI Probe</a>, a mobile and desktop app, can probe for potential blocking of websites, instant messaging apps, and censorship circumvention tools. Examining test results from users in Algeria for popular messaging platforms <a href="https://explorer.ooni.org/chart/mat?probe_cc=DZ&amp;since=2024-06-01&amp;until=2024-06-15&amp;time_grain=day&amp;axis_x=measurement_start_day&amp;test_name=whatsapp">WhatsApp</a>, <a href="https://explorer.ooni.org/chart/mat?probe_cc=DZ&amp;since=2024-06-01&amp;until=2024-06-15&amp;time_grain=day&amp;axis_x=measurement_start_day&amp;test_name=telegram">Telegram</a>, <a href="https://explorer.ooni.org/chart/mat?probe_cc=DZ&amp;since=2024-06-01&amp;until=2024-06-15&amp;time_grain=day&amp;axis_x=measurement_start_day&amp;test_name=signal">Signal</a>, and <a href="https://explorer.ooni.org/chart/mat?probe_cc=DZ&amp;since=2024-06-01&amp;until=2024-06-15&amp;time_grain=day&amp;axis_x=measurement_start_day&amp;test_name=facebook_messenger">Facebook Messenger</a> for the first two weeks of June, we clearly see the appearance of test results marked as “Anomaly” starting on June 9. (OONI defines “Anomaly” results as “<i>Measurements that provided signs of potential blocking</i>”.) OONI <a href="https://ooni.org/nettest/tor/">Tor test</a> <a href="https://explorer.ooni.org/chart/mat?probe_cc=DZ&amp;since=2024-06-01&amp;until=2024-06-20&amp;time_grain=day&amp;axis_x=measurement_start_day&amp;test_name=tor">results</a> also show a similar “Anomaly” pattern. Anomalous traffic patterns are also visible for <a href="https://transparencyreport.google.com/traffic/overview?hl=en&amp;fraction_traffic=start:1717200000000;product:19;region:DZ;end:1718495999999&amp;lu=fraction_traffic">Google Web Search</a>, <a href="https://transparencyreport.google.com/traffic/overview?hl=en&amp;fraction_traffic=start:1717200000000;product:21;region:DZ;end:1718495999999&amp;lu=fraction_traffic">YouTube</a>, and <a href="https://transparencyreport.google.com/traffic/overview?hl=en&amp;fraction_traffic=start:1717200000000;product:6;region:DZ;end:1718495999999&amp;lu=fraction_traffic">GMail</a>.</p><p>Although the analysis of these observations and data sets doesn’t provide us with specific details around exactly how the observed Internet disruptions are being implemented, it strongly supports the supposition that network providers in Algeria are, in some fashion, interfering with TCP connections, but not blocking them outright nor shutting down their networks completely. Given that popular messaging platforms, Google properties, Cloudflare’s 1.1.1.1 DNS resolver, and some number of Cloudflare customer sites all appear to be impacted, it suggests that a list of hostnames are being targeted for disruption/interference, <a href="/consequences-of-ip-blocking">either by the SNI or the destination IP address</a>.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Perhaps recognizing the broad negative impact that brute-force nationwide Internet shutdowns have as a response to cheating on exams, some governments appear to be turning to more nuanced techniques, such as content blocking or connection tampering. However, because these are widely applied as well, they are arguably just as disruptive as a full nationwide Internet shutdown. The cause of full shutdowns, such as those seen in Syria, are arguably easier to diagnose than the disruptions to connectivity seen in Iraq and Algeria, which appear to use approaches that are hard to specifically identify from the outside.</p><p>Visit <a href="https://radar.cloudflare.com/">Cloudflare Radar</a> for additional insights around these, and other, Internet disruptions. Follow us on social media at <a href="https://x.com/CloudflareRadar">@CloudflareRadar</a> (X), <a href="https://noc.social/@cloudflareradar">noc.social/@cloudflareradar</a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com">radar.cloudflare.com</a> (Bluesky), or contact us via email.</p> ]]></content:encoded>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Internet Traffic]]></category>
            <category><![CDATA[Outage]]></category>
            <category><![CDATA[Internet Shutdown]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Consumer Services]]></category>
            <guid isPermaLink="false">7I3aMukuPURTotjQ1Njiei</guid>
            <dc:creator>David Belson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Traffic anomalies and notifications with Cloudflare Radar]]></title>
            <link>https://blog.cloudflare.com/traffic-anomalies-notifications-radar/</link>
            <pubDate>Tue, 26 Sep 2023 13:00:37 GMT</pubDate>
            <description><![CDATA[ Cloudflare Radar now displays country and ASN traffic anomalies in the Outage Center as they are detected, as well as publishing anomaly information via API. We are also launching Radar notifications, enabling users to subscribe to notifications about traffic anomalies ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64RU7fXZM4tF0ZKhlNAqcY/36a614227aeb00168ea9e08681f2638b/image13-3.png" />
            
            </figure><p>We launched the <a href="https://radar.cloudflare.com/outage-center">Cloudflare Radar Outage Center (CROC)</a> during Birthday Week 2022 as a way of keeping the community up to date on Internet disruptions, including outages and shutdowns, visible in Cloudflare’s traffic data. While some of the entries have their genesis in information from social media posts made by local telecommunications providers or civil society organizations, others are based on an internal traffic anomaly detection and alerting tool. Today, we’re adding this alerting feed to Cloudflare Radar, showing country and network-level traffic anomalies on the CROC as they are detected, as well as making the feed available via <a href="https://developers.cloudflare.com/api/operations/radar-get-traffic-anomalies">API</a>.</p><p>Building on this new functionality, as well as the <a href="/route-leak-detection-with-cloudflare-radar/">route leaks</a> and <a href="/bgp-hijack-detection/">route hijacks insights</a> that we recently launched <a href="https://radar.cloudflare.com/routing">on Cloudflare Radar</a>, we are also launching new Radar notification functionality, enabling you to subscribe to notifications about traffic anomalies, confirmed Internet outages, route leaks, or route hijacks. Using the <a href="https://dash.cloudflare.com/">Cloudflare dashboard’s</a> existing notification functionality, users can set up notifications for one or more countries or autonomous systems, and receive notifications when a relevant event occurs. Notifications may be sent via e-mail or webhooks — the available delivery methods <a href="https://developers.cloudflare.com/notifications/">vary according to plan level</a>.</p>
    <div>
      <h3>Traffic anomalies</h3>
      <a href="#traffic-anomalies">
        
      </a>
    </div>
    <p>Internet traffic generally follows a fairly regular pattern, with daily peaks and troughs at roughly the same volumes of traffic. However, while weekend traffic patterns may look similar to weekday ones, their traffic volumes are generally different. Similarly, holidays or national events can also cause traffic patterns and volumes to differ significantly from “normal”, as people shift their activities and spend more time offline, or as people turn to online sources for information about, or coverage of, the event. These traffic shifts can be newsworthy, and we have covered some of them in past Cloudflare blog posts (<a href="/how-the-coronation-of-king-charles-iii-affected-internet-traffic/">King Charles III coronation</a>, <a href="/easter-passover-ramadan-internet-trends-2023/">Easter/Passover/Ramadan</a>, <a href="/how-the-brazilian-presidential-elections-affected-internet-traffic/">Brazilian presidential elections</a>).</p><p>However, as you also know from reading our <a href="/tag/outage/">blog posts</a> and following <a href="https://twitter.com/CloudflareRadar">Cloudflare Radar</a> on social media, it is the more drastic drops in traffic that are a cause for concern. Some are the result of infrastructure damage from severe weather or a natural disaster like an earthquake and are effectively unavoidable, but getting timely insights into the impact of these events on Internet connectivity is valuable from a communications perspective. Other traffic drops have occurred when an authoritarian government orders mobile Internet connectivity to be shut down, or shuts down all Internet connectivity nationwide. Timely insights into these types of anomalous traffic drops are often critical from a human rights perspective, as Internet shutdowns are often used as a means of controlling communication with the outside world.</p><p>Over the last several months, the Cloudflare Radar team has been using an internal tool to identify traffic anomalies and post alerts for followup to a dedicated chat space. The companion blog post <a href="/detecting-internet-outages"><i>Gone Offline: Detecting Internet Outages</i></a> goes into deeper technical detail about our traffic analysis and anomaly detection methodologies that power this internal tool.</p><p>Many of these internal traffic anomaly alerts ultimately result in Outage Center entries and Cloudflare Radar social media posts. Today, we’re extending the <a href="https://radar.cloudflare.com/outage-center">Cloudflare Radar Outage Center</a> and publishing information about these anomalies as we identify them. As shown in the figure below, the new <b>Traffic anomalies</b> table includes the type of anomaly (location or ASN), the entity where the anomaly was detected (country/region name or autonomous system), the start time, duration, verification status, and an “Actions” link, where the user can view the anomaly on the relevant entity traffic page or subscribe to a notification. (If manual review of a detected anomaly finds that it is present in multiple Cloudflare traffic datasets and/or is visible in third-party datasets, such as Georgia Tech’s <a href="https://ioda.live/">IODA</a> platform, we will mark it as verified. Unverified anomalies may be false positives, or related to Netflows collection issues, though we endeavor to minimize both.)</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4avPigfW9GjhqKlH4U9XI2/35b3761b52b9046ed4fa03c6b68db4c1/pasted-image-0-6.png" />
            
            </figure><p>In addition to this new table, we have updated the <a href="https://radar.cloudflare.com/outage-center">Cloudflare Radar Outage Center</a> map to highlight where we have detected anomalies, as well as placing them into a broader temporal context in a new timeline immediately below the map. Anomalies are represented as orange circles on the map, and can be hidden with the toggle in the upper right corner. Double-bordered circles represent an aggregation across multiple countries, and zooming in to that area will ultimately show the number of anomalies associated with each country that was included in the aggregation. Hovering over a specific dot in the timeline displays information about the outage or anomaly with which it is associated.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3fqllfaS5B100bVx1f58w0/d1c4f33c47d8d9271f4c3cf77d0688a3/pasted-image-0--1--6.png" />
            
            </figure><p>Internet outage information has been available via the <a href="https://developers.cloudflare.com/api/operations/radar-get-annotations-outages">Radar API</a> since we launched the Outage Center and API in September 2022, and traffic anomalies are now available through a <a href="https://developers.cloudflare.com/api/operations/radar-get-traffic-anomalies">Radar API endpoint</a> as well. An example traffic anomaly API request and response are shown below.</p><p><b>Example request:</b></p>
            <pre><code>curl --request GET \ --url https://api.cloudflare.com/client/v4/radar/traffic_anomalies \ --header 'Content-Type: application/json' \ --header 'X-Auth-Email: '</code></pre>
            <p><b>Example response:</b></p>
            <pre><code>{
  "result": {
    "trafficAnomalies": [
      {
        "asnDetails": {
          "asn": "189",
          "locations": {
            "code": "US",
            "name": "United States"
          },
          "name": "LUMEN-LEGACY-L3-PARTITION"
        },
        "endDate": "2023-08-03T23:15:00Z",
        "locationDetails": {
          "code": "US",
          "name": "United States"
        },
        "startDate": "2023-08-02T23:15:00Z",
        "status": "UNVERIFIED",
        "type": "LOCATION",
        "uuid": "55a57f33-8bc0-4984-b4df-fdaff72df39d",
        "visibleInDataSources": [
          "string"
        ]
      }
    ]
  },
  "success": true
}</code></pre>
            
    <div>
      <h3>Notifications overview</h3>
      <a href="#notifications-overview">
        
      </a>
    </div>
    <p>Timely knowledge about Internet “events”, such as drops in traffic or routing issues, are potentially of interest to multiple audiences. Customer service or help desk agents can use the information to help diagnose customer/user complaints about application performance or availability. Similarly, network administrators can use the information to better understand the state of the Internet outside their network. And civil society organizations can use the information to inform action plans aimed at maintaining communications and protecting human rights in areas of conflict or instability. With the new notifications functionality also being launched today, you can subscribe to be notified about observed traffic anomalies, confirmed Internet outages, route leaks, or route hijacks, at a country or autonomous system level. In the following sections, we discuss how to subscribe to and configure notifications, as well as the information contained within the various types of notifications.</p>
    <div>
      <h4>Subscribing to notifications</h4>
      <a href="#subscribing-to-notifications">
        
      </a>
    </div>
    <p>Note that you need to log in to the <a href="https://dash.cloudflare.com/">Cloudflare dashboard</a> to subscribe to and configure notifications. No purchase of Cloudflare services is necessary — just a verified email address is required to set up an account. While we would have preferred to not require a login, it enables us to take advantage of Cloudflare’s existing notifications engine, allowing us to avoid having to dedicate time and resources to building a separate one just for Radar. If you don’t already have a Cloudflare account, visit <a href="https://dash.cloudflare.com/sign-up">https://dash.cloudflare.com/sign-up</a> to create one. Enter your username and a unique strong password, click “Sign Up”, and follow the instructions in the verification email to activate your account. (Once you’ve activated your account, we also suggest activating <a href="https://www.cloudflare.com/learning/access-management/what-is-two-factor-authentication/">two-factor authentication (2FA)</a> as an additional security measure.)</p><p>Once you have set up and activated your account, you are ready to start creating and configuring notifications. The first step is to look for the Notifications (bullhorn) icon – the presence of this icon means that notifications are available for that metric — in the Traffic, Routing, and Outage Center sections on Cloudflare Radar. If you are on a country or ASN-scoped traffic or routing page, the notification subscription will be scoped to that entity.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2pthIoKtBxP566BiMTmIo7/f59c76a18de0f8d3f4f3ce8f020edbfb/image3-23.png" />
            
            </figure><p><i>Look for this icon in the Traffic, Routing, and Outage Center sections of Cloudflare Radar to start setting up notifications.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7BzESdUYejnGYbyOELT3xb/2f71cc74d0f760efe24ead5ef5aef5d9/pasted-image-0--2--4.png" />
            
            </figure><p><i>In the Outage Center, click the icon in the “Actions” column of an Internet outages table entry to subscribe to notifications for the related location and/or ASN(s). Click the icon alongside the table description to subscribe to notifications for all confirmed Internet outages.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JzDVk3W9GXmFs1EEMpI1y/64d825564b01ced81010d08c8f8b7e30/pasted-image-0--3--6.png" />
            
            </figure><p><i>In the Outage Center, click the icon in the “Actions” column of a Traffic anomalies table entry to subscribe to notifications for the related entity. Click the icon alongside the table description to subscribe to notifications for all traffic anomalies.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/USbTwPiy87LABK9yzvdAD/1bd91ad4a70fc53a1a2c7f51a400800d/pasted-image-0--4--2.png" />
            
            </figure><p><i>On country or ASN traffic pages, click the icon alongside the description of the traffic trends graph to subscribe to notifications for traffic anomalies or Internet outages impacting the selected country or ASN.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7uXs607M5vob0cRMFuyPrG/638d601eecfdc6b89b90770e9482584f/pasted-image-0--5--2.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/266gINGnVLwOGMcogtGoql/4ad8994d134b1a71365d07a50fb725d1/pasted-image-0--6--1.png" />
            
            </figure><p><i>On country or ASN routing pages, click the icon alongside the description to subscribe to notifications for route leaks or origin hijacks related to the selected country or ASN.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7LXt4HacyxqtpJbgj6sQ6T/b8ec62dbc5a9447ed37ace2844ceabed/pasted-image-0--7--1.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2SWPlKRV9xUQxyXsfqbDBj/b4f0253d65e3aec230b3f5ce7e126b1d/pasted-image-0--8--1.png" />
            
            </figure><p><i>Within the Route Leaks or Origin Hijacks tables on the routing pages, click the icon in a table entry to subscribe to notifications for route leaks or origin hijacks for referenced countries and/or ASNs.</i> </p><p>After clicking a notification icon, you’ll be taken to the Cloudflare login screen. Enter your username and password (and 2FA code if required), and once logged in, you’ll see the Add Notification page, pre-filled with the key information passed through from the referring page on Radar, including relevant locations and/or ASNs. (If you are already logged in to Cloudflare, then you’ll be taken directly to the Add Notification page after clicking a notification icon on Radar.) On this page, you can name the notification, add an optional description, and adjust the location and ASN filters as necessary. Enter an email address for notifications to be sent to, or select an established webhook destination (if you have webhooks enabled on your account).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3WoiB73UcjTrzvt3KMeOgm/2146fb68b612f8d82897e8dfd46f66c3/pasted-image-0--9--1.png" />
            
            </figure><p>Click “Save”, and the notification is added to the Notifications Overview page for the account.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2g3sIazFDddiBRi12oMzYr/00100d832d84fc0d54379341ee575692/pasted-image-0--10--1.png" />
            
            </figure><p>You can also create and configure notifications directly within Cloudflare, without starting from a link on a Radar page. To do so, log in to Cloudflare, and choose “Notifications” from the left side navigation bar. That will take you to the Notifications page shown below. Click the “Add” button to add a new notification.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5mw4IUl2YxYOXQdM3pfcrU/0d4979bf37b64a2ffe29ad1e084b3afc/pasted-image-0--11--1.png" />
            
            </figure><p>On the next page, search for and select “Radar” from the list of Cloudflare products for which notifications are available.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6dq1Kt2F5XIJlV3X1r6Jyw/beb22e11b950f73130eec41c8b612350/pasted-image-0--12--1.png" />
            
            </figure><p>On the subsequent “Add Notification” page, you can create and configure a notification from scratch. Event types can be selected in the “Notify me for:” field, and both locations and ASNs can be searched for and selected within the respective “Filtered by (optional)” fields. Note that if no filters are selected, then notifications will be sent for <b>all</b> events of the selected type(s). Add one or more emails to send notifications to, or select a webhook target if available, and click “Save” to add it to the list of notifications configured for your account.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/bbIKwireM6CKWHJS92IvU/2770fd9d6c355c3e9e1c7010528e8053/pasted-image-0--13-.png" />
            
            </figure><p>It is worth mentioning that advanced users can also create and configure notifications through the <a href="https://developers.cloudflare.com/api/operations/notification-policies-create-a-notification-policy">Cloudflare API Notification policies endpoint</a>, but we will not review that process within this blog post.</p>
    <div>
      <h4>Notification messages</h4>
      <a href="#notification-messages">
        
      </a>
    </div>
    <p>Example notification email messages are shown below for the various types of events. Each contains key information like the type of event, affected entities, and start time — additional relevant information is included depending on the event type. Each email includes both plaintext and HTML versions to accommodate multiple types of email clients. (Final production emails may vary slightly from those shown below.)</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4m9llI9iUghxk9WNmeXs5c/9c03b0fa19da713b02d05e3828d8e338/pasted-image-0--14-.png" />
            
            </figure><p><i>Internet outage notification emails include information about the affected entities, a description of the cause of the outage, start time, scope (if available), and the type of outage (Nationwide, Network, Regional, or Platform), as well as a link to view the outage in a Radar traffic graph.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6ojcoXp3rsEhPY5QvrzmLu/ae551c2efb6a8dc2424c8b5f6b8a1e13/pasted-image-0--15-.png" />
            
            </figure><p><i>Traffic anomaly notification emails simply include information about the affected entity and a start time, as well as a link to view the anomaly in a Radar traffic graph.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4odkGwBAyUklNM3EpEYjlD/a808cd14489ebba0eb9ca463bfce3e2c/pasted-image-0--16-.png" />
            
            </figure><p><i>BGP hijack notification emails include information about the hijacking and victim ASNs, affected IP address prefixes, the number of BGP messages (announcements) containing leaked routes, the number of peers announcing the hijack, detection timing, a confidence level on the event being a true hijack, and relevant tags, as well as a link to view details of the hijack event on Radar.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5xjHBbXtCiOKWEhgS4bfpU/b3ab59dc514a70dfdeaae820e10fbb44/pasted-image-0--17-.png" />
            
            </figure><p><i>BGP route leak notification emails include information about the AS that the leaked routes were learned from, the AS that leaked the routes, the AS that received and propagated the leaked routes, the number of affected prefixes, the number of affected origin ASes, the number of BGP route collector peers that saw the route leak, and detection timing, as well as a link to view details of the route leak event on Radar.</i></p><p>If you are sending notifications to webhooks, you can integrate those notifications into tools like Slack. For example, by following the directions in <a href="https://api.slack.com/messaging/webhooks">Slack’s API documentation</a>, creating a simple integration took just a few minutes and results in messages like the one shown below.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6uMXY1xUfZdoyX2u1J5xWO/87ccc9c307304e78a698c9dfb060cc2a/pasted-image-0--18-.png" />
            
            </figure>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Cloudflare’s unique perspective on the Internet provides us with near-real-time insight into unexpected drops in traffic, as well as potentially problematic routing events. While we’ve been sharing these insights with you over the past year, you had to visit Cloudflare Radar to figure out if there were any new “events”. With the launch of notifications, we’ll now automatically send you information about the latest events that you are interested in.</p><p>We encourage you to visit Cloudflare Radar to familiarize yourself with the information we publish about <a href="https://radar.cloudflare.com/outage-center">traffic anomalies</a>, <a href="https://radar.cloudflare.com/outage-center">confirmed Internet outages</a>, <a href="https://radar.cloudflare.com/routing">BGP route leaks</a>, and <a href="https://radar.cloudflare.com/routing">BGP origin hijacks</a>. Look for the notification icon on the relevant graphs and tables on Radar, and go through the workflow to set up and subscribe to notifications. (And don’t forget to sign up for a <a href="https://dash.cloudflare.com/">Cloudflare</a> account if you don’t have one already.) Please <a>send us feedback</a> about the notifications, as we are constantly working to improve them, and let us know how and where you’ve integrated Radar notifications into your own tools/workflows/organization.</p><p>Follow Cloudflare Radar on social media at <a href="https://twitter.com/CloudflareRadar">@CloudflareRadar</a> (Twitter), <a href="https://noc.social/@cloudflareradar">https://noc.social/@cloudflareradar</a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com">radar.cloudflare.com</a> (Bluesky).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bnvOiFBE4NEjULFappDQN/9f9df4b7ee7c5f74ac242e00d77344cc/Announcement.png" />
            
            </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Outage]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Internet Traffic]]></category>
            <category><![CDATA[Notifications]]></category>
            <guid isPermaLink="false">4OjzZwFN2RgWm8cgWlrPty</guid>
            <dc:creator>David Belson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare Radar's new BGP origin hijack detection system]]></title>
            <link>https://blog.cloudflare.com/bgp-hijack-detection/</link>
            <pubDate>Fri, 28 Jul 2023 13:00:26 GMT</pubDate>
            <description><![CDATA[ BGP origin hijacks allow attackers to intercept, monitor, redirect, or drop traffic destined for the victim's networks. We explain how Cloudflare built its BGP hijack detection system, from its design and implementation to its integration on Cloudflare Radar ]]></description>
            <content:encoded><![CDATA[ <p></p><p><a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">Border Gateway Protocol</a> (BGP) is the de facto inter-domain routing protocol used on the Internet. It enables networks and organizations to exchange reachability information for blocks of IP addresses (IP prefixes) among each other, thus allowing routers across the Internet to forward traffic to its destination. BGP was designed with the assumption that networks do not intentionally propagate falsified information, but unfortunately that’s not a valid assumption on today’s Internet.</p><p>Malicious actors on the Internet who control BGP routers can perform BGP hijacks by falsely announcing ownership of groups of IP addresses that they do not own, control, or route to. By doing so, an attacker is able to redirect traffic destined for the victim network to itself, and monitor and intercept its traffic. A BGP hijack is much like if someone were to change out all the signs on a stretch of freeway and reroute automobile traffic onto incorrect exits.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3eRfapZmJQLB67OmDnppNJ/29c5285c15fc25ad65a2b615b8abe131/image11.png" />
            
            </figure><p>You can learn more about <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">BGP</a> and <a href="https://www.cloudflare.com/learning/security/glossary/bgp-hijacking/">BGP hijacking</a> and its consequences in our learning center.</p><p>At Cloudflare, we have long been monitoring suspicious BGP anomalies internally. With our recent efforts, we are bringing BGP origin hijack detection to the <a href="https://radar.cloudflare.com/security-and-attacks">Cloudflare Radar</a> platform, sharing our detection results with the public. In this blog post, we will explain how we built our detection system and how people can use Radar and its APIs to integrate our data into their own workflows**.**</p>
    <div>
      <h2>What is BGP origin hijacking?</h2>
      <a href="#what-is-bgp-origin-hijacking">
        
      </a>
    </div>
    <p>Services and devices on the Internet locate each other using IP addresses. Blocks of IP addresses are called an IP prefix (or just prefix for short), and multiple prefixes from the same organization are aggregated into an <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">autonomous system</a> (AS).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1V7wyuIuBZwT9BV8uQjBRs/93167df81577e93c6e01dcae78883700/Screenshot-2023-07-26-at-18.26.17.png" />
            
            </figure><p>Using the BGP protocol, ASes announce which routes can be imported or exported to other ASes and routers from their routing tables. This is called the AS routing policy. Without this routing information, operating the Internet on a large scale would quickly become impractical: data packets would get lost or take too long to reach their destinations.</p><p>During a BGP origin hijack, an attacker creates fake announcements for a targeted prefix, falsely identifying an <a href="https://developers.cloudflare.com/radar/glossary/#autonomous-systems">autonomous systems (AS)</a> under their control as the origin of the prefix.</p><p>In the following graph, we show an example where <code>AS 4</code> announces the prefix <code>P</code> that was previously originated by <code>AS 1</code>. The receiving parties, i.e. <code>AS 2</code> and <code>AS 3</code>, accept the hijacked routes and forward traffic toward prefix <code>P</code> to <code>AS 4</code> instead.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4868WfpNphX6eCHwMpZKGr/9edb6690b917913bdfcbc361eadbbae3/image2-15.png" />
            
            </figure><p>As you can see, the normal and hijacked traffic flows back in the opposite direction of the BGP announcements we receive.</p><p>If successful, this type of attack will result in the dissemination of the falsified prefix origin announcement throughout the Internet, causing network traffic previously intended for the victim network to be redirected to the AS controlled by the attacker. As an example of a famous BGP hijack attack, in 2018 <a href="/bgp-leaks-and-crypto-currencies/">someone was able</a> to convince parts of the Internet to reroute traffic for AWS to malicious servers where they used DNS to redirect MyEtherWallet.com, a popular crypto wallet, to a hacked page.</p>
    <div>
      <h2>Prevention mechanisms and why they’re not perfect (yet)</h2>
      <a href="#prevention-mechanisms-and-why-theyre-not-perfect-yet">
        
      </a>
    </div>
    <p>The key difficulty in preventing BGP origin hijacks is that the BGP protocol itself does not provide a mechanism to validate the announcement content. In other words, the original BGP protocol does not provide any authentication or ownership safeguards; any route can be originated and announced by any random network, independent of its rights to announce that route.</p><p>To address this problem, operators and researchers have proposed the <a href="https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure">Resource Public Key Infrastructure (RPKI)</a> to store and validate prefix-to-origin mapping information. With RPKI, operators can prove the ownership of their network resources and create ROAs, short for Route Origin Authorisations, cryptographically signed objects that define which Autonomous System (AS) is authorized to originate a specific prefix.</p><p>Cloudflare <a href="/rpki/">committed to support RPKI</a> since the early days of the <a href="https://datatracker.ietf.org/doc/html/rfc6480">RFC</a>. With RPKI, IP prefix owners can store and share the ownership information securely, and other operators can validate BGP announcements by checking the prefix origin to the information stored on RPKI. Any hijacking attempt to announce an IP prefix with an incorrect origin AS will result in invalid validation results, and such invalid BGP messages will be discarded. This validation process is referred to as route origin validation (ROV).</p><p>In order to further advocate for RPKI deployment and filtering of RPKI invalid announcements, Cloudflare has been providing a RPKI test service, <a href="https://isbgpsafeyet.com/">Is BGP Safe Yet?</a>, allowing users to test whether their ISP filters RPKI invalid announcements. We also provide rich information with regard to the RPKI status of individual prefixes and ASes at <a href="https://rpki.cloudflare.com/">https://rpki.cloudflare.com/</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4MNPvwC6PpJCPQCIQhC8sl/82309707277d649b0b810ed6e3028947/image8-1.png" />
            
            </figure><p><b>However</b>, the effectiveness of RPKI on preventing BGP origin hijacks depends on two factors:</p><ol><li><p>The ratio of prefix owners register their prefixes on RPKI;</p></li><li><p>The ratio of networks performing route origin validation.</p></li></ol><p>Unfortunately, neither ratio is at a satisfactory level yet. As of today, July 27, 2023, only about 45% of the IP prefixes routable on the Internet are covered by some ROA on RPKI. The remaining prefixes are highly vulnerable to BGP origin hijacks. Even for the 45% prefix that are covered by some ROA, origin hijack attempts can still affect them due to the low ratio of networks that perform route origin validation (ROV). Based on our <a href="/rpki-updates-data/">recent study,</a> only 6.5% of the Internet users are protected by ROV from BGP origin hijacks.</p><p>Despite the benefits of RPKI and RPKI ROAs, their effectiveness in preventing BGP origin hijacks is limited by the slow adoption and deployment of these technologies. Until we achieve a high rate of RPKI ROA registration and RPKI invalid filtering, BGP origin hijacks will continue to pose a significant threat to the daily operations of the Internet and the security of everyone connected to it. Therefore, it’s also essential to prioritize developing and deploying BGP monitoring and detection tools to enhance the security and stability of the Internet's routing infrastructure.</p>
    <div>
      <h2>Design of Cloudflare’s BGP hijack detection system</h2>
      <a href="#design-of-cloudflares-bgp-hijack-detection-system">
        
      </a>
    </div>
    <p>Our system comprises multiple data sources and three distinct modules that work together to detect and analyze potential BGP hijack events: prefix origin change detection, hijack detection and the storage and notification module.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/67bYKbfT4zBONmISARpcta/8790aaad3f3db54ba012bd751dd393c9/image6-7.png" />
            
            </figure><p>The Prefix Origin Change Detection module provides the data, the Hijack Detection module analyzes the data, and the Alerts Storage and Delivery module stores and provides access to the results. Together, these modules work in tandem to provide a comprehensive system for detecting and analyzing potential BGP hijack events.</p>
    <div>
      <h3>Prefix origin change detection module</h3>
      <a href="#prefix-origin-change-detection-module">
        
      </a>
    </div>
    <p>At its core, the BGP protocol involves:</p><ol><li><p>Exchanging prefix reachability (routing) information;</p></li><li><p>Deciding where to forward traffic based on the reachability information received.</p></li></ol><p>The reachability change information is encoded in BGP update messages while the routing decision results are encoded as a route information base (RIB) on the routers, also known as the <a href="https://en.wikipedia.org/wiki/Routing_table">routing table</a>.</p><p>In our origin hijack detection system, we focus on investigating BGP <a href="https://datatracker.ietf.org/doc/html/rfc4271">update messages</a> that contain changes to the origin ASes of any IP prefixes. There are two types of BGP update messages that could indicate prefix origin changes: <b>announcements</b> and <b>withdrawals</b>.</p><p>Announcements include an AS-level path toward one or more prefixes. The path tells the receiving parties through which sequence of networks (ASes) one can reach the corresponding prefixes. The last hop of an AS path is the origin AS. In the following diagram, AS 1 is the origin AS of the announced path.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bF9IfSM2X5mtlWgqsKt4e/8a30a1f33717082d8e5d0f0ba35ae68a/image4-6.png" />
            
            </figure><p>Withdrawals, on the other hand, simply inform the receiving parties that the prefixes are no longer reachable.</p><p>Both types of messages are stateless. They inform us of the current route changes, but provide no information about the previous states. As a result, detecting origin changes is not as straightforward as one may think. Our system needs to keep track of historical BGP updates and build some sort of state over time so that we can verify if a BGP update contains origin changes.</p><p>We didn't want to deal with a complex system like a database to manage the state of all the prefixes we see resulting from all the BGP updates we get from them. Fortunately, there's this thing called <a href="https://en.wikipedia.org/wiki/Trie">prefix trie</a> in computer science that you can use to store and look up string-indexed data structures, which is ideal for our use case. We ended up developing a fast Rust-based custom IP prefix trie that we use to hold the relevant information such as the origin ASN and the AS path for each IP prefix and allows information to be updated based on BGP announcements and withdrawals.</p><p>The example figure below shows an example of the AS path information for prefix <code>192.0.2.0/24</code> stored on a prefix trie. When updating the information on the prefix trie, if we see a change of origin ASN for any given prefix, we record the BGP message as well as the change and create an <code>Origin Change Signal</code>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2tZK0inZhfzbvgpQIBKkFE/4215c1630012b91785c9a79866117eae/Screenshot-2023-07-26-at-18.20.07.png" />
            
            </figure><p>The prefix origin changes detection module collects and processes live-stream and historical BGP data from various sources. For <a href="https://www.cloudflare.com/developer-platform/solutions/live-streaming/">live streams</a>, our system applies a thin layer of data processing to translate BGP messages into our internal data structure. At the same time, for historical archives, we use a dedicated deployment of the <a href="https://bgpkit.com/broker">BGPKIT broker</a> and <a href="https://bgpkit.com/parser">parser</a> to convert MRT files from <a href="https://www.routeviews.org/">RouteViews</a> and <a href="https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris">RIPE RIS</a> into BGP message streams as they become available.</p><p>After the data is collected, consolidated and normalized it then creates, maintains and destroys the prefix tries so that we can know what changed from previous BGP announcements from the same peers. Based on these calculations we then send enriched messages downstream to be analyzed.</p>
    <div>
      <h3>Hijack detection module</h3>
      <a href="#hijack-detection-module">
        
      </a>
    </div>
    <p>Determining whether BGP messages suggest a hijack is a complex task, and no common scoring mechanism can be used to provide a definitive answer. Fortunately, there are several types of data sources that can collectively provide a relatively good idea of whether a BGP announcement is legitimate or not. These data sources can be categorized into two types: inter-AS relationships and prefix-origin binding.</p><p>The inter-AS relationship datasets include AS2org and AS2rel datasets from <a href="https://www.caida.org/">CAIDA/UCSD</a>, AS2rel datasets from <a href="https://bgpkit.com/">BGPKIT</a>, AS organization datasets from <a href="https://www.peeringdb.com/">PeeringDB</a>, and <a href="/route-leak-detection-with-cloudflare-radar/#route-leak-detection">per-prefix AS relationship data</a> built at Cloudflare. These datasets provide information about the relationship between autonomous systems, such as whether they are upstream or downstream from one another, or if the origins of any change signal belong to the same organization.</p><p>Prefix-to-origin binding datasets include live RPKI validated ROA payload (VRP) from the <a href="https://rpki.cloudflare.com/">Cloudflare RPKI portal</a>, daily Internet Routing Registry (IRR) dumps curated and cleaned up by <a href="https://www.manrs.org/">MANRS</a>, and prefix and AS <a href="https://en.wikipedia.org/wiki/Bogon_filtering">bogon</a> lists (private and reserved addresses defined by <a href="https://datatracker.ietf.org/doc/html/rfc1918">RFC 1918</a>, <a href="https://datatracker.ietf.org/doc/html/rfc5735">RFC 5735</a>, and <a href="https://datatracker.ietf.org/doc/html/rfc6598">RFC 6598</a>). These datasets provide information about the ownership of prefixes and the ASes that are authorized to originate them.</p><p>By combining all these data sources, it is possible to collect information about each BGP announcement and answer questions programmatically. For this, we have a scoring function that takes all the evidence gathered for a specific BGP event as the input and runs that data through a sequence of checks. Each condition returns a neutral, positive, or negative weight that keeps adding to the final score. The higher the score, the more likely it is that the event is a hijack attempt.</p><p>The following diagram illustrates this sequence of checks:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4y6YiYezGew65wYeNFPOqs/f2883b0c385966acbc991be815bb4a1c/image1-12.png" />
            
            </figure><p>As you can see, for each event, several checks are involved that help calculate the final score: RPKI, Internet Routing Registry (IRR), bogon prefixes and ASNs lists, AS relationships, and AS path.</p><p>Our guiding principles are: if the newly announced origins are RPKI or IRR invalid, it’s more likely that it’s a hijack, but if the old origins are also invalid, then it’s less likely. We discard events about private and reserved ASes and prefixes. If the new and old origins have a direct business relationship, then it’s less likely that it’s a hijack. If the new AS path indicates that the traffic still goes through the old origin, then it’s probably not a hijack.</p><p>Signals that are deemed legitimate are discarded, while signals with a high enough confidence score are flagged as potential hijacks and sent downstream for further analysis.</p><p>It's important to reiterate that the decision is not binary but a score. There will be situations where we find false negatives or false positives. The advantage of this framework is that we can easily monitor the results, learn from additional datasets and conduct the occasional manual inspection, which allows us to adjust the weights, add new conditions and continue improving the score precision over time.</p>
    <div>
      <h4>Aggregating BGP hijack events</h4>
      <a href="#aggregating-bgp-hijack-events">
        
      </a>
    </div>
    <p>Our BGP hijack detection system provides fast response time and requires minimal resources by operating on a per-message basis.</p><p>However, when a hijack is happening, the number of hijack signals can be overwhelming for operators to manage. To address this issue, we designed a method to aggregate individual hijack messages into <b>BGP hijack events</b>, thereby reducing the number of alerts triggered.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/20mBP4fcykkGkNIXwXqlAa/326f9aeed83dfd6c337bf374ae0f233b/image10.png" />
            
            </figure><p>An event aggregates BGP messages that are coming from the same hijacker related to prefixes from the same victim. The start date is the same as the date of the first suspicious signal. To calculate the end of an event we look for one of the following conditions:</p><ul><li><p>A BGP withdrawn message for the hijacked prefix: regardless of who sends the withdrawal, the route towards the prefix is no longer via the hijacker, and thus this hijack message is considered finished.</p></li><li><p>A new BGP announcement message with the previous (legitimate) network as the origin: this indicates that the route towards the prefix is reverted to the state before the hijack, and the hijack is therefore considered finished.</p></li></ul><p>If all BGP messages for an event have been withdrawn or reverted, and there are no more new suspicious origin changes from the hijacker ASN for <b>six hours</b>, we mark the event as finished and set the end date.</p><p>Hijack events can capture both small-scale and large-scale attacks. Alerts are then based on these aggregated events, not individual messages, making it easier for operators to manage and respond appropriately.</p>
    <div>
      <h3>Alerts, Storage and Notifications module</h3>
      <a href="#alerts-storage-and-notifications-module">
        
      </a>
    </div>
    <p>This module provides access to detected BGP hijack events and sends out notifications to relevant parties. It handles storage of all detected events and provides a user interface for easy access and search of historical events. It also generates notifications and delivers them to the relevant parties, such as network administrators or security analysts, when a potential BGP hijack event is detected. Additionally, this module can build dashboards to display high-level information and visualizations of detected events to facilitate further analysis.</p>
    <div>
      <h3>Lightweight and portable implementation</h3>
      <a href="#lightweight-and-portable-implementation">
        
      </a>
    </div>
    <p>Our BGP hijack detection system is implemented as a Rust-based command line application that is lightweight and portable. The whole detection pipeline runs off a single binary application that connects to a PostgreSQL database and essentially runs a complete self-contained BGP data pipeline. And if you are wondering, yes, the full system, including the database, can run well on a laptop.</p><p>The runtime cost mainly comes from maintaining the in-memory prefix tries for each full-feed router, each costing roughly 200 MB RAM. For the beta deployment, we use about 170 full-feed peers and the whole system runs well on a single 32 GB node with 12 threads.</p>
    <div>
      <h2>Using the BGP Hijack Detection</h2>
      <a href="#using-the-bgp-hijack-detection">
        
      </a>
    </div>
    <p>The BGP Hijack Detection results are now available on both the <a href="https://radar.cloudflare.com/security-and-attacks">Cloudflare Radar</a> website and the <a href="https://developers.cloudflare.com/api/operations/radar-get-bgp-hijacks-events">Cloudflare Radar API</a>.</p>
    <div>
      <h3>Cloudflare Radar</h3>
      <a href="#cloudflare-radar">
        
      </a>
    </div>
    <p>Under the “Security &amp; Attacks” section of the Cloudflare Radar for both global and ASN view, we now display the BGP origin hijacks table. In this table, we show a list of detected potential BGP hijack events with the following information:</p><ul><li><p>The detected and expected origin ASes;</p></li><li><p>The start time and event duration;</p></li><li><p>The number of BGP messages and route collectors peers that saw the event;</p></li><li><p>The announced prefixes;</p></li><li><p>Evidence tags and confidence level (on the likelihood of the event being a hijack).</p></li></ul>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1VwQO8aPGpngp78MrDyCmH/5b27df69735cf09eb21dea04385d8bcc/image3-6.png" />
            
            </figure><p>For each BGP event, our system generates relevant evidence tags to indicate why the event is considered suspicious or not. These tags are used to inform the confidence score assigned to each event. Red tags indicate evidence that increases the likelihood of a hijack event, while green tags indicate the opposite.</p><p>For example, the red tag "RPKI INVALID" indicates an event is likely a hijack, as it suggests that the RPKI validation failed for the announcement. Conversely, the tag "SIBLING ORIGINS" is a green tag that indicates the detected and expected origins belong to the same organization, making it less likely for the event to be a hijack.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/44EZVKQqIpl7O5tS7QrweM/1a763e761fda826d151fd43e951c6167/Screenshot-2023-07-26-at-18.22.35.png" />
            
            </figure><p>Users can now access the BGP hijacks table in the following ways:</p><ol><li><p>Global view under <a href="https://radar.cloudflare.com/security-and-attacks">Security &amp; Attacks</a> page without location filters. This view lists the most recent 150 detected BGP hijack events globally.</p></li><li><p>When filtered by a specific ASN, the table will appear on Overview, Traffic, and Traffic &amp; Attacks tabs.</p></li></ol>
    <div>
      <h3>Cloudflare Radar API</h3>
      <a href="#cloudflare-radar-api">
        
      </a>
    </div>
    <p>We also provide programmable access to the BGP hijack detection results via the Cloudflare Radar API, which is freely available under <a href="https://radar.cloudflare.com/about">CC BY-NC 4.0 license</a>. The API documentation is available at the <a href="https://developers.cloudflare.com/api/operations/radar-get-bgp-hijacks-events">Cloudflare API portal</a>.</p><p>The following <code>curl</code> command fetches the most recent 10 BGP hijack events relevant to AS64512.</p>
            <pre><code>curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&amp;format=json&amp;per_page=10" \
    -H "Authorization: Bearer &lt;API_TOKEN&gt;"</code></pre>
            <p>Users can further filter events with high confidence by specifying the <code>minConfidence</code> parameter with a 0-10 value, where a higher value indicates higher confidence of the events being a hijack. The following example expands on the previous example by adding the minimum confidence score of 8 to the query:</p>
            <pre><code>curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&amp;format=json&amp;per_page=10&amp;minConfidence=8" \
    -H "Authorization: Bearer &lt;API_TOKEN&gt;"</code></pre>
            <p>Additionally, users can also quickly build custom hijack alerters using a Cloudflare <a href="https://developers.cloudflare.com/workers/wrangler/workers-kv/#workers-kv">Workers + KV combination</a>. We have a full tutorial on building alerters that send out webhook-based messages or emails (with <a href="https://developers.cloudflare.com/email-routing/">Email Routing</a>) available on the <a href="https://developers.cloudflare.com/radar/investigate/bgp-anomalies/">Cloudflare Radar documentation site</a>.</p>
    <div>
      <h2>More routing security on Cloudflare Radar</h2>
      <a href="#more-routing-security-on-cloudflare-radar">
        
      </a>
    </div>
    <p>As we continue improving Cloudflare Radar, we are planning to introduce additional Internet routing and security data. For example, Radar will soon get a dedicated routing section to provide digestible BGP information for given networks or regions, such as distinct routable prefixes, RPKI valid/invalid/unknown routes, distribution of IPv4/IPv6 prefixes, etc. Our goal is to provide the best data and tools for routing security to the community, so that we can build a better and more secure Internet together.</p><p>Visit <a href="https://radar.cloudflare.com/">Cloudflare Radar</a> for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at <a href="https://twitter.com/CloudflareRadar">@CloudflareRadar</a> (Twitter), <a href="https://noc.social/@cloudflareradar">https://noc.social/@cloudflareradar</a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com">radar.cloudflare.com</a> (Bluesky), or contact us via <a>e-mail</a>.</p> ]]></content:encoded>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Radar Alerts]]></category>
            <guid isPermaLink="false">33xptAfGQ0z94EAn4h1oKn</guid>
            <dc:creator>Mingwei Zhang</dc:creator>
            <dc:creator>Celso Martinho</dc:creator>
        </item>
        <item>
            <title><![CDATA[Helping build a safer Internet by measuring BGP RPKI Route Origin Validation]]></title>
            <link>https://blog.cloudflare.com/rpki-updates-data/</link>
            <pubDate>Fri, 16 Dec 2022 14:00:00 GMT</pubDate>
            <description><![CDATA[ Is BGP safe yet? If the question needs asking, then it isn't. But how far the Internet is from this goal is what we set out to answer. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1VWVhVnz5Xbv2u1jm48KeJ/dd52aaf9426c64b5d2b68a0b7651cb93/image7-7.png" />
            
            </figure><p>The <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">Border Gateway Protocol</a> (BGP) is the glue that keeps the entire Internet together. However, despite its vital function, BGP wasn't originally designed to protect against malicious actors or routing mishaps. It has since been updated to account for this shortcoming with the <a href="https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure">Resource Public Key Infrastructure</a> (RPKI) framework, but can we declare it to be safe yet?</p><p>If the question needs asking, you might suspect we can't. There is a shortage of reliable data on how much of the Internet is protected from preventable routing problems. Today, we’re releasing a new method to measure exactly that: what percentage of Internet users are protected by their Internet Service Provider from these issues. We find that there is a long way to go before the Internet is protected from routing problems, though it varies dramatically by country.</p>
    <div>
      <h3>Why RPKI is necessary to secure Internet routing</h3>
      <a href="#why-rpki-is-necessary-to-secure-internet-routing">
        
      </a>
    </div>
    <p>The Internet is a network of independently-managed networks, called <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">Autonomous Systems (ASes)</a>. To achieve global reachability, ASes interconnect with each other and determine the feasible paths to a given destination IP address by exchanging routing information using BGP. BGP enables routers with only local network visibility to construct end-to-end paths based on the arbitrary preferences of each administrative entity that operates that equipment. Typically, Internet traffic between a user and a destination traverses multiple AS networks using paths constructed by BGP routers.</p><p>BGP, however, lacks built-in security mechanisms to protect the integrity of the exchanged routing information and to provide authentication and authorization of the advertised IP address space. Because of this, AS operators must implicitly trust that the routing information exchanged through BGP is accurate. As a result, the Internet is vulnerable to the injection of bogus routing information, which cannot be mitigated by security measures at the client or server level of the network.</p><p>An adversary with access to a BGP router can inject fraudulent routes into the routing system, which can be used to execute an array of attacks, including:</p><ul><li><p>Denial-of-Service (DoS) through traffic blackholing or redirection,</p></li><li><p>Impersonation attacks to eavesdrop on communications,</p></li><li><p>Machine-in-the-Middle exploits to modify the exchanged data, and subvert reputation-based filtering systems.</p></li></ul><p>Additionally, local misconfigurations and fat-finger errors can be propagated well beyond the source of the error and cause major disruption across the Internet.</p><p>Such an incident happened on <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">June 24, 2019</a>. Millions of users were unable to access Cloudflare address space when a regional ISP in Pennsylvania accidentally advertised routes to Cloudflare through their capacity-limited network. This was effectively the Internet equivalent of routing an entire freeway through a neighborhood street.</p><p>Traffic misdirections like these, either unintentional or intentional, are not uncommon. The Internet Society’s <a href="https://www.manrs.org/">MANRS</a> (Mutually Agreed Norms for Routing Security) initiative estimated that in 2020 alone there were <a href="https://www.manrs.org/2021/03/a-regional-look-into-bgp-incidents-in-2020/">over 3,000 route leaks and hijacks</a>, and new occurrences can be <a href="/route-leak-detection-with-cloudflare-radar/">observed every day through Cloudflare Radar.</a></p><p>The most prominent proposals to secure BGP routing, standardized by the <a href="https://www.ietf.org/about/introduction/">IETF</a> focus on validating the origin of the advertised routes using <a href="https://en.wikipedia.org/wiki/Resource_Public_Key_Infrastructure">Resource Public Key Infrastructure</a> (RPKI) and verifying the integrity of the paths with <a href="https://en.wikipedia.org/wiki/BGPsec">BGPsec</a>. Specifically, RPKI (defined in <a href="https://www.rfc-editor.org/rfc/rfc7115.html">RFC 7115</a>) relies on a <a href="https://en.wikipedia.org/wiki/Public_key_infrastructure">Public Key Infrastructure</a> to validate that an AS advertising a route to a destination (an IP address space) is the legitimate owner of those IP addresses.</p><p>RPKI has been defined for a long time but lacks adoption. It requires network operators to cryptographically sign their prefixes, and routing networks to perform an RPKI Route Origin Validation (ROV) on their routers. This is a two-step operation that requires coordination and participation from many actors to be effective.</p>
    <div>
      <h3>The two phases of RPKI adoption: signing origins and validating origins</h3>
      <a href="#the-two-phases-of-rpki-adoption-signing-origins-and-validating-origins">
        
      </a>
    </div>
    <p>RPKI has two phases of deployment: first, an AS that wants to protect its own IP prefixes can cryptographically sign Route Origin Authorization (ROA) records thereby attesting to be the legitimate origin of that signed IP space. Second, an AS can avoid selecting invalid routes by performing Route Origin Validation (ROV, defined in <a href="https://www.rfc-editor.org/rfc/rfc6483">RFC 6483</a>).</p><p>With ROV, a BGP route received by a neighbor is validated against the available RPKI records. A route that is valid or missing from RPKI is selected, while a route with RPKI records found to be invalid is typically rejected, thus preventing the use and propagation of hijacked and misconfigured routes.</p><p>One issue with RPKI is the fact that implementing ROA is meaningful only if other ASes implement ROV, and vice versa. Therefore, securing BGP routing requires a united effort and a lack of broader adoption disincentivizes ASes from commiting the resources to validate their own routes. Conversely, increasing RPKI adoption can lead to network effects and accelerate RPKI deployment. Projects like MANRS and Cloudflare’s <a href="https://isbgpsafeyet.com/">isbgpsafeyet.com</a> are promoting good Internet citizenship among network operators, and make the benefits of RPKI deployment known to the Internet. You can check whether your own ISP is being a good Internet citizen by testing it on <a href="https://isbgpsafeyet.com/">isbgpsafeyet.com</a>.</p><p>Measuring the extent to which both ROA (signing of addresses by the network that controls them) and ROV (filtering of invalid routes by ISPs) have been implemented is important to evaluating the impact of these initiatives, developing situational awareness, and predicting the impact of future misconfigurations or attacks.</p><p>Measuring ROAs is straightforward since ROA data is <a href="https://ftp.ripe.net/rpki/">readily available</a> from RPKI repositories. Querying RPKI repositories for publicly routed IP prefixes (e.g. prefixes visible in the <a href="http://www.routeviews.org/">RouteViews</a> and <a href="https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris">RIPE RIS</a> routing tables) allows us to estimate the percentage of addresses covered by ROA objects. Currently, there are 393,344 IPv4 and 86,306 IPv6 ROAs in the global RPKI system, covering about 40% of the globally routed prefix-AS origin pairs<sup>1</sup>.</p><p>Measuring ROV, however, is significantly more challenging given it is configured inside the BGP routers of each AS, not accessible by anyone other than each router’s administrator.</p>
    <div>
      <h3>Measuring ROV deployment</h3>
      <a href="#measuring-rov-deployment">
        
      </a>
    </div>
    <p>Although we do not have direct access to the configuration of everyone’s BGP routers, it is possible to infer the use of ROV by comparing the reachability of RPKI-valid and RPKI-invalid prefixes from measurement points within an AS<sup>2</sup>.</p><p>Consider the following toy topology as an example, where an RPKI-invalid origin is advertised through AS0 to AS1 and AS2. If AS1 filters and rejects RPKI-invalid routes, a user behind AS1 would not be able to connect to that origin. By contrast, if AS2 does not reject RPKI invalids, a user behind AS2 would be able to connect to that origin.</p><p>While occasionally a user may be unable to access an origin due to transient network issues, if multiple users act as vantage points for a measurement system, we would be able to collect a large number of data points to infer which ASes deploy ROV.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3ix5pgVzjgMlL7BugvGJDD/aff6d6eaf101da010a24fa8e7908b106/1-1.png" />
            
            </figure><p>If, in the figure above, AS0 filters invalid RPKI routes, then vantage points in both AS1 and AS2 would be unable to connect to the RPKI-invalid origin, making it hard to distinguish if ROV is deployed at the ASes of our vantage points or in an AS along the path. One way to mitigate this limitation is to announce the RPKI-invalid origin from multiple locations from an anycast network taking advantage of its direct interconnections to the measurement vantage points as shown in the figure below. As a result, an AS that does not itself deploy ROV is less likely to observe the benefits of upstream ASes using ROV, and we would be able to accurately infer ROV deployment per AS<sup>3</sup>.</p><p><i>Note that it’s also important that the IP address of the RPKI-invalid origin should not be covered by a less specific prefix for which there is a valid or unknown RPKI route, otherwise even if an AS filters invalid RPKI routes its users would still be able to find a route to that IP.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7HDnUbqxvRQ3DbhArqJsMg/a21bfe1ef026a4aa615ac21f759d3f3f/2-1.png" />
            
            </figure><p>The measurement technique described here is the one implemented by Cloudflare’s <a href="https://isbgpsafeyet.com">isbgpsafeyet.com</a> website, allowing end users to assess whether or not their ISPs have deployed BGP ROV.</p><p>The <a href="https://isbgpsafeyet.com/">isbgpsafeyet.com</a> website itself doesn't submit any data back to Cloudflare, but recently we started measuring whether end users’ browsers can successfully connect to invalid RPKI origins when ROV is present. We use the same mechanism as is used for <a href="/network-performance-update-developer-week/">global performance data</a><sup>4</sup>. In particular, every measurement session (an individual end user at some point in time) attempts a request to both valid.rpki.cloudflare.com, which should always succeed as it’s RPKI-valid, and invalid.rpki.cloudflare.com, which is RPKI-invalid and should fail when the user’s ISP uses ROV.</p><p>This allows us to have continuous and up-to-date measurements from hundreds of thousands of browsers on a daily basis, and develop a greater understanding of the state of ROV deployment.</p>
    <div>
      <h3>The state of global ROV deployment</h3>
      <a href="#the-state-of-global-rov-deployment">
        
      </a>
    </div>
    <p>The figure below shows the raw number of ROV probe requests per hour during October 2022 to <i>valid.rpki.cloudflare.com</i> and <i>invalid.rpki.cloudflare.com</i>. In total, we observed 69.7 million successful probes from 41,531 ASNs.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/16AWpr0oNevsWJynpNN1v4/51885d284ed9360416c915a935f19d6d/3-1.png" />
            
            </figure><p>Based on <a href="https://labs.apnic.net/?p=526">APNIC's estimates</a> on the number of end users per ASN, our weighted<sup>5</sup> analysis covers 96.5% of the world's Internet population. As expected, the number of requests follow a diurnal pattern which reflects established user behavior in daily and weekly Internet activity<sup>6</sup>.</p><p>We can also see that the number of successful requests to <i>valid.rpki.cloudflare.com</i> (<b><i>gray line</i></b>) closely follows the number of sessions that issued at least one request (<b><i>blue line</i></b>), which works as a smoke test for the correctness of our measurements.</p><p>As we don't store the IP addresses that contribute measurements, we don’t have any way to count individual clients and large spikes in the data may introduce unwanted bias. We account for that by capturing those instants and excluding them.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2rou45RM7Y0NdF2opTgGZE/0e708f6e801926147cd498a355751b21/4-1.png" />
            
            </figure><p>Overall, we estimate that out of the four billion Internet users, <b>only 261 million (6.5%) are protected by BGP Route Origin Validation</b>, but the true state of global ROV deployment is more subtle than this.</p><p>The following map shows the fraction of dropped RPKI-invalid requests from ASes with over 200 probes over the month of October. It depicts how far along each country is in adopting ROV but doesn’t necessarily represent the fraction of protected users in each country, as we will discover.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3UzNnhgQaIpcQktYHwNwpf/b6a389c9ed94e49456c75aa0f8689264/5-1.png" />
            
            </figure><p>Sweden and Bolivia appear to be the countries with the highest level of adoption (over 80%), while only a few other countries have crossed the 50% mark (e.g. Finland, Denmark, Chad, Greece, the United States).</p><p>ROV adoption may be driven by a few ASes hosting large user populations, or by many ASes hosting small user populations. To understand such disparities, the map below plots the contrast between overall adoption in a country (as in the previous map) and median adoption over the individual ASes within that country. Countries with stronger reds have relatively few ASes deploying ROV with high impact, while countries with stronger blues have more ASes deploying ROV but with lower impact per AS.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6m2lvwtrbDEzDObM6NfW5W/2371a8bdb5f7ed5ed4981103d4aad4c5/6-1.png" />
            
            </figure><p>In the Netherlands, Denmark, Switzerland, or the United States, adoption appears mostly driven by their larger ASes, while in Greece or Yemen it’s the smaller ones that are adopting ROV.</p><p>The following histogram summarizes the worldwide level of adoption for the 6,765 ASes covered by the previous two maps.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5zg7hScwuknrlGP0z76zvx/6fa811c1f9f92f50397ec64267b0af73/7.png" />
            
            </figure><p>Most ASes either don’t validate at all, or have close to 100% adoption, which is what we’d intuitively expect. However, it's interesting to observe that there are small numbers of ASes all across the scale. ASes that exhibit partial RPKI-invalid drop rate compared to total requests may either implement ROV partially (on some, but not all, of their BGP routers), or appear as dropping RPKI invalids due to ROV deployment by other ASes in their upstream path.</p><p>To estimate the number of users protected by ROV we only considered ASes with an observed adoption above <b>95%</b>, as an AS with an incomplete deployment still leaves its users vulnerable to route leaks from its BGP peers.</p><p>If we take the previous histogram and summarize by the number of users behind each AS, the green bar on the right corresponds to the <b>261 million</b> users currently protected by ROV according to the above criteria (686 ASes).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/NAtYPcWbDWsiBiMfX56es/40a59d3253b0cba83c6e1bde2bbf83a6/8.png" />
            
            </figure><p>Looking back at the country adoption map one would perhaps expect the number of protected users to be larger. But worldwide ROV deployment is still mostly partial, lacking larger ASes, or both. This becomes even more clear when compared with the next map, plotting just the fraction of fully protected users.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7kAM05eOwaPusgkvSIASoB/e86386a84c161919b3bdc9018145eb1c/9.png" />
            
            </figure><p>To wrap up our analysis, we look at two world economies chosen for their contrasting, almost symmetrical, stages of deployment: the United States and the European Union.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3DShkKyva4l7rm5qC7gQnP/2ba1802f7d305815450bc1b9df372abb/10.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4Q0ipIYrllZ63MXSRYEh59/bc25c31c0de0b71c157f909e5eb8522f/11.png" />
            
            </figure><p>112 million Internet users are protected by 111 ASes from the United States with comprehensive ROV deployments. Conversely, more than twice as many ASes from countries making up the European Union have fully deployed ROV, but end up covering only half as many users. This can be reasonably explained by end user ASes being more likely to operate within a single country rather than span multiple countries.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Probe requests were performed from end user browsers and very few measurements were collected from transit providers (which have few end users, if any). Also, paths between end user ASes and Cloudflare are often very short (a nice outcome of our extensive peering) and don't traverse upper-tier networks that they would otherwise use to reach the rest of the Internet.</p><p>In other words, the methodology used focuses on ROV adoption by <b>end user networks</b> (e.g. ISPs) and isn’t meant to reflect the eventual effect of indirect validation from (perhaps validating) upper-tier transit networks. While indirect validation may limit the "blast radius" of (malicious or accidental) route leaks, it still leaves non-validating ASes vulnerable to leaks coming from their peers.</p><p>As with indirect validation, an AS remains vulnerable until its ROV deployment reaches a sufficient level of completion. We chose to only consider AS deployments above 95% as truly comprehensive, and <a href="https://radar.cloudflare.com">Cloudflare Radar</a> will soon begin using this threshold to track ROV adoption worldwide, as part of our mission to help build a better Internet.</p><p>When considering only comprehensive ROV deployments, some countries such as Denmark, Greece, Switzerland, Sweden, or Australia, already show an effective coverage above 50% of their respective Internet populations, with others like the Netherlands or the United States slightly above 40%, mostly driven by few large ASes rather than many smaller ones.</p><p>Worldwide we observe a very low effective coverage of just <b>6.5%</b> over the measured ASes, corresponding to <b>261 million</b> end users currently safe from (malicious and accidental) route leaks, which means there’s still a long way to go before we can declare BGP to be safe.</p><p>......</p><p><sup>1</sup><a href="https://rpki.cloudflare.com/">https://rpki.cloudflare.com/</a></p><p><sup>2</sup>Gilad, Yossi, Avichai Cohen, Amir Herzberg, Michael Schapira, and Haya Shulman. "Are we there yet? On RPKI's deployment and security." Cryptology ePrint Archive (2016).</p><p><sup>3</sup>Geoff Huston. “Measuring ROAs and ROV”. <a href="https://blog.apnic.net/2021/03/24/measuring-roas-and-rov/">https://blog.apnic.net/2021/03/24/measuring-roas-and-rov/</a>
<sup>4</sup>Measurements are issued stochastically when users encounter 1xxx error pages from default (non-customer) configurations.</p><p><sup>5</sup>Probe requests are weighted by AS size as calculated from Cloudflare's <a href="https://radar.cloudflare.com/">worldwide HTTP traffic</a>.</p><p><sup>6</sup>Quan, Lin, John Heidemann, and Yuri Pradkin. "When the Internet sleeps: Correlating diurnal networks with external factors." In Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 87-100. 2014.</p> ]]></content:encoded>
            <category><![CDATA[Impact Week]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Routing Security]]></category>
            <category><![CDATA[Better Internet]]></category>
            <guid isPermaLink="false">dMGl1iwWVn3YZWRTxIzgV</guid>
            <dc:creator>Carlos Rodrigues</dc:creator>
            <dc:creator>Vasilis Giotsas</dc:creator>
        </item>
        <item>
            <title><![CDATA[Why BGP communities are better than AS-path prepends]]></title>
            <link>https://blog.cloudflare.com/prepends-considered-harmful/</link>
            <pubDate>Thu, 24 Nov 2022 17:31:47 GMT</pubDate>
            <description><![CDATA[ Routing on the Internet follows a few basic principles. Unfortunately not everything on the Internet is created equal, and prepending can do more harm than good. In this blog post we’ll talk about the problems that prepending aims to solve, and some alternative solutions ]]></description>
            <content:encoded><![CDATA[ <p><i></i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2E41RLSZSaS34QDzUqX1Rd/7ebce0374cd5f67f333eaf954e6f1445/image7-9.png" />
            
            </figure><p>The Internet, in its purest form, is a loosely connected graph of independent networks (also called <a href="https://www.cloudflare.com/en-gb/learning/network-layer/what-is-an-autonomous-system/">Autonomous Systems</a> (AS for short)). These networks use a signaling protocol called <a href="https://www.cloudflare.com/en-gb/learning/security/glossary/what-is-bgp/">BGP</a> (Border Gateway Protocol) to inform their neighbors (also known as peers) about the reachability of IP prefixes (a group of IP addresses) in and through their network. Part of this exchange contains useful metadata about the IP prefix that are used to inform network routing decisions. One example of the metadata is the full AS-path, which consists of the different autonomous systems an IP packet needs to pass through to reach its destination.</p><p>As we all want our packets to get to their destination as fast as possible, selecting the shortest AS-path for a given prefix is a good idea. This is where something called prepending comes into play.</p>
    <div>
      <h2>Routing on the Internet, a primer</h2>
      <a href="#routing-on-the-internet-a-primer">
        
      </a>
    </div>
    <p>Let's briefly talk about how the Internet works at its most fundamental level, before we dive into some nitty-gritty details.</p><p>The Internet is, at its core, a massively interconnected network of thousands of networks. Each network owns two things that are critical:</p><p>1. An Autonomous System Number (ASN): a 32-bit integer that uniquely identifies a network. For example, one of the Cloudflare ASNs (we have multiple) is 13335.</p><p>2. IP prefixes: An IP prefix is a range of IP addresses, bundled together in powers of two: In the IPv4 space, two addresses form a /31 prefix, four form a /30, and so on, all the way up to /0, which is shorthand for “all IPv4 prefixes''. The same applies for IPv6  but instead of aggregating 32 bits at most, you can aggregate up to 128 bits. The figure below shows this relationship between IP prefixes, in reverse -- a /24 contains two /25s that contains two /26s, and so on.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5XUwaT0EJLzfHUwkpm7aN2/47a248cc3292ae8a970423c8f3de9f5b/image9-6.png" />
            
            </figure><p>To communicate on the Internet, you must be able to reach your destination, and that’s where routing protocols come into play. They enable each node on the Internet to know where to send your message (and for the receiver to send a message back).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6RzDynXtAzTFAAeSRNF7Qz/f3297d7747eb351a77dfb3da7461944a/image5-21.png" />
            
            </figure><p>As mentioned earlier, these destinations are identified by IP addresses, and contiguous ranges of IP addresses are expressed as IP prefixes. We use IP prefixes for routing as an efficiency optimization: Keeping track of where to go for four billion (2<sup>32</sup>) IP addresses in IPv4 would be incredibly complex, and require a lot of resources. Sticking to prefixes reduces that number down to about one million instead.</p><p>Now recall that Autonomous Systems are independently operated and controlled. In the Internet’s network of networks, how do I tell Source A in some other network that there is an available path to get to Destination B in (or through) my network? In comes BGP! BGP is the Border Gateway Protocol, and it is used to signal reachability information. Signal messages generated by the source ASN are referred to as ‘announcements’ because they declare to the Internet that IP addresses in the prefix are online and reachable.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/cudmPAIkxVZr5tOuSqQdm/f096de61cdebaa7c9427343be70cb0ff/image4-33.png" />
            
            </figure><p>Have a look at the figure above. Source A should now know how to get to Destination B through 2 different networks!</p><p>This is what an actual BGP message would look like:</p>
            <pre><code>BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24</code></pre>
            <p>As you can see, BGP messages contain more than just the IP prefix (the NLRI bit) and the path, but also a bunch of other metadata that provides additional information about the path. Other fields include communities (more on that later), as well as MED, or origin code. MED is a suggestion to other directly connected networks on which path should be taken if multiple options are available, and the lowest value wins. The origin code can be one of three values: IGP, EGP or Incomplete. IGP will be set if you originate the prefix through BGP, EGP is no longer used (it’s an ancient routing protocol), and Incomplete is set when you distribute a prefix into BGP from another routing protocol (like IS-IS or OSPF).</p><p>Now that source A knows how to get to Destination B through two different networks, let's talk about traffic engineering!</p>
    <div>
      <h2>Traffic engineering</h2>
      <a href="#traffic-engineering">
        
      </a>
    </div>
    <p>Traffic engineering is a critical part of the day to day management of any network. Just like in the physical world, detours can be put in place by operators to optimize the traffic flows into (inbound) and out of (outbound) their network. Outbound traffic engineering is significantly easier than inbound traffic engineering because operators can choose from neighboring networks, even prioritize some traffic over others. In contrast, inbound traffic engineering requires influencing a network that is operated by someone else entirely. The autonomy and self-governance of a network is paramount, so operators use available tools to inform or shape inbound packet flows from other networks. The understanding and use of those tools is complex, and can be a challenge.</p><p>The available set of traffic engineering tools, both in- and outbound, rely on manipulating attributes (metadata) of a given route. As we’re talking about traffic engineering between independent networks, we’ll be manipulating the attributes of an EBGP-learned route. BGP can be split into two categories:</p><ol><li><p>EBGP: BGP communication between two different ASNs</p></li><li><p>IBGP: BGP communication within the same ASN.</p></li></ol><p>While the protocol is the same, certain attributes can be exchanged on an IBGP session that aren’t exchanged on an EBGP session. One of those is local-preference. More on that in a moment.</p>
    <div>
      <h3>BGP best path selection</h3>
      <a href="#bgp-best-path-selection">
        
      </a>
    </div>
    <p>When a network is connected to multiple other networks and service providers, it can receive path information to the same IP prefix from many of those networks, each with slightly different attributes. It is then up to the receiving network of that information to use a BGP best path selection algorithm to pick the “best” prefix (and route), and use this to forward IP traffic. I’ve put “best” in quotation marks, as best is a subjective requirement. “Best” is frequently the shortest, but what can be best for my network might not be the best outcome for another network.</p><p>BGP will consider multiple prefix attributes when filtering through the received options. However, rather than combine all those attributes into a single selection criteria, BGP best path selection uses the attributes in tiers -- at any tier, if the available attributes are sufficient to choose the best path, then the algorithm terminates with that choice.</p><p>The BGP best path selection algorithm is extensive, containing 15 discrete steps to select the best available path for a given prefix. Given the numerous steps, it’s in the interest of the network to decide the best path as early as possible. The first four steps are most used and influential, and are depicted in the figure below as sieves.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/28iOxj73xffT9imICxgTTI/504e3ea0c7767d8acae1ee0ca0b64c4d/image2-55.png" />
            
            </figure><p>Picking the shortest path possible is usually a good idea, which is why “AS-path length” is a step executed early on in the algorithm. However, looking at the figure above, “AS-path length” appears second, despite being the attribute to find the shortest path. So let’s talk about the first step: local preference.</p><p><b>Local preference</b>Local preference is an operator favorite because it allows them to handpick a route+path combination of their choice. It’s the first attribute in the algorithm because it is unique for any given route+neighbor+AS-path combination.</p><p>A network sets the local preference on import of a route (having learned about the route from a neighbor network). Being a non-transitive property, meaning that it’s an attribute that is never sent in an EBGP message to other networks. This intrinsically means, for example, that the operator of AS 64496 can’t set the local preference of routes to their own (or transiting) IP prefixes inside neighboring AS 64511. The inability to do so is partially why inbound traffic engineering through EBGP is so difficult.</p><p><b>Prepending artificially increases AS-path length</b>Since no network is able to directly set the local preference for a prefix inside another network, the first opportunity to influence other networks’ choices is modifying the AS-path. If the next hops are valid, and the local preference for all the different paths for a given route are the same, modifying the AS-path is an obvious option to change the path traffic will take towards your network. In a BGP message, prepending looks like this:</p><p>BEFORE:</p>
            <pre><code>BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24</code></pre>
            <p>AFTER:</p>
            <pre><code>BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24</code></pre>
            <p>Specifically, operators can do AS-path prepending. When doing AS-path prepending, an operator adds additional autonomous systems to the path (usually the operator uses their own AS, but that’s not enforced in the protocol). This way, an AS-path can go from a length of 1 to a length of 255. As the length has now increased dramatically, that specific path for the route will not be chosen. By changing the AS-path advertised to different peers, an operator can control the traffic flows coming into their network.</p><p>Unfortunately, prepending has a catch: To be the deciding factor, all the other attributes need to be equal. This is rarely true, especially in large networks that are able to choose from many possible routes to a destination.</p>
    <div>
      <h2>Business Policy Engine</h2>
      <a href="#business-policy-engine">
        
      </a>
    </div>
    <p>BGP is colloquially also referred to as a Business Policy Engine: it does <b>not</b> select the best path from a performance point of view; instead, and more often than not, it will select the best path from a <i>business</i> point of view. The business criteria could be anything from investment (port) efficiency to increased revenue, and more. This may sound strange but, believe it or not, this is what BGP is designed to do! The power (and complexity) of BGP is that it enables a network operator to make choices according to the operator’s needs, contracts, and policies, many of which cannot be reflected by conventional notions of engineering performance.</p>
    <div>
      <h3>Different local preferences</h3>
      <a href="#different-local-preferences">
        
      </a>
    </div>
    <p>A lot of networks (including Cloudflare) assign a local preference depending on the type of connection used to send us the routes. A higher value is a higher preference. For example, routes learned from transit network connections will get a lower local preference of 100 because they are the most costly to use; backbone-learned routes will be 150, Internet exchange (IX) routes get 200, and lastly private interconnect (PNI) routes get 250. This means that for egress (outbound) traffic, the Cloudflare network, by default, will prefer a PNI-learned route, even if a shorter AS-path is available through an IX or transit neighbor.</p><p>Part of the reason a PNI is preferred over an IX is reliability, because there is no third-party switching platform involved that is out of our control, which is important because we operate on the assumption that all hardware can and will eventually break. Another part of the reason is for port efficiency reasons. Here, efficiency is defined by cost per megabit transferred on each port. Roughly speaking, the cost is calculated by:</p><p><code>((cost_of_switch / port_count) + transceiver_cost)</code></p><p>which is combined with the cross-connect cost (might be monthly recurring (MRC), or a one-time fee). PNI is preferable because it helps to optimize value by reducing the overall cost per megabit transferred, because the unit price decreases with higher utilization of the port.</p><p>This reasoning is similar for a lot of other networks, and is very prevalent in transit networks. BGP is at least as much about cost and business policy, as it is about performance.</p>
    <div>
      <h3>Transit local preference</h3>
      <a href="#transit-local-preference">
        
      </a>
    </div>
    <p>For simplicity, when referring to transits, I mean the <a href="https://en.wikipedia.org/wiki/Tier_1_network">traditional tier-1 transit networks</a>. Due to the nature of these networks, they have two distinct sets of network peers:</p><p>1. Customers (like Cloudflare)2. Settlement-free peers (like other tier-1 networks)</p><p>In normal circumstances, transit customers will get a higher local preference assigned than the local preference used for their settlement-free peers. This means that, no matter how much you prepend a prefix, if traffic enters that transit network, traffic will <b>always</b> land on your interconnection with that transit network, it will not be offloaded to another peer.</p><p>A prepend can still be used if you want to switch/offload traffic from a single link with one transit if you have multiple distinguished links with them, or if the source of traffic is multihomed behind multiple transits (and they don’t have their own local preference playbook preferring one transit over another). But inbound traffic engineering traffic away from one transit port to another through AS-path prepending has significant diminishing returns: once you’re past three prepends, it’s unlikely to change much, if anything, at that point.</p><p><b>Example</b></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/8QW8idtVhFb2jtY64uTLf/696a61224544698eb35a8e7ae20f8a22/image8-6.png" />
            
            </figure><p>In the above scenario, no matter the adjustment Cloudflare makes in its AS-path towards AS 64496, the traffic will keep flowing through the Transit B &lt;&gt; Cloudflare interconnection, even though the path Origin A → Transit B → Transit A → Cloudflare is shorter from an AS-path point of view.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2uNwYLR8BdzCHVTzISwgtk/581ea233f8058f998bf00dfaa44ca3e3/image6-12.png" />
            
            </figure><p>In this scenario, not a lot has changed, but Origin A is now multi-homed behind the two transit providers. In this case, the AS-path prepending was effective, as the paths seen on the Origin A side are both the prepended and non-prepended path. As long as Origin A is not doing any egress traffic engineering, and is treating both transit networks equally, then the path chosen will be Origin A → Transit A → Cloudflare.</p>
    <div>
      <h3>Community-based traffic engineering</h3>
      <a href="#community-based-traffic-engineering">
        
      </a>
    </div>
    <p>So we have now identified a pretty critical problem within the Internet ecosystem for operators: with the tools mentioned above, it’s not always (some might even say outright impossible) possible to accurately dictate paths traffic can ingress your own network, reducing the control an autonomous system has over its own network. Fortunately, there is a solution for this problem: community-based local preference.</p><p>Some transit providers allow their customers to influence the local preference in the transit network through the use of BGP communities. BGP communities are an optional transitive attribute for a route advertisement. The communities can be informative (“I learned this prefix in Rome”), but they can also be used to trigger actions on the receiving side. For example, Cogent publishes the following action communities:</p><table>
<thead>
  <tr>
    <th>Community</th>
    <th>Local preference</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>174:10</td>
    <td>10</td>
  </tr>
  <tr>
    <td>174:70</td>
    <td>70</td>
  </tr>
  <tr>
    <td>174:120</td>
    <td>120</td>
  </tr>
  <tr>
    <td>174:125</td>
    <td>125</td>
  </tr>
  <tr>
    <td>174:135</td>
    <td>135</td>
  </tr>
  <tr>
    <td>174:140</td>
    <td>140</td>
  </tr>
</tbody>
</table><p>When you know that Cogent uses the following default local preferences in their network:</p><p>Peers → Local preference 100Customers → Local preference 130</p><p>It’s easy to see how we could use the communities provided to change the route used. It’s important to note though that, as we can’t set the local preference of a route to 100 (or 130), AS-path prepending remains largely irrelevant, as the local preference won’t ever be the same.</p><p>Take for example the following configuration:</p>
            <pre><code>term ADV-SITELOCAL {
    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    }
    then {
        as-path-prepend "13335 13335";
        accept;
    }
}</code></pre>
            
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2QJPzju5z4aHCOTxSEAO9y/21cf93066b2a67a48f530af5934c58d3/image1-71.png" />
            
            </figure><p>We’re prepending the Cloudflare ASN two times, resulting in a total AS-path of three, yet we were still seeing a lot (too much) traffic coming in on our Cogent link. At that point, an engineer could add another prepend, but for a well-connected network as Cloudflare, if two prepends didn’t do much, or three, then four or five isn’t going to do much either. Instead, we can leverage the Cogent communities documented above to change the routing within Cogent:</p>
            <pre><code>term ADV-SITELOCAL {
    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    }
    then {
        community add COGENT_LPREF70;
        accept;
    }
}</code></pre>
            <p>The above configuration changes the traffic flow to this:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1hJxj0OColxsHeP06JAfBb/8fa01b2f92b44de8b129b761dccc9acf/image3-47.png" />
            
            </figure><p>Which is exactly what we wanted!</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>AS-path prepending is still useful, and has its use as part of the toolchain for operators to do traffic engineering, but should be used sparingly. <a href="https://ripe79.ripe.net/presentations/64-prepending_madory2.pdf">Excessive prepending opens a network up to wider spread route hijacks</a>, which should be avoided at all costs. As such, using community-based ingress traffic engineering is highly preferred (and recommended). In cases where communities aren’t available (or not available to steer customer traffic), prepends can be applied, but I encourage operators to actively monitor their effects, and roll them back if ineffective.</p><p>As a side-note, P Marcos et al. have published an interesting paper on AS-path prepending, and go into some trends seen in relation to prepending, I highly recommend giving it a read: <a href="https://www.caida.org/catalog/papers/2020_aspath_prepending/aspath_prepending.pdf">https://www.caida.org/catalog/papers/2020_aspath_prepending/aspath_prepending.pdf</a></p> ]]></content:encoded>
            <category><![CDATA[Routing]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Network]]></category>
            <guid isPermaLink="false">7xqrew8H7IM1awzPwN3MOw</guid>
            <dc:creator>Tom Strickx</dc:creator>
        </item>
        <item>
            <title><![CDATA[How we detect route leaks and our new Cloudflare Radar route leak service]]></title>
            <link>https://blog.cloudflare.com/route-leak-detection-with-cloudflare-radar/</link>
            <pubDate>Wed, 23 Nov 2022 16:00:00 GMT</pubDate>
            <description><![CDATA[ In this blog post, we will introduce our new system designed to detect route leaks and its integration on Cloudflare Radar and its public API. ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5tjdb8oktiBnsjsr6Fj109/dca51bae6a054cc120d91c11a35c54fe/image5-19.png" />
            
            </figure><p>Today we’re introducing Cloudflare Radar’s route leak data and API so that anyone can get information about route leaks across the Internet. We’ve built a comprehensive system that takes in data from public sources and Cloudflare’s view of the Internet drawn from our massive global network. The system is now feeding route leak data on Cloudflare Radar’s ASN pages and via the API.</p><p>This blog post is in two parts. There’s a discussion of BGP and route leaks followed by details of our route leak detection system and how it feeds Cloudflare Radar.</p>
    <div>
      <h2>About BGP and route leaks</h2>
      <a href="#about-bgp-and-route-leaks">
        
      </a>
    </div>
    <p>Inter-domain routing, i.e., exchanging reachability information among networks, is critical to the wellness and performance of the Internet. The <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">Border Gateway Protocol</a> (BGP) is the de facto routing protocol that exchanges routing information among organizations and networks. At its core, BGP assumes the information being exchanged is genuine and trust-worthy, which unfortunately is <a href="/rpki/">no longer a valid assumption</a> on the current Internet. In many cases, networks can make mistakes or intentionally lie about the reachability information and propagate that to the rest of the Internet. Such incidents can cause significant disruptions of the normal operations of the Internet. One type of such disruptive incident is <b>route leaks</b>.</p><p>We consider route leaks as the propagation of routing announcements beyond their intended scope (<a href="https://www.rfc-editor.org/rfc/rfc7908.html">RFC7908</a>). Route leaks can cause significant disruption affecting millions of Internet users, as we have seen in many past notable incidents. For example, <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">in June 2019 a misconfiguration</a> in a small network in Pennsylvania, US (<a href="https://radar.cloudflare.com/traffic/as396531">AS396531</a> - Allegheny Technologies Inc) accidentally leaked a Cloudflare prefix to Verizon, which proceeded to propagate the misconfigured route to the rest of its peers and customers. As a result, the traffic of a large portion of the Internet was squeezed through the limited-capacity links of a small network. The resulting congestion caused most of Cloudflare traffic to and from the affected IP range to be dropped.</p><p>A similar incident in November 2018 caused widespread unavailability of Google services when a Nigerian ISP (<a href="https://radar.cloudflare.com/traffic/as37282">AS37282</a> - Mainone) <a href="/how-a-nigerian-isp-knocked-google-offline/">accidentally leaked</a> a large number of Google IP prefixes to its peers and providers violating the <a href="https://ieeexplore.ieee.org/document/974527">valley-free principle</a>.</p><p>These incidents illustrate not only that route leaks can be very impactful, but also the snowball effects that misconfigurations in small regional networks can have on the global Internet.</p><p>Despite the criticality of detecting and rectifying route leaks promptly, they are often detected only when users start reporting the noticeable effects of the leaks. The challenge with detecting and preventing route leaks stems from the fact that AS business relationships and BGP routing policies are generally <a href="https://ieeexplore.ieee.org/document/974523">undisclosed</a>, and the affected network is often remote to the root of the route leak.</p><p>In the past few years, solutions have been proposed to prevent the propagation of leaked routes. Such proposals include <a href="https://datatracker.ietf.org/doc/rfc9234/">RFC9234</a> and <a href="https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-aspa-verification">ASPA</a>, which extends the BGP to annotate sessions with the relationship type between the two connected AS networks to enable the detention and prevention of route leaks.</p><p>An alternative proposal to implement similar signaling of BGP roles is through the use of <a href="https://en.wikipedia.org/wiki/Border_Gateway_Protocol#Communities">BGP Communities</a>; a transitive attribute used to encode metadata in BGP announcements. While these directions are promising in the long term, they are still in very preliminary stages and are not expected to be adopted at scale soon.</p><p>At Cloudflare, we have developed a system to detect route leak events automatically and send notifications to multiple channels for visibility. As we continue our efforts to bring more relevant <a href="https://developers.cloudflare.com/radar/">data to the public</a>, we are happy to announce that we are starting an <a href="https://developers.cloudflare.com/api/operations/radar_get_BGPRouteLeakEvents">open data API</a> for our route leak detection results today and integrate results to <a href="https://radar.cloudflare.com/">Cloudflare Radar</a> pages.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bWYvHZtatR3ooYMKq2WCb/79ef433f8c46c40aefa2fefe35905aa7/image4-32.png" />
            
            </figure>
    <div>
      <h2>Route leak definition and types</h2>
      <a href="#route-leak-definition-and-types">
        
      </a>
    </div>
    <p>Before we jump into how we design our systems, we will first do a quick primer on what a route leak is, and why it is important to detect it.</p><p>We refer to the published IETF RFC7908 document <a href="https://www.rfc-editor.org/rfc/rfc7908.html"><i>"Problem Definition and Classification of BGP Route Leaks"</i></a> to define route leaks.</p><p>&gt; A route leak is the propagation of routing announcement(s) beyond their intended scope.</p><p>The <i>intended scope</i> is often concretely defined as inter-domain routing policies based on business relationships between Autonomous Systems (ASes). These business relationships <a href="https://ieeexplore.ieee.org/document/974527">are broadly classified into four categories</a>: customers, transit providers, peers and siblings, although more complex arrangements are possible.</p><p>In a customer-provider relationship the customer AS has an agreement with another network to transit its traffic to the global routing table. In a peer-to-peer relationship two ASes agree to free bilateral traffic exchange, but only between their own IPs and the IPs of their customers. Finally, ASes that belong under the same administrative entity are considered siblings, and their traffic exchange is often unrestricted.  The image below illustrates how the three main relationship types translate to export policies.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3XAOHLT8UtzkLQigSbcrUd/34f8de710fcdda2c4feb7fd1eaaec576/image7-7.png" />
            
            </figure><p>By categorizing the types of AS-level relationships and their implications on the propagation of BGP routes, we can define multiple phases of a prefix origination announcements during propagation:</p><ul><li><p>upward: all path segments during this phase are <b>customer to provider</b></p></li><li><p>peering: one peer-peer path segment</p></li><li><p>downward: all path segments during this phase are <b>provider to customer</b></p></li></ul><p>An AS path that follows <a href="https://ieeexplore.ieee.org/document/6363987"><b>valley-free routing principle</b></a> will have <b>upward, peering, downward</b> phases, <b>all optional</b> but have to be <b>in that order</b>. Here is an example of an AS path that conforms with valley-free routing.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7mm4sD88Ai3cFOS7ugzhoH/d0ea889e5d22d60d4648b7f13a69b08b/image11-4.png" />
            
            </figure><p>In RFC7908, <a href="https://www.rfc-editor.org/rfc/rfc7908.html"><i>"Problem Definition and Classification of BGP Route Leaks"</i></a>, the authors define six types of route leaks, and we refer to these definitions in our system design. Here are illustrations of each of the route leak types.</p>
    <div>
      <h3>Type 1: Hairpin Turn with Full Prefix</h3>
      <a href="#type-1-hairpin-turn-with-full-prefix">
        
      </a>
    </div>
    <p>&gt; A multihomed AS learns a route from one upstream ISP and simply propagates it to another upstream ISP (the turn essentially resembling a hairpin).  Neither the prefix nor the AS path in the update is altered.</p><p>An AS path that contains a provider-customer and customer-provider segment is considered a type 1 leak. The following example: AS4 → AS5 → AS6 forms a type 1 leak.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Njxtz1neeF3ejVaLH6RPi/4836161f45d7f547608839f3a11467dd/image9-5.png" />
            
            </figure><p>Type 1 is the most recognized type of route leaks and is very impactful. In many cases, a customer route is preferable to a peer or a provider route. In this example, AS6 will likely prefer sending traffic via AS5 instead of its other peer or provider routes, causing AS5 to unintentionally become a transit provider. This can significantly affect the performance of the traffic related to the leaked prefix or cause outages if the leaking AS is not provisioned to handle a large influx of traffic.</p><p>In June 2015, Telekom Malaysia (<a href="https://radar.cloudflare.com/traffic/as4788">AS4788</a>), a regional ISP, <a href="https://www.bgpmon.net/massive-route-leak-cause-internet-slowdown/">leaked over 170,000 routes</a> learned from its providers and peers to its other provider Level3 (<a href="https://radar.cloudflare.com/traffic/as3549">AS3549</a>, now Lumen). Level3 accepted the routes and further propagated them to its downstream networks, which in turn caused significant network issues globally.</p>
    <div>
      <h3>Type 2: Lateral ISP-ISP-ISP Leak</h3>
      <a href="#type-2-lateral-isp-isp-isp-leak">
        
      </a>
    </div>
    <p>Type 2 leak is defined as propagating routes obtained from one peer to another peer, creating two or more consecutive peer-to-peer path segments.</p><p>Here is an example: AS3 → AS4 → AS5 forms a  type 2 leak.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/18VJCjED1cnQmJd0cXPnBU/f38d68dacc38637c0a9d72e5cdafa5ae/image1-70.png" />
            
            </figure><p>One example of such leaks is <a href="https://archive.nanog.org/meetings/nanog41/presentations/mauch-lightning.pdf">more than three very large networks appearing in sequence</a>. Very large networks (such as Verizon and Lumen) do not purchase transit from each other, and having <a href="https://puck.nether.net/bgp/leakinfo.cgi/">more than three such networks</a> on the path in sequence is often an indication of a route leak.</p><p>However, in the real world, it is not unusual to see multiple small peering networks exchanging routes and passing on to each other. Legit business reasons exist for having this type of network path. We are less concerned about this type of route leak as compared to type 1.</p>
    <div>
      <h3>Type 3 and 4: Provider routes to peer; peer routes to provider</h3>
      <a href="#type-3-and-4-provider-routes-to-peer-peer-routes-to-provider">
        
      </a>
    </div>
    <p>These two types involve propagating routes from a provider or a peer not to a customer, but to another peer or provider. Here are the illustrations of the two types of leaks:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6T2roi9v4ATputaICUVuUv/50831a0a631774e101e5f04abfb25876/image10-3.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/18D2LnLHkNkbzLsORnD95y/1c033cd2c3cea76e556013bc777889a9/image13-1.png" />
            
            </figure><p>As in the <a href="/how-a-nigerian-isp-knocked-google-offline/">previously mentioned example</a>, a Nigerian ISP who peers with Google accidentally leaked its route to its provider <a href="https://radar.cloudflare.com/traffic/as4809">AS4809</a>, and thus generated a type 4 route leak. Because routes via customers are usually preferred to others, the large provider (AS4809) rerouted its traffic to Google via its customer, i.e. the leaking ASN, overwhelmed the small ISP and took down Google for over one hour.</p>
    <div>
      <h2>Route leak summary</h2>
      <a href="#route-leak-summary">
        
      </a>
    </div>
    <p>So far, we have looked at the four types of route leaks defined in <a href="https://www.rfc-editor.org/rfc/rfc7908.html">RFC7908</a>. The common thread of the four types of route leaks is that they're all defined using AS-relationships, i.e., peers, customers, and providers. We summarize the types of leaks by categorizing the AS path propagation based on where the routes are learned from and propagate to. The results are shown in the following table.</p><table>
<thead>
  <tr>
    <th>Routes from / propagates to</th>
    <th>To provider</th>
    <th>To peer</th>
    <th>To customer</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>From provider</td>
    <td>Type 1</td>
    <td>Type 3</td>
    <td>Normal</td>
  </tr>
  <tr>
    <td>From peer</td>
    <td>Type 4</td>
    <td>Type 2</td>
    <td>Normal</td>
  </tr>
  <tr>
    <td>From customer</td>
    <td>Normal</td>
    <td>Normal</td>
    <td>Normal</td>
  </tr>
</tbody>
</table><p>We can summarize the whole table into one single rule: <b>routes obtained from a non-customer AS can only be propagated to customers</b>.</p><p><i>Note: Type 5 and type 6 route leaks are defined as prefix re-origination and announcing of private prefixes. Type 5 is more closely related to</i> <a href="https://www.cloudflare.com/learning/security/glossary/bgp-hijacking/"><i>prefix hijackings</i></a><i>, which we plan to expand our system to as the next steps, while type 6 leaks are outside the scope of this work. Interested readers can refer to sections 3.5 and 3.6 of</i> <a href="https://www.rfc-editor.org/rfc/rfc7908.html"><i>RFC7908</i></a> <i>for more information.</i></p>
    <div>
      <h2>The Cloudflare Radar route leak system</h2>
      <a href="#the-cloudflare-radar-route-leak-system">
        
      </a>
    </div>
    <p>Now that we know what a  route leak is, let’s talk about how we designed our route leak detection system.</p><p>From a very high level, we compartmentalize our system into three different components:</p><ol><li><p><b>Raw data collection module</b>: responsible for gathering BGP data from multiple sources and providing BGP message stream to downstream consumers.</p></li><li><p><b>Leak detection module</b>: responsible for determining whether a given AS-level path is a route leak, estimate the confidence level of the assessment, aggregating and providing all external evidence needed for further analysis of the event.</p></li><li><p><b>Storage and notification module</b>: responsible for providing access to detected route leak events and sending out notifications to relevant parties. This could also include building a dashboard for easy access and search of the historical events and providing the user interface for high-level analysis of the event.</p></li></ol>
    <div>
      <h3>Data collection module</h3>
      <a href="#data-collection-module">
        
      </a>
    </div>
    <p>There are three types of data input we take into consideration:</p><ol><li><p>Historical: BGP archive files for some time range in the pasta. <a href="https://www.routeviews.org/routeviews/">RouteViews</a> and <a href="https://ris.ripe.net/docs/20_raw_data_mrt.html#name-and-location">RIPE RIS</a> BGP archives</p></li><li><p>Semi-real-time: BGP archive files as soon as they become available, with a 10-30 minute delay.a. RouteViews and RIPE RIS archives with data broker that checks new files periodically (e.g. <a href="https://bgpkit.com/broker">BGPKIT Broker</a>)</p></li><li><p>Real-time: true real-time data sourcesa. <a href="https://ris-live.ripe.net/">RIPE RIS Live</a>b. Cloudflare internal BGP sources</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6p32es0tPhqsESHMazR8Ni/96910fef1be1cccf2bd69aa750b063c8/image6-11.png" />
            
            </figure><p>For the current version, we use the semi-real-time data source for the detection system, i.e., the BGP updates files from RouteViews and RIPE RIS. For data completeness, we process data from all public collectors from these two projects (a total of 63 collectors and over 2,400 collector peers) and implement a pipeline that’s capable of handling the BGP data processing as the data files become available.</p><p>For data files indexing and processing, we deployed an on-premises <a href="https://github.com/bgpkit/bgpkit-broker-backend">BGPKIT Broker instance</a> with Kafka feature enabled for message passing, and a custom concurrent <a href="https://www.rfc-editor.org/rfc/rfc6396.html">MRT</a> data processing pipeline based on <a href="https://github.com/bgpkit/bgpkit-parser">BGPKIT Parser</a> Rust SDK. The data collection module processes MRT files and converts results into a BGP messages stream at over two billion BGP messages per day (roughly 30,000 messages per second).</p>
    <div>
      <h3>Route leak detection</h3>
      <a href="#route-leak-detection">
        
      </a>
    </div>
    <p>The route leak detection module works at the level of individual BGP announcements. The detection component investigates one BGP message at a time, and estimates how likely a given BGP message is a result of a route leak event.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3uLtzc0IV8IAuxze3lYUXY/8eb3553e71a84930fe6851e8732f849f/image8-5.png" />
            
            </figure><p>We base our detection algorithm mainly on the <a href="https://ieeexplore.ieee.org/document/6363987">valley-free model</a>, which we believe can capture most of the notable route leak incidents. As mentioned previously, the key to having low false positives for detecting route leaks with the valley-free model is to have accurate AS-level relationships. While those relationship types are not publicized by every AS, there have been over two <a href="https://ieeexplore.ieee.org/document/6027863">decades of research</a> on the inference of the relationship types using publicly observed BGP data.</p><p>While state-of-the-art relationship inference algorithms have been shown to be <a href="https://dl.acm.org/doi/10.1145/2504730.2504735">highly accurate</a>, even a small margin of errors can still incur inaccuracies in the detection of route leaks. To alleviate such artifacts, we synthesize multiple data sources for inferring AS-level relationships, including <a href="https://www.caida.org/">CAIDA/UCSD</a>’s <a href="https://www.caida.org/catalog/datasets/as-relationships/">AS relationship</a> data and our in-house built AS relationship dataset. Building on top of the two AS-level relationships, we create a much more granular dataset at the per-prefix and per-peer levels. The improved dataset allows us to answer the question like what is the relationship between AS1 and AS2 with respect to prefix P observed by collector peer X. This eliminates much of the ambiguity for cases where networks have multiple different relationships based on prefixes and geo-locations, and thus helps us reduce the number of false positives in the system. Besides the AS-relationships datasets, we also apply the <a href="https://ihr.iijlab.net/ihr/en-us/documentation#AS_dependency">AS Hegemony dataset</a> from <a href="https://ihr.iijlab.net/ihr/en-us/">IHR IIJ</a> to further reduce false positives.</p>
    <div>
      <h3>Route leak storage and presentation</h3>
      <a href="#route-leak-storage-and-presentation">
        
      </a>
    </div>
    <p>After processing each BGP message, we store the generated route leak entries in a database for long-term storage and exploration. We also aggregate individual route leak BGP announcements and group relevant leaks from the same leak ASN within a short period together into <b>route-leak events</b>. The route leak events will then be available for consumption by different downstream applications like web UIs, an <a href="https://developers.cloudflare.com/api/operations/radar_get_BGPRouteLeakEvents">API</a>, or alerts.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7f4kKk1DFYPIltkyzArwDy/4347df7a5bee4ca6686455d6205324f7/image12-2.png" />
            
            </figure>
    <div>
      <h2>Route leaks on Cloudflare Radar</h2>
      <a href="#route-leaks-on-cloudflare-radar">
        
      </a>
    </div>
    <p>At Cloudflare, we aim to help build a better Internet, and that includes sharing our efforts on monitoring and securing Internet routing. Today, we are releasing our route leak detection system as public beta.</p><p>Starting today, users going to the Cloudflare Radar ASN pages will now find the list of route leaks that affect that AS. We consider that an AS is being affected when the leaker AS is one hop away from it in any direction, before or after.</p><p>The Cloudflare Radar ASN page is directly accessible via <a href="https://radar.cloudflare.com/as{ASN}"><b>https://radar.cloudflare.com/as{ASN}</b></a>. For example, one can navigate to <a href="https://radar.cloudflare.com/as174">https://radar.cloudflare.com/as174</a> to view the overview page for Cogent AS174. ASN pages now show a dedicated card for route leaks detected relevant to the current ASN within the selected time range.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5CoRAfRuDBdZr5zQxCtoGn/1a89ba21ea72c44cbd45da3661705f65/image2-54.png" />
            
            </figure><p>Users can also start using our <a href="https://developers.cloudflare.com/api/operations/radar_get_BGPRouteLeakEvents">public data API</a> to lookup route leak events with regards to any given ASN.  Our API supports filtering route leak results by time ranges, and ASes involved. Here is a screenshot of the <a href="https://developers.cloudflare.com/api/operations/radar_get_BGPRouteLeakEvents">route leak events API documentation page</a> on the <a href="/building-a-better-developer-experience-through-api-documentation/">newly updated API docs site</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5IgJJ1GO4uHepxxQc5vwV7/9e809b7ef9264f9c03d70c70c27d4bb5/image3-44.png" />
            
            </figure>
    <div>
      <h2>More to come on routing security</h2>
      <a href="#more-to-come-on-routing-security">
        
      </a>
    </div>
    <p>There is a lot more we are planning to do with route-leak detection. More features like a global view page, route leak notifications, more advanced APIs, custom automations scripts, and historical archive datasets will begin to ship on Cloudflare Radar over time. Your feedback and suggestions are also very important for us to continue improving on our detection results and serve better data to the public.</p><p>Furthermore, we will continue to expand our work on other important topics of Internet routing security, including global BGP hijack detection (not limited to our customer networks), RPKI validation monitoring, open-sourcing tools and architecture designs, and centralized routing security web gateway. Our goal is to provide the best data and tools for routing security to the communities so that we can build a better and more secure Internet together.</p><p>In the meantime, we opened a <a href="https://discord.com/channels/595317990191398933/1035553707116478495">Radar room</a> on our Developers Discord Server. Feel free to <a href="https://discord.com/channels/595317990191398933/1035553707116478495">join</a> and talk to us; the team is eager to receive feedback and answer questions.</p><p>Visit <a href="https://radar.cloudflare.com/">Cloudflare Radar</a> for more Internet insights. You can also follow us <a href="https://twitter.com/cloudflareradar">on Twitter</a> for more Radar updates.</p> ]]></content:encoded>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Routing Security]]></category>
            <guid isPermaLink="false">72oaP8g7ZckKtIVQxA8EX4</guid>
            <dc:creator>Mingwei Zhang</dc:creator>
            <dc:creator>Vasilis Giotsas</dc:creator>
            <dc:creator>Celso Martinho</dc:creator>
        </item>
        <item>
            <title><![CDATA[BGP security and confirmation biases]]></title>
            <link>https://blog.cloudflare.com/route-leaks-and-confirmation-biases/</link>
            <pubDate>Wed, 23 Feb 2022 13:59:26 GMT</pubDate>
            <description><![CDATA[ On February 1, 2022, a configuration error on one of our routers caused a route leak of up to 2,000 Internet prefixes to one of our Internet transit providers. This leak lasted for 32 seconds and at a later time 7 seconds ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/20cLql3Jp3Pjr0yzjRJUNU/18aa7947fa7aaf819a2e17686f876d19/Route-Leaks-and-Confirmation-Biases.png" />
            
            </figure><p>This is not what I imagined my first blog article would look like, but here we go.</p><p>On February 1, 2022, a configuration error on one of our routers caused a route leak of up to 2,000 Internet prefixes to one of our Internet transit providers. This leak lasted for 32 seconds and at a later time 7 seconds. We did not see any traffic spikes or drops in our network and did not see any customer impact because of this error, but this may have caused an impact to external parties, and we are sorry for the mistake.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7MWQzcEXTHSBFhDVvDszOo/2de836f980c0a7af00e3c378d0040edb/image1.jpg" />
            
            </figure>
    <div>
      <h3>Timeline</h3>
      <a href="#timeline">
        
      </a>
    </div>
    <p>All timestamps are UTC.</p><p>As part of our efforts to build the best network, we regularly update our Internet transit and peering links throughout our network. On February 1, 2022, we had a “hot-cut” scheduled with one of our Internet transit providers to simultaneously update router configurations on Cloudflare and ISP routers to migrate one of our existing Internet transit links in Newark to a link with more capacity. Doing a “hot-cut” means that both parties will change cabling and configuration at the same time, usually while being on a conference call, to reduce downtime and impact on the network. The migration started off-peak at 10:45 (05:45 local time) with our network engineer entering the bridge call with our data center engineers and remote hands on site as well as operators from the ISP.</p><p>At 11:17, we connected the new fiber link and established the BGP sessions to the ISP successfully. We had BGP filters in place on our end to not accept and send any prefixes, so we could evaluate the connection and settings without any impact on our network and services.</p><p>As the connection between our router and the ISP — like most Internet connections — was realized over a fiber link, the first item to check are the “light levels” of that link. This shows the strength of the optical signal received by our router from the ISP router and can indicate a bad connection when it’s too low. Low light levels are likely caused by unclean fiber ends or not fully seated connectors, but may also indicate a defective optical transceiver which connects the fiber link to the router - all of which can degrade service quality.</p><p>The next item on the checklist is interface errors, which will occur when a network device receives incorrect or malformed network packets, which would also indicate a bad connection and would likely lead to a degradation in service quality, too.</p><p>As light levels were good, and we observed no errors on the link, we deemed it ready for production and removed the BGP reject filters at 11:22.</p><p>This immediately triggered the maximum prefix-limit protection the ISP had configured on the BGP session and shut down the session, preventing further impact. The maximum prefix-limit is a safeguard in BGP to prevent the spread of route leaks and to protect the Internet. The limit is usually set just a little higher than the expected number of Internet prefixes from a peer to leave some headroom for growth but also catch configuration errors fast. The configured value was just 40 prefixes short of the number of prefixes we were advertising at that site, so this was considered the reason for the session to be shut down. After checking back internally, we asked the ISP to raise the prefix-limit, which they did.</p><p>The BGP session was reestablished at 12:08 and immediately shut down again. The problem was identified and fixed at 12:14.</p><p>10:45: Start of scheduled maintenance</p><p>11:17: New link was connected and BGP sessions went up (filters still in place)</p><p>11:22: Link was deemed ready for production and filters removed</p><p>11:23: BGP sessions were torn down by ISP router due to configured prefix-limit</p><p>12:08: ISP configures higher prefix-limits, BGP sessions briefly come up again and are shut down</p><p>12:14: Issue identified and configuration updated</p>
    <div>
      <h3>What happened and what we’re doing about it</h3>
      <a href="#what-happened-and-what-were-doing-about-it">
        
      </a>
    </div>
    <p>The outage occurred while migrating one of our Internet transits to a link with more capacity. Once the new link and a <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">BGP session</a> had been established, and the link deemed error-free, our network engineering team followed the peer-reviewed deployment plan. The team removed the filters from the BGP sessions, which had been preventing the Cloudflare router from accepting and sending prefixes via BGP.</p><p>Due to an oversight in the deployment plan, which had been peer-reviewed before without noticing this issue, no BGP filters to only export prefixes of Cloudflare and our customers were added. A peer review on the internal chat did not notice this either, so the network engineer performing this change went ahead.</p>
            <pre><code>ewr02# show |compare                                     
[edit protocols bgp group 4-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;
[edit protocols bgp group 6-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;</code></pre>
            <p>The change resulted in our router sending all known prefixes to the ISP router, which shut down the session as the number of prefixes received exceeded the maximum prefix-limit configured.</p><p>As the configured values for the maximum prefix-limits turned out to be rather low for the number of prefixes on our network, this didn’t come as a surprise to our network engineering team and no investigation into why the BGP session went down was started. The prefix-limit being too low seemed to be a perfectly valid reason.</p><p>We asked the ISP to increase the prefix-limit, which they did after they received approval on their side. Once the prefix-limit had been increased and the previously shutdown BGP sessions reset, the sessions were reestablished but were shut down immediately as the maximum prefix-limit was triggered again. This is when our network engineer started questioning whether there was another issue at fault and found and corrected the configuration error previously overlooked.</p><p>We made the following change in response to this event: we introduced an implicit reject policy for BGP sessions which will take effect if no import/export policy is configured for a specific BGP neighbor or neighbor group. This change has been deployed.</p>
    <div>
      <h3>BGP security &amp; preventing route-leaks — what’s in the cards?</h3>
      <a href="#bgp-security-preventing-route-leaks-whats-in-the-cards">
        
      </a>
    </div>
    <p>Route leaks aren’t new, and they keep happening. The industry has come up with many approaches to limit the impact or even prevent route-leaks. Policies and filters are used to control which prefixes should be exported to or imported from a given peer. RPKI can help to make sure only allowed prefixes are accepted from a peer and a maximum prefix-limit can act as a last line of defense when everything else fails.</p><p>BGP policies and filters are commonly used to ensure only explicitly allowed prefixes are sent out to BGP peers, usually only allowing prefixes owned by the entity operating the network and its customers. They can also be used to tweak some knobs (BGP local-pref, MED, AS path prepend, etc.) to influence routing decisions and balance traffic across links. This is what the policies we have in place for our peers and transits do. As explained above, the maximum prefix-limit is intended to tear down BGP sessions if more prefixes are being sent or received than to be expected. We have talked about RPKI before, it’s <a href="/rpki/">the required cryptographic upgrade to BGP routing</a>, and we still are on <a href="/rpki-details/">our path to securing Internet Routing</a>.</p><p>To improve the overall stability of the Internet even more, in 2017, a new Internet standard was proposed, which adds another layer of protection into the mix: <a href="https://datatracker.ietf.org/doc/html/rfc8212">RFC8212</a> defines <code>Default External BGP (EBGP) Route Propagation Behavior without Policies</code> which pretty much tackles the exact issues we were facing.</p><p>This RFC updates the BGP-4 standard (<a href="https://datatracker.ietf.org/doc/html/rfc4271">RFC4271</a>) which defines how BGP works and what vendors are expected to implement. On the Juniper operating system, JunOS, this can be activated by setting <code>defaults ebgp no-policy reject-always</code> on the <code>protocols bgp</code> hierarchy level starting with Junos OS Release 20.3R1. The <a href="https://github.com/bgp/RFC8212">RFC8212 repository on GitHub</a> provides a good overview on the current implementation status of RFC8212 in common network vendor OSes and routing daemons.</p><p>If you are running an older version of JunOS, a similar effect can be achieved by defining a REJECT-ALL policy and setting this as import/export policy on the <code>protocols bgp</code> hierarchy level. Note that this will also affect iBGP sessions, which the solution above will have no impact on.</p>
            <pre><code>policy-statement REJECT-ALL {
  then reject;
}

protocol bgp {
  import REJECT-ALL;
  export REJECT-ALL;
}</code></pre>
            
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>We are sorry for leaking routes of prefixes which did not belong to Cloudflare or our customers and to network engineers who got paged as a result of this.</p><p>We have processes in place to make sure that changes to our infrastructure are reviewed before being executed, so potential issues can be spotted before they reach production. In this case, the review process failed to catch this configuration error. In response, we will increase our efforts to further our network automation, to fully derive the device configuration from an intended state.</p><p>While this configuration error was caused by human error, it could have been detected and mitigated significantly faster if the confirmation bias did not kick in, making the operator think the observed behavior was to be expected. This error underlines the importance of our existing efforts on training our people to be aware of biases we have in our life. This also serves as a great example on how confirmation bias can influence and impact our work and that we should question our conclusions (early).</p><p>It also shows how important protocols like RPKI are. Route leaks are something even experienced network operators can cause accidentally, and technical solutions are needed to reduce the impact of leaks whether they are intentional or the result of an error.</p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Internet Traffic]]></category>
            <category><![CDATA[Better Internet]]></category>
            <guid isPermaLink="false">5vmYeAWFyKeluMXwpx7wir</guid>
            <dc:creator>Maximilian Wilhelm</dc:creator>
        </item>
        <item>
            <title><![CDATA[What happened on the Internet during the Facebook outage]]></title>
            <link>https://blog.cloudflare.com/during-the-facebook-outage/</link>
            <pubDate>Fri, 08 Oct 2021 15:16:00 GMT</pubDate>
            <description><![CDATA[ Today, we're going to show you how the Facebook and affiliate sites downtime affected us, and what we can see in our data. ]]></description>
            <content:encoded><![CDATA[ <p>It's been a few days now since Facebook, Instagram, and WhatsApp went AWOL and experienced one of the most extended and rough downtime periods in their existence.</p><p>When that happened, we reported our bird's-eye view of the event and posted the blog <a href="/october-2021-facebook-outage/">Understanding How Facebook Disappeared from the Internet</a> where we tried to explain what we saw and how <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">DNS</a> and BGP, two of the technologies at the center of the outage, played a role in the event.</p><p>In the meantime, more information has surfaced, and Facebook has <a href="https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/">published a blog post</a> giving more details of what happened internally.</p><p>As we said before, these events are a gentle reminder that the Internet is a vast network of networks, and we, as industry players and end-users, are part of it and should work together.</p><p>In the aftermath of an event of this size, we don't waste much time debating how peers handled the situation. We do, however, ask ourselves the more important questions: "How did this affect us?" and "What if this had happened to us?" Asking and answering these questions whenever something like this happens is a great and healthy exercise that helps us improve our own resilience.</p><p>Today, we're going to show you how the Facebook and affiliate sites downtime affected us, and what we can see in our data.</p>
    <div>
      <h3>1.1.1.1</h3>
      <a href="#1-1-1-1">
        
      </a>
    </div>
    <p>1.1.1.1 is a fast and privacy-centric public DNS resolver operated by Cloudflare, used by millions of users, browsers, and devices worldwide. Let's look at our telemetry and see what we find.</p><p>First, the obvious. If we look at the response rate, there was a massive spike in the number of SERVFAIL codes. SERVFAILs can happen for several reasons; we have an excellent blog called <a href="/unwrap-the-servfail/">Unwrap the SERVFAIL</a> that you should read if you're curious.</p><p>In this case, we started serving SERVFAIL responses to all facebook.com and whatsapp.com DNS queries because our resolver couldn't access the upstream Facebook authoritative servers. About 60x times more than the average on a typical day.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7BKM7m3fUz3fCRrVuCN3jh/4cb1ccccd0cbe22f2fed10cf10360084/image16.png" />
            
            </figure><p>If we look at all the queries, not specific to Facebook or WhatsApp domains, and we split them by IPv4 and IPv6 clients, we can see that our load increased too.</p><p>As explained before, this is due to a snowball effect associated with applications and users retrying after the errors and generating even more traffic. In this case, 1.1.1.1 had to handle more than the expected rate for A and AAAA queries.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7a67lujBcgIkqVusTr4R1t/6f5bd1971964e4b3721d6096b50bc8d7/image3-12.png" />
            
            </figure><p>Here's another fun one.</p><p>DNS vs. DoT and DoH. Typically, DNS queries and responses are <a href="https://datatracker.ietf.org/doc/html/rfc1035#section-4.2">sent in plaintext over UDP</a> (or TCP sometimes), and that's been the case for decades now. Naturally, this poses security and privacy risks to end-users as it allows in-transit attacks or traffic snooping.</p><p>With DNS over TLS (DoT) and DNS over HTTPS, clients can talk DNS using well-known, well-supported encryption and authentication protocols.</p><p>Our learning center has a good article on "<a href="https://www.cloudflare.com/learning/dns/dns-over-tls/">DNS over TLS vs. DNS over HTTPS</a>" that you can read. Browsers like Chrome, Firefox, and Edge have supported DoH for some time now, WAP uses DoH too, and you can even configure your operating system to use the new protocols.</p><p>When Facebook went offline, we saw the number of DoT+DoH SERVFAILs responses grow by over x300 vs. the average rate.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/TNHkfPvSSHUxljyh89S5a/4de08b52d1a9cd23862d56a90943677b/image14.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2pdogcCvQMRd10yEPUV0hu/c2ce7f4eeabd871727af41e2fc574524/image11-1.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6GJdW0PO7zIHBlnkKTyGa2/59fd778bee5c0a837877535399455210/image4-13.png" />
            
            </figure><p>So, we got hammered with lots of requests and errors, causing traffic spikes to our 1.1.1.1 resolver and causing an unexpected load in the edge network and systems. How did we perform during this stressful period?</p><p>Quite well. 1.1.1.1 kept its cool and continued serving the vast majority of requests around the <a href="https://www.dnsperf.com/#!dns-resolvers">famous 10ms mark</a>. An insignificant fraction of p95 and p99 percentiles saw increased response times, probably due to timeouts trying to reach Facebook’s nameservers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ZzJKGMnii2ghnbRxI8UoT/8c282529bb5efc0f834728c14047b463/image6-11.png" />
            
            </figure><p>Another interesting perspective is the distribution of the ratio between SERVFAIL and good DNS answers, by country. In theory, the higher this ratio is, the more the country uses Facebook. Here's the map with the countries that suffered the most:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3XJYvy2Ore7FARCv4oBbiD/d1700bca70c1c85c2fe01d3e49019fdb/image18.png" />
            
            </figure><p>Here’s the top twelve country list, ordered by those that apparently use Facebook, WhatsApp and Instagram the most:</p><table><tr><td><p><b>Country</b></p></td><td><p><b>SERVFAIL/Good Answers ratio</b></p></td></tr><tr><td><p>Turkey</p></td><td><p>7.34</p></td></tr><tr><td><p>Grenada</p></td><td><p>4.84</p></td></tr><tr><td><p>Congo</p></td><td><p>4.44</p></td></tr><tr><td><p>Lesotho</p></td><td><p>3.94</p></td></tr><tr><td><p>Nicaragua</p></td><td><p>3.57</p></td></tr><tr><td><p>South Sudan</p></td><td><p>3.47</p></td></tr><tr><td><p>Syrian Arab Republic</p></td><td><p>3.41</p></td></tr><tr><td><p>Serbia</p></td><td><p>3.25</p></td></tr><tr><td><p>Turkmenistan</p></td><td><p>3.23</p></td></tr><tr><td><p>United Arab Emirates</p></td><td><p>3.17</p></td></tr><tr><td><p>Togo</p></td><td><p>3.14</p></td></tr><tr><td><p>French Guiana</p></td><td><p>3.00</p></td></tr></table>
    <div>
      <h3>Impact on other sites</h3>
      <a href="#impact-on-other-sites">
        
      </a>
    </div>
    <p>When Facebook, Instagram, and WhatsApp aren't around, the world turns to other places to look for information on what's going on, other forms of entertainment or other applications to communicate with their friends and family. Our data shows us those shifts. While Facebook was going down, other services and platforms were going up.</p><p>To get an idea of the changing traffic patterns we look at DNS queries as an indicator of increased traffic to specific sites or types of site.</p><p>Here are a few examples.</p><p>Other social media platforms saw a slight increase in use, compared to normal.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7uFhqEr4MIl6HxpBI9GG1M/a995ce00e0ff2b2b96d9aa9fb3341fa6/image17.png" />
            
            </figure><p>Traffic to messaging platforms like Telegram, Signal, Discord and Slack got a little push too.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/16AyoRRD7YSUSd3MuHNNGO/c587c9a318702877248025d86f921c79/image9-6.png" />
            
            </figure><p>Nothing like a little gaming time when Instagram is down, we guess, when looking at traffic to sites like Steam, Xbox, Minecraft and others.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2v3JZdLK2rSbGPlhgicDJn/e8f2bb509432c9b5fa2837588bd1f927/image8-10.png" />
            
            </figure><p>And yes, people want to know what’s going on and fall back on news sites like CNN, New York Times, The Guardian, Wall Street Journal, Washington Post, Huffington Post, BBC, and others:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4JR6ipTzivZfWYOS5rmnlX/c9b363c7babfaa94b096ebd657780b5a/image5-12.png" />
            
            </figure>
    <div>
      <h3>Attacks</h3>
      <a href="#attacks">
        
      </a>
    </div>
    <p>One could speculate that the Internet was under attack from malicious hackers. Our Firewall doesn't agree; nothing out of the ordinary stands out.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6lBrIk5Tx5b64G2SGtfFXZ/b899bd50829087a48ef38b31b97cfd7a/image13.png" />
            
            </figure>
    <div>
      <h3>Network Error Logs</h3>
      <a href="#network-error-logs">
        
      </a>
    </div>
    <p><a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Network_Error_Logging">Network Error Logging</a>, NEL for short, is an experimental technology supported in Chrome. A website can issue a Report-To header and ask the browser to send reports about network problems, like bad requests or <a href="https://www.cloudflare.com/learning/dns/common-dns-issues/">DNS issues</a>, to a specific endpoint.</p><p>Cloudflare uses NEL data to quickly help triage end-user connectivity issues when end-users reach our network. You can learn more about this feature in our <a href="https://support.cloudflare.com/hc/en-us/articles/360050691831-Understanding-Network-Error-Logging">help center</a>.</p><p>If Facebook is down and their DNS isn't responding, Chrome will start reporting NEL events every time one of the pages in our zones fails to load Facebook comments, posts, ads, or authentication buttons. This chart shows it clearly.<b>​​</b></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/56Sp2huqEtyGiEOmzBrsiU/6ce9a0f9affe20691bf8588d9f4d6a8f/image7-8.png" />
            
            </figure>
    <div>
      <h3>WARP</h3>
      <a href="#warp">
        
      </a>
    </div>
    <p>Cloudflare announced <a href="https://1.1.1.1/">WARP</a> in 2019, and called it "<a href="/1111-warp-better-vpn/">A VPN for People Who Don't Know What V.P.N. Stands For</a>" and offered it for free to its customers. Today WARP is used by millions of people worldwide to securely and privately access the Internet on their desktop and mobile devices. Here's what we saw during the outage by looking at traffic volume between WARP and Facebook’s network:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4UIMVg1PpKr27RRkKagcdp/979d1e3ef2dda3107f4db2799bb2f3f6/WARP-graph-Facebook-outage-Oct-2021.png" />
            
            </figure><p>You can see how the steep drop on Facebook ASN traffic coincides with the start of the incident and how it compares to the same period the day before.</p>
    <div>
      <h3>Our own traffic</h3>
      <a href="#our-own-traffic">
        
      </a>
    </div>
    <p>People tend to think of Facebook as a place to visit. We log in, and we access Facebook, we post. It turns out that Facebook likes to visit us too, quite a lot. Like Google and other platforms, Facebook uses an army of crawlers to constantly check websites for data and updates. Those robots gather information about websites content, such as its titles, descriptions, thumbnail images, and metadata. You can learn more about this on the "<a href="https://developers.facebook.com/docs/sharing/webmasters/crawler/">The Facebook Crawler</a>" page and the <a href="https://ogp.me/">Open Graph</a> website.</p><p>Here's what we see when traffic is coming from the Facebook ASN, supposedly from crawlers, to our CDN sites:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5bWgF42TihmULktktsg6TX/669f1a605301c7be4f8983d45c42ffd3/image10-3.png" />
            
            </figure><p>The robots went silent.</p><p>What about the traffic coming to our CDN sites from Facebook <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent">User-Agents</a>? The gap is indisputable.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/WCcPk5XJu2bmFxhfNjJq3/825ee59fe0ac3b20f14fe249e18f701d/image1-16.png" />
            
            </figure><p>We see about 30% of a typical request rate hitting us. But it's not zero; why is that?</p><p>We'll let you know a little secret. Never trust User-Agent information; it's broken. User-Agent spoofing is everywhere. Browsers, apps, and other clients deliberately change the User-Agent string when they fetch pages from the Internet to hide, obtain access to certain features, or bypass paywalls (because pay-walled sites want sites like Facebook to index their content, so that then they get more traffic from links).</p><p>Fortunately, there are newer, and privacy-centric standards emerging like <a href="https://developer.mozilla.org/en-US/docs/Web/API/User-Agent_Client_Hints_API">User-Agent Client Hints</a>.</p>
    <div>
      <h3>Core Web Vitals</h3>
      <a href="#core-web-vitals">
        
      </a>
    </div>
    <p>Core Web Vitals are the subset of <a href="https://web.dev/vitals/">Web Vitals</a>, an initiative by Google to provide a unified interface to measure real-world quality signals when a user visits a web page. Such signals include Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS).</p><p>We <a href="/web-analytics-vitals-explorer/">use Core Web Vitals</a> with our privacy-centric Web Analytics product and collect anonymized data on how end-users experience the websites that enable this feature.</p><p>One of the metrics we can calculate using these signals is the page load time. Our theory is that if a page includes scripts coming from external sites (for example, Facebook "like" buttons, comments, ads), and they are unreachable, its total load time gets affected.</p><p>We used a list of about 400 domains that we know embed Facebook scripts in their pages and looked at the data.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/HZbQOhcV4ki3H7NVMD6Yq/117d2298e9d0e06521083d5176a34f77/image12.png" />
            
            </figure><p>Now let's look at the Largest Contentful Paint. <a href="https://web.dev/lcp/">LCP</a> marks the point in the page load timeline when the page's main content has likely loaded. The faster the LCP is, the better the end-user experience.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/QJiUkTvEWln7dpvLkVKiV/eb6595b2c781e5ccc68c03b4bd233e3b/image15.png" />
            
            </figure><p>Again, the page load experience got visibly degraded.</p><p>The outcome seems clear. The sites that use Facebook scripts in their pages took 1.5x more time to load their pages during the outage, with some of them taking more than 2x the usual time. Facebook's outage dragged the performance of some other sites down.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>When Facebook, Instagram, and WhatsApp went down, the Web felt it. Some websites got slower or lost traffic, other services and platforms got unexpected load, and people lost the ability to communicate or do business normally.</p> ]]></content:encoded>
            <category><![CDATA[Outage]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Trends]]></category>
            <category><![CDATA[Facebook]]></category>
            <guid isPermaLink="false">4sF0eFLy72giKT8ZsHadhg</guid>
            <dc:creator>Celso Martinho</dc:creator>
            <dc:creator>Sabina Zejnilovic</dc:creator>
        </item>
        <item>
            <title><![CDATA[Understanding how Facebook disappeared from the Internet]]></title>
            <link>https://blog.cloudflare.com/october-2021-facebook-outage/</link>
            <pubDate>Mon, 04 Oct 2021 21:08:52 GMT</pubDate>
            <description><![CDATA[ Today at 1651 UTC, we opened an internal incident entitled "Facebook DNS lookup returning SERVFAIL" because we were worried that something was wrong with our DNS resolver 1.1.1.1.  But as we were about to post on our public status page we realized something else more serious was going on. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The Internet - A Network of Networks</p><p>“<i>Facebook can't be down, can it?</i>”, we thought, for a second.</p><p>Today at 15:51 UTC, we opened an internal incident entitled "Facebook DNS lookup returning SERVFAIL" because we were worried that something was wrong with our DNS resolver <a href="https://developers.cloudflare.com/warp-client/">1.1.1.1</a>.  But as we were about to post on our <a href="https://www.cloudflarestatus.com/">public status</a> page we realized something else more serious was going on.</p><p>Social media quickly burst into flames, reporting what our engineers rapidly confirmed too. Facebook and its affiliated services WhatsApp and Instagram were, in fact, all down. Their DNS names stopped resolving, and their infrastructure IPs were unreachable. It was as if someone had "pulled the cables" from their data centers all at once and disconnected them from the Internet.</p><p>This wasn't a <a href="https://www.cloudflare.com/learning/dns/common-dns-issues/">DNS issue</a> itself, but failing DNS was the first symptom we'd seen of a larger Facebook outage.</p><p>How's that even possible?</p>
    <div>
      <h3>Update from Facebook</h3>
      <a href="#update-from-facebook">
        
      </a>
    </div>
    <p>Facebook has now <a href="https://engineering.fb.com/2021/10/04/networking-traffic/outage/">published a blog post</a> giving some details of what happened internally. Externally, we saw the BGP and DNS problems outlined in this post but the problem actually began with a configuration change that affected the entire internal backbone. That cascaded into Facebook and other properties disappearing and staff internal to Facebook having difficulty getting service going again.</p><p>Facebook posted <a href="https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/">a further blog post</a> with a lot more detail about what happened. You can read that post for the inside view and this post for the outside view.</p><p>Now on to what we saw from the outside.</p>
    <div>
      <h3>Meet BGP</h3>
      <a href="#meet-bgp">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">BGP</a> stands for Border Gateway Protocol. It's a mechanism to exchange routing information between autonomous systems (AS) on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver every network packet to their final destinations. Without BGP, the Internet routers wouldn't know what to do, and the Internet wouldn't work.</p><p>The Internet is literally a network of networks, and it’s bound together by BGP. BGP allows one network (say Facebook) to advertise its presence to other networks that form the Internet. As we write Facebook is not advertising its presence, ISPs and other networks can’t find Facebook’s network and so it is unavailable.</p><p>The individual networks each have an ASN: an Autonomous System Number. An Autonomous System (AS) is an individual network with a unified internal routing policy. An AS can originate prefixes (say that they control a group of IP addresses), as well as transit prefixes (say they know how to reach specific groups of IP addresses).</p><p>Cloudflare's ASN is <a href="https://www.peeringdb.com/asn/13335">AS13335</a>. Every ASN needs to announce its prefix routes to the Internet using BGP; otherwise, no one will know how to connect and where to find us.</p><p>Our <a href="https://www.cloudflare.com/learning/">learning center</a> has a good overview of what <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">BGP</a> and <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">ASNs</a> are and how they work.</p><p>In this simplified diagram, you can see six autonomous systems on the Internet and two possible routes that one packet can use to go from Start to End. AS1 → AS2 → AS3 being the fastest, and AS1 → AS6 → AS5 → AS4 → AS3 being the slowest, but that can be used if the first fails.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/32OFXCXQN61ABya2ei1YR2/b8dc0f86b97079926dd6de75533f0912/image5-10.png" />
            
            </figure><p>At 15:58 UTC we noticed that Facebook had stopped announcing the routes to their DNS prefixes. That meant that, at least, Facebook’s DNS servers were unavailable. Because of this Cloudflare’s 1.1.1.1 DNS resolver could no longer respond to queries asking for the IP address of facebook.com.</p>
            <pre><code>route-views&gt;show ip bgp 185.89.218.0/23
% Network not in table
route-views&gt;

route-views&gt;show ip bgp 129.134.30.0/23
% Network not in table
route-views&gt;</code></pre>
            <p>Meanwhile, other Facebook IP addresses remained routed but weren’t particularly useful since without DNS Facebook and related services were effectively unavailable:</p>
            <pre><code>route-views&gt;show ip bgp 129.134.30.0   
BGP routing table entry for 129.134.0.0/17, version 1025798334
Paths: (24 available, best #14, table default)
  Not advertised to any peer
  Refresh Epoch 2
  3303 6453 32934
    217.192.89.50 from 217.192.89.50 (138.187.128.158)
      Origin IGP, localpref 100, valid, external
      Community: 3303:1004 3303:1006 3303:3075 6453:3000 6453:3400 6453:3402
      path 7FE1408ED9C8 RPKI State not found
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
route-views&gt;</code></pre>
            <p>We keep track of all the BGP updates and announcements we see in our global network. At our scale, the data we collect gives us a view of how the Internet is connected and where the traffic is meant to flow from and to everywhere on the planet.</p><p>A BGP UPDATE message informs a router of any changes you’ve made to a prefix advertisement or entirely withdraws the prefix. We can clearly see this in the number of updates we received from Facebook when checking our time-series BGP database. Normally this chart is fairly quiet: Facebook doesn’t make a lot of changes to its network minute to minute.</p><p>But at around 15:40 UTC we saw a peak of routing changes from Facebook. That’s when the trouble began.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5y7pAAyjmIsSJXessKptMg/0f842af9af2a9c2550ff64f43ea7c365/image4-11.png" />
            
            </figure><p>If we split this view by routes announcements and withdrawals, we get an even better idea of what happened. Routes were withdrawn, Facebook’s DNS servers went offline, and one minute after the problem occurred, Cloudflare engineers were in a room wondering why 1.1.1.1 couldn’t resolve facebook.com and worrying that it was somehow a fault with our systems.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/71IU6JSuaRY843UulhFiKW/cb6d58af732594976beb1a78c0f8d308/image3-9.png" />
            
            </figure><p>With those withdrawals, Facebook and its sites had effectively disconnected themselves from the Internet.</p>
    <div>
      <h3>DNS gets affected</h3>
      <a href="#dns-gets-affected">
        
      </a>
    </div>
    <p>As a direct consequence of this, DNS resolvers all over the world stopped resolving their domain names.</p>
            <pre><code>➜  ~ dig @1.1.1.1 facebook.com
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: SERVFAIL, id: 31322
;facebook.com.			IN	A
➜  ~ dig @1.1.1.1 whatsapp.com
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: SERVFAIL, id: 31322
;whatsapp.com.			IN	A
➜  ~ dig @8.8.8.8 facebook.com
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: SERVFAIL, id: 31322
;facebook.com.			IN	A
➜  ~ dig @8.8.8.8 whatsapp.com
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: SERVFAIL, id: 31322
;whatsapp.com.			IN	A</code></pre>
            <p>This happens because DNS, like many other systems on the Internet, also has its routing mechanism. When someone types the <a href="https://facebook.com">https://facebook.com</a> URL in the browser, the DNS resolver, responsible for translating <a href="https://www.cloudflare.com/learning/dns/glossary/what-is-a-domain-name/">domain names</a> into actual IP addresses to connect to, first checks if it has something in its cache and uses it. If not, it tries to grab the answer from the domain nameservers, typically hosted by the entity that owns it.</p><p>If the nameservers are unreachable or fail to respond because of some other reason, then a SERVFAIL is returned, and the browser issues an error to the user.</p><p>Again, our learning center provides a <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">good explanation</a> on how DNS works.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/aYyNe0TkIR47XpI3RjaN4/179f32b67057dc440e1ad3af2a0061e2/image8-8.png" />
            
            </figure><p>Due to Facebook stopping announcing their DNS prefix routes through BGP, our and everyone else's DNS resolvers had no way to connect to their nameservers. Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses.</p><p>But that's not all. Now human behavior and application logic kicks in and causes another exponential effect. A tsunami of additional DNS traffic follows.</p><p>This happened in part because apps won't accept an error for an answer and start retrying, sometimes aggressively, and in part because end-users also won't take an error for an answer and start reloading the pages, or killing and relaunching their apps, sometimes also aggressively.</p><p>This is the traffic increase (in number of requests) that we saw on 1.1.1.1:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/JgzupnGlQqx0cZjijoRfb/bce5744c0d0a655e77c9b72f9d297496/image6-9.png" />
            
            </figure><p>So now, because Facebook and their sites are so big, we have DNS resolvers worldwide handling 30x more queries than usual and potentially causing latency and timeout issues to other platforms.</p><p>Fortunately, 1.1.1.1 was built to be Free, Private, Fast (as the independent DNS monitor <a href="https://www.dnsperf.com/#!dns-resolvers">DNSPerf</a> can attest), and scalable, and we were able to keep servicing our users with minimal impact.</p><p>The vast majority of our DNS requests kept resolving in under 10ms. At the same time, a minimal fraction of p95 and p99 percentiles saw increased response times, probably due to expired TTLs having to resort to the Facebook nameservers and timeout. The 10 seconds DNS timeout limit is well known amongst engineers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7HcKMYPFQWk5QlVyMY4dxH/c51ddb3df72b0baec1a0ff99b1a208fc/image2-11.png" />
            
            </figure>
    <div>
      <h3>Impacting other services</h3>
      <a href="#impacting-other-services">
        
      </a>
    </div>
    <p>People look for alternatives and want to know more or discuss what’s going on. When Facebook became unreachable, we started seeing increased DNS queries to Twitter, Signal and other messaging and social media platforms.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4LNsXe0n3q2kZgSwbiTgD8/ddc52a824634bf383fc2aaf8fbadc25e/image1-12.png" />
            
            </figure><p>We can also see another side effect of this unreachability in our WARP traffic to and from Facebook's affected ASN 32934. This chart shows how traffic changed from 15:45 UTC to 16:45 UTC compared with three hours before in each country. All over the world WARP traffic to and from Facebook’s network simply disappeared.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6pe57CevhAkyCi0MYDHurW/fc424989247ad4faa11f3b4c64781502/image7-6.png" />
            
            </figure>
    <div>
      <h3>The Internet</h3>
      <a href="#the-internet">
        
      </a>
    </div>
    <p>Today's events are a gentle reminder that the Internet is a very complex and interdependent system of millions of systems and protocols working together. That trust, standardization, and cooperation between entities are at the center of making it work for almost five billion active users worldwide.</p>
    <div>
      <h3>Update</h3>
      <a href="#update">
        
      </a>
    </div>
    <p>At around 21:00 UTC we saw renewed BGP activity from Facebook's network which peaked at 21:17 UTC.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/47HV9JWOs1yaUtXV1g269v/1effede60e4dc9e1eab042bb1fc1b4fe/unnamed-3-3.png" />
            
            </figure><p>This chart shows the availability of the DNS name 'facebook.com' on Cloudflare's DNS resolver 1.1.1.1. It stopped being available at around 15:50 UTC and returned at 21:20 UTC.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5y4sNviiduLRAipw6I6UHs/770e842b223754d4aa61fffe3ee8d7c6/unnamed-4.png" />
            
            </figure><p>Undoubtedly Facebook, WhatsApp and Instagram services will take further time to come online but as of 21:28 UTC Facebook appears to be reconnected to the global Internet and DNS working again.</p> ]]></content:encoded>
            <category><![CDATA[Trends]]></category>
            <category><![CDATA[Outage]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Facebook]]></category>
            <guid isPermaLink="false">7jh9UDGts4LJU26IAOLsgK</guid>
            <dc:creator>Celso Martinho</dc:creator>
            <dc:creator>Tom Strickx</dc:creator>
        </item>
        <item>
            <title><![CDATA[Protecting Cloudflare Customers from BGP Insecurity with Route Leak Detection]]></title>
            <link>https://blog.cloudflare.com/route-leak-detection/</link>
            <pubDate>Thu, 25 Mar 2021 13:00:00 GMT</pubDate>
            <description><![CDATA[ Today, we're excited to announce Route Leak Detection, a new network alerting feature that tells customers when a prefix they own that is onboarded to Cloudflare is being leaked. ]]></description>
            <content:encoded><![CDATA[ <p><i>This post is also available in </i><a href="/ja-jp/route-leak-detection-ja-jp/"><i>日本語</i></a><i>, </i><a href="/id-id/route-leak-detection-id-id/"><i>Bahasa Indonesia</i></a><i>, </i><a href="/th-th/route-leak-detection-th-th/"><i>ไทย</i></a><i>.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2RqommHUoS6kg7yEKfg1b0/b2a9436cfbfad21affdd171f271e4b06/image6-17.png" />
            
            </figure><p>Border Gateway Protocol (BGP) route leaks and hijacks can ruin your day — BGP is <a href="/is-bgp-safe-yet-rpki-routing-security-initiative/">insecure by design</a>, and incorrect routing information spreading across the Internet can be incredibly disruptive and dangerous to the normal functioning of customer networks, and the Internet at large. Today, we're excited to announce Route Leak Detection, a new network alerting feature that tells customers when a prefix they own that is onboarded to Cloudflare is being leaked, i.e., advertised by an unauthorized party. Route Leak Detection helps protect your routes on the Internet: it tells you when your traffic is going places it’s not supposed to go, which is an indicator of a possible attack, and reduces time to mitigate leaks by arming you with timely information.</p><p>In this blog, we will explain what route leaks are, how Cloudflare Route Leak Detection works, and what we are doing to help protect the Internet from route leaks.</p>
    <div>
      <h2>What are route leaks, and why should I care?</h2>
      <a href="#what-are-route-leaks-and-why-should-i-care">
        
      </a>
    </div>
    <p>A route leak occurs when a network on the Internet tells the rest of the world to route traffic through their network, when the traffic isn’t supposed to go there normally. <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">A great example</a> of this and the impact it can cause was an incident in June 2019, where a small ISP in Pennsylvania started advertising routes for part of the Internet including Cloudflare, Amazon, and Linode. A significant portion of traffic destined for those networks was incorrectly routed to the network, leaking Cloudflare, Amazon, and Linode’s prefixes, and causing congestion and unreachable network errors for end users. Route leaks tend to happen because of a misconfigured peering session or customer router, a software bug in a customer or third party router, a man-in-the-middle attack, or a malicious customer or third party.</p><p>Some route leaks are innocuous. But some route leaks can be malicious, and can have very real security impact. An attacker can advertise specific routes for the express purpose of directing users to their network to do things like <a href="/bgp-leaks-and-crypto-currencies/">steal cryptocurrencies</a> and other important data, or attempt to issue <a href="https://www.cloudflare.com/application-services/products/ssl/">SSL/TLS certificates</a> that can be used to impersonate domains. By advertising more specific routes, an attacker can trick you into accessing a site that you don’t intend to, and if it looks exactly like the site you expect, you may unwittingly enter in personal data and be at risk for an attack. Here’s a diagram representing traffic without a route leak:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ldg1OBf3Ql3K2OM5QzuFQ/dfc07340361bb25b84ea1c8c7caf2d57/image4-32.png" />
            
            </figure><p>And here’s traffic after a route leak:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JTlxIjXatTk3p7VWBjqst/57c838e231584e8dd22db0be74001e82/image5-30.png" />
            
            </figure><p>So in addition to making users unhappy because a lot of Internet traffic is going through paths that can’t handle it, route leaks can have very real data leak implications.</p><p>Cloudflare’s Route Leak Detection allows you to get notified quickly when your routes are leaking so that you know when a potential attack is happening.</p>
    <div>
      <h2>How does Cloudflare Route Leak Detection protect my network?</h2>
      <a href="#how-does-cloudflare-route-leak-detection-protect-my-network">
        
      </a>
    </div>
    
    <div>
      <h3>How to configure Route Leak Detection</h3>
      <a href="#how-to-configure-route-leak-detection">
        
      </a>
    </div>
    <p>In order to configure Route Leak Detection, you must be a Cloudflare customer who has <a href="https://developers.cloudflare.com/byoip/">“brought your own IP” (BYOIP)</a> addresses—this includes Magic Transit (L3), Spectrum (L4), and WAF (L7) customers. Only prefixes advertised by Cloudflare qualify for Route Leak Detection.</p><p>Configuring Route Leak Detection can be done by setting up a message in the Notifications tab in your account.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/QnjMkL4u4AwrgSh1Jpxrr/142248389c38a6204d2948785f36208c/image7.gif" />
            
            </figure><p>Cloudflare will then begin monitoring all of your onboarded prefixes for leaks and hijacks and will send you alerts when they occur via <a href="https://support.cloudflare.com/hc/en-us/articles/360047358211-Connecting-PagerDuty-to-Cloudflare">email or specialized on call tools like PagerDuty</a>.</p><p>Cloudflare’s alert notification system supports webhooks, email, and PagerDuty, so your teams are kept up to date across their desired medium with changes in network routes and so that they can respond and take corrective action when necessary.</p>
    <div>
      <h3>An example attack scenario</h3>
      <a href="#an-example-attack-scenario">
        
      </a>
    </div>
    <p>A malicious party attempting to use routes to gain access to customer data starts advertising a subnet of onboarded prefixes for one of our Magic Transit customers. This attack, if not found and remediated quickly, could have serious impact on the customer. When the attacker begins to advertise the prefix without the customer’s knowledge, BGP updates and route changes start occurring rapidly in the global routing table—typically within 60 seconds.</p><p>Let’s walk through how a customer might deploy Route Leak Detection. Customer Acme Corp. owns the IP prefix 203.0.113.0/24. Acme has onboarded 203.0.113.0/24 to Cloudflare, and Cloudflare tells the rest of the Internet that this prefix is reachable through Cloudflare’s network.</p><p>Once Acme has enabled Route Leak Detection, Cloudflare continuously monitors routing information on the Internet for 203.0.113.0/24. Our goal is to detect leaks within five minutes of the erroneous routing information propagating on the Internet.</p><p>Let’s go back to the attack scenario. A malicious party attempting to attack Acme’s network hijacks the advertisement for 203.0.113.0/24, diverting legitimate users from the intended network path to Acme (though Cloudflare’s network) and instead to a facsimile of Acme’s network intended to capture information from unwitting users.</p><p>Because Acme has enabled Route Leak Detection, an alert is sent to Acme’s administrators.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4NcQzvEYk24mS66H5r6jMc/e3924fc243bc6c1dc79829063561fb4a/image1-41.png" />
            
            </figure><p>The alert includes all of the ASNs that are seeing the prefix being advertised by the potentially malicious party. Acme is able to warn their users that they may be at risk of a data exfiltration attack, and they should be on the lookout for suspicious behavior.</p><p>Acme is also able to quickly contact the service providers listed in the alert to stop honoring the smaller routes. Currently, the process of mitigating a route leak is a highly manual process, requiring contacting service providers directly using contact information published in <a href="https://www.peeringdb.com/">public databases</a>. In the future, we plan to build features to automate this outreach and mitigation process to further drive downtime to mitigation for route leak events that may impact our customers.</p>
    <div>
      <h2>How does Cloudflare detect route leaks?</h2>
      <a href="#how-does-cloudflare-detect-route-leaks">
        
      </a>
    </div>
    <p>Cloudflare uses several sources of routing data to create a synthesis of how the Internet sees routes to our BYOIP customers. Cloudflare then watches these views to track any sudden changes that occur on the Internet. If we can correlate those changes to actions we have taken, then we know the change is benign, and it’s business as usual. However, if we haven’t made any changes, we quickly take action to tell you that your routes and your users may be at risk.</p>
    <div>
      <h3>Cloudflare’s outside-in ingestion pipeline</h3>
      <a href="#cloudflares-outside-in-ingestion-pipeline">
        
      </a>
    </div>
    <p>Cloudflare’s main source of data is from externally maintained repositories such as <a href="https://ris-live.ripe.net/">RIPE’s RIS feed</a>, <a href="http://www.routeviews.org/">RouteViews</a>, and <a href="https://bgpstream.caida.org/data#!caida-bmp">Caida’s public BMP feed</a>. It is important to use multiple external views of the Internet routing table to be as accurate as possible when making inferences about the state of the Internet. Cloudflare makes API calls to those sources to ingest data and analyze changes in BGP routes. These feeds allow us to ingest routing data for the whole Internet. Cloudflare filters all of that down to your prefixes that you have previously onboarded to Cloudflare.</p><p>Once this data is ingested and filtered, Cloudflare begins cross-referencing updates to the global routing table with metrics that indicate possible hijacks, such as the number of ASNs that directly see your routes, the number of BGP updates that occur over short periods of time, and how many subnets are being advertised. If the number of ASNs that directly see your routes or the number of updates change drastically, it could mean that your prefixes are being leaked. If a subnet of the prefixes you are advertising are seeing drastic amounts of change in the global routing table, it’s likely that your prefixes are being leaked somewhere.</p><p>Cloudflare already has this configured on our own prefixes today. Here’s an example of what we see when our system determines that something is wrong:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3UsydST8qItOP1lNo38LCP/51816b8fda71313bcc09dc867d2a8cc3/image2-35.png" />
            
            </figure><p>Cloudflare owns the prefix range 2606:4700:50::/44, as it is a subnet of one of the ranges <a href="https://www.cloudflare.com/ips/">listed on our site here</a>. For a period of an hour, we noticed that someone tried to advertise a subnet of that range to 38 other networks. Fortunately, because we have <a href="/rpki-details/">deployed RPKI</a>, we know that most networks will reject rather than honor these route advertisements from attackers.</p>
    <div>
      <h2>What can I do to prevent route leaks in the future?</h2>
      <a href="#what-can-i-do-to-prevent-route-leaks-in-the-future">
        
      </a>
    </div>
    <p>The best way to prevent route leaks is to deploy <a href="/rpki-details/">RPKI</a> in your network, and urge your Internet providers to do so as well. RPKI allows you and your providers to sign routes that you advertise to the Internet, so that no one else can steal them. If someone is advertising your RPKI routes, any providers that support RPKI will not forward those routes to other customers, ensuring that the attempted leak is contained as close to the attacker as possible.</p><p>Cloudflare’s <a href="https://isbgpsafeyet.com/">continued advocacy</a> for RPKI has yielded fruits in the past three months alone. Providers such as Amazon, Google, Telstra, Cogent, and even Netflix have started supporting RPKI and are filtering and dropping invalid prefixes. In fact, over 50% of the top Internet providers now support RPKI in some fashion:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/33cOUIS0k8FDIJjIVHC2aN/6e14337d251c6f6d4c570c668ffa1b25/image3-33.png" />
            
            </figure><p>Cloudflare’s Route Leak Detection combined with more providers implementing RPKI are helping to ensure that data loss and downtime from route leaks become a thing of the past. If you’re a Cloudflare Magic Transit or BYOIP customer, try configuring a route leak alert in your dash today. If you’re not a Magic Transit or BYOIP customer, reach out to our <a href="https://www.cloudflare.com/plans/enterprise/contact/">sales team</a> to get started in the process to keep your network safe — even the routes.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[RPKI]]></category>
            <guid isPermaLink="false">6U8gLsr4STQiKqza7HDwyP</guid>
            <dc:creator>David Tuber</dc:creator>
        </item>
        <item>
            <title><![CDATA[The Internet is Getting Safer: Fall 2020 RPKI Update]]></title>
            <link>https://blog.cloudflare.com/rpki-2020-fall-update/</link>
            <pubDate>Fri, 06 Nov 2020 12:36:07 GMT</pubDate>
            <description><![CDATA[ The cap of two hundred thousand routing cryptographic records was recently passed. We thought it was time for an update on a major year for RPKI. ]]></description>
            <content:encoded><![CDATA[ <p>The Internet is a network of networks. In order to find the path between two points and exchange data, the network devices rely on the information from their peers. This information consists of IP addresses and Autonomous Systems (AS) which announce the addresses using Border Gateway Protocol (BGP).</p><p>One problem arises from this design: what protects against a malevolent peer who decides to announce incorrect information? The damage caused by <a href="/bgp-leaks-and-crypto-currencies/">route hijacks can be major</a>.</p><p>Routing Public Key Infrastructure (RPKI) is a framework created in 2008. Its goal is to provide a source of truth for Internet Resources (IP addresses) and ASes in signed cryptographically signed records called Route Origin Objects (ROA).</p><p>Recently, we’ve seen the significant threshold of two hundred thousand ROAs being passed. This represents a big step in making the Internet more secure against accidental and deliberate BGP tampering.</p><p>We have talked about RPKI <a href="/tag/rpki/">in the past</a>, but we thought it would be a good time for an update.</p><p>In a more technical context, the RPKI framework consists of two parts:</p><ul><li><p>IP addresses need to be cryptographically signed by their owners in a database managed by a Trust Anchor: Afrinic, APNIC, ARIN, LACNIC and RIPE NCC. Those five organizations are in charge of allocating Internet resources. The ROA indicates which Network Operator is allowed to announce the addresses using BGP.</p></li><li><p>Network operators download the list of ROAs, perform the cryptographic checks and then apply filters on the prefixes they receive: this is called BGP Origin Validation.</p></li></ul>
    <div>
      <h3>The “Is BGP Safe Yet” website</h3>
      <a href="#the-is-bgp-safe-yet-website">
        
      </a>
    </div>
    <p>The launch of the website <a href="https://isbgpsafeyet.com/">isbgpsafeyet.com</a> to test if your ISP correctly performs BGP Origin Validation was a success. Since launch, it has been visited more than five million times from over 223 countries and 13,000 unique networks (20% of the entire Internet), generating half a million BGP Origin Validation tests.</p><p>Many providers subsequently indicated on social media (for example, <a href="https://twitter.com/Aussie_BB/status/1252046032214450176">here</a> or <a href="https://twitter.com/swisscom_csirt/status/1300666695959244800">here</a>) that they had an RPKI deployment in the works. This increase in Origin Validation by networks is increasing the security of the Internet globally.</p><p>The site’s test for Origin Validation consists of queries toward two addresses, one of which is behind an RPKI invalid prefix and the other behind an RPKI valid prefix. If the query towards the invalid succeeds, the test fails as the ISP does not implement Origin Validation. We counted the number of queries that failed to reach invalid.cloudflare.com. This also included a few thousand <a href="https://atlas.ripe.net/measurements/?page=1&amp;search=target:invalid.rpki.cloudflare.com#tab-http">RIPE Atlas tests</a> that were started by Cloudflare and various contributors, providing coverage for smaller networks.</p><p>Every month since launch we’ve seen that around 10 to 20 networks are deploying RPKI Origin Validation. Among the major providers we can build the following table:</p><table><tr><td><p><b>Month</b></p></td><td><p><b>Networks</b></p></td></tr><tr><td><p>August</p></td><td><p>Swisscom (Switzerland), Salt (Switzerland)</p></td></tr><tr><td><p>July</p></td><td><p>Telstra (Australia), Quadranet (USA), Videotron (Canada)</p></td></tr><tr><td><p>June</p></td><td><p>Colocrossing (USA), Get Norway (Norway), Vocus (Australia), Hurricane Electric (Worldwide), Cogent (Worldwide)</p></td></tr><tr><td><p>May</p></td><td><p>Sengked Fiber (Indonesia), Online.net (France), WebAfrica Networks (South Africa), CableNet (Cyprus), IDnet (Indonesia), Worldstream (Netherlands), GTT (Worldwide)</p></td></tr></table><p>With the help of many <a href="https://github.com/cloudflare/isbgpsafeyet.com/blob/master/CONTRIBUTING.md">contributors</a>, we have compiled a list of network operators and public statements at the top of the isbgpsafeyet.com page.</p><p>We excluded providers that manually blocked the traffic towards the prefix instead of using RPKI. Among the techniques we see are firewall filtering and manual prefix rejection. The filtering is often propagated to other customer ISPs. In a unique case, an ISP generated a “more-specific” blackhole route that leaked to multiple peers over the Internet.</p><p>The deployment of RPKI by major transit providers, also known as Tier 1, such as Cogent, GTT, Hurricane Electric, NTT and Telia made many downstream networks more secure without them having them deploying validation software.</p><p>Overall, we looked at the evolution of the successful tests per ASN, and we noticed a steady increase over the recent months of 8%.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1F5LrsstbIKXWNt4drF5Vt/7515a8aad81c55ee921785a2f71ff3a8/success-ratio-3-1.png" />
            
            </figure><p>Furthermore, when we probed the entire IPv4 space this month, using a similar technique to the isbgpsafeyet.com test, many more networks were not able to reach an RPKI invalid prefix than compared to the same period last year. This confirms an increase of RPKI Origin Validation deployment across all network operators. The picture below shows the IPv4 space behind a network with RPKI Origin Validation enabled in yellow and the active space in blue. It uses a <a href="https://en.wikipedia.org/wiki/Hilbert_curve">Hilbert Curve</a> to efficiently plot IP addresses: for example one /20 prefix (4096 IPs) is a pixel, a /16 prefix (65536 IPs) will form a 4x4 pixels square.</p><p>The more the yellow spreads, the safer the Internet becomes.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/kX7jAx67NEyl7ujWucW2h/72625c4415f1f619b73e2fc3f0654373/hilbert-comp.png" />
            
            </figure><p>What does it mean exactly? <i>If you were hijacking a prefix, the users behind the yellow space would likely not be affected.</i> This also applies if you miss-sign your prefixes: you would not be able to reach the services or users behind the yellow space. Once RPKI is enabled everywhere, there will only be yellow squares.</p>
    <div>
      <h3>Progression of signed prefixes</h3>
      <a href="#progression-of-signed-prefixes">
        
      </a>
    </div>
    <p>Owners of IP addresses indicate the networks allowed to announce them. They do this by signing prefixes: they create Route Origin Objects (ROA). As of today, there are more than 200,000 ROAs. The distribution shows that the RIPE region is still leading in ROA count, then followed by the APNIC region.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Dskm8cAFbT7A51uLr1pRF/d3238ae962da9a9ddf442e833af6864f/image2-2.png" />
            
            </figure><p>2020 started with 172,000 records and the count is getting close to 200,000 at the beginning of November, approximately a quarter of all the Internet routes. Since last year, the database of ROAs grew by more than 70 percent, from 100,000 records, an average pace of 5% every month.</p><p>On the following graph of unique ROAs count per day, we can see two points that were followed by a change in ROA creation rate: 140/day, then 231/day, and since August, 351 new ROAs per day.</p><p>It is not yet clear what caused the increase in August.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/mzXuBiiXzCAl7qIUwsMA4/2b2a51ec5f4443b2951ad59cd38223b5/count-growth-3.png" />
            
            </figure>
    <div>
      <h3>Free services and software</h3>
      <a href="#free-services-and-software">
        
      </a>
    </div>
    <p>In <a href="/bgp-leaks-and-crypto-currencies/">2018</a> and <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">2019</a>, Cloudflare was impacted by BGP route hijacks. Both could have been avoided with RPKI. Not long after the first incident, we started signing prefixes and developing RPKI software. It was necessary to make BGP safer, and we wanted to do more than talk about it. But we also needed enough networks to be deploying RPKI as well. By making deployment easier for everyone, we hoped to increase adoption.</p><p>The following is a reminder of what we built over the years around RPKI and how it grew.</p><p><a href="https://github.com/cloudflare/cfrpki">OctoRPKI</a> is Cloudflare’s open source RPKI Validation software. It periodically generates a JSON document of validated prefixes that we pass onto our routers using <a href="https://github.com/cloudflare/gortr">GoRTR</a>. It generates most of the data behind the graphs here.</p><p>The latest version, <a href="https://github.com/cloudflare/cfrpki/releases/tag/v1.2.0">1.2.0</a>, of OctoRPKI was released at the end of October. It implements important security fixes, better memory management and extended <a href="https://github.com/cloudflare/cfrpki/blob/master/Monitoring.md">logging</a>. This is the first validator to provide detailed information around cryptographically invalid records into <a href="https://sentry.io/welcome/">Sentry</a> and performance data in <a href="https://opentracing.io/">distributed tracing tools</a>.GoRTR remains heavily used in production, including by <a href="https://github.com/cloudflare/gortr#in-the-field">transit providers</a>. It can natively connect to other validators like <a href="https://www.rpki-client.org/">rpki-client</a>.</p><p>When we released our public rpki.json endpoint in early 2019, the idea was to enable anyone to see what Cloudflare was filtering.</p><p>The file is also used as a bootstrap by GoRTR, so that users can test a deployment. The file is cached on more than 200 data centers, ensuring quick and secure delivery of a list of valid prefixes, making RPKI more accessible for smaller networks and developers.</p><p>Between March 2019 and November 2020, the number of queries more than doubled and there are five times more networks querying this file.</p><p>The growth of queries follows approximately the rate of ROA creation (~5% per month).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5kDHjZZCWlsKLvUHVCrNiZ/d58ee3429910b250612355c8e74d0a50/rpki-json-evolution-4.png" />
            
            </figure><p>A public RTR server is also available on rtr.rpki.cloudflare.com. It includes a plaintext endpoint on port 8282 and an SSH endpoint on port 8283. This allows us to test new versions of <a href="https://github.com/cloudflare/gortr">GoRTR</a> before release.</p><p>Later in 2019, we also built a <a href="https://rpki.cloudflare.com">public dashboard</a> where you can see in-depth RPKI validation. With a GraphQL API, you can now explore the validation data, test a list of prefixes, or see the status of the current routing table.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3cJHLgbi7dDeAPV7ZV2izH/9a895b790137c972c8623f7248539cdc/rpki.cloudflare.com__view-validator-validateRoute-13335_1.1.1.0-2F24.png" />
            
            </figure><p>Currently, the API is used by <a href="https://github.com/nttgin/BGPalerter">BGPalerter</a>, an open-source tool that detects routing issues (including hijacks!) from a stream of BGP updates.</p><p>Additionally, starting in November, you can access the historical data from May 2019. Data is computed daily and contains the unique records. The team behind the dashboard worked hard to provide a fast and accurate visualization of the daily ROA changes and the volumes of files changed over the day.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3uoRqILkpkM58M0iKbqZm6/fcbcaf5f1dbd7695504e7cd8b0df3d4f/rpki_dashboard_candlestick.png" />
            
            </figure>
    <div>
      <h3>The future</h3>
      <a href="#the-future">
        
      </a>
    </div>
    <p>We believe RPKI is going to continue growing, and we would like to thank the hundreds of network engineers around the world who are making the Internet routing more secure by deploying RPKI.</p><p>25% of routes are signed and 20% of the Internet is doing origin validation and those numbers grow every day. We believe BGP will be safer <a href="https://youtu.be/3BAwBClazWc?t=1333">before reaching 100%</a> of deployment; for instance, once the remaining transit providers enable Origin Validation, it is unlikely a BGP hijack will make it to the front page of world news outlets.</p><p>While difficult to quantify, we believe that critical mass of protected resources will be reached in late 2021.</p><p>We will keep improving the tooling; OctoRPKI and GoRTR are open-source, and we welcome contributions. In the near future, we plan on releasing a packaged version of GoRTR that can be directly installed on certain routers. Stay tuned!</p> ]]></content:encoded>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[RPKI]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Cryptography]]></category>
            <guid isPermaLink="false">1EyhrRgU58WhG0wa9Bo8Qh</guid>
            <dc:creator>Louis Poinsignon</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing Regional Services]]></title>
            <link>https://blog.cloudflare.com/introducing-regional-services/</link>
            <pubDate>Fri, 26 Jun 2020 11:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare launches Regional Services, giving customers control over where their data is processed. ]]></description>
            <content:encoded><![CDATA[ <p>In a world where, increasingly, workloads shift to the cloud, it is often uncertain and unclear how data travels the Internet and in which countries data is processed. Today, Cloudflare is pleased to announce that we're giving our customers control. With Regional Services, we’re providing customers full control over exactly where their traffic is handled.</p><p>We operate a global network spanning more than 200 cities. Each data center runs servers with the exact same software stack. This has enabled Cloudflare to quickly and efficiently add capacity where needed. It also allows our engineers to ship features with ease: deploy once, and it's available globally.</p><p>The same benefit applies to our customers: configure once and that change is applied everywhere in seconds, regardless of whether they’re changing security features, adding a DNS record or deploying a Cloudflare Worker containing code.</p><p>Having a homogenous network is great from a routing point of view: whenever a user performs an HTTP request, the closest datacenter is found due to Cloudflare's Anycast network. BGP looks at the hops that would need to be traversed to find the closest data center. This means that someone near the Canadian border (let's say North Dakota) could easily find themselves routed to Winnipeg (inside Canada) instead of a data center in the United States. This is generally what our customers want and expect: find the fastest way to serve traffic, regardless of geographic location.</p><a href="https://cloudflare.tv/">
         <img src="http://staging.blog.mrk.cfdata.org/content/images/2020/06/tube-blog-banner.png" />
      </a><p>Some organizations, however, have expressed preferences for maintaining regional control over their data for a variety of reasons. For example, they may be bound by agreements with their own customers that include geographic restrictions on data flows or data processing. As a result, some customers have requested control over where their web traffic is serviced.</p><p>Regional Services gives our customers the ability to accommodate regional restrictions while still using Cloudflare’s global edge network. As of today, Enterprise customers can add Regional Services to their contracts. With Regional Services, customers can choose which subset of data centers are able to service traffic on the HTTP level. But we're not reducing network capacity to do this: that would not be the Cloudflare Way. Instead, we're allowing customers to use our entire network for <a href="https://www.cloudflare.com/ddos/">DDoS protection</a> but limiting the data centers that apply higher-level layer 7 security and performance features such as WAF, Workers, and Bot Management.</p><p>Traffic is ingested on our global Anycast network at the location closest to the client, as usual, and then passed to data centers inside the geographic region of the customer’s choice. TLS keys are only <a href="/geo-key-manager-how-it-works">stored</a> and used to actually handle traffic inside that region. This gives our customers the benefit of our huge, low-latency, high-throughput network, capable of withstanding even the <a href="/the-daily-ddos-ten-days-of-massive-attacks/">largest DDoS attacks</a>, while also giving them local control: only data centers inside a customer’s preferred geographic region will have the access necessary to apply security policies.</p><p>The diagram below shows how this process works. When users connect to Cloudflare, they hit the closest data center to them, by nature of our Anycast network. That data center detects and mitigates DDoS attacks. Legitimate traffic is passed through to a data center with the geographic region of the customers choosing. Inside that data center, traffic is inspected at OSI layer 7 and HTTP products can work their magic:</p><ul><li><p>Content can be returned from and stored in cache</p></li><li><p>The WAF looks inside the HTTP payloads</p></li><li><p>Bot Management detects and blocks suspicious activity</p></li><li><p>Workers scripts run</p></li><li><p>Access policies are applied</p></li><li><p>Load Balancers look for the best origin to service traffic</p></li></ul>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7aaFSqiVx77rXsS2N3RT1f/d574a8616e54dd8246b68ee94a09837e/image2-9.png" />
            
            </figure><p>Today's launch includes preconfigured geographic regions; we'll look to add more depending on customer demand. Today, US and EU regions are available immediately, meaning layer 7 (HTTP) products can be configured to only be applied within those regions and not outside of them.</p><p>The US and EU maps are depicted below. Purple dots represent data centers that apply DDoS protection and network acceleration. Orange dots represent data centers that process traffic.</p>
    <div>
      <h3>US</h3>
      <a href="#us">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/27QO1l8SD4U7w27OSYYPOp/33c4577ab859445c0f3fab1f515fbf72/image1-10.png" />
            
            </figure>
    <div>
      <h3>EU</h3>
      <a href="#eu">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/10lHcRerwTtYDamjx1u0HA/7f714e18362e0ad7a09caa8ea4447406/BDES-655-_-Slides-with-Cloudflare-PoPs-for-product-launch--1-.jpg" />
            
            </figure><p>We're very excited to provide new tools to our customers, allowing them to dictate which of our data centers employ HTTP features and which do not. If you're interested in learning more, contact <a>sales@cloudflare.com</a>.</p> ]]></content:encoded>
            <category><![CDATA[Data Center]]></category>
            <category><![CDATA[Europe]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[BGP]]></category>
            <category><![CDATA[Regional Services]]></category>
            <guid isPermaLink="false">6odmOeCIIEK47sVIlmcGt6</guid>
            <dc:creator>Achiel van der Mandele</dc:creator>
        </item>
    </channel>
</rss>