
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 04 Apr 2026 09:36:54 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Cloudflare’s perspective of the October 30, 2024, OVHcloud outage]]></title>
            <link>https://blog.cloudflare.com/cloudflare-perspective-of-the-october-30-2024-ovhcloud-outage/</link>
            <pubDate>Wed, 30 Oct 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[ On October 30, 2024, cloud hosting provider OVHcloud (AS16276) suffered a brief but significant outage. Within this post, we review Cloudflare’s perspective on this outage. ]]></description>
            <content:encoded><![CDATA[ <p>On October 30, 2024, cloud hosting provider <a href="https://radar.cloudflare.com/as16276"><u>OVHcloud (AS16276)</u></a> suffered a brief but significant outage. According to their <a href="https://network.status-ovhcloud.com/incidents/qgb1ynp8x0c4"><u>incident report</u></a>, the problem started at 13:23 UTC, and was described simply as “<i>An incident is in progress on our backbone infrastructure.</i>” OVHcloud noted that the incident ended 17 minutes later, at 13:40 UTC. As a major global cloud hosting provider, some customers use OVHcloud as an origin for sites delivered by Cloudflare — if a given content asset is not in our cache for a customer’s site, we retrieve the asset from OVHcloud.</p><p>We observed traffic starting to drop at 13:21 UTC, just ahead of the reported start time. By 13:28 UTC, it was approximately 95% lower than pre-incident levels. Recovery appeared to start at 13:31 UTC, and by 13:40 UTC, the reported end time of the incident, it had reached approximately 50% of pre-incident levels. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/62w8PcLJ3Q05F1BtA12zUb/6d8ce87f85eb585a7fe0ac02f8cd93d5/image4.jpg" />
          </figure><p><sup><i>Traffic from OVHcloud (AS16276) to Cloudflare</i></sup></p><p></p><p>Cloudflare generally exchanges most of our traffic with OVHcloud over peering links. However, as shown below, peered traffic volume during the incident fell significantly. It appears that some small amount of traffic briefly began to flow over transit links from Cloudflare to OVHcloud due to sudden changes in which Cloudflare data centers we were receiving OVHcloud requests. (Peering is a direct connection between two network providers for the purpose of exchanging traffic. Transit is when one network pays an intermediary network to carry traffic to the destination network.) </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2L0IaXd7B5C6RX23iTG5Pf/3fd2489f159e2281d191f157f5695f94/image3.jpg" />
          </figure><p>Because we peer directly, we exchange most traffic over our private peering sessions with OVHcloud. Instead, we found OVHcloud routing to Cloudflare dropped entirely for a few minutes, then switched to just a single Internet Exchange port in Amsterdam, and finally normalized globally minutes later.</p><p>As the graphs below illustrate, we normally see the largest amount of traffic from OVHcloud in our Frankfurt and Paris data centers, as <a href="https://www.ovhcloud.com/en/about-us/global-infrastructure/regions/"><u>OVHcloud has large data center presences in these regions</u></a>. However, in that shift to transit, and the shift to an Amsterdam Internet Exchange peering point, we saw a spike in traffic routed to our Amsterdam data center. We suspect the routing shifts are the earliest signs of either internal BGP reconvergence, or general network recovery within AS16276, starting with their presence nearest our Amsterdam peering point.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/yCDGCplEsmqXU7uRifjTU/12176147c10ab6e9a766ee5d788b133a/image2.jpg" />
          </figure><p>The <a href="https://network.status-ovhcloud.com/incidents/qgb1ynp8x0c4"><u>postmortem</u></a> published by OVHcloud noted that the incident was caused by “<i>an issue in a network configuration mistakenly pushed by one of our peering partner[s]</i>” and that “<i>We immediately reconfigured our network routes to restore traffic.</i>” One possible explanation for the backbone incident may be a BGP route leak by the mentioned peering partner, where OVHcloud could have accepted a full Internet table from the peer and therefore overwhelmed their network or the peering partner’s network with traffic, or caused unexpected internal BGP route updates within AS16276.</p><p>Upon investigating what route leak may have caused this incident impacting OVHcloud, we found evidence of a maximum prefix-limit threshold being breached on our peering with <a href="https://radar.cloudflare.com/as49981"><u>Worldstream (AS49981)</u></a> in Amsterdam. </p>
            <pre><code>Oct 30 13:16:53  edge02.ams01 rpd[9669]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 141.101.65.53 (External AS 49981) changed state from Established to Idle (event PrefixLimitExceeded) (instance master)</code></pre>
            <p></p><p>As the number of received prefixes exceeded the limits configured for our peering session with Worldstream, the BGP session automatically entered an idle state. This prevented the route leak from impacting Cloudflare’s network. In analyzing <a href="https://datatracker.ietf.org/doc/html/rfc7854"><u>BGP Monitoring Protocol (BMP)</u></a> data from AS49981 prior to the automatic session shutdown, we were able to confirm Worldstream was sending advertisements with AS paths that contained their upstream Tier 1 transit provider.</p><p>During this time, we also detected over 500,000 BGP announcements from AS49981, as Worldstream was announcing routes to many of their peers, visible on <a href="https://radar.cloudflare.com/routing/as49981?dateStart=2024-10-30&amp;dateEnd=2024-10-30#bgp-announcements"><u>Cloudflare Radar</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2YmTSJfXomzeb3mh93JyRH/15c764790576468a47d3760bc7f48153/Screenshot_2024-10-30_at_12.49.25_PM.png" />
          </figure><p>Worldstream later <a href="https://noc.worldstream.nl"><u>posted a notice</u></a> on their status page, indicating that their network experienced a route leak, causing routes to be unintentionally advertised to all peers:</p><blockquote><p><i>“Due to a configuration error on one of the core routers, all routes were briefly announced to all our peers. As a result, we pulled in more traffic than expected, leading to congestion on some paths. To address this, we temporarily shut down these BGP sessions to locate the issue and stabilize the network. We are sorry for the inconvenience.”</i></p></blockquote><p>We believe Worldstream also leaked routes on an OVHcloud peering session in Amsterdam, which caused today’s impact.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Cloudflare has written about<a href="https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024"> <u>impactful route leaks</u></a> before, and there are multiple methods available to prevent BGP route leaks from impacting your network. One is setting <a href="https://www.rfc-editor.org/rfc/rfc7454.html#section-8"><u>max prefix-limits</u></a> for a peer, so the BGP session is automatically torn down when a peer sends more prefixes than they are expected to. Other forward-looking measures are<a href="https://manrs.org/2023/02/unpacking-the-first-route-leak-prevented-by-aspa/"> <u>Autonomous System Provider Authorization (ASPA) for BGP</u></a>, where Resource Public Key Infrastructure (RPKI) helps protect a network from accepting BGP routes with an invalid AS path, or<a href="https://rfc.hashnode.dev/rfc9234-observed-in-the-wild"> <u>RFC9234,</u></a> which prevents leaks by tying strict customer, peer, and provider relationships to BGP updates. For improved Internet resilience, we recommend that network operators follow recommendations defined within<a href="https://manrs.org/netops/"> <u>MANRS for Network Operators</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Trends]]></category>
            <category><![CDATA[Consumer Services]]></category>
            <category><![CDATA[Outage]]></category>
            <guid isPermaLink="false">Vn5VV2dLkJbOn1YNqSSBv</guid>
            <dc:creator>Bryton Herdes</dc:creator>
            <dc:creator>David Belson</dc:creator>
            <dc:creator>Tanner Ryan</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare 1.1.1.1 incident on June 27, 2024]]></title>
            <link>https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024/</link>
            <pubDate>Thu, 04 Jul 2024 13:00:50 GMT</pubDate>
            <description><![CDATA[ On June 27, 2024, a small number of users globally may have noticed that 1.1.1.1 was unreachable or degraded. The root cause was a mix of BGP (Border Gateway Protocol) hijacking and a route leak ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6kBrAZxRvJnPmEMCYY9KuL/b998cbe27bf1b851f48ca7c75d12d565/image2-4.png" />
            
            </figure>
    <div>
      <h2>Introduction</h2>
      <a href="#introduction">
        
      </a>
    </div>
    <p>On June 27, 2024, a small number of users globally may have noticed that 1.1.1.1 was unreachable or degraded. The root cause was a mix of BGP (Border Gateway Protocol) <a href="https://www.cloudflare.com/learning/security/glossary/bgp-hijacking/">hijacking</a> and a route leak.</p><p>Cloudflare was an <a href="/rpki-and-the-rtr-protocol">early adopter</a> of Resource Public Key Infrastructure (RPKI) for route origin validation (ROV). With RPKI, IP prefix owners can store and share ownership information securely, and other operators can validate BGP announcements by comparing received BGP routes with what is stored in the form of Route Origin Authorizations (ROAs). When Route Origin Validation is enforced by networks properly and prefixes are signed via ROA, the impact of a BGP hijack is greatly limited. Despite increased adoption of RPKI over the past several years and 1.1.1.0/24 being a <a href="https://rpki.cloudflare.com/?view=explorer&amp;prefix=1.1.1.0%2F24">signed resource</a>, during the incident 1.1.1.1/32 was originated by ELETRONET S.A. (AS267613) and accepted by multiple networks, including at least one <a href="https://en.wikipedia.org/wiki/Tier_1_network">Tier 1 provider</a> who accepted 1.1.1.1/32 as a <a href="https://datatracker.ietf.org/doc/html/rfc3882">blackhole route</a>.</p><p>This caused immediate unreachability for the DNS resolver address from over 300 networks in 70 countries, although the impact on the overall percentage of users was quite low (less than 1% of users in the UK and Germany, for example), and in some countries no users noticed an impact.</p><p>Route leaks are something Cloudflare <a href="/route-leak-incident-on-october-2-2014">has written and talked about before</a>, and unfortunately there are only best-effort safeguards in wide deployment today, such as IRR (Internet Routing Registry) prefix-list filtering by providers. During the same period of time as the 1.1.1.1/32 hijack, 1.1.1.0/24 was erroneously leaked upstream by Nova Rede de Telecomunicações Ltda (AS262504). The leak was further and widely propagated by Peer-1 Global Internet Exchange (AS1031), which also contributed to the impact felt by customers during the incident.</p><p>We apologize for the impact felt by users of 1.1.1.1, and take the operation of the service very seriously. Although the root cause of the impact was external to Cloudflare, we will continue to improve the detection methods in place to yield quicker response times, and will use our stance within the Internet community to further encourage adoption of RPKI-based hijack and leak prevention mechanisms such as Route Origin Validation (ROV) and Autonomous Systems Provider Authorization (<a href="https://datatracker.ietf.org/doc/draft-ietf-sidrops-aspa-verification/">ASPA</a>) objects for BGP.</p>
    <div>
      <h2>Background</h2>
      <a href="#background">
        
      </a>
    </div>
    <p>Cloudflare <a href="/announcing-1111">introduced</a> the <a href="https://one.one.one.one/">1.1.1.1</a> public DNS resolver service in 2018. Since the announcement, 1.1.1.1 has become one of the most popular resolver IP addresses that is free-to-use by anyone. Along with the popularity and easily recognized IP address comes some operational difficulties. The difficulties stem from <a href="https://youtu.be/vR4GbRMAWj8?si=HTH8nvxVvyLYYjF2">historical use of 1.1.1.1 by networks in labs or as a testing IP address</a>, resulting in some residual unexpected traffic or blackholed routing behavior. Because of this, Cloudflare is no stranger to dealing with the effects of BGP misrouting traffic, two of which are covered below.</p>
    <div>
      <h3>BGP hijacks</h3>
      <a href="#bgp-hijacks">
        
      </a>
    </div>
    <p>Some of the difficulty comes from potential <a href="https://www.cloudflare.com/learning/security/glossary/bgp-hijacking/">routing hijacks</a> of 1.1.1.1. For example, if some fictitious FooBar Networks assigns 1.1.1.1/32 to one of their routers and shares this prefix within their internal network, their customers will have difficulty routing to the 1.1.1.1 DNS service. If they advertise the 1.1.1.1/32 prefix outside their immediate network, the impact can be even greater. The reason 1.1.1.1/32 would be selected instead of the 1.1.1.0/24 BGP-announced by Cloudflare is due to <a href="https://en.wikipedia.org/wiki/Longest_prefix_match">Longest Prefix Matching (LPM)</a>. While many prefixes in a route table could match the 1.1.1.1 address, such as 1.1.1.0/24, 1.1.1.0/29, and 1.1.1.1/32, 1.1.1.1/32 is considered the “longest match” by the LPM algorithm because it has the highest number of identical bits and longest <a href="https://en.wikipedia.org/wiki/Subnet">subnet</a> mask while matching the 1.1.1.1 address. In simple terms, we would call 1.1.1.1/32 the “most specific” route available to 1.1.1.1.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7n3Xe0tgkW3a2cZuVI0bAs/4d192f979294dc2f6b758994ac512b71/image4-1.png" />
            
            </figure><p>Instead of traffic toward 1.1.1.1 routing to Cloudflare via anycast and landing on one of our servers globally, it will instead land somewhere on a device within FooBar Networks where 1.1.1.1 is terminated, and a legitimate response will fail to be served back to clients. This would be considered a hijack of requests to 1.1.1.1, either done purposefully or accidentally by network operators within FooBar Networks.</p>
    <div>
      <h3>BGP route leaks</h3>
      <a href="#bgp-route-leaks">
        
      </a>
    </div>
    <p>Another source of impact we sometimes face for 1.1.1.1 is BGP route leaks. A route leak occurs when a network becomes an upstream, in terms of BGP announcement, for a network it shouldn’t be an upstream provider for.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Nw7KaaD49t0Drer2H1kbd/1e61b336c901e2f4b4c9cd3d70843adf/image3-2.png" />
            
            </figure><p>Here is an example of a route leak where a customer forward routes learned from one provider to another, causing a type 1 leak (defined in <a href="https://www.rfc-editor.org/rfc/rfc7908.html">RFC7908</a>).</p><p>If enough networks within the <a href="https://en.wikipedia.org/wiki/Default-free_zone">Default-Free Zone (DFZ)</a> accept a route leak, it may be used widely for forwarding traffic along the <i>bad</i> path. Often this will cause the network leaking the prefixes to overload, as they aren’t prepared for the amount of global traffic they are now attracting. We <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">wrote</a> about a wide-scale route leak that knocked off a large portion of the Internet, when a provider in Pennsylvania attracted traffic toward global destinations it would have typically never transited traffic for. Even though Cloudflare interconnects with over 13,000 networks globally, the BGP local-preference assigned to a leaked route could be higher than the route received by a network directly from Cloudflare. This sounds counterproductive, but unfortunately it can happen.</p><p>To explain why this happens, it helps to think of BGP as a business policy engine along with the routing protocol for the Internet. A transit provider has customers who pay them to transport their data, so logically they assign a higher BGP local-preference than connections with either private or Internet Exchange (IX) peers, so the connection being paid for is most utilized. Think of local-preference as a way of influencing priority of which outgoing connection to send traffic to. Different networks also may choose to prefer Private Network Interconnects (PNIs) over Internet Exchange (IX) received routes. Part of the reason for this is reliability, as a private connection can be viewed as a point-to-point connection between two networks with no third-party managed fabric in between to worry about. Another reason could be cost efficiency, as if you’ve gone to the trouble to allocate a router port and purchase a cross connect between yourself and another peer, you’d like to make use of it to get the best return on your investment.</p><p>It is worth noting that both BGP hijacks and route leaks can happen to any IP and prefix on the Internet, not just 1.1.1.1. But as mentioned earlier, 1.1.1.1 is such a recognizable and historically misappropriated address that it tends to be more prone to accidental hijacks or leaks than other IP resources.</p><p>During the Cloudflare 1.1.1.1 incident that happened on June 27, 2024, we ended up fighting the impact caused by a combination of both BGP hijacking and a route leak.</p>
    <div>
      <h2>Incident timeline and impact</h2>
      <a href="#incident-timeline-and-impact">
        
      </a>
    </div>
    <p>All timestamps are in UTC.</p><p><b>2024-06-27 18:51:00</b> AS267613 (Eletronet) begins announcing 1.1.1.1/32 to peers, providers, and customers. 1.1.1.1/32 is announced with the AS267613 origin AS</p><p><b>2024-06-27 18:52:00</b> AS262504 (Nova) leaks 1.1.1.0/24, also received from AS267613, upstream to AS1031 (PEER 1 Global Internet Exchange) with AS path “1031 262504 267613 13335”</p><p><b>2024-06-27 18:52:00</b> AS1031 (upstream of Nova) propagates 1.1.1.0/24 to various Internet Exchange peers and route-servers, widening impact of the leak</p><p><b>2024-06-27 18:52:00</b> One tier 1 provider receives the 1.1.1.1/32 announcement from AS267613 as a RTBH (Remote Triggered Blackhole) route, causing blackholed traffic for all the tier 1’s customers</p><p><b>2024-06-27 20:03:00</b> Cloudflare raises internal incident for 1.1.1.1 reachability issues from various countries</p><p><b>2024-06-27 20:08:00</b> Cloudflare disables a partner peering location with AS267613 that is receiving traffic toward 1.1.1.0/24</p><p><b>2024-06-27 20:08:00</b> Cloudflare team engages peering partner AS267613 about the incident</p><p><b>2024-06-27 20:10:00</b> AS262504 leaks 1.1.1.0/24 with a new AS path, “262504 53072 7738 13335” which is also redistributed by AS1031. Traffic is being delivered successfully to Cloudflare when along this path, but with high latency for affected clients</p><p><b>2024-06-27 20:17:00</b> Cloudflare engages AS262504 regarding the route leak of 1.1.1.0/24 to their upstream providers</p><p><b>2024-06-27 21:56:00</b> Cloudflare engineers disable a second peering point with AS267613 that is receiving traffic meant for 1.1.1.0/24 from multiple sources not in Brazil</p><p><b>2024-06-27 22:16:00</b> AS262504 leaks 1.1.1.0/24 again, attracting some traffic to a Cloudflare peering with AS267613 in São Paulo. Some 1.1.1.1 requests as a result are returned with higher latency, but the hijack of 1.1.1.1/32 and traffic blackholing appears resolved</p><p><b>2024-06-28 02:28:00</b> AS262504 fully resolves the route leak of 1.1.1.0/24</p><p>The impact to customers surfaced in one of two ways: unable to reach 1.1.1.1 at all; Able to reach 1.1.1.1, but with high latency per request.</p><p>Since AS267613 was hijacking the 1.1.1.1/32 address somewhere within their network, many requests failed at some device in their autonomous system. There were intermittent periods, or flaps, during the incident where they successfully routed requests toward 1.1.1.1 to Cloudflare data centers, albeit with high latency.</p><p>Looking at two source countries during the incident, Germany and the United States, impacted traffic to 1.1.1.1 looked like this:</p><p><i>Source Country of Users:</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/492JYYYPZzxjjmGk2IF5Sb/5d6762775689439de1aca2f868bf67cd/image5-1.png" />
            
            </figure><p><i>Keep in mind that overall this may represent a relatively small amount of total requests per source country, but normally no requests would route from the US or Germany to Brazil at all for 1.1.1.1.</i></p><p><i>Cloudflare Data Center city:</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/61Bq2eHhu5HZzGNcIs8tZS/4dcaa35237709fb1b9af4abfde382303/image6-1.png" />
            
            </figure><p>Looking at the graphs, requests to 1.1.1.1 were landing in Brazilian data centers. The gaps between the spikes are when 1.1.1.1 requests were blackholed prior to or within AS267613, and the spikes themselves are when traffic was delivered to Cloudflare with high latency invoked on the request and response. The brief spikes of traffic successfully carried to the Cloudflare peering location with AS267613 could be explained by the 1.1.1.1/32 route flapping within their network, occasionally letting traffic through to Cloudflare instead of it dropping somewhere in the intermediate path.</p>
    <div>
      <h2>Technical description of the error and how it happened</h2>
      <a href="#technical-description-of-the-error-and-how-it-happened">
        
      </a>
    </div>
    <p>Normally, a request to 1.1.1.1 from users routes to the nearest data center via BGP anycast. During the incident, AS267613 (Eletronet) advertised 1.1.1.1/32 to their peers and upstream providers, and AS262504 leaked 1.1.1.0/24 upstream, changing the normal path of BGP anycast for multiple eyeball networks drastically.</p><p>With public route collectors and the <a href="https://github.com/bgpkit/monocle">monocle tool</a>, we can search for the rogue BGP updates.</p>
            <pre><code>monocle search --start-ts 2024-06-27T18:51:00Z --end-ts 2024-06-27T18:55:00Z --prefix '1.1.1.1/32'

A|1719514377.130203|206.126.236.209|398465|1.1.1.1/32|398465 267613|IGP|206.126.236.209|0|0||false|||route-views.eqix
–
A|1719514377.681932|206.82.104.185|398465|1.1.1.1/32|398465 267613|IGP|206.82.104.185|0|0|13538:1|false|||route-views.ny
–
A|1719514388.996829|198.32.132.129|13760|1.1.1.1/32|13760 267613|IGP|198.32.132.129|0|0||false|||route-views.telxatl</code></pre>
            <p>We see above that AS398465 and AS13760 reported to the route-views collectors that they received 1.1.1.1/32 from AS267613 around the time impact begins. Normally, the longest IPv4 prefix accepted in the Default-Free-Zone (DFZ) is a /24, but in this case we observed multiple networks using the 1.1.1.1/32 route from AS267613 for forwarding, made apparent by the blackholing of traffic that never arrived at a Cloudflare POP (Point of Presence). The origination of 1.1.1.1/32 by AS267613 is a BGP route hijack. They were announcing the prefix with origin AS267613 even though the Route Origin Authorization (ROA) is only signed for origin AS13335 (Cloudflare) with a maximum prefix length of /24.</p><p>We even saw BGP updates for 1.1.1.1/32 when looking at our own BMP (BGP Monitoring Protocol) data at Cloudflare. From at least a couple different route servers, we received our own 1.1.1.1/32 announcement via BGP. Thankfully, Cloudflare rejects these routes on import as both RPKI Invalid and DFZ Invalid due to invalid AS origin and prefix length. The BMP data display is pre-policy, meaning even though we rejected the route we can see where we receive the BGP update over a peering session.</p><p>So not only are multiple networks accepting prefixes that should not exist in the global routing table, but they are also accepting an <a href="https://rpki.cloudflare.com/?view=explorer&amp;prefix=1.1.1.0%2F24">RPKI (Resource Public Key Infrastructure) Invalid route</a>. To make matters worse, one Tier-1 transit provider accepted the 1.1.1.1/32 announcement as a RTBH (Remote-Triggered Blackhole) route from AS267613, discarding all traffic at their edge that would normally route to Cloudflare. This alone caused wide impact, as any networks leveraging this particular Tier-1 provider in routing to 1.1.1.1 would have been unable to reach the IP address during the incident.</p><p>For those unfamiliar with Remote-Triggered Blackholing, it is a method of signaling to a provider a set of destinations you would like traffic to be dropped for within their network. It exists as a blunt method of fighting off DDoS attacks. When you are being attacked on a specific IP or prefix, you can tell your upstream provider to absorb all traffic toward that destination IP address or prefix by discarding it before it comes to your network port. The problem during this incident was AS267613 was unauthorized to blackhole 1.1.1.1/32. Cloudflare only should have the sole right to leverage RTBH for discarding of traffic destined for AS13335, which is something we would in reality never do.</p><p>Looking now at BGP updates for 1.1.1.0/24 multiple networks received the prefix from AS262504 and accepted it.</p>
            <pre><code>~&gt; monocle search --start-ts 2024-06-27T20:10:00Z --end-ts 2024-06-27T20:13:00Z --prefix '1.1.1.0/24' --as-path ".* 267613 13335" --include-sub

.. some advertisements removed for brevity ..

A|1719519011.378028|187.16.217.158|1031|1.1.1.0/24|1031 262504 267613 13335|IGP|187.16.217.158|0|0|1031:1031 1031:4209 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views2.saopaulo
–
A|1719519011.629398|45.184.147.17|1031|1.1.1.0/24|1031 262504 267613 13335|IGP|45.184.147.17|0|0|1031:1031 1031:4209 1031:4259 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views.fortaleza
–
A|1719519036.943174|80.249.210.99|50763|1.1.1.0/24|50763 1031 262504 267613 13335|IGP|80.249.210.99|0|0|1031:1031 50763:400|false|13335|162.158.177.1|route-views.amsix
–
A|1719519037|80.249.210.99|50763|1.1.1.0/24|50763 1031 262504 267613 13335|IGP|80.249.210.99|0|0|1031:1031 50763:400|false|13335|162.158.177.1|rrc03
–
A|1719519087.4546|45.184.146.59|199524|1.1.1.0/24|199524 1031 262504 267613 13335|IGP|45.184.147.17|0|0||false|13335|162.158.177.1|route-views.fortaleza
A|1719519087.464375|45.184.147.74|264409|1.1.1.0/24|264409 1031 262504 267613 13335|IGP|45.184.147.74|0|0|65100:7010|false|13335|162.158.177.1|route-views.fortaleza
–
A|1719519096.059558|190.15.124.18|61568|1.1.1.0/24|61568 262504 267613 13335|IGP|190.15.124.18|0|0|1031:1031 1031:4209 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views3
–
A|1719519128.843415|190.15.124.18|61568|1.1.1.0/24|61568 262504 267613 13335|IGP|190.15.124.18|0|0|1031:1031 1031:4209 1031:6045 1031:7019 1031:8010|false|13335|162.158.177.1|route-views3</code></pre>
            <p>Here we pay attention to the AS path again. This time, AS13335 is the origin AS at the very end of the announcements. This BGP announcement is RPKI <b>Valid</b>, because the origin is correctly AS13335, but this is a route leak of 1.1.1.0/24 because the path itself is invalid.</p>
    <div>
      <h3>How do we know it’s a route leak?</h3>
      <a href="#how-do-we-know-its-a-route-leak">
        
      </a>
    </div>
    <p>Looking at an example path, “199524 1031 262504 267613 13335”, AS267613 is functionally a peer of AS13335 and should not share the 1.1.1.0/24 announcement with their peers or upstreams, only their customers (<a href="https://www.manrs.org/wp-content/uploads/2021/11/AS-Cones-MANRS.pdf">AS Cone</a>). AS262504 is a customer of AS267613 and the next adjacent ASN in the path, so that particular announcement is fine up until this point. Where the 1.1.1.0/24 goes wrong is AS262504, when they announce the prefix to their upstream AS1031. Furthermore, AS1031 redistributed the advertisement to their peers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7fWhrQoqfJzgS7EEaK05Gw/2de5227144de75d232012c0029540af4/image1-3.png" />
            
            </figure><p>This means AS262504 is the leaking network. AS1031 accepted the leak from their customer, AS262504, and caused wide impact by distributing the route in multiple peering locations globally. AS1031 (Peer-1 Global Internet Exchange) advertises themselves as a global peering exchange. Cloudflare is not a customer of AS1031, so 1.1.1.0/24 should have never been redistributed to peers, route-servers, or upstreams of AS1031. It appears that AS1031 does not perform any extensive filtering for customer BGP sessions, and instead just matches on adjacency (in this case, AS262504) and redistributes everything that meets this criteria. Unfortunately, this is irresponsible of AS1031 and causes direct impact to 1.1.1.1 and potentially other services that fall victim to the unguarded route propagation. While the original leaking network was AS262504, impact was greatly amplified by AS1031 and others when they accepted the hijack or leak and further distributed the announcements.</p><p>During the majority of the incident, the leak by AS262504 eventually landed requests within AS267613, which was discarding 1.1.1.1/32 traffic somewhere in their network. To that end, AS262504 really just amplified the impact in terms of 1.1.1.1 unreachability by leaking routes upstream.</p><p>To limit impact of the route leak, Cloudflare disabled peering in multiple locations with AS267613. The problem did not completely go away, as AS262504 was still leaking a stale path pointing to São Paulo. Requests landing in São Paulo were able to be served, albeit with a high round-trip time back to users. Cloudflare has been engaging with all networks mentioned throughout this post in regard to the leak and future prevention mechanisms, as well as at least one <a href="https://en.wikipedia.org/wiki/Tier_1_network">Tier 1 transit provider</a> who accepted 1.1.1.1/32 from AS267613 as a blackhole route that was unauthorized by Cloudflare and caused widespread impact.</p>
    <div>
      <h2>Remediation and follow-up steps</h2>
      <a href="#remediation-and-follow-up-steps">
        
      </a>
    </div>
    
    <div>
      <h3>BGP hijacks</h3>
      <a href="#bgp-hijacks">
        
      </a>
    </div>
    <p><b>RPKI origin validation</b>RPKI has recently reached a major milestone at 50% deployment in terms of prefixes signed by Route Origin Authorization (ROA). While RPKI certainly helps limit the spread of a hijacked BGP prefix throughout the Internet, we need all networks to do their part, especially major networks with a large sum of downstream Autonomous Systems (AS’s). During the hijack of 1.1.1.1/32, multiple networks accepted and used the route announced by AS267613 for traffic forwarding.</p><p><b>RPKI and Remote-Triggered Blackholing (RTBH)</b>A significant amount of the impact caused during this incident was due to a Tier 1 provider accepting 1.1.1.1/32 as a blackhole route from a third party that is not Cloudflare. This in itself is a hijack of 1.1.1.1, and a very dangerous one. RTBH is a useful tool used by many networks when desperate for a mitigation against large DDoS attacks. The problem is the BGP filtering used for RTBH is loose in nature, relying often only on <a href="https://www.apnic.net/manage-ip/using-whois/guide/as-set/">AS-SET</a> objects found in Internet Routing Registries. Relying on Route Origin Authorization (ROA) would be infeasible for RTBH filtering, as that would require thousands of potential ROAs be created for the network the size of Cloudflare. Not only this, but creating specific /32 entries opens up the potential for an individual address such as 1.1.1.1/32 being announced by someone pretending to be AS13335, becoming the best route to 1.1.1.1 on the Internet and causing severe impact.</p><p>AS-SET filtering is not representative of authority to blackhole a route, such as 1.1.1.1/32. Only Cloudflare should be able to blackhole a destination it has the rights to operate. A potential way to fix the lenient filtering of providers on RTBH sessions would again be leveraging an RPKI. Using an example from the IETF, the expired <a href="https://datatracker.ietf.org/doc/draft-spaghetti-sidrops-rpki-doa/">draft-spaghetti-sidrops-rpki-doa-00</a> proposal specified a Discard Origin Authorization (DOA) object that would be used to authorize only specific origins to authorize a blackhole action for a prefix. If such an object was signed, and RTBH requests validated against the object, the unauthorized blackhole attempt of 1.1.1.1/32 by AS267613 would have been invalid instead of accepted by the Tier 1 provider.</p><p><b>BGP best practices</b>Simply following BGP best practices laid out by <a href="https://manrs.org/netops/guide/">MANRS</a>, and rejecting IPv4 prefixes that are longer than a /24 in the Default-Free Zone (DFZ) would have reduced impact to 1.1.1.1. Rejecting invalid prefix lengths within the wider Internet should be part of a standard BGP policy for all networks.</p>
    <div>
      <h2>BGP route leaks</h2>
      <a href="#bgp-route-leaks">
        
      </a>
    </div>
    
    <div>
      <h3>Route leak detection</h3>
      <a href="#route-leak-detection">
        
      </a>
    </div>
    <p>While route leaks are not unavoidable for Cloudflare today, because the Internet inherently relies on trust for interconnection, there are some steps we will take to limit impact.</p><p>We have expanded data sources to use for our <a href="/route-leak-detection-with-cloudflare-radar/">route leak detection system</a> to cover more networks and are in the process of incorporating real-time data into the detection system to allow more timely response toward similar events in the future.</p>
    <div>
      <h3>ASPA for BGP</h3>
      <a href="#aspa-for-bgp">
        
      </a>
    </div>
    <p>We will continue advocating for the adoption of RPKI into AS Path based route leak prevention. Autonomous System Provider Authorization (ASPA) objects are similar to ROAs, except instead of signing prefixes with an authorized origin AS, the AS itself is signed with a list of provider networks that are allowed to propagate their routes. So, in the case of Cloudflare, only valid upstream transit providers would be signed as authorized to advertise AS13335 prefixes such as 1.1.1.0/24 upstream.</p><p>In the route leak example where AS262504 (customer of AS267613) shared 1.1.1.0/24 upstream, BGP ASPA would see this leak if AS267613 had signed their authorized providers and AS1031 had validated paths against that list. Similar to RPKI origin validation, however, this will be a long-term effort and dependent on networks, especially large providers, rejecting invalid AS paths as based on ASPA objects.</p>
    <div>
      <h3>Other potential approaches</h3>
      <a href="#other-potential-approaches">
        
      </a>
    </div>
    <p>There are alternative approaches to ASPA that do exist, in various stages of adoption that may be worth noting. There is no guarantee that the following make it to a stage of wide Internet deployment, however.</p><p><a href="https://rfc.hashnode.dev/rfc9234-observed-in-the-wild">RFC9234</a>, for example, uses a concept of peer roles within BGP capabilities and attributes, and depending on the configuration of routers along a path for updates, an “Only-To-Customer” (OTC) attribute can be added to prefixes that will prevent the upstream spread of a prefix such as 1.1.1.0/24 along a leaked path. The downside is BGP configuration needs to be completed to assign the various roles to each peering session, and vendor adoption still has to be fully ironed out to make this feasible for actual use in production with positive results.</p><p>Like all approaches to solving route leaks, cooperation amongst network operators on the Internet is required for success.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Cloudflare’s 1.1.1.1 DNS resolver service fell victim to a simultaneous BGP hijack and BGP route leak event. While the actions of external networks are outside of Cloudflare’s direct control, we intend to take every step within both the Internet community and internally at Cloudflare to detect impact more quickly and lessen impact to our users.</p><p>Long term, Cloudflare continues to support adoption of RPKI-based origin validation, as well as AS path validation. The former exists with deployment across a wide array of the world’s largest networks, and the latter is still in draft phase at the IETF (Internet Engineering Task Force). In the meantime, to check if your ISP is enforcing RPKI origin validation, you can always visit <a href="http://isbgpsafeyet.com">isbgpsafeyet.com</a> and <i>Test Your ISP</i>.</p> ]]></content:encoded>
            <category><![CDATA[1.1.1.1]]></category>
            <category><![CDATA[Outage]]></category>
            <guid isPermaLink="false">IyAM1csW8ynZvyJrQtmvS</guid>
            <dc:creator>Bryton Herdes</dc:creator>
            <dc:creator>Mingwei Zhang</dc:creator>
            <dc:creator>Tanner Ryan</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare Backbone: A Fast Lane on the Busy Internet Highway]]></title>
            <link>https://blog.cloudflare.com/cloudflare-backbone-internet-fast-lane/</link>
            <pubDate>Thu, 16 Sep 2021 12:59:48 GMT</pubDate>
            <description><![CDATA[ It’s important that our network continues to help bring improved performance and resiliency to the Internet. To accomplish this, we built our own backbone.  ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The Internet is an amazing place. It’s a communication superhighway, allowing people and machines to exchange exabytes of information every day. But it's not without its share of issues: whether it’s <a href="/cloudflare-thwarts-17-2m-rps-ddos-attack-the-largest-ever-reported/">DDoS attacks</a>, <a href="/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/">route leaks</a>, <a href="/not-one-not-two-but-three-undersea-cables-cut-in-jersey/">cable cuts</a>, or <a href="/a-post-mortem-on-this-mornings-incident/">packet loss</a>, the components of the Internet do not always work as intended.</p><p>The reason Cloudflare exists is to help solve these problems. As we continue to grow our <a href="https://www.cloudflare.com/network/">rapidly expanding global network</a> in more than 250 cities, while directly connecting with more than 9,800 networks, it’s important that our network continues to help bring improved performance and resiliency to the Internet. To accomplish this, we built our own backbone. Other than improving redundancy, the immediate advantage to you as a Cloudflare user? It can reduce your website loading times by up to 45% — and you don’t have to do a thing.</p>
    <div>
      <h3>The Cloudflare Backbone</h3>
      <a href="#the-cloudflare-backbone">
        
      </a>
    </div>
    <p>We began building out our global backbone in 2018. It comprises a network of long-distance fiber optic cables connecting various Cloudflare data centers across North America, South America, Europe, and Asia. This also includes Cloudflare’s metro fiber network, directly connecting data centers within a metropolitan area.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/jn1eE7h9w0TMITPDwbsHF/c9131d2d2fe23d34051312c7b9fd23a7/Untitled.png" />
            
            </figure><p>Our backbone is a dedicated network, providing guaranteed network capacity and consistent latency between various locations. It gives us the ability to securely, reliably, and quickly route packets between our data centers, without having to rely on other networks.</p><p>This dedicated network can be thought of as a fast lane on a busy highway. When traffic in the normal lanes of the highway encounter slowdowns from congestion and accidents, vehicles can make use of a fast lane to bypass the traffic and get to their destination on time.</p><p>Our <a href="https://www.cloudflare.com/learning/network-layer/what-is-sdn/">software-defined network</a> is like a smart GPS device, as we’re always calculating the performance of routes between various networks. If a route on the public Internet becomes congested or unavailable, our network automatically adjusts routing preferences in real-time to make use of all routes we have available, including our dedicated backbone, helping to deliver your network packets to the destination as fast as we can.</p>
    <div>
      <h3>Measuring backbone improvements</h3>
      <a href="#measuring-backbone-improvements">
        
      </a>
    </div>
    <p>As we grow our global infrastructure, it’s important that we analyze our network to quantify the impact we’re having on performance.</p><p>Here’s a simple, real-world test we’ve used to validate that our backbone helps speed up our global network. We deployed a simple API service hosted on a public cloud provider, located in Chicago, Illinois. Once placed behind Cloudflare, we performed benchmarks from various geographic locations with the backbone disabled and enabled to measure the change in performance.</p><p>Instead of comparing the difference in latency our backbone creates, it is important that our experiment captures a real-world performance gain that an API service or website would experience. To validate this, our primary metric is measuring the average request time when accessing an API service from Miami, Seattle, San Jose, São Paulo, and Tokyo. To capture the response of the network itself, we disabled caching on the Cloudflare dashboard and sent 100 requests from each testing location, both while forcing traffic through our backbone, and through the public Internet.</p><p>Now, before we claim our backbone solves all Internet problems, you can probably notice that for some tests (Seattle, WA and San Jose, CA), there was actually an increase in response time when we forced traffic through the backbone. Since latency is directly proportional to the distance of fiber optic cables, and since we have over 9,800 direct connections with other Internet networks, there is a possibility that an uncongested path on the public Internet might be geographically shorter, causing this speed up compared to our backbone.</p><p>Luckily for us, we have technologies like <a href="/argo-and-the-cloudflare-global-private-backbone/">Argo Smart Routing</a>, <a href="/introducing-smarter-tiered-cache-topology-generation/">Argo Tiered Caching</a>, <a href="https://1.1.1.1/">WARP+</a>, and most recently announced <a href="/orpheus/">Orpheus</a>, which dynamically calculates the performance of each route at our data centers, choosing the fastest healthy route at that time. What might be the fastest path during this test may not be the fastest at the time you are reading this.</p><p>With that disclaimer out of the way, now onto the test.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1J6FgOCk20reULhsoVF8BE/0c2e430b5ca8ddf670aad16d4d825449/averages.png" />
            
            </figure><p>With the backbone disabled, if a visitor from São Paulo performed a request to our service, they would be routed to our São Paulo data center via <a href="/a-brief-anycast-primer/">BGP Anycast</a>. With caching disabled, our São Paulo data center forwarded the request over the public Internet to the origin server in Chicago. On average, the entire process to fetch data from the origin server and return to the response to the requesting user took 335.8 milliseconds.</p><p>Once the backbone was enabled and requests were created, our software performed tests to determine the fastest healthy route to the origin, whether it was a route on the public Internet or through our private backbone. For this test the backbone was faster, resulting in an average total request time of 230.2 milliseconds. Just by routing the request through our private backbone, we <b>improved the average response time by 31%</b>.</p><p>We saw even better improvement when testing from Tokyo. When routing the request over the public Internet, the request took an average of 424 milliseconds. By enabling our backbone which created a faster path, the request took an average of 234 milliseconds, creating an <b>average response time improvement of 44%</b>.</p><table><tr><td><p><b>Visitor Location</b></p></td><td><p><b>Distance to Chicago</b></p></td><td><p><b>Avg. response time using public Internet (ms)</b></p></td><td><p><b>Avg. response using backbone (ms)</b></p></td><td><p><b>Change in response time</b></p></td></tr><tr><td><p>Miami, FL, US</p></td><td><p>1917 km</p></td><td><p>84</p></td><td><p>75</p></td><td><p><b>10.7% decrease</b></p></td></tr><tr><td><p>Seattle, WA, US</p></td><td><p>2785 km</p></td><td><p>118</p></td><td><p>124</p></td><td><p>5.1% increase</p></td></tr><tr><td><p>San Jose, CA, US</p></td><td><p>2856 km</p></td><td><p>122</p></td><td><p>132</p></td><td><p>8.2% increase</p></td></tr><tr><td><p>São Paulo, BR</p></td><td><p>8403 km</p></td><td><p>336</p></td><td><p>230</p></td><td><p><b>31.5% decrease</b></p></td></tr><tr><td><p>Tokyo, JP</p></td><td><p>10129 km</p></td><td><p>424</p></td><td><p>234</p></td><td><p><b>44.8% decrease</b></p></td></tr></table><p>We also observed a smaller deviation in the response time of packets routed through our backbone over larger distances.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4l7WB7dKR0zXUM7yjyoBwv/031ff75e5b3da67b7f1be69d509b3dd6/timeseries.png" />
            
            </figure>
    <div>
      <h3>Our next generation network</h3>
      <a href="#our-next-generation-network">
        
      </a>
    </div>
    <p>Cloudflare is built on top of lossy, unreliable networks that we do not have control over. It’s our software that turns these traditional tubes of the Internet into a smart, high performing, and reliable network Cloudflare customers get to use today. Coupled with our new, but rapidly expanding backbone, it is this software that produces significant performance gains over traditional Internet networks.</p><p>Whether you visit a website powered by Cloudflare’s Argo Smart Routing, Argo Tiered Caching, Orpheus, or use our 1.1.1.1 service with WARP+ to access the Internet, you get direct access to the Internet fast lane we call the Cloudflare backbone.</p><p>For Cloudflare, a better Internet means improving Internet security, reliability, and performance. The backbone gives us the ability to build out our network in areas that have typically lacked infrastructure investments by other networks. Even with issues on the public Internet, these initiatives allow us to be located within 50 milliseconds of 95% of the Internet connected population.</p><p>In addition to our growing global infrastructure providing 1.1.1.1, WARP, <a href="https://developers.cloudflare.com/time-services/roughtime/usage">Roughtime</a>, <a href="https://developers.cloudflare.com/time-services/ntp/usage">NTP</a>, <a href="https://developers.cloudflare.com/distributed-web/ipfs-gateway">IPFS Gateway</a>, and <a href="https://developers.cloudflare.com/randomness-beacon/about">Drand</a> to the greater Internet, it’s important that we extend our services to those who are most vulnerable. This is why we extend all our infrastructure benefits directly to the community, through projects like <a href="https://www.cloudflare.com/galileo/">Galileo</a>, <a href="https://www.cloudflare.com/athenian/">Athenian</a>, <a href="https://www.cloudflare.com/fair-shot/">Fair Shot</a>, and <a href="https://www.cloudflare.com/pangea/">Pangea</a>.</p><p>And while these thousands of fiber optic connections are already fixing today’s Internet issues, we truly are just getting started.</p><p>Want to help build the future Internet? Networks that are faster, safer, and more reliable than they are today? The Cloudflare Infrastructure team is <a href="https://www.cloudflare.com/careers/jobs/?department=Infrastructure&amp;location=default">currently hiring</a>!</p><p>If you operate an ISP or transit network and would like to bring your users faster and more reliable access to websites and services powered by Cloudflare’s rapidly expanding network, please reach out to our Edge Partnerships team at <a>epp@cloudflare.com</a>.</p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div></div><p></p> ]]></content:encoded>
            <category><![CDATA[Speed Week]]></category>
            <category><![CDATA[Cloudflare Network]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[Better Internet]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">3vmfGkLMgyROXmDwChgfdf</guid>
            <dc:creator>Tanner Ryan</dc:creator>
        </item>
    </channel>
</rss>