The Cloudflare Blog

Enforcing the First AS in BGP AS_PATHs

Bryton Herdes — Wed, 03 Jun 2026 17:00:00 GMT

Some recent route hijacks reported by Spamhaus captured our attention. In many of these hijack attempts, an apparent bad actor took advantage of unused autonomous system numbers, or ASNs. Notably in these hijacks, the actor appears to be creating fake AS_PATHs toward destinations, misdirecting traffic down an unexpected path.

By creating forged AS_PATHs, the hijacker is attempting to lead traffic somewhere it isn’t normally meant to go while also trying to conceal their identity. A hijacker could strip enough information away from a network path that they could pretend to be the origin of a Border Gateway Protocol (BGP) prefix themselves. Attackers can use this hijacked route to intercept traffic and for other nefarious purposes.

There is a simple solution for these cases: basic verification that a BGP peer autonomous system (AS) always includes their network as the “First AS” in an advertised route. To get a sense of how well these safeguards are implemented, we stress-tested several major networks and researched their BGP implementations. Read on to see what we learned.

Examining route hijacks involving forged paths

The idea that an actor is creating fake AS_PATHs is supported when we take a closer look at implausible AS relationships in the path. For example, let’s examine one of the hijacks reported by Spamhaus, involving a prefix belonging to Orange S.A., the French telecom company. Using the monocle tool, we can easily find a BGP UPDATE message related to the hijack:

➜  ~ monocle search --start-ts 2026-04-13T00:20:00Z --end-ts 2026-04-13T00:23:59Z --prefix 90.98.0.0/15 --collector rrc26 --json
{
  "aggr_asn": null,
  "aggr_ip": null,
  "as_path": "48237 1299 199524 270118 17072 41128",
  "atomic": false,
  "collector": "rrc26",
  "communities": null,
  "local_pref": 0,
  "med": 0,
  "next_hop": "185.1.8.3",
  "origin": "IGP",
  "peer_asn": 48237,
  "peer_ip": "185.1.8.3",
  "prefix": "90.98.0.0/15",
  "timestamp": 1776039612.0,
  "type": "ANNOUNCE"
}

We know AS1299 (Arelion) is a Tier 1 network, meaning every AS on the right-hand side in the path is describing an upstream (customer-to-provider) relationship. This implies that AS17072 is a transit provider for AS41128, AS270118 for AS17072, and AS199524 for AS270118. If we take a closer look at these networks:

AS41128 is an unused ASN belonging to Orange France
AS17072 is an ISP primarily based in Mexico
AS270118 is a hosting provider based in Mexico
AS199524 is Gcore, a provider with a global peering presence

The order of the ASes in the message above would suggest that an unused Orange France AS is buying transit from Mexican ISPs, which is then upstreamed to Gcore and Tier 1 providers – which would be quite odd.

In another instance, a reported hijack for prefixes 47.1.0.0/16 and 47.2.0.0/16 from origin AS36429 even included Cloudflare’s main ASN, 13335, in the AS_PATH, “199524 270118 17072 13335 36429”. We can view examples of these BGP UPDATEs in the MRT Explorer from Cloudflare Radar:

We can authoritatively confirm that we (Cloudflare, AS13335) have no adjacency with the now-unused AS36429 owned by Charter. This means this was a forged path by the hijacker that included Cloudflare’s ASN as one of the fake upstream networks in advertisements propagated toward Gcore (AS199524). Further, Spamhaus correctly pointed out that all the hijack routes led to a network behind Gcore peering in Chicago, never actually traversing the Mexican ISPs or Cloudflare’s network in the forwarding path.

Because of this, we can reasonably conclude these paths are forged up until the leftmost common AS, which in this case is AS199524, as the rest of the path seems implausible. We believe what is happening here is the result of a specific strategy by the hijacker, involving the following steps:

Originate BGP announcements for “parked” prefixes
Forge the AS_PATH completely, without including the hijacker’s own local ASN
Advertise these routes to Gcore, AS199524

In these hijacks it appears Gcore (AS199524) skips the verification and enforcement of the First AS matching the expected customer’s ASN. (We’ll look at why it might skip those steps later in this post.) As a result, the forged path is accepted and the hijacked prefixes are propagated to upstream providers and peers.

While Autonomous System Provider Authorization (ASPA) will help invalidate these forged paths, attackers may bypass it by only including an RPKI-ROV-valid origin AS, or a legitimate ASPA upstream AS. To stop these specific hijacks, we must rely on a different protection mechanism already built into BGP: First AS checking and enforcement.

The importance of First AS checking

Routing traffic across the Internet is a bit like shipping a package. When the package is shipped, a log is kept of every courier that handles it. In BGP, this is called the AS_PATH (Autonomous System Path) and it tracks each network in the path of that route.

The AS_PATH attribute in BGP is used for path selection. This selection algorithm determines which route to a destination traverses the best list of hops, where “best” is defined by multiple variables. It is also used for loop prevention, where networks can decide not to accept paths that have already traversed their own network. Aside from keeping a record of the networks a BGP UPDATE, and therefore route, will traverse, the AS_PATH can also be examined by operator-configured routing policies to route around or purposely through a given AS – for example to avoid BGP anomalies having unexpected impact.

BGP was built on trust, and the AS_PATH can be easily manipulated – whether for seemingly legitimate reasons such as AS prepending to move traffic around, or nefarious reasons such as shortening it to artificially attract traffic or perform origin attacks.

Let’s look at how these two types of malicious BGP manipulations are carried out.

Example 1: Forged origin attacks

AS64506 cryptographically signs their routes with an RPKI ROA (Route Origin Authorization) record, to prevent route origin hijacks.
AS64506 also creates an ASPA object, specifying only AS64503 as a valid provider
AS64505 manipulates their AS_PATH to strip AS64505 and originate with AS64506
AS64502 does not enforce the First AS

The route appears RPKI-ROV valid and is the shortest path, effectively hijacking traffic with the route. AS64506 has done everything correctly by specifying a valid ROA for a prefix advertisement, and has even configured an ASPA object consisting of their sole provider AS64503.

Unfortunately, the hijacker running AS64505 is still able to attract traffic meant for AS64506. Even if AS64501, the customer, and AS64502, their provider, run ASPA validation, they will not find an invalid path, because there is no valley in the path “64502 64506”. In other words, AS64505 by way of not even including their own ASN in the AS_PATH is able to pretend they are AS64506 with no intermediate AS hop.

The correct way of preventing this hijack with existing tools is to enforce the First AS in the AS_PATH. Once enforcing this rule, AS64502 would properly drop the route from AS64505.

Example 2: Shortening the AS_PATH to attract traffic

AS64506 has two transit providers: AS64503 and AS64505.
AS64505 bills their customer AS64506 based on traffic usage ratios.
AS64505 strips itself from the path, and their peer AS64504 does not enforce the First AS.

The BGP path selection algorithm now chooses the route via AS64504 as the best path from AS64501. AS64506 pays both of their providers, AS64503 and AS64505, to deliver traffic from the Internet. However, now AS64505 provides a shorter BGP path from far-end sources, meaning AS64505 will process all the traffic toward AS64506 and be paid for doing so, and AS64503 will not be paid at all.

These BGP vulnerabilities can be solved very simply by enforcing the First AS to match the peer AS in a received AS_PATH.

When an operator configures a BGP neighbor, they must set the remote AS of the network they are interconnecting with. If the First AS in the AS_PATH does not match this value, then the path has been manipulated. The First AS enforcement procedure is outlined in Section 6.3 of RFC 4271 very clearly as:

“If the UPDATE message is received from an external peer, the local
system MAY check whether the leftmost (with respect to the position
of octets in the protocol message) AS in the AS_PATH attribute is
equal to the autonomous system number of the peer that sent the
message. If the check determines this is not the case, the Error
Subcode MUST be set to Malformed AS_PATH.”

RFC 7606 later revises how error-handling should be implemented by vendors, suggesting that routes containing malformed AS_PATHs should be dropped via treat-as-withdraw method. This allows routers to drop specific prefixes with malformed attributes without disrupting the entire BGP session.

The current ASPA draft clearly calls out the importance of First AS enforcement, stating that ASPA cannot handle paths where sufficient AS_PATH information is lacking due to malformed announcements. Enforcing First AS in AS_PATHs is a must for Internet routing security.

Measurement by breaking the First AS rule on purpose

Instead of sticking to theoretical failure cases and past public incidents about violations of the First AS rule, we wanted to measure for ourselves how widely these AS_PATH violations could be accepted on the Internet. To do so, we set up BGP announcements to neighbors where we purposely violated the rule ourselves. Here is what we did:

Allocated two IP prefixes, one for IPv4 and one for IPv6, to advertise to Tier 1 External BGP (EBGP) neighbors
Purposely prepended the test prefix advertisements to Tier 1 neighbors with a Cloudflare-owned, non-13335 ASN (AS402542) in front of 13335

For example, we advertised the prefixes to AS1299 from our normal BGP session in Geneva. Our local AS is AS13335, but we include AS402542 clearly as the First AS in the AS_PATH.

bryton@edge01.gva01> show configuration policy-options policy-statement 4-TELIA-ACCEPT-EXPORT term ADV-FIRST-AS-PROBE-CR-1695522
from {
    community ANYCAST-ROUTE;
    prefix-list fl_first_as_prober;
    route-type internal;
}
then {
    origin igp;
    as-path-prepend 402542;
    next-hop self;
    accept;
}

bryton@edge01.gva01> show route advertising-protocol bgp  162.159.82.0/24 detail | grep "AS path: "
     AS path: 402542 [13335] I

With this configuration, our expectation is that:

Networks that do enforce-first-as will quietly drop the route via RFC 7606 withdrawal method
Networks that do not enforce-first-as will accept the route and install it for forwarding toward our test prefixes

Either result will be visible in BGP public route views. It was initially our goal to implement a continuous announcement of prefixes toward all peers that would purposely violate the First AS rule in announcements, and give everyone a tool to check which ISPs validate First AS and those which do not. However, we found there are still networks that have not implemented the guidance published in RFC 7606 when receiving malformed BGP AS_PATHs, and would reset BGP sessions instead of a treat-as-withdraw behavior. This meant we could not safely implement a continuous set of announcements that violate the First AS rule without impacting real traffic to Cloudflare, which we obviously can’t do.

But we can take a closer look at the networks whose policies make the biggest impact: Tier 1 networks. These networks make up the backbone of the Internet and have the largest AS customer cones of anyone, meaning hijacks or malformed paths by these peers have the broadest significance. Let’s start by examining the normal propagation of an anycast prefix, 1.1.1.0/24, across the Tier 1 networks.

The propagation of 1.1.1.0/24 looks how you would expect – it is directly reachable by every Tier 1 network that Cloudflare has a direct adjacency with currently.

Now, let’s compare that with our purposely malformed announcement of the prefix 162.159.82.0/24:

^{Note: AS5511 (Orange S.A.) is not pictured above due to its limited presence in public route views, but it was a part of our testing and measurements.}

The prefix is propagated very differently from 1.1.1.0/24 – far fewer Tier 1 networks are accepting the announcement directly from Cloudflare (in this case from AS13335 with AS402542 prepended). Based on the criteria of our test mentioned earlier, these are the results we found.

Tier 1 networks that are enforcing First AS rule (by dropping the invalid announcements):

AS174 (Cogent)
AS1299 (Arelion)
AS3257 (GTT)
AS3491 (PCCW)
AS5511 (Orange S.A.)
AS6453 (Tata)
AS7018 (AT&T)

Tier 1 networks that are not enforcing the First AS rule (by accepting and installing the prefixes):

AS701 (Verizon)
AS2914 (NTT)
AS3356 (Lumen/Colt/Cirion)
AS6461 (Zayo)
AS6762 (Sparkle)
AS6830 (Liberty Global)
AS12956 (Telefonica)

With our testing, we uncovered a troubling reality: Half of the Tier 1 networks are vulnerable to hijacks that violate the First AS rule.

While we only tested Tier 1 networks in this measurement study, there’s no doubt there are many non-Tier 1 networks that also break the First AS rule.

We noted that the majority of the Tier 1 networks failing the First AS violation test are running Juniper Networks routers, identified by the peers’ MAC addresses.

This highlights that the default behavior of vendors defines how secure a network is “out of the box” against First AS violation-based attacks. Let’s go over some of the BGP implementations and their defaults to have a better understanding of who is protected by default, and who isn’t.

BGP implementations and default behaviors

The chart below lists major routing/networking vendors and their BGP policies. Here, “Yes” means the BGP implementation by default enforces First AS, which is good. “No” means the BGP implementation is vulnerable by default.

BGP implementation	First AS enforced by default	Documentation
Cisco IOS/XE/XR	Yes	bgp enforce-first-as
Junos OS / Junos OS Evolved	No	enforce-first-as
Arista EOS	Yes	bgp enforce-first-as
Nokia SR OS	No	enforce-first-as
Huawei	Yes	check-first-as
Extreme SLX-OS	No	enforce-first-as
RouterOS	No	Configuration not available
BIRD	No	enforce first as
OpenBGPD	Yes	enforce neighbor-as
FRR	Yes (since October 2023 patch)	bgp enforce-first-as

The lack of default enforcement from some vendors may stem from the only valid use case where the First AS should not be enforced on External BGP (EBGP) sessions: Internet Exchange (IX) route servers.

A route server is responsible for transparently (without appending its AS to the AS_PATH) distributing routes between peers on the fabric. This ensures peers do not have to configure new BGP sessions every time a network joins the fabric – instead they can peer with just the route server.

In reality, most production networks have far more sessions with neighbors who are not transparent IX route servers than neighbors who are. It makes much more sense to configure “no enforce-first-as” on a handful of route-server sessions than to manually enable “enforce-first-as” on every single peer in your network.

While a “safe by default” approach is best for protecting against First AS violations, it is generally a steep hill to climb trying to convince vendors to change longstanding defaults. Vendors would also need to introduce a method of doing this gracefully, so as to not impact the IX route server BGP sessions that require “no enforce-first-as” settings to successfully receive routes.

Safer Internet routing with your help: enforce the First AS

Attackers will purposely malform AS_PATHs to slide around BGP security mechanisms. Even RPKI-based ASPA path validation will not be able to protect us from forged-origin hijacks where the path has been totally stripped of everything but the origin AS, leaving nothing for ASPA to invalidate.

The good news is we already have a mitigation for these cases: we can verify the First AS matches BGP peer AS and always enforce it. Refer to the corresponding “Documentation” column in the above table we have provided. It should be safe to enforce First AS on any External BGP (EBGP) session besides those facing an IX route server neighbor.

If you are a network operator, please enforce First AS on your routers today to protect your network and the wider Internet.

If your router vendor or choice of BGP implementation has a default of enforcing First AS, you’re already safe and should be rejecting any First AS violations.

By working together, we can make the Internet safer from these kinds of hijacks.

Shutdowns, power outages, and conflict: a review of Q1 2026 Internet disruptions

David Belson — Tue, 28 Apr 2026 13:00:00 GMT

In the first quarter of 2026, government-directed shutdowns figured prominently, with prolonged Internet blackouts in both Uganda and Iran, a stark contrast to the lack of observed government-directed shutdowns in the same quarter a year prior. This quarter, we also observed a number of Internet disruptions caused by power outages, including three separate collapses of Cuba's national electrical grid. Military action continued to disrupt connectivity in Ukraine and also impacted hyperscaler cloud infrastructure in the Middle East. Severe weather knocked out Internet connectivity in Portugal, while cable damage disrupted connectivity in the Republic of Congo. A technical problem hit Verizon Wireless in the United States, and unknown issues briefly disrupted connectivity for customers of providers in Guinea and the United Kingdom.

This post is intended as a summary overview of observed and confirmed disruptions and is not an exhaustive or complete list of issues that have occurred during the quarter. A larger list of detected traffic anomalies is available in the Cloudflare Radar Outage Center. Note that both bytes-based and request-based traffic graphs are used within this post to illustrate the impact of the observed disruptions, with the choice of metric generally made based on which better illustrates the impact of the disruption.

Government-directed shutdowns

Uganda

In advance of the January 15 presidential election, Ugandan authorities ordered a nationwide Internet shutdown. The Uganda Communications Commission (UCC) instructed mobile network operators to suspend public Internet access, effective 18:00 local time (15:00 UTC) on January 13. The UCC reportedly defended the shutdown as necessary to "curb misinformation, disinformation, electoral fraud and related risks." Domestic traffic at the Uganda Internet Exchange Point (UIXP) dropped from approximately 72 Gbps to 1 Gbps as a result of the action taken.

Similarly, Cloudflare data shows a near-complete loss of traffic from Uganda coincident with the start of the shutdown, with traffic remaining effectively at zero through 23:00 local time (20:00 UTC) on January 17, when Internet connectivity was partially restored after incumbent President Yoweri Museveni was declared winner of his seventh term.

Full Internet restoration was announced by the UCC on January 26, with mobile network operators MTN Uganda and Airtel Uganda both confirming on social media that restrictions had been lifted. The shutdown prompted lawsuits against UCC and the telecoms companies and drew criticism from digital rights organizations including CIPESA.

Uganda also blocked Internet access during its 2021 election. Authorities had repeatedly promised this time would be different, stating as recently as January 5 that "claims suggesting otherwise are false, misleading."

Iran

Iranian citizens spent a large part of Q1 2026 offline, or with severely limited connectivity, due to two nationwide Internet shutdowns. The first began around 20:00 local time (16:30 UTC) on January 8, and we explored the impact seen over the first few days in our What we know about Iran’s Internet shutdown blog post. Traffic from Iran remained near zero until January 21, when a small amount of traffic returned, only to disappear a little over 24 hours later. A similar brief restoration also occurred on January 25, before traffic recovered more aggressively starting on January 27.

A near-complete loss of announced IPv6 address space started several hours before the drop in traffic took place on January 8. Asiatech (AS43754) was by far the single largest contributor, losing 4.46 million /48-equivalents, accounting for ~9.4% of Iran's entire IPv6 space loss on its own. RASANA (AS31549) was the second-largest, losing 4.19 million /48-equivalents (~8.8% of the country total). As would be expected, this resulted in the share of IPv6 traffic in Iran going to zero. Given the gap in timing between this change and the loss of traffic across the country, this may have been a leading indicator of what was about to happen, but likely not a direct cause of it. Some nominal shifts in announced IPv4 address space are visible during the shutdown, but levels remained fairly consistent during the shutdown period. These observations suggest that the shutdown was implemented by other means, such as filtering.

Cloudflare Radar social media posts (X, Bluesky, Mastodon) throughout January and into early February documented our observations about the state of connectivity in Iran over the course of that month.

On February 28, as military strikes on Iran escalated, a second nationwide Internet shutdown began. Cloudflare Radar observed a sharp drop in traffic from Iran beginning around 10:30 local time (07:00 UTC). Traffic levels fell to well under 1% of previous levels, with only small amounts of Web and DNS traffic egressing the country.

No significant shifts in announced IP address space were observed around the onset of this shutdown. IPv4 space remained fairly consistent, and IPv6 space remained consistently volatile, suggesting that route withdrawals were not the cause of this second shutdown.

The continued announcement of IP address space, and the presence of traffic from the country, even if just a small amount, supports reports that the shutdown was effectively achieved through aggressive filtering, with so-called “whitelists” and “white SIM cards” restricting access to only approved Internet sites by selected users.

Iran remained effectively offline through the end of the quarter. As of late April, this shutdown remains largely in place, making it one of the longest sustained Internet disruptions observed in recent years.

Republic of Congo

On March 15, as the Republic of Congo held a presidential election expected to extend President Denis Sassou Nguesso's 42-year rule, a near-complete shutdown of Internet connectivity was observed in the country. Traffic from the country dropped precipitously around 06:30 local time (05:30 UTC), falling to near zero for approximately 60 hours through the election period and its immediate aftermath. Traffic began recovering around March 17 at 18:20 local time (17:20 UTC), rapidly returning to pre-shutdown levels. While Congolese authorities provided no official explanation for the drop in traffic, similar shutdowns were put into place during the 2021 and 2016 elections.

Military action

Ukraine (Dnipropetrovsk)

On January 7-8, Russian attacks on energy infrastructure in Ukraine caused power outages that disrupted Internet connectivity in Dnipropetrovsk and surrounding regions. Cloudflare Radar observed a significant drop in traffic from the region, reaching nearly 50% below the prior week’s levels, starting around 22:45 local time (20:45 UTC) on January 7. Recovery began approximately 06:00 local time (04:00 UTC) on January 8.

Ukraine (Kharkiv)

On January 26, Russia launched a drone and missile attack targeting energy infrastructure in Kharkiv. Cloudflare Radar observed an approximately 50% drop in traffic from the region beginning around 19:15 local time (17:15 UTC). Recovery progressed through January 27 as power was gradually restored.

Amazon Web Services Middle East (United Arab Emirates and Bahrain)

One of the most unusual disruptions of the quarter was the physical damage inflicted on Amazon Web Services data centers in the Middle East by drone strikes tied to the ongoing regional conflict. On the morning of March 1 (UTC), Amazon reported a fire started after objects hit a UAE data center. The following day, the company confirmed that two of its facilities in the United Arab Emirates (me-central-1 region) were "directly struck" by drones and that a facility in Bahrain (me-south-1 region) was also taken offline after being damaged by a nearby strike.

Cloudflare's Cloud Observatory data showed elevated connection failure rates for the me-central-1 and me-south-1 regions beginning March 1-2 and remaining higher for multiple days. Connection failures occur when Cloudflare fails to successfully connect to an origin server when attempting to retrieve uncacheable content, or content not in/expired from cache. These graphs illustrate the increased rate of failures experienced when attempting to connect to servers in these impacted regions.

In a status post on the AWS Health Dashboard, Amazon acknowledged: "These strikes have caused structural damage, disrupted power delivery to our infrastructure, and in some cases required fire suppression activities that resulted in additional water damage." The company warned that instability was likely to continue in the Middle East, making operations "unpredictable," and urged customers with workloads in the affected regions to back up their data or migrate to other AWS regions.

The AWS me-south-1 region in Bahrain suffered an additional disruption on March 23, following further drone activity.

Power outages

Argentina (Buenos Aires)

On January 15, a power outage struck Buenos Aires during a summer heat wave. The outage caused nominal disruptions in Internet connectivity for customers of multiple providers in the Buenos Aires area, including Telecom Argentina (AS7303), Telecentro (AS27747), and IPLAN (AS16814), with traffic from these networks dropping between 17:30 and 19:30 local time (20:30 - 22:30 UTC). Traffic returned to expected levels approximately two hours after the outage began.

Moldova and Ukraine

An emergency power cut on Ukraine's electricity grid on January 31 caused widespread power outages affecting Moldova and several Ukrainian regions including Kyiv and Kharkiv. Moldova was reportedly hit by widespread power cuts amid the Ukrainian grid problems, and the Ukrainian Energy Minister explained the cross-border impact, noting “Today at 10:42 a.m. (08:42 GMT), a technical malfunction occurred, causing a simultaneous shutdown of the 400 kilovolt line between the power grids of Romania and Moldova and the 750 kilovolt line between western and central Ukraine.” Traffic from Moldova, Kyiv, and Kharkiv began falling around 10:42 local time (08:42 UTC), reaching as much as 46% below the prior week, with recovery occurring around 14:00 local time (12:00 UTC).

Paraguay

On February 18, widespread power outages struck Paraguay after key transmission lines went out of service. The National Electricity Administration (ANDE) posted a series of updates on X documenting the incident and efforts to restore power. Internet traffic from Paraguay dropped as much as 72% compared to the prior week beginning around 15:15 local time (18:15 UTC), and the disruption lasted nearly three hours, with recovery occurring by approximately 18:30 local time (21:30 UTC).

Dominican Republic

A major failure in the Interconnected National Electric System (SENI) of the Dominican Republic caused a widespread power outage on February 23. The state-owned electric company Empresa de Transmisión Eléctrica Dominicana (ETED) posted updates on X documenting the failure and the recovery effort. Internet traffic from the country dropped sharply beginning around 10:50 local time (14:50 UTC), and recovered around midnight local time (04:00 UTC) on February 24, in line with a confirmation posted by ETED that “The authorities of the electric sector reported that the Interconnected National Electric System (SENI) was fully restored to 100% at 11:53 p.m. on this Monday…”.

Cuba

Cuba experienced three separate collapses of its National Electric System (SEN) during March, each causing widespread Internet disruption, reflecting the severe deterioration of the country's electrical infrastructure. (Power outages also disrupted Internet connectivity in Cuba during September and March 2025, and October 2024.)

The first collapse occurred on March 4, when a disconnection of Cuba's National Electroenergy System cascaded from Camagüey to Pinar del Río, cutting power to the western half of the island, including Havana. OSDE/UNE (Cuba's Electric Union) confirmed the failure on social media. Cloudflare Radar data showed traffic from the island dropping by nearly half beginning around 12:15 local time (17:15 UTC), with traffic recovering by approximately 05:01 local time (10:01 UTC) on March 5.

The second collapse occurred on March 16, when Cuba's entire National Electric Power System was disconnected. EnergíaMinas Cuba posted updates on the situation on X. Cloudflare Radar data again shows a significant loss of traffic from Cuba beginning around 13:35 local time (17:35 UTC) on March 16, dropping approximately 65%. Traffic returned to expected levels by approximately 20:00 local time on March 17 (00:00 UTC on March 18), with the disruption lasting over 30 hours.

The third collapse (the second in just a week) happened just days later, on March 21-22. EnergíaMinas Cuba and OSDE/UNE again provided situation updates via X. Cloudflare Radar data shows another significant loss of traffic from Cuba beginning around 18:30 local time (22:30 UTC) on March 21, falling as much as 77% compared to the previous week. Traffic recovered around 21:39 local time on March 22 (01:39 UTC on March 23).

U.S. Virgin Islands

According to a Facebook post from the Virgin Islands Water and Power Authority (WAPA) on March 24, a loss of generation at the Richmond Power Plant combined with damage to an underground cable caused a power outage affecting St. Croix and St. Thomas in the U.S. Virgin Islands. Cloudflare Radar data shows traffic from local provider VI Powernet (AS14434), the primary ISP for the U.S. Virgin Islands, dropping to near zero beginning around 12:15 local time (16:15 UTC), with recovery occurring by approximately 14:45 local time (18:45 UTC). Although VI Powernet experienced a near-complete outage, traffic from St. Thomas only fell by around 60%, and approximately 40% from St. Croix due to the presence of other providers.

Severe weather

Portugal

Storm Kristin made landfall in Portugal on January 28, causing widespread damage and power outages across the country. Approximately 1,500 incidents were registered by Civil Protection between midnight and 08:00 local time (00:00 - 08:00 UTC), with the hardest-hit areas being the districts of Leiria and Coimbra. Significant infrastructure damage was reported, and by 07:00 local time (07:00 UTC), over 850,000 E-Redes customers were without electricity.

The associated power outages disrupted Internet connectivity across Portugal, which Cloudflare Radar observed primarily in the regions of Leiria, Santarém, and Coimbra beginning around 04:10 local time (04:10 UTC) on January 28. Internet traffic dropped as much as 70% in Leiria, and 52% in Coimbra.

Recovery was slow: over 290,000 customers remained without power as late as January 30, and Cloudflare continued tracking gradual recovery of regional traffic over the following weeks. (Coimbra returned to expected levels within the first several days after the storm.) More than three weeks after the storm, over 6,000 customers in Leiria reportedly remained without electricity.

Cable damage

Republic of Congo

Just after the New Year, Internet connectivity in the Republic of Congo was disrupted by an incident on the WACS (West Africa Cable System) submarine cable. Congo Telecom (AS37451) posted on X announcing "an international incident on the WACS cable" was causing Internet disruptions, and stating that backup solutions had been activated. Cloudflare Radar observed a significant drop in traffic from Congo beginning around 00:00 local time on January 2 (23:00 UTC on January 1), falling to 82% below expected levels. A follow-up post from Congo Telecom confirmed that repairs were ongoing, with users potentially experiencing slowdowns during peak hours. Traffic returned to expected levels by approximately 15:00 local time (14:00 UTC) on January 4.

Technical problems

Verizon Wireless (United States)

On January 14, a software issue impacted voice and data services for customers of Verizon Wireless (AS6167) across the United States. Verizon published an official statement acknowledging that the outage began January 14 and that by 22:15 ET (03:15 UTC on January 15) the issue had been resolved. Multiple updates on X from @VerizonNews kept subscribers informed throughout the evening. Cloudflare Radar data shows a minor drop in traffic beginning around 12:30 ET (17:30 UTC) on January 14, consistent with the reported onset of the outage.

Grenada

On February 9-10, customers of Flow Grenada (AS46650) – the primary Internet provider serving Grenada – experienced an island-wide service disruption lasting approximately 12 hours. The provider posted on Facebook acknowledging a service disruption, though no details about the root cause were provided. Cloudflare Radar data shows traffic from the network initially dropping around 11:30 local time (15:30) UTC on February 9, disappearing completely around 20:00 local time (midnight UTC on February 10), and recovering by approximately 23:30 local time (03:30 UTC on February 10). Routing data shows a complete loss of announced IPv4 space at the same time traffic dropped to zero. Major spikes in BGP announcements around the time the disruption initially started, and bookending the complete outage, suggest that the whole event may have been routing-related.

Unknown cause

Orange Guinée (Guinea)

Customers of Orange Guinée (AS37461) in Guinea were unable to make phone calls or access the Internet starting around 10:45 local time (10:45 UTC) on January 6. Orange Guinée subsequently confirmed an "exceptional breakdown" affecting mobile phone and Internet services due to a technical incident, with teams mobilized to restore service. Service was restored by approximately 14:00 local time (14:00 UTC) that same day. No further details on the root cause of the incident were publicly disclosed.

TalkTalk (United Kingdom)

On March 25, customers of UK broadband provider TalkTalk (AS13285) reported widespread service disruptions. TalkTalk acknowledged the issues on X but did not publicly disclose a root cause. Cloudflare Radar observed traffic from the provider drop nearly 50% as compared to the previous week beginning around 07:00 local time (07:00 UTC), with service restored by approximately 08:15 local time (08:15 UTC).

A quarter marked by major disruptions

The first quarter of 2026 was marked by an unusually high number of severe and prolonged Internet disruptions. The major government-directed shutdowns, particularly the extended blackouts in Uganda and Iran, underscore how Internet access continues to be weaponized as a tool of political control. Cuba's three separate national grid collapses in a single month paint a troubling picture of infrastructure fragility with direct consequences for connectivity. And the drone strikes on AWS data centers in the Middle East represent an unprecedented escalation as active military conflict directly and physically damaged major cloud infrastructure, with disastrous consequences for the websites and applications hosted there.

The Cloudflare Radar team is constantly monitoring for Internet disruptions, sharing our observations on the Cloudflare Radar Outage Center, via social media, and in posts on blog.cloudflare.com. Follow us on social media at @CloudflareRadar(X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

500 Tbps of capacity: 16 years of scaling our global network

Tanner Ryan — Fri, 10 Apr 2026 18:00:05 GMT

^{Cloudflare’s global network and backbone in 2026.}

Cloudflare's network recently passed a major milestone: we crossed 500 terabits per second (Tbps) of external capacity.

When we say 500 Tbps, we mean total provisioned external interconnection capacity: the sum of every port facing a transit provider, private peering partner, Internet exchange, or Cloudflare Network Interconnect (CNI) port across all 330+ cities. This is not peak traffic. On any given day, our peak utilization is a fraction of that number. (The rest is our DDoS budget.)

It’s a long way from where we started. In 2010, we launched from a small office above a nail salon in Palo Alto, with a single transit provider and a reverse proxy you could set up by changing two nameservers.

The early days of transit and peering

Our first transit provider was nLayer Communications, a network most people now know as GTT. nLayer gave us our first capacity and our first hands-on company experience in peering relationships and the careful balance between cost and performance.

From there, we grew city by city: Chicago, Ashburn, San Jose, Amsterdam, Tokyo. Each new data center meant negotiating colocation contracts, pulling fiber, racking servers, and establishing peering through Internet exchanges. The Internet isn't actually a cloud, of course. It is a collection of specific rooms full of cables, and we spent years learning the nuances of every one of them.

Not every city was a straightforward deployment, having to deal with missing hardware, customs strikes, and even dental floss. In a single month in 2018, we opened up in 31 cities in 24 days: from Kathmandu and Baghdad to Reykjavík and Chișinău. When we opened our 127th data center in Macau, we were protecting 7 million Internet properties. Today, with data centers in 330+ cities, we protect more than 20% of the web.

When the network became the security layer

As our footprint grew, customers asked for more than just website caching. They needed to protect employees, replace aging Multiprotocol Label Switching (MPLS) circuits, and secure entire enterprise networks. Instead of traditional appliances, we built systems to establish secure tunnels to private subnets and advertise enterprise IP space directly from our global network via BGP.

The scale of threats grew in parallel. In 2025, we mitigated a 31.4 Tbps DDoS attack lasting 35 seconds. The source was the Aisuru-Kimwolf botnet, including many infected Android TVs. It was one of over 5,000 attacks we blocked that day. No engineer was paged.

A decade ago, an attack of that magnitude would have required nation-state resources to counter. Today, our network handles it in seconds without human intervention. That is what operating at a 500 Tbps scale requires: moving the intelligence to every server in our network so the network can defend itself.

How our network responds to an attack

Here is what actually happens when an attack hits our network. Packets arrive at the network interface card (NIC) and immediately enter an eXpress Data Path (XDP) program chain managed by xdpd, running in driver mode. Among the first programs in that chain is l4drop, which evaluates each packet against mitigation rules in extended Berkeley Packet Filter (eBPF). Those rules are generated by dosd, our denial of service daemon, which runs on every server in our fleet. Each dosd instance samples incoming traffic, builds a table of the heaviest hitters it sees, and broadcasts that table to every other instance in the colo. The result is a shared colo-wide view of traffic, and because every server works from the same data, they reach the same mitigation decision.

When dosd detects an attack pattern, the resulting rule is applied locally via l4drop and propagates globally via Quicksilver, our distributed key-value (KV) store, reaching every server in every data center within seconds. Only after surviving l4drop do packets reach Unimog, our Layer 4 (L4) load balancer, which distributes them across healthy servers in the data center. For Magic Transit customers routing enterprise network traffic through our edge, flowtrackd adds a further layer of stateful TCP inspection, tracking connection state and dropping packets that don't belong to legitimate flows.

The 31.4 Tbps attack we mitigated followed exactly this path. No traffic was backhauled to a centralized scrubbing center. No human intervened. Every server in the targeted data centers independently recognized the attack and began dropping malicious packets at line rate, before those packets consumed a single CPU cycle of application processing. The software is only half the story: none of it works if the ports aren't there to absorb the traffic in the first place.

A distributed developer platform

Running code on every server in our network was a natural consequence of controlling the full stack. If we already ran eBPF programs on every machine to drop attack traffic, we could run customer application code there too. That insight became Workers, and later KV and Durable Objects.

Our developer platform runs in every city we operate in, not in a handful of cloud regions. In 2025, we added Containers to Workers, so heavier workloads can run at the edge too. V8 isolates and custom filesystem layers minimize cold starts. Your code runs where your users are, on the same servers that drop attack traffic at line rate via l4drop. Attack traffic is dropped before it reaches the network stack. Your application never sees it.

Forward-looking protocols: IPv6, RPKI, ASPA

We were early adopters of IPv6 and Resource Public Key Infrastructure (RPKI). BGP hijacks cause real outages and security breaches. RPKI allows us to drop invalid routes from peers, ensuring traffic goes where it is supposed to. We sign Route Origin Authorizations (ROAs) for our prefixes and enforce Route Origin Validation on ingress. We reject RPKI-invalid routes, even when that occasionally breaks reachability to networks with misconfigured ROAs.

Autonomous System Provider Authorization (ASPA) is next. RPKI validates who owns a prefix. ASPA validates the path it took to get here. RPKI is a passport check at the destination, confirming the right owner, while ASPA is a flight manifest check: it verifies every network the traffic passed through. A route leak is like a passenger who boarded in the wrong city; RPKI would not catch it, but ASPA will.

Current ecosystem adoption for ASPA looks like RPKI did in 2015. We were one of the first networks to deploy RPKI at scale, and today, 867,000 prefixes in the global routing table have valid RPKI certificates, up from near zero a decade ago. At our scale, the protocols we choose have real consequences for the broader Internet. We push for adoption early because waiting means more hijacks and more leaks in the meantime.

AI agents and the evolving Internet

AI has changed what it means to have a presence on the web. For most of the Internet’s history, traffic was human-generated, by people clicking links in browsers. Today, AI crawlers, model training pipelines, and autonomous agents now account for more than 4% of all HTML requests across our network, comparable to Googlebot itself. "User action" crawling, where an AI visits a page because a human asked it a question, grew over 15x in 2025 alone.

AI crawlers behave differently than browsers at the infrastructure level. Browsers load a page and stop. Crawlers instead fetch every linked resource at maximum throughput with no pause between requests. At our scale, distinguishing legitimate AI crawling from actual attacks is a real engineering problem. Our detection systems use a combination of verified bot IP ranges, TLS fingerprinting, behavioral analysis, and robots.txt compliance signals to make that distinction, and to give site owners the data they need to decide which crawlers to allow.

At the TLS layer, for example, a legitimate browser presents a ClientHello with a predictable set of cipher suites, extensions, and ordering that matches its declared User-Agent. A crawler spoofing that User-Agent but using a stripped-down TLS library will present a different fingerprint, and that mismatch is one of the signals our systems use to classify the request before it reaches the origin.

Help us build the next 500 Tbps

What started above a nail salon in Palo Alto is now a 500 Tbps network in 330+ cities across 125+ countries, where every server runs our developer platform and security services, not just cache. That is sixteen years of architectural decisions compounding, and we owe it to the 13,000+ networks and partners who peer with us. We are not done.

If you are a network operator, peer with us. Our peering policy and interconnection details are on PeeringDB. If you are interested in embedding Cloudflare infrastructure directly within your network, reach out to our team at epp@cloudflare.com, to join the Edge Partner Program.

ASPA: making Internet routing more secure

Mingwei Zhang — Fri, 27 Feb 2026 06:00:00 GMT

Internet traffic relies on the Border Gateway Protocol (BGP) to find its way between networks. However, this traffic can sometimes be misdirected due to configuration errors or malicious actions. When traffic is routed through networks it was not intended to pass through, it is known as a route leak. We have written on our blog multiple times about BGP route leaks and the impact they have on Internet routing, and a few times we have even alluded to a future of path verification in BGP.

While the network community has made significant progress in verifying the final destination of Internet traffic, securing the actual path it takes to get there remains a key challenge for maintaining a reliable Internet. To address this, the industry is adopting a new cryptographic standard called ASPA (Autonomous System Provider Authorization), which is designed to validate the entire path of network traffic and prevent route leaks.

To help the community track the rollout of this standard, Cloudflare Radar has introduced a new ASPA deployment monitoring feature. This view allows users to observe ASPA adoption trends over time across the five Regional Internet Registries (RIRs), and view ASPA records and changes over time at the Autonomous System (AS) level.

What is ASPA?

To understand how ASPA works, it is helpful to look at how the Internet currently secures traffic destinations.

Today, networks use a secure infrastructure system called RPKI (Resource Public Key Infrastructure), which has seen significant deployment growth over the past few years. Within RPKI, networks publish specific cryptographic records called ROAs (Route Origin Authorizations). A ROA acts as a verifiable digital ID card, confirming that an Autonomous System (AS) is officially authorized to announce specific IP addresses. This addresses the "origin hijacks" issue, where one network attempts to impersonate another.

ASPA (Autonomous System Provider Authorization) builds directly on this foundation. While a ROA verifies the destination, an ASPA record verifies the journey.

When data travels across the Internet, it keeps a running log of every network it passes through. In BGP, this log is known as the AS_PATH (Autonomous System Path). ASPA provides networks with a way to officially publish a list of their authorized upstream providers within the RPKI system. This allows any receiving network to look at the AS_PATH, check the associated ASPA records, and verify that the traffic only traveled through an approved chain of networks.

A ROA helps ensure the traffic arrives at the correct destination, ASPA ensures the traffic takes an intended, authorized route to get there. Let’s take a look at how path evaluation actually works in practice.

Route leak detection with ASPA

How does ASPA know if a route is a detour? It relies on the hierarchy of the Internet.

In a healthy Internet routing topology (e.g. “valley-free” routing), traffic generally follows a specific path: it travels "up" from a customer to a large provider (like a major ISP), optionally crosses over to another big provider, and then flows "down" to the destination. You can visualize this as a “mountain” shape:

The Up-Ramp: Traffic starts at a Customer and travels "up" through larger and larger Providers (ISPs), where ISPs pay other ISPs to transit traffic for them.
The Apex: It reaches the top tier of the Internet backbone and may cross a single peering link.

The Down-Ramp: It travels "down" through providers to reach the destination Customer.

^{A visualization of "valley-free" routing. Routes propagate up to a provider, optionally across one peering link, and down to a customer.}

In this model, a route leak is like a valley, or dip. One type of such leak happens when traffic goes down to a customer and then unexpectedly tries to go back up to another provider.

This "down-and-up" movement is undesirable as customers aren't intended nor equipped to transit traffic between two larger network providers.

How ASPA validation works

ASPA gives network operators a cryptographic way to declare their authorized providers, enabling receiving networks to verify that an AS path follows this expected structure.

ASPA validates AS paths by checking the “chain of relationships” from both ends of the routes propagation:

Checking the Up-Ramp: The check starts at the origin and moves forward. At every hop, it asks: "Did this network authorize the next network as a Provider?" It keeps going until the chain stops.
Checking the Down-Ramp: It does the same thing from the destination of a BGP update, moving backward.

If the "Up" path and the "Down" path overlap or meet at the top, the route is Valid. The mountain shape is intact.

However, if the two valid paths do not meet, i.e. there is a gap in the middle where authorization is missing or invalid, ASPA reports such paths as problematic. That gap represents the "valley" or the leak.

Validation process example

Let’s look at a scenario where a network (AS65539) receives a bad route from a customer (AS65538).

The customer (AS65538) is trying to send traffic received from one provider (AS65537) "up" to another provider (AS65539), acting like a bridge between providers. This is a classic route leak. Now let’s walk the ASPA validation process.

We check the Up-Ramp: The original source (AS65536) authorizes its provider. (Check passes).
We check the Down-Ramp: We start from the destination and look back. We see the customer (AS65538).
The Mismatch: The up-ramp ends at AS65537, while the down-ramp ends at 65538. The two ramps do not connect.

Because the "Up" path and "Down" path fail to connect, the system flags this as ASPA Invalid. ASPA is required to do this path validation, as without signed ASPA objects in RPKI, we cannot find which networks are authorized to advertise which prefixes to whom. By signing a list of provider networks for each AS, we know which networks should be able to propagate prefixes laterally or upstream.

ASPA against forged-origin hijacks

ASPA can serve as an effective defense against forged-origin hijacks, where an attacker bypasses Route Origin Validation (ROV) by pretending and advertising a BGP path to a real origin prefix. Although the origin AS remains correct, the relationship between the hijacker and the victim is fabricated.

ASPA exposes this deception by allowing the victim network to cryptographically declare its actual authorized providers; because the hijacker is not on that authorized list, the path is rejected as invalid, effectively preventing the malicious redirection.

ASPA cannot fully protect against forged-origin hijacks, however. There is still at least one case where not even ASPA validation can fully prevent this type of attack on a network. An example of a forged-origin hijack that ASPA cannot account for is when a provider forges a path advertisement to their customer.

Essentially, a provider could “fake” a peering link with another AS to attract traffic from a customer with a short AS_PATH length, even when no such peering link exists. ASPA does not prevent this path forgery by the provider, because ASPA only works off of provider information and knows nothing specific about peering relationships.

So while ASPA can be an effective means of rejecting forged-origin hijack routes, there are still some rare cases where it will be ineffective, and those are worth noting.

Creating ASPA objects: just a few clicks away

Creating an ASPA object for your network (or Autonomous System) is now a simple process in registries like RIPE and ARIN. All you need is your AS number and the AS numbers of the providers you purchase Internet transit service from. These are the authorized upstream networks you trust to announce your IP addresses to the wider Internet. In the opposite direction, these are also the networks you authorize to send you a full routing table, which acts as the complete map of how to reach the rest of the Internet.

We’d like to show you just how easy creating an ASPA object is with a quick example.

Say we need to create the ASPA object for AS203898, an AS we use for our Cloudflare London office Internet. At the time of writing we have three Internet providers for the office: AS8220, AS2860, and AS1273. This means we will create an ASPA object for AS203898 with those three provider members in a list.

First, we log into the RIPE RPKI dashboard and navigate to the ASPA section:

Then, we click on “Create ASPA” for the object we want to create an ASPA object for. From there, we just fill in the providers for that AS.

It’s as simple as that. After just a short period of waiting, we can query the global RPKI ecosystem and find our ASPA object for AS203898 with the providers we defined.

It’s a similar story with ARIN, the only other Regional Internet Registries (RIRs) that currently supports the creation of ASPA objects. Log in to ARIN online, then navigate to Routing Security, and click “Manage RPKI”.

From there, you’ll be able to click on “Create ASPA”. In this example, we will create an object for another one of our ASNs, AS400095.

And that’s it – now we have created our ASPA object for AS40095 with provider AS0.

The “AS0” provider entry is special when used, and means the AS owner attests there are no valid upstream providers for their network. By definition this means every transit-free Tier-1 network should eventually sign an ASPA with only “AS0” in their object, if they truly only have peer and customer relationships.

New ASPA features in Cloudflare Radar

We have added a new ASPA deployment monitoring feature to Cloudflare Radar. The new ASPA deployment view allows users to examine the growth of ASPA adoption over time, with the ability to visualize trends across the five Regional Internet Registries (RIRs) based on AS registration.

We have also integrated ASPA data directly into the country/region and ASN routing pages. Users can now track how different locations are progressing in securing their infrastructure, based on the associated ASPA records from the customer ASNs registered locally.

There are also new features when you zoom into a particular Autonomous System (AS), for example AS203898.

We can see whether a network’s observed BGP upstream providers are ASPA authorized, their full list of providers in their ASPA object, and the timeline of ASPA changes that involve their AS.

The road to better routing security

With ASPA finally becoming a reality, we have our cryptographic upgrade for Internet path validation. However, those who have been around since the start of RPKI for route origin validation know this will be a long road to actually providing significant value on the Internet. Changes are needed to RPKI Relaying Party (RP) packages, signer implementations, RTR (RPKI-to-Router protocol) software, and BGP implementations to actually use ASPA objects and validate paths with them.

In addition to ASPA adoption, operators should also configure BGP roles as described within RFC9234. The BGP roles configured on BGP sessions will help future ASPA implementations on routers decide which algorithm to apply: upstream or downstream. In other words, BGP roles give us the power as operators to directly tie our intended BGP relationships with another AS to sessions with those neighbors. Check with your routing vendors and make sure they support RFC9234 BGP roles and OTC (Only-to-Customer) attribute implementation.

To get the most out of ASPA, we encourage everyone to create their ASPA objects for their AS. Creating and maintaining these ASPA objects requires careful attention. In the future, as networks use these records to actively block invalid paths, omitting a legitimate provider could cause traffic to be dropped. However, managing this risk is no different from how networks already handle Route Origin Authorizations (ROAs) today. ASPA is the necessary cryptographic upgrade for Internet path validation, and we’re happy it’s here!

Route leak incident on January 22, 2026

Bryton Herdes — Fri, 23 Jan 2026 14:00:00 GMT

On January 22, 2026, an automated routing policy configuration error caused us to leak some Border Gateway Protocol (BGP) prefixes unintentionally from a router at our data center in Miami, Florida. While the route leak caused some impact to Cloudflare customers, multiple external parties were also affected because their traffic was accidentally funnelled through our Miami data center location.

The route leak lasted 25 minutes, causing congestion on some of our backbone infrastructure in Miami, elevated loss for some Cloudflare customer traffic, and higher latency for traffic across these links. Additionally, some traffic was discarded by firewall filters on our routers that are designed to only accept traffic for Cloudflare services and our customers.

While we’ve written about route leaks before, we rarely find ourselves causing them. This route leak was the result of an accidental misconfiguration on a router in Cloudflare’s network, and only affected IPv6 traffic. We sincerely apologize to the users, customers, and networks we impacted yesterday as a result of this BGP route leak.

BGP route leaks

We have written multiple times about BGP route leaks, and we even record route leak events on Cloudflare Radar for anyone to view and learn from. To get a fuller understanding of what route leaks are, you can refer to this detailed background section, or refer to the formal definition within RFC7908.

Essentially, a route leak occurs when a network tells the broader Internet to send it traffic that it's not supposed to forward. Technically, a route leak occurs when a network, or Autonomous System (AS), appears unexpectedly in an AS path. An AS path is what BGP uses to determine the path across the Internet to a final destination. An example of an anomalous AS path indicative of a route leak would be finding a network sending routes received from a peer to a provider.

During this type of route leak, the rules of valley-free routing are violated, as BGP updates are sent from AS64501 to their peer (AS64502), and then unexpectedly up to a provider (AS64503). Oftentimes the leaker, in this case AS64502, is not prepared to handle the amount of traffic they are going to receive and may not even have firewall filters configured to accept all of the traffic coming in their direction. In simple terms, once a route update is sent to a peer or provider, it should only be sent further to customers and not to another peer or provider AS.

During the incident on January 22, we caused a similar kind of route leak, in which we took routes from some of our peers and redistributed them in Miami to some of our peers and providers. According to the route leak definitions in RFC7908, we caused a mixture of Type 3 and Type 4 route leaks on the Internet.

Timeline

Time (UTC)	Event
2026-01-22 19:52 UTC	A change that ultimately triggers the routing policy bug is merged in our network automation code repository
2026-01-22 20:25 UTC	Automation is run on single Miami edge-router resulting in unexpected advertisements to BGP transit providers and peers IMPACT START
2026-01-22 20:40 UTC	Network team begins investigating unintended route advertisements from Miami
2026-01-22 20:44 UTC	Incident is raised to coordinate response
2026-01-22 20:50 UTC	The bad configuration change is manually reverted by a network operator, and automation is paused for the router, so it cannot run again IMPACT STOP
2026-01-22 21:47 UTC	The change that triggered the leak is reverted from our code repository
2026-01-22 22:07 UTC	Automation is confirmed by operators to be healthy to run again on the Miami router, without the routing policy bug
2026-01-22 22:40 UTC	Automation is unpaused on the single router in Miami

What happened: the configuration error

On January 22, 2026, at 20:25 UTC, we pushed a change via our policy automation platform to remove the BGP announcements from Miami for one of our data centers in Bogotá, Colombia. This was purposeful, as we previously forwarded some IPv6 traffic through Miami toward the Bogotá data center, but recent infrastructure upgrades removed the need for us to do so.

This change generated the following diff (a program that compares configuration files in order to determine how or whether they differ):

[edit policy-options policy-statement 6-COGENT-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-COMCAST-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-GTT-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-LEVEL3-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PRIVATE-PEER-ANYCAST-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PUBLIC-PEER-ANYCAST-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-PUBLIC-PEER-OUT term ADV-SITELOCAL from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-TELEFONICA-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;
[edit policy-options policy-statement 6-TELIA-ACCEPT-EXPORT term ADV-SITELOCAL-GRE-RECEIVER from]
-      prefix-list 6-BOG04-SITE-LOCAL;

While this policy change looks innocent at a glance, only removing the prefix lists containing BOG04 unicast prefixes resulted in a policy that was too permissive:

policy-options policy-statement 6-TELIA-ACCEPT-EXPORT {
    term ADV-SITELOCAL-GRE-RECEIVER {
        from route-type internal;
        then {
            community add STATIC-ROUTE;
            community add SITE-LOCAL-ROUTE;
            community add MIA01;
            community add NORTH-AMERICA;
            accept;
        }
    }
}

The policy would now mark every prefix of type “internal” as acceptable, and proceed to add some informative communities to all matching prefixes. But more importantly, the policy also accepted the route through the policy filter, which resulted in the prefix — which was intended to be “internal” — being advertised externally. This is an issue because the “route-type internal” match in JunOS or JunOS EVO (the operating systems used by HPE Juniper Networks devices) will match any non-external route type, such as Internal BGP (IBGP) routes, which is what happened here.

As a result, all IPv6 prefixes that Cloudflare redistributes internally across the backbone were accepted by this policy, and advertised to all our BGP neighbors in Miami. This is unfortunately very similar to the outage we experienced in 2020, on which you can read more on our blog.

When the policy misconfiguration was applied at 20:25 UTC, a series of unintended BGP updates were sent from AS13335 to peers and providers in Miami. These BGP updates are viewable historically by looking at MRT files with the monocle tool or using RIPE BGPlay.

➜  ~ monocle search --start-ts 2026-01-22T20:24:00Z --end-ts 2026-01-22T20:30:00Z --as-path ".*13335[ \d$]32934$*"
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f077::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f091::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f16f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f17c::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f26f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f27c::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113609.854028|2801:14:9000::6:4112:1|64112|2a03:2880:f33f::/48|64112 22850 174 3356 13335 32934|IGP|2801:14:9000::6:4112:1|0|0|22850:65151|false|||pit.scl
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f17c::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f27c::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113583.095278|2001:504:d::4:9544:1|49544|2a03:2880:f091::/48|49544 1299 3356 13335 32934|IGP|2001:504:d::4:9544:1|0|0|1299:25000 1299:25800 49544:16000 49544:16106|false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f091::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f17c::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
A|1769113584.324483|2001:504:d::19:9524:1|199524|2a03:2880:f27c::/48|199524 1299 3356 13335 32934|IGP|2001:2035:0:2bfd::1|0|0||false|||route-views.isc
{trimmed}

^{In the monocle output seen above, we have the timestamp of our BGP update, followed by the next-hop in the announcement, the ASN of the network feeding a given route-collector, the prefix involved, and the AS path and BGP communities if any are found. At the end of the output per-line, we also find the route-collector instance.}

Looking at the first update for prefix 2a03:2880:f077::/48, the AS path is 64112 22850 174 3356 13335 32934. This means we (AS13335) took the prefix received from Meta (AS32934), our peer, and then advertised it toward Lumen (AS3356), one of our upstream transit providers. We know this is a route leak as routes received from peers are only meant to be readvertised to downstream (customer) networks, not laterally to other peers or up to providers.

As a result of the leak and the forwarding of unintended traffic into our Miami router from providers and peers, we experienced congestion on our backbone between Miami and Atlanta, as you can see in the graph below.

This would have resulted in elevated loss for some Cloudflare customer traffic, and higher latency than usual for traffic traversing these links. In addition to this congestion, the networks whose prefixes we leaked would have had their traffic discarded by firewall filters on our routers that are designed to only accept traffic for Cloudflare services and our customers. At peak, we discarded around 12Gbps of traffic ingressing our router in Miami for these non-downstream prefixes.

Follow-ups and preventing route leaks

We are big supporters and active contributors to efforts within the IETF and infrastructure community that strengthen routing security. We know firsthand how easy it is to cause a route leak accidentally, as evidenced by this incident.

Preventing route leaks will require a multi-faceted approach, but we have identified multiple areas in which we can improve, both short- and long-term.

In terms of our routing policy configurations and automation, we are:

Patching the failure in our routing policy automation that caused the route leak, and will mitigate this potential failure and others like it immediately
Implementing additional BGP community-based safeguards in our routing policies that explicitly reject routes that were received from providers and peers on external export policies
Adding automatic routing policy evaluation into our CI/CD pipelines that looks specifically for empty or erroneous policy terms
Improve early detection of issues with network configurations and the negative effects of an automated change

To help prevent route leaks in general, we are:

Validating routing equipment vendors' implementation of RFC9234 (BGP roles and the Only-to-Customer Attribute) in preparation for our rollout of the feature, which is the only way independent of routing policy to prevent route leaks caused at the local Autonomous System (AS)
Encouraging the long term adoption of RPKI Autonomous System Provider Authorization (ASPA), where networks could automatically reject routes that contain anomalous AS paths

Most importantly, we would again like to apologize for the impact we caused users and customers of Cloudflare, as well as any impact felt by external networks.

A closer look at a BGP anomaly in Venezuela

Bryton Herdes — Tue, 06 Jan 2026 08:00:00 GMT

As news unfolds surrounding the U.S. capture and arrest of Venezuelan leader Nicolás Maduro, a cybersecurity newsletter examined Cloudflare Radar data and took note of a routing leak in Venezuela on January 2.

We dug into the data. Since the beginning of December there have been eleven route leak events, impacting multiple prefixes, where AS8048 is the leaker. Although it is impossible to determine definitively what happened on the day of the event, this pattern of route leaks suggests that the CANTV (AS8048) network, a popular Internet Service Provider (ISP) in Venezuela, has insufficient routing export and import policies. In other words, the BGP anomalies observed by the researcher could be tied to poor technical practices by the ISP rather than malfeasance.

In this post, we’ll briefly discuss Border Gateway Protocol (BGP) and BGP route leaks, and then dig into the anomaly observed and what may have happened to cause it.

Background: BGP route leaks

First, let’s revisit what a BGP route leak is. BGP route leaks cause behavior similar to taking the wrong exit off of a highway. While you may still make it to your destination, the path may be slower and come with delays you wouldn’t otherwise have traveling on a more direct route.

Route leaks were given a formal definition in RFC7908 as “the propagation of routing announcement(s) beyond their intended scope.” Intended scope is defined using pairwise business relationships between networks. The relationships between networks, which in BGP we represent using Autonomous Systems (ASes), can be one of the following:

customer-provider: A customer pays a provider network to connect them and their own downstream customers to the rest of the Internet
peer-peer: Two networks decide to exchange traffic between one another, to each others’ customers, settlement-free (without payment)

In a customer-provider relationship, the provider will announce all routes to the customer. The customer, on the other hand, will advertise only the routes from their own customers and originating from their network directly.

In a peer-peer relationship, each peer will advertise to one another only their own routes and the routes of their downstream customers.

These advertisements help direct traffic in expected ways: from customers upstream to provider networks, potentially across a single peering link, and then potentially back down to customers on the far end of the path from their providers.

A valid path would look like the following that abides by the valley-free routing rule:

A route leak is a violation of valley-free routing where an AS takes routes from a provider or peer and redistributes them to another provider or peer. For example, a BGP path should never go through a “valley” where traffic goes up to a provider, and back down to a customer, and then up to a provider again. There are different types of route leaks defined in RFC7908, but a simple one is the Type 1: Hairpin route leak between two provider networks by a customer.

In the figure above, AS64505 takes routes from one of its providers and redistributes them to their other provider. This is unexpected, since we know providers should not use their customer as an intermediate IP transit network. AS64505 would become overwhelmed with traffic, as a smaller network with a smaller set of backbone and network links than its providers. This can become very impactful quickly.

Route leak by AS8048 (CANTV)

Now that we have reminded ourselves what a route leak is in BGP, let’s examine what was hypothesized in the newsletter post. The post called attention to a few route leak anomalies on Cloudflare Radar involving AS8048. On the Radar page for this leak, we see this information:

We see the leaker AS, which is AS8048 — CANTV, Venezuela’s state-run telephone and Internet Service Provider. We observe that routes were taken from one of their providers AS6762 (Sparkle, an Italian telecom company) and then redistributed to AS52320 (V.tal GlobeNet, a Colombian network service provider). This is definitely a route leak.

The newsletter suggests “BGP shenanigans” and posits that such a leak could be exploited to collect intelligence useful to government entities.

While we can’t say with certainty what caused this route leak, our data suggests that its likely cause was more mundane. That’s in part because BGP route leaks happen all of the time, and they have always been part of the Internet — most often for reasons that aren’t malicious.

To understand more, let’s look closer at the impacted prefixes and networks. The prefixes involved in the leak were all originated by AS21980 (Dayco Telecom, a Venezuelan company):

The prefixes are also all members of the same 200.74.224.0/20 subnet, as noted by the newsletter author. Much more intriguing than this, though, is the relationship between the originating network AS21980 and the leaking network AS8048: AS8048 is a provider of AS21980.

The customer-provider relationship between AS8048 and AS21980 is visible in both Cloudflare Radar and bgp.tools AS relationship interference data. We can also get a confidence score of the AS relationship using the monocle tool from BGPKIT, as you see here:

➜ ~ monocle as2rel 8048 21980 Explanation: - connected: % of 1813 peers that see this AS relationship - peer: % where the relationship is peer-to-peer - as1_upstream: % where ASN1 is the upstream (provider) - as2_upstream: % where ASN2 is the upstream (provider)

Data source: https://data.bgpkit.com/as2rel/as2rel-latest.json.bz2

╭──────┬───────┬───────────┬──────┬──────────────┬──────────────╮ │ asn1 │ asn2 │ connected │ peer │ as1_upstream │ as2_upstream │ ├──────┼───────┼───────────┼──────┼──────────────┼──────────────┤ │ 8048 │ 21980 │ 9.9% │ 0.6% │ 9.4% │ 0.0% │ ╰──────┴───────┴───────────┴──────┴──────────────┴──────────────╯

While only 9.9% of route collectors see these two ASes as adjacent, almost all of the paths containing them reflect AS8048 as an upstream provider for AS21980, meaning confidence is high in the provider-customer relationship between the two.

Many of the leaked routes were also heavily prepended with AS8048, meaning it would have been potentially less attractive for routing when received by other networks. Prepending is the padding of an AS more than one time in an outbound advertisement by a customer or peer, to attempt to switch traffic away from a particular circuit to another. For example, many of the paths during the leak by AS8048 looked like this: “52320,8048,8048,8048,8048,8048,8048,8048,8048,8048,23520,1299,269832,21980”.

You can see that AS8048 has sent their AS multiple times in an advertisement to AS52320, because by means of BGP loop prevention the path would never actually travel in and out of AS8048 multiple times in a row. A non-prepended path would look like this: “52320,8048,23520,1299,269832,21980”.

If AS8048 was intentionally trying to become a man-in-the-middle (MITM) for traffic, why would they make the BGP advertisement less attractive instead of more attractive? Also, why leak prefixes to try and MITM traffic when you’re already a provider for the downstream AS anyway? That wouldn’t make much sense.

The leaks from AS8048 also surfaced in multiple separate announcements, each around an hour apart on January 2, 2026 between 15:30 and 17:45 UTC, suggesting they may have been having network issues that surfaced in a routing policy issue or a convergence-based mishap.

It is also noteworthy that these leak events begin over twelve hours prior to the U.S. military strikes in Venezuela. Leaks that impact South American networks are common, and we have no reason to believe, based on timing or the other factors I have discussed, that the leak is related to the capture of Maduro several hours later.

In fact, looking back the past two months, we can see plenty of leaks by AS8048 that are just like this one, meaning this is not a new BGP anomaly:

You can see above in the history of Cloudflare Radar’s route leak alerting pipeline that AS8048 is no stranger to Type 1 hairpin route leaks. Since the beginning of December alone there have been eleven route leak events where AS8048 is the leaker.

From this we can draw a more innocent possible explanation about the route leak: AS8048 may have configured too loose of export policies facing at least one of their providers, AS52320. And because of that, redistributed routes belong to their customer even when the direct customer BGP routes were missing. If their export policy toward AS52320 only matched on IRR-generated prefix list and not a customer BGP community tag, for example, it would make sense why an indirect path toward AS6762 was leaked back upstream by AS8048.

These types of policy errors are something RFC9234 and the Only-to-Customer (OTC) attribute would help with considerably, by coupling BGP more tightly to customer-provider and peer-peer roles, when supported by all routing vendors. I will save the more technical details on RFC9234 for a follow-up blog post.

The difference between origin and path validation

The newsletter also calls out as “notable” that Sparkle (AS6762) does not implement RPKI (Resource Public Key Infrastructure) Route Origin Validation (ROV). While it is true that AS6762 appears to have an incomplete deployment of ROV and is flagged as “unsafe” on isbgpsafeyet.com because of it, origin validation would not have prevented this BGP anomaly in Venezuela.

It is important to separate BGP anomalies into two categories: route misoriginations, and path-based anomalies. Knowing the difference between the two helps to understand the solution for each. Route misoriginations, often called BGP hijacks, are meant to be fixed by RPKI Route Origin Validation (ROV) by making sure the originator of a prefix is who rightfully owns it. In the case of the BGP anomaly described in this post, the origin AS was correct as AS21980 and only the path was anomalous. This means ROV wouldn’t help here.

Knowing that, we need path-based validation. This is what Autonomous System Provider Authorization (ASPA), an upcoming draft standard in the IETF, is going to provide. The idea is similar to RPKI Route Origin Authorizations (ROAs) and ROV: create an ASPA object that defines a list of authorized providers (upstreams) for our AS, and everyone will use this to invalidate route leaks on the Internet at various vantage points. Using a concrete example, AS6762 is a Tier-1 transit-free network, and they would use the special reserved “AS0” member in their ASPA signed object to communicate to the world that they have no upstream providers, only lateral peers and customers. Then, AS52320, the other provider of AS8048, would see routes from their customer with “6762” in the path and reject them by performing an ASPA verification process.

ASPA is based on RPKI and is exactly what would help prevent route leaks similar to the one we observed in Venezuela.

A safer BGP, built together

We felt it was important to offer an alternative explanation for the BGP route leak by AS8048 in Venezuela that was observed on Cloudflare Radar. It is helpful to understand that route leaks are an expected side effect of BGP historically being based entirely on trust and carefully-executed business relationship-driven intent.

While route leaks could be done with malicious intent, the data suggests this event may have been an accident caused by a lack of routing export and import policies that would prevent it. This is why to have a safer BGP and Internet we need to work together and drive adoption of RPKI-based ASPA, for which RIPE recently released object creation, on the wide Internet. It will be a collaborative effort, just like RPKI has been for origin validation, but it will be worth it and prevent BGP incidents such as the one in Venezuela.

In addition to ASPA, we can all implement simpler mechanisms such as Peerlock and Peerlock-lite as operators, which sanity-checks received paths for obvious leaks. One especially promising initiative is the adoption of RFC9234, which should be used in addition to ASPA for preventing route leaks with the establishing of BGP roles and a new Only-To-Customer (OTC) attribute. If you haven’t already asked your routing vendors for an implementation of RFC9234 to be on their roadmap: please do. You can help make a big difference.

Update: Sparkle (AS6762) finished RPKI ROV deployment and was marked safe on February 3, 2026.

BGP zombies and excessive path hunting

Bryton Herdes — Fri, 31 Oct 2025 15:30:00 GMT

Here at Cloudflare, we’ve been celebrating Halloween with some zombie hunting of our own. The zombies we’d like to remove are those that disrupt the core framework responsible for how the Internet routes traffic: BGP (Border Gateway Protocol).

A BGP zombie is a silly name for a route that has become stuck in the Internet’s Default-Free Zone, aka the DFZ: the collection of all internet routers that do not require a default route, potentially due to a missed or lost prefix withdrawal.

The underlying root cause of a zombie could be multiple things, spanning from buggy software in routers or just general route processing slowness. It’s when a BGP prefix is meant to be gone from the Internet, but for one reason or another it becomes a member of the undead and hangs around for some period of time.

The longer these zombies linger, the more they create operational impact and become a real headache for network operators. Zombies can lead packets astray, either by trapping them inside of route loops or by causing them to take an excessively scenic route. Today, we’d like to celebrate Halloween by covering how BGP zombies form and how we can lessen the likelihood that they wreak havoc on Internet traffic.

Path hunting

To understand the slowness that can often lead to BGP zombies, we need to talk about path hunting. Path hunting occurs when routers running BGP exhaustively search for the best path to a prefix as determined by Longest Prefix Matching (LPM) and BGP routing attributes like path length and local preference. This becomes relevant in our observations of exactly how routes become stuck, for how long they become stuck, and how visible they are on the Internet.

For example, path hunting happens when a more-specific BGP prefix is withdrawn from the global routing table, and networks need to fallback to a less-specific BGP advertisement. In this example, we use 2001:db8::/48 for the more-specific BGP announcement and 2001:db8::/32 for the less-specific prefix. When the /48 is withdrawn by the originating Autonomous System (AS), BGP routers have to recognize that route as missing and begin routing traffic to IPs such as 2001:db8::1 via the 2001:db8::/32 route, which still remains while the prefix 2001:db8::/48 is gone.

Let’s see what this could look like in action with a few diagrams.

_{Diagram 1: Active 2001:db8::/48 route}

In this initial state, 2001:db8::/48 is used actively for traffic forwarding, which all flows through AS13335 on the way to AS64511. In this case, AS64511 would be a BYOIP customer of Cloudflare. AS64511 also announces a backup route to another Internet Service Provider (ISP), AS64510, but this route is not active even in AS64510’s routing table for forwarding to 2001:db8::1 because 2001:db8::/48 is a longer prefix match when compared to 2001:db8::/32.

Things get more interesting when AS64510 signals for 2001:db8::/48 to be withdrawn by Cloudflare (AS13335), perhaps because a DDoS attack is over and the customer opts to use Cloudflare only when they are actively under attack.

When the customer signals to Cloudflare (via BGP Control or API call) to withdraw the 2001:db8::/48 announcement, all BGP routers have to converge upon this update, which involves path hunting. AS13335 sends a BGP withdrawal message for 2001:db8::/48 to its directly-connected BGP neighbors. While the news of withdrawal may travel quickly from AS13335 to the other networks, news may travel more quickly to some of the neighbors than others. This means that until everyone has received and processed the withdrawal, networks may try routing through one another to reach the 2001:db8::/48 prefix – even after AS13335 has withdrawn it.

_{Diagram 2: 2001:db8::/48 route withdrawn via AS13335}

Imagine AS64501 is a little slower than the rest – perhaps due to using older hardware, hardware being overloaded, a software bug, specific configuration settings, poor luck, or some other factor – and still has not processed the withdrawal of the /48. This in itself could be a BGP zombie, since the route is stuck for a small period. Our pings toward 2001:db8::1 are never able to actually reach AS64511, because AS13335 knows the /48 is meant to be withdrawn, but some routers carrying a full table have not yet converged upon that result.

The length of time spent path hunting is amplified by something called the Minimum Route Advertisement Interval (MRAI). The MRAI specifies the minimum amount of time between BGP advertisement messages from a BGP router, meaning it introduces a purposeful number of seconds of delay between each BGP advertisement update. RFC4271 recommends an MRAI value of 30-seconds for eBGP updates, and while this can cut down on the chattiness of BGP, or even potential oscillation of updates, it also makes path hunting take longer.

At the next cycle of path hunting, even AS64501, which was previously still pointing toward a nonexistent /48 route from AS13335, should find the /32 advertisement is all that is left toward 2001:db8::1. Once it has done so, the traffic flow will become the following:

_{Diagram 3: Routing fallback to 2001:db8::/32 and 2001:db8::/48 is gone from DFZ}

This would mean BGP path hunting is over, and the Internet has realized that the 2001:db8::/32 is the best route available toward 2001:db8::1, and that 2001:db8::/48 is really gone. While in this example we’ve purposely made path hunting only last two cycles, in reality it can be far more, especially with how highly connected AS13335 is to thousands of peer networks and Tier-1’s globally.

Now that we’ve discussed BGP path hunting and how it works, you can probably already see how a BGP zombie outbreak can begin and how routing tables can become stuck for a lengthy period of time. Excessive BGP path hunting for a previously-known more-specific prefix can be an early indicator that a zombie could follow.

Spawning a zombie

Zombies have captured our attention more recently as they were noticed by some of our customers leveraging Bring-Your-Own-IP (BYOIP) on-demand advertisement for Magic Transit. BYOIP may be configured in two modes: "always-on", in which a prefix is continuously announced, or "on-demand", where a prefix is announced only when a customer chooses to. For some on-demand customers, announcement and withdrawal cycles may be a more frequent occurrence, which can lead to an increase in BGP zombies.

With that in mind and also knowing how path hunting works, let’s spawn our own zombie onto the Internet. To do so, we’ll take a spare block of IPv4 and IPv6 and announce them like so:

Once the routes are announced and stable, we’ll then proceed to withdraw the more specific routes advertised via Cloudflare globally. With a few quick clicks, we’ve successfully re-animated the dead.

Variant A: Ghoulish Gateways

One place zombies commonly occur is between upstream ISPs. When one router in a given ISP’s network is a little slower to update, routes can become stuck.

Take, for example, the following loop we observed between two of our upstream partners:

7. be2431.ccr31.sjc04.atlas.cogentco.com
8. tisparkle.sjc04.atlas.cogentco.com
9. 213.144.177.184
10. 213.144.177.184
11. 89.221.32.227
12. (waiting for reply)
13. be2749.rcr71.goa01.atlas.cogentco.com
14. be3219.ccr31.mrs02.atlas.cogentco.com
15. be2066.agr21.mrs02.atlas.cogentco.com
16. telecomitalia.mrs02.atlas.cogentco.com
17. 213.144.177.186
18. 89.221.32.227

Or this loop - observed on the same withdrawal test - between two different providers:

15. if-bundle-12-2.qcore2.pvu-paris.as6453.net
16. if-bundle-56-2.qcore1.fr0-frankfurt.as6453.net
17. if-bundle-15-2.qhar1.fr0-frankfurt.as6453.net
18. 195.219.223.11
19. 213.144.177.186
20. 195.22.196.137
21. 213.144.177.186
22. 195.22.196.137

Variant B: Undead LAN (Local Area Network)

Simultaneously, zombies can occur entirely within a given network. When a route is withdrawn from Cloudflare’s network, each device in our network must individually begin the process of withdrawing the route. While this is generally a smooth process, things can still become stuck.

Take, for instance, a situation where one router inside of our network has not yet fully processed the withdrawal. Connectivity partners will continue routing traffic towards that router (as they have not yet received the withdrawal) while no host remains behind the router which is capable of actually processing the traffic. The result is an internal-only looping path:

10. 192.0.2.112
11. 192.0.2.113
12. 192.0.2.112
13. 192.0.2.113
14. 192.0.2.112
15. 192.0.2.113
16. 192.0.2.112
17. 192.0.2.113
18. 192.0.2.112
19. 192.0.2.113

Unlike most fictionally-depicted hoards of the walking dead, our highly-visible zombie has a limited lifetime in most major networks – in this instance, only around around 6 minutes, after which most had re-converged around the less-specific as the best path. Sadly, this is on the shorter side – in some cases, we have seen long-lived zombies cause reachability issues for more than 10 minutes. It’s safe to say this is longer than most network operators would expect BGP convergence to take in a normal situation.

But, you may ask – is this the excessive path hunting we talked about earlier, or a BGP zombie? Really, it depends on the expectation and tolerance around how long BGP convergence should take to process the prefix withdrawal. In any case, even over 30 minutes after our withdrawal of our more-specific prefix, we are able to see zombie routes in the route-views public collectors easily:

~ % monocle search --start-ts 2025-10-28T12:40:13Z --end-ts 2025-10-28T13:00:13Z --prefix 198.18.0.0/24
A|1761656125.550447|206.82.105.116|54309|198.18.0.0/24|54309 13335 395747|IGP|206.82.104.31|0|0|54309:111|false|||route-views.ny

You might argue that six to eleven minutes (or more) is a reasonable time for worst-case BGP convergence in the Tier-1 network layer, though that itself seems like a stretch. Even setting that aside, our data shows that very real BGP zombies exist in the global routing table, and they will negatively impact traffic. Curiously, we observed the path hunting delay is worse on IPv4, with the longest observed IPv6 impact in major (Tier-1) networks being just over 4 minutes. One could speculate this is in part due to the much higher number of IPv4 prefixes in the Internet global routing table than the IPv6 global table, and how BGP speakers handle them separately.

_{Source: RIPEstat’s BGPlay}

Part of the delay appears to originate from how interconnected AS13335 is; being heavily peered with a large portion of the Internet increases the likelihood of a route becoming stuck in a given location. Given that, perhaps a zombie would be shorter-lived if we operated in the opposite direction: announcing a less-specific persistently to 13335 and announcing more specifics via our local ISP during normal operation. Since the withdrawal will come from what is likely a less well-peered network, the time-to-convergence may be shorter:

Indeed, as predicted, we still get a stuck route, and it only lives for around 20 seconds in the Tier-1 network layer:

19. be12488.ccr42.ams03.atlas.cogentco.com
20. 38.88.214.142
21. be2020.ccr41.ams03.atlas.cogentco.com
22. 38.88.214.142
23. (waiting for reply)
24. 38.88.214.142
25. (waiting for reply)
26. 38.88.214.142

Unfortunately, that 20 seconds is still an impactful 20 seconds - while better, it’s not where we want to be. The exact length of time will depend on the native ISP networks one is connected with, and it could certainly ease into the minutes worth of stuck routing.

In both cases, the initial time-to-announce yielded no loss, nor was a zombie created, as both paths remained valid for the entirety of their initial lifetime. Zombies were only created when a more specific prefix was fully withdrawn. A newly-announced route is not subject to path hunting in the same way a withdrawn more-specific route is. As they say, good (new) news travels fast.

Lessening the zombie outbreak

Our findings lead us to believe that the withdrawal of a more-specific prefix may lead to zombies running rampant for longer periods of time. Because of this, we are exploring some improvements that make the consequences of BGP zombie routing less impactful for our customers relying on our on-demand BGP functionality.

For the traffic that does reach Cloudflare with stuck routes, we will introduce some BGP traffic forwarding improvements internally that allow for a more graceful withdrawal of traffic, even if routes are erroneously pointing toward us. In many ways, this will closely resemble the BGP well-known no-export community’s functionality from our servers running BGP. This means even if we receive traffic from external parties due to stuck routing, we will still have the opportunity to deliver traffic to our far-end customers over a tunneled connection or via a Cloudflare Network Interconnect (CNI). We look forward to reporting back the positive impact after making this improvement for a more graceful draining of traffic by default.

For the traffic that does not reach Cloudflare’s edge, and instead loops between network providers, we need to use a different approach. Since we know more-specific to less-specific prefix routing fallback is more prone to BGP zombie outbreak, we are encouraging customers to instead use a multi-step draining process when they want traffic drained from the Cloudflare edge for an on-demand prefix without introducing route loops or blackhole events. The draining process when removing traffic for a BYOIP prefix from Cloudflare should look like this:

The customer is already announcing an example prefix from Cloudflare, ex. 198.18.0.0/24
The customer begins natively announcing the prefix 198.18.0.0/24 (i.e. the same-length as the prefix they are advertising via Cloudflare) from their network to the Internet Service Providers that they wish to fail over traffic to.
After a few minutes, the customer signals BGP withdrawal from Cloudflare for the 198.18.0.0/24 prefix.

The result is a clean cut over: impactful zombies are avoided because the same-length prefix (198.18.0.0/24) remains in the global routing table. Excessive path hunting is avoided because instead of routers needing to aggressively seek out a missing more-specific prefix match, they can fallback to the same-length announcement that persists in the routing table from the natively-originated path to the customer’s network.

_{Source: RIPEstat’s BGPlay}

What next?

We are going to continue to refine our methods of measuring BGP zombies, so you can look forward to more insights in the future. There is also work from others in the community around zombie measurement that is interesting and producing useful data. In terms of combatting the software bugs around BGP zombie creation, routing vendors should implement RFC9687, the BGP SendHoldTimer. The general idea is that a local router can detect via the SendHoldTimer if the far-end router stops processing BGP messages unexpectedly, which lowers the possibility of zombies becoming stuck for long periods of time.

In addition, it’s worth keeping in mind our observations made in this post about more-specific prefix announcements and excessive path hunting. If as a network operator you rely on more-specific BGP prefix announcements for failover, or for traffic engineering, you need to be aware that routes could become stuck for a longer period of time before full BGP convergence occurs.

If you’re interested in problems like BGP zombies, consider coming to work at Cloudflare or applying for an internship. Together we can help build a better Internet!

Monitoring AS-SETs and why they matter

Mingwei Zhang — Fri, 26 Sep 2025 14:00:00 GMT

Introduction to AS-SETs

An AS-SET, not to be confused with the recently deprecated BGP AS_SET, is an Internet Routing Registry (IRR) object that allows network operators to group related networks together. AS-SETs have been used historically for multiple purposes such as grouping together a list of downstream customers of a particular network provider. For example, Cloudflare uses the AS13335:AS-CLOUDFLARE AS-SET to group together our list of our own Autonomous System Numbers (ASNs) and our downstream Bring-Your-Own-IP (BYOIP) customer networks, so we can ultimately communicate to other networks whose prefixes they should accept from us.

In other words, an AS-SET is currently the way on the Internet that allows someone to attest the networks for which they are the provider. This system of provider authorization is completely trust-based, meaning it's not reliable at all, and is best-effort. The future of an RPKI-based provider authorization system is coming in the form of ASPA (Autonomous System Provider Authorization), but it will take time for standardization and adoption. Until then, we are left with AS-SETs.

Because AS-SETs are so critical for BGP routing on the Internet, network operators need to be able to monitor valid and invalid AS-SET memberships for their networks. Cloudflare Radar now introduces a transparent, public listing to help network operators in our routing page per ASN.

AS-SETs and building BGP route filters

AS-SETs are a critical component of BGP policies, and often paired with the expressive Routing Policy Specification Language (RPSL) that describes how a particular BGP ASN accepts and propagates routes to other networks. Most often, networks use AS-SET to express what other networks should accept from them, in terms of downstream customers.

Back to the AS13335:AS-CLOUDFLARE example AS-SET, this is published clearly on PeeringDB for other peering networks to reference and build filters against.

When turning up a new transit provider service, we also ask the provider networks to build their route filters using the same AS-SET. Because BGP prefixes are also created in IRR registries using the route or route6 objects, peers and providers now know what BGP prefixes they should accept from us and deny the rest. A popular tool for building prefix-lists based on AS-SETs and IRR databases is bgpq4, and it’s one you can easily try out yourself.

For example, to generate a Juniper router’s IPv4 prefix-list containing prefixes that AS13335 could propagate for Cloudflare and its customers, you may use:

% bgpq4 -4Jl CLOUDFLARE-PREFIXES -m24 AS13335:AS-CLOUDFLARE | head -n 10
policy-options {
replace:
 prefix-list CLOUDFLARE-PREFIXES {
    1.0.0.0/24;
    1.0.4.0/22;
    1.1.1.0/24;
    1.1.2.0/24;
    1.178.32.0/19;
    1.178.32.0/20;
    1.178.48.0/20;

^{Restricted to 10 lines, actual output of prefix-list would be much greater}

This prefix list would be applied within an eBGP import policy by our providers and peers to make sure AS13335 is only able to propagate announcements for ourselves and our customers.

How accurate AS-SETs prevent route leaks

Let’s see how accurate AS-SETs can help prevent route leaks with a simple example. In this example, AS64502 has two providers – AS64501 and AS64503. AS64502 has accidentally messed up their BGP export policy configuration toward the AS64503 neighbor, and is exporting all routes, including those it receives from their AS64501 provider. This is a typical Type 1 Hairpin route leak.

Fortunately, AS64503 has implemented an import policy that they generated using IRR data including AS-SETs and route objects. By doing so, they will only accept the prefixes that originate from the AS Cone of AS64502, since they are their customer. Instead of having a major reachability or latency impact for many prefixes on the Internet because of this route leak propagating, it is stopped in its tracks thanks to the responsible filtering by the AS64503 provider network. Again it is worth keeping in mind the success of this strategy is dependent upon data accuracy for the fictional AS64502:AS-CUSTOMERS AS-SET.

Monitoring AS-SET misuse

Besides using AS-SETs to group together one’s downstream customers, AS-SETs can also represent other types of relationships, such as peers, transits, or IXP participations.

For example, there are 76 AS-SETs that directly include one of the Tier-1 networks, Telecom Italia / Sparkle (AS6762). Judging from the names of the AS-SETs, most of them are representing peers and transits of certain ASNs, which includes AS6762. You can view this output yourself at https://radar.cloudflare.com/routing/as6762#irr-as-sets

There is nothing wrong with defining AS-SETs that contain one’s peers or upstreams as long as those AS-SETs are not submitted upstream for customer->provider BGP session filtering. In fact, an AS-SET for upstreams or peer-to-peer relationships can be useful for defining a network’s policies in RPSL.

However, some AS-SETs in the AS6762 membership list such as AS-10099 look to attest customer relationships.

% whois -h rr.ntt.net AS-10099 | grep "descr"
descr:          CUHK Customer

We know AS6762 is transit free and this customer membership must be invalid, so it is a prime example of AS-SET misuse that would ideally be cleaned up. Many Internet Service Providers and network operators are more than happy to correct an invalid AS-SET entry when asked to. It is reasonable to look at each AS-SET membership like this as a potential risk of having higher route leak propagation to major networks and the Internet when they happen.

AS-SET information on Cloudflare Radar

Cloudflare Radar is a hub that showcases global Internet traffic, attack, and technology trends and insights. Today, we are adding IRR AS-SET information to Radar’s routing section, freely available to the public via both website and API access. To view all AS-SETs an AS is a member of, directly or indirectly via other AS-SETs, a user can visit the corresponding AS’s routing page. For example, the AS-SETs list for Cloudflare (AS13335) is available at https://radar.cloudflare.com/routing/as13335#irr-as-sets

The AS-SET data on IRR contains only limited information like the AS members and AS-SET members. Here at Radar, we also enhance the AS-SET table with additional useful information as follows.

Inferred ASN shows the AS number that is inferred to be the creator of the AS-SET. We use PeeringDB AS-SET information match if available. Otherwise, we parse the AS-SET name to infer the creator.
IRR Sources shows which IRR databases we see the corresponding AS-SET. We are currently using the following databases: AFRINIC, APNIC, ARIN, LACNIC, RIPE, RADB, ALTDB, NTTCOM, and TC.
AS Members and AS-SET members show the count of the corresponding types of members.
AS Cone is the count of the unique ASNs that are included by the AS-SET directly or indirectly.
Upstreams is the count of unique AS-SETs that includes the corresponding AS-SET.

Users can further filter the table by searching for a specific AS-SET name or ASN. A toggle to show only direct or indirect AS-SETs is also available.

In addition to listing AS-SETs, we also provide a tree-view to display how an AS-SET includes a given ASN. For example, the following screenshot shows how as-delta indirectly includes AS6762 through 7 additional other AS-SETs. Users can copy or download this tree-view content in the text format, making it easy to share with others.

We built this Radar feature using our publicly available API, the same way other Radar websites are built. We have also experimented using this API to build additional features like a full AS-SET tree visualization. We encourage developers to give this API (and other Radar APIs) a try, and tell us what you think!

Looking ahead

We know AS-SETs are hard to keep clean of error or misuse, and even though Radar is making them easier to monitor, the mistakes and misuse will continue. Because of this, we as a community need to push forth adoption of RFC9234 and implementations of it from the major vendors. RFC9234 embeds roles and an Only-To-Customer (OTC) attribute directly into the BGP protocol itself, helping to detect and prevent route leaks in-line. In addition to BGP misconfiguration protection with RFC9234, Autonomous System Provider Authorization (ASPA) is still making its way through the IETF and will eventually help offer an authoritative means of attesting who the actual providers are per BGP Autonomous System (AS).

If you are a network operator and manage an AS-SET, you should seriously consider moving to hierarchical AS-SETs if you have not already. A hierarchical AS-SET looks like AS13335:AS-CLOUDFLARE instead of AS-CLOUDFLARE, but the difference is very important. Only a proper maintainer of the AS13335 ASN can create AS13335:AS-CLOUDFLARE, whereas anyone could create AS-CLOUDFLARE in an IRR database if they wanted to. In other words, using hierarchical AS-SETs helps guarantee ownership and prevent the malicious poisoning of routing information.

While keeping track of AS-SET memberships seems like a chore, it can have significant payoffs in preventing BGP-related incidents such as route leaks. We encourage all network operators to do their part in making sure the AS-SETs you submit to your providers and peers to communicate your downstream customer cone are accurate. Every small adjustment or clean-up effort in AS-SETs could help lessen the impact of a BGP incident later.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (X), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Bringing connections into view: real-time BGP route visibility on Cloudflare Radar

Mingwei Zhang — Wed, 21 May 2025 13:00:00 GMT

The Internet relies on the Border Gateway Protocol (BGP) to exchange IP address reachability information. This information outlines the path a sender or router can use to reach a specific destination. These paths, conveyed in BGP messages, are sequences of Autonomous System Numbers (ASNs), with each ASN representing an organization that operates its own segment of Internet infrastructure.

Throughout this blog post, we'll use the terms "BGP routes" or simply "routes" to refer to these paths. In essence, BGP functions by enabling autonomous systems to exchange routes to IP address blocks (“IP prefixes”), allowing different entities across the Internet to construct their routing tables.

When network operators debug reachability issues or assess a resource's global reach, BGP routes are often the first thing they examine. Therefore, it’s critical to have an up-to-date view of the routes toward the IP prefixes of interest. Some networks provide tools called "looking glasses" — public routing information services offering data directly from their own BGP routers. These allow external operators to examine routes from that specific network's perspective. Furthermore, services like bgp.tools, bgp.he.net, RouteViews, or the NLNOG RING looking glass offer aggregated, looking glass-like lookup capabilities, drawing on data sources from multiple organizations rather than just a single one.

However, individual looking glass instances offer a limited scope, typically restricted to the infrastructure of the service provider's network. While aggregated routing information services provide broader vantage points, they often lack the API access necessary for building automated tools on top of them. For example, systems designed for automated tasks, such as BGP leak or hijack detection, depend on programmatic API access.

We're excited to introduce Cloudflare Radar's new real-time BGP route lookup service, described below. Built using public data sources, this service provides visualizations of real-time routes directly on the corresponding IP prefix pages within Radar (see the page for 1.1.1.0/24 as an example). We are also offering API access through our free-to-use Cloudflare Radar API, empowering developers to leverage this data to build their own innovative systems and tools.

Cloudflare Radar provides real-time routes

We are excited to announce the launch of our new real-time BGP route lookup service, now accessible through both Cloudflare Radar web interface and the Cloudflare Radar API. This enhancement provides users with a near instantaneous view into global BGP routing data.

Cloudflare Radar prefix pages

Cloudflare Radar's real-time routes feature now offers a Sankey diagram illustrating the BGP routes for a given prefix. To minimize visual complexity, the visualization displays routes directed towards the Tier 1 networks. For example, the diagram below shows that 1.1.1.0/24 is announced by AS13335 (Cloudflare) and that Cloudflare has direct connections to almost all U.S.-based and international Tier 1 network providers.

Expanding on this more concise view, users also have the option to 'Show full paths' and visualize every BGP route from the prefix of interest to the collectors. (The role of the collectors in gathering this data is discussed below.) The interactive view allows panning and zooming, and hovering over the links provides tooltip information on which collector saw the route and when it was last updated.

For both views, the prefix origin table is displayed above the route’s visualization. The table shows the originating Autonomous System (AS), the visibility percentage (representing the proportion of route collectors observing the origin ASN announcement), and RPKI validation outcomes.

During a recently detected BGP misconfiguration, we saw two origin ASNs for a prefix, with AS3 incorrectly used instead of the intended origin being prepended three times. The visualization reveals AS3 as RPKI invalid with low visibility, indicating limited network acceptance. Operators can analyze these issues visually or in the table and monitor real-time corrections by refreshing the page.

Whether facing network outages, implementing new deployments, or investigating route leaks, users can leverage this feature for any scenario where a clear, global understanding of a prefix's routing paths is essential.

To allow easier access to this information, users can now search for any prefix using the Radar search bar and navigate to the corresponding prefix routing pages. Prefixes involved in BGP route leak and origin hijack events are also linked to this enhanced routing information page, helping operators debug BGP anomalies in real-time.

Cloudflare Routes API

Cloudflare Radar real-time route data is also accessible via the Radar API, and users can follow this guide to get started.

The following example shows an HTTP GET request to query all the current routes for a prefix of interest:

curl -X GET 
"https://api.cloudflare.com/client/v4/radar/bgp/routes/realtime?prefix=1.1.1.0/24" -H 
"Authorization: Bearer "

With the help of JSON data processing tools like jq, users can further filter data results by routes containing a certain ASN. In the following example, we make a request to ask for all current routes toward the prefix 1.1.1.0/24 and filter all routes with AS paths containing AS174:

curl -X GET
 "https://api.cloudflare.com/client/v4/radar/bgp/routes/realtime?prefix=1.1.1.0/24" \
    -H "Authorization: Bearer " | \
jq '.result.routes[]|select(.as_path | contains([174]))'

The command output is a JSON array of route objects. Each object details a route that includes AS174 in its AS path. Additional information provided for each route includes the BGP route collector, BGP community values, and the timestamp of the last update.

{
  "as_path": [
    3130,
    174,
    13335
  ],
  "collector": "route-views2",
  "communities": [
    "174:21001",
    "174:22013",
    "3130:394"
  ],
  "peer_asn": 3130,
  "prefix": "1.1.1.0/24",
  "timestamp": "2025-05-14T00:00:00Z"
}
{
  "as_path": [
    263237,
    174,
    13335
  ],
  "collector": "rrc15",
  "communities": [
    "174:21001",
    "174:22013",
    "65237:174"
  ],
  "peer_asn": 263237,
  "prefix": "1.1.1.0/24",
  "timestamp": "2025-05-14T01:39:52Z"
}

The API also offers supplementary metadata alongside BGP route information, including insights into BGP route collector status and aggregated prefix-to-origin data. Recalling the earlier example of an AS path prepending misconfiguration, the RPKI invalid AS3 origin is now directly visible to users and API clients in the JSON response, showing that just 9% of all collectors observed its announcements.

"meta": {
  "collectors": [
    {
      "latest_realtime_ts": "2025-05-19T21:35:40Z",
      "latest_rib_ts": "2025-05-19T20:00:00Z",
      "latest_updates_ts": "2025-05-19T21:15:00Z",
      "peers_count": 24,
      "peers_v4_count": 0,
      "peers_v6_count": 24,
      "collector": "route-views6"
    },
  ],
  "prefix_origins": [
    {
      "origin": 3,
      "prefix": "2804:4e28::/32",
      "rpki_validation": "invalid",
      "total_peers": 121,
      "total_visible": 11,
      "visibility": 0.09090909090909091
    },
    {
      "origin": 268243,
      "prefix": "2804:4e28::/32",
      "rpki_validation": "valid",
      "total_peers": 121,
      "total_visible": 94,
      "visibility": 0.7768595041322314
    }
  ],
}

From archives to real-time

Architecture overview

Cloudflare Radar uses RIPE RIS and the University of Oregon’s RouteViews as our primary BGP data sources for services like the routing statistics widget, anomaly detection, and announced address space graphs. We have previously discussed in detail on how we use the data archives from these two providers to build Cloudflare Radar’s routing pages, and our route leak and hijack detection systems.

In brief, RIPE RIS and RouteViews maintain several BGP route collectors, each connected to BGP routers across a diverse set of networks. These routers forward BGP messages to the collectors, which generate periodic data dumps for public access. These data dumps include both collections of BGP message updates and full routing table snapshots (RIB dump files).

For services monitoring stable routing information, like global routing statistics, we process RIB dump files from the archives as they become available. Conversely, for detecting dynamic events such as route leaks and hijacks, we process periodic BGP update files in batches. Services depending on this historical BGP data may experience processing delays of 10 to 30 minutes at the route collectors.

For the new real-time BGP routes feature, we aim to reduce the data delay from minutes or tens of minutes down to seconds. With the real-time stream capability provided by the BGP archiver services — RIS Live WebSocket from RIPE RIS and OpenBMP Kafka stream from RouteViews — we designed an additional real-time data stream component that enhances the routes snapshots built with MRT archive files by constantly updating the snapshots.

System design

At its core, the system enables a user to look up a prefix's routes stored in BGP routes snapshots. The BGP routes snapshots serve as a queryable data repository, organized hierarchically. The snapshots use a trie structure to allow for the retrieval of route information (such as AS paths and community values) associated with specific address prefixes. Each node in the hierarchy stores routing information from different peering routers, providing a consolidated global view. To handle the large data volumes from multiple BGP route collectors, the system partitions routing data into separate BGP routes snapshots, where each snapshot receives data streamed from its corresponding collector. This architecture enables horizontal scalability, allowing for dynamic adjustment of data sources by selecting which independent collectors' data to include or exclude.

Because the collectors’ BGP route information is maintained independently, to query for a global status, we apply the actor model software architecture for the implementation. Each collector is considered an actor that runs completely independently and communicates with a central controller via a dedicated communication channel. The central controller starts all actors by sending a signal to each of them, triggering actors to start collecting archival and real-time BGP data, on their separate threads.

Upon queries from users, the central controller will relay the query to all running actors via a query message. The actors will retrieve the corresponding route information on its prefix-trie and return the results to the controller with another message. The controller aggregates all messages from the actors and compiles them into a reply response to the user. During the whole process, the real-time BGP streaming and snapshots’ updating processes continue to run in the background without interruptions.

Our actor-model implementation enables a single node to efficiently store hundreds of full routing tables in its memory. Our current deployment uses eight route collectors, housing a total of 261 full routing tables. This in-memory system operating on a single node consumes approximately 45 GB of memory, which translates to about 170 MB per full routing table.

Summary

Cloudflare Radar now offers a real-time BGP route lookup service, providing near-instantaneous insights into global Internet routing. This feature leverages real-time data streams from RouteViews and RIPE RIS, moving beyond historical archives to deliver up-to-the-minute information. Users can now visualize routes in real time on Cloudflare Radar's prefix pages with intuitive Sankey diagrams that detail complete route information. Furthermore, the Cloudflare Radar API provides programmatic access to this data, allowing for seamless integration into custom tools and workflows.

Visit Cloudflare Radar for additional insights around Internet disruptions, routing issues, Internet traffic trends, attacks, and Internet quality. Follow us on social media at @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

Making progress on routing security: the new White House roadmap

Mike Conlow — Mon, 02 Sep 2024 23:00:00 GMT

The Internet can feel like magic. When you load a webpage in your browser, many simultaneous requests for data fly back and forth to remote servers. Then, often in less than one second, a website appears. Many people know that DNS is used to look up a hostname, and resolve it to an IP address, but fewer understand how data flows from your home network to the network that controls the IP address of the web server.

The Internet is an interconnected network of networks, operated by thousands of independent entities. To allow these networks to communicate with each other, in 1989, on the back of two napkins, three network engineers devised the Border Gateway Protocol (BGP). It allows these independent networks to signal directions for IP prefixes they own, or that are reachable through their network. At that time, Internet security wasn’t a big deal — SSL, initially developed to secure websites, wasn’t developed until 1995, six years later. So BGP wasn’t originally built with security in mind, but over time, security and availability concerns have emerged.

Today, the White House Office of the National Cyber Director issued the Roadmap to Enhancing Internet Routing Security, and we’re excited to highlight their recommendations. But before we get into that, let’s provide a quick refresher on what BGP is and why routing security is so important.

BGP: pathways through the Internet

BGP is the core signaling protocol used on the Internet. It’s fully distributed, and managed independently by all the individual operators of the Internet. With BGP, operators will send messages to their neighbors (other networks they are directly connected with, either physically or through an Internet Exchange) that indicate their network can be used to reach a specific IP prefix. These IP prefixes can be resources the network owns themselves, such as 104.16.128.0/20 for Cloudflare, or resources that are reachable through their network, by transiting the network.

By exchanging all of this information between peers, each individual network on the Internet can form a full map of what the Internet looks like, and ideally, how to reach each IP address on the Internet. This map is in an almost constant state of flux: networks disappear from the Internet for a wide variety of reasons, ranging from scheduled maintenance to catastrophic failures, like the Facebook incident in 2021. On top of this, the ideal path to take from point A (your home ISP) to point B (Cloudflare) can change drastically, depending on routing decisions made by your home ISP, and any or all intermediate networks between your home ISP and Cloudflare (here’s an example from 2019). These routing decisions are entirely arbitrary, and left to the owners of the networks. Performance and security can be considered, but neither of these have been historically made visible through BGP itself.

As all the networks can independently make their own routing decisions, there are a lot of individual points where things can go wrong. Going wrong can have multiple meanings here: this can range from routing loops, causing Internet traffic to go back and forth repeatedly between two networks, never reaching its destination, to more malicious problems, such as traffic interception or traffic manipulation.

As routing security wasn’t accounted for in that initial two-napkin draft, it is easy for a malicious actor on the Internet to pretend to either be an originating network (where they claim to own the IP prefix, positioning themselves as the destination network), or they can pretend to be a viable middle network, getting traffic to transit through their network.

In either of these examples, the actor can manipulate the Internet traffic of unsuspecting end users and potentially steal passwords, cryptocurrency, or any other data that can be of value. While transport security (TLS for HTTP/1.x and HTTP/2, QUIC for HTTP/3) has reduced this risk significantly, there’s still ways this can be bypassed. Over time, the Internet community has acknowledged the security concerns with BGP, and has built infrastructure to mitigate some of these problems.

BGP security: The RPKI is born

This journey is now coming to a final destination with the development and adoption of the Resource Public Key Infrastructure (RPKI). The RPKI is a PKI, just like the Web PKI which provides security certificates for the websites we browse (the “s” in https). The RPKI is a PKI specifically with the Internet in mind: it provides core constructs for IP addresses and Autonomous System Numbers (ASNs), the numbers used to identify these individual operating networks mentioned earlier.

Through the RPKI, it’s possible for an operator to establish a cryptographically secure relationship between the IP prefixes they originate, and their ASN, through the issuance of Route Origin Authorization records (ROAs). These ROAs can be used by all other networks on the Internet to validate that the IP prefix update they just received for a given origin network actually belongs to that origin network, a process called Route Origin Validation (ROV). If a malicious party tries to hijack an IP prefix that has a ROA to their (different) origin network, validating networks would know this update is invalid and reject it, maintaining the origin security and ensuring reachability.

Why does BGP security matter? Examples of route hijacks and leaks

But why should you care about BGP? And more importantly, why does the White House care about BGP? Put simply: BGP (in)security can cost people and companies millions of dollars and cause widespread disruptions for critical services.

In February 2022, Korean crypto platform KLAYswap was the target of a malicious BGP hijack, which was used to steal $1.9 million of cryptocurrency from their customers. The attackers were able to serve malicious code that mimicked the service KLAYswap was using for technical support. They were able to do this by announcing the IP prefix used to serve the JavaScript SDK KLAYswap was using. When other networks accepted this announcement, end user traffic loading the technical support page instead received malicious JavaScript, which was used to drain customer wallets. As the attackers hijacked the IP address, they were also able to register a TLS certificate for the domain name used to serve the SDK. As a result, nothing looked out of the ordinary for Klayswap’s customers until they noticed their wallets had been drained.

However, not all BGP problems are intentional hijacks. In March 2022, RTComm (AS8342), a Russian ISP, announced itself as the origin of 104.244.42.0/24, which is an IP prefix actually owned by Twitter (now X) (AS13414). In this case, all researchers have drawn a similar conclusion: RTComm wanted to block its users from accessing Twitter, but inadvertently advertised the route to its peers and upstream providers. Thankfully, the impact was limited, in large part due to Twitter issuing ROA records for their IP prefixes, which meant the hijack was blocked at all networks that had implemented ROV and were validating announcements.

Inadvertent incorrect advertisements passing from one network to another, or route leaks, can happen to anyone, even Cloudflare. Our 1.1.1.1 public DNS service — used by millions of consumers and businesses — is often the unintended victim. Consider this situation (versions of which have happened numerous times): a network engineer running a local ISP is testing a configuration on their router and announces to the Internet that you can reach the IP address 1.1.1.1 through their network. They will often pick this address because it’s easy to input on the router and observe in network analytics. They accidentally push that change out to all their peer networks — the networks they’re connected to — and now, if proper routing security isn’t in place, users on multiple networks around the Internet trying to reach 1.1.1.1 might be directed to this local ISP where there is no DNS service to be found. This can lead to widespread outages.

The types of routing security measures in the White House roadmap can prevent these issues. In the case of 1.1.1.1, Cloudflare has ROAs in place that tell the Internet that we originate the IP prefix that contains 1.1.1.1. If someone else on the Internet is advertising 1.1.1.1, that’s an invalid route, and other networks should stop accepting it. In the case of KLAYswap, had there been ROAs in place, other networks could have used common filtering techniques to filter out the routes pointing to the attackers malicious JavaScript. So now let’s talk more about the plan the White House has to improve routing security on the Internet, and how the US government developed its recommendations.

Work leading to the roadmap

The new routing security roadmap from the Office of the National Cyber Director (ONCD) is the product of years of work, throughout both government and industry. The National Institute of Standards and Technology (NIST) has been a longstanding proponent of improving routing security, developing test and measurement tools and publishing special publication 1800-14 on Protecting the Integrity of Internet Routing, among many other initiatives. They are active participants in the Internet community, and an important voice for routing security.

Cloudflare first started publicly advocating for adoption of security measures like RPKI after a massive BGP route leak took down a portion of the Internet, including websites using Cloudflare’s services, in 2019.

Since that time, the federal government has increasingly recognized the need to elevate efforts to secure Internet routing, a process that Cloudflare has helped support along the way. The Cyberspace Solarium Commission report, published in 2020, encouraged the government to develop a strategy and recommendations to define “common, implementable guidance for securing the DNS and BGP.”

In February 2022, the Federal Communication Commission launched a notice of inquiry to better understand Internet routing. Cloudflare responded with a detailed explanation of our history with RPKI and routing security. In July 2023, the FCC, jointly with the Director of the Cybersecurity and Infrastructure Security Agency, held a workshop for stakeholders, with Cloudflare as one of the presenters. In June 2024, the FCC issued a Notice of Proposed Rulemaking that would require large service providers to develop security risk management plans and report on routing security efforts, including RPKI adoption.

The White House has been involved as well. In March 2023, they cited the need to secure the technical foundation of the Internet, from issued such as BGP vulnerabilities, as one of the strategic objectives of the National Cybersecurity Strategy. Citing those efforts, in May 2024, the Department of Commerce issued ROAs signing some of its IP space, and this roadmap strongly encourages other departments and agencies to do the same. All of those efforts and the focus on routing security have resulted in increased adoption of routing security measures.

Report observations and recommendations

The report released by the White House Office of the National Cyber Director details the current state of BGP security, and the challenges associated with Resource Public Key Infrastructure (RPKI) Route Origin Authorization (ROA) issuance and RPKI Route Origin Validation (ROV) adoption. It also provides network operators and government agencies with next steps and recommendations for BGP security initiatives.

One of the first recommendations is for all networks to create and publish ROAs. It’s important that every network issues ROAs for their IP prefixes, as it’s the only way for other networks to validate they are the authorized originator of those prefixes. If one network is advertising an IP address as their own, but a different network issued the ROA, that’s an important sign that something might be wrong!

As shown in the chart below from NIST’s RPKI Monitor, as of September 2024, at least 53% of all the IPv4 prefixes on the Internet have a valid ROA record available (IPv6 reached this milestone in late 2023), up from only 6% in 2017. (The metric is even better when measured as a percent of Internet traffic: data from Kentik, a network observability company, shows that 70.3% of Internet traffic is exchanged with IP prefixes that have a valid ROA.) This increase in the number of signed IP prefixes (ROAs) is foundational to secure Internet routing.

Unfortunately, the US is lagging behind: Only 39% of IP prefixes originated by US networks have a valid ROA. This is not surprising, considering the US has significantly more Internet address resources than other parts of the world. However, the report highlights the need for the US to overcome the common barriers network operators face when implementing BGP security measures. Administrative challenges, the perception of risk, and prioritization and resourcing constraints are often cited as the problems networks face when attempting to move forward with ROV and RPKI.

A related area of the roadmap highlights the need for networks that allow their customers to control IP address space to still create ROAs for those addresses. The reality of how every ISP, government, and large business allocates its IP address space is undoubtedly messy, but that doesn’t reduce the importance of making sure that the correct entity is identified in the official records with a ROA.

A network signing routes for its IP addresses is an important step, but it isn’t enough. To prevent incorrect routes — malicious or not — from spreading around the Internet, networks need to implement Route Origin Validation (ROV) and implement other BGP best practices, outlined by MANRS in their Zen Guide to Routing Security Policy. If one network incorrectly announces itself as the origin for 1.1.1.1, that won’t have any effect beyond its own borders if no other networks pick up that invalid route. The Roadmap calls out filtering invalid routes as another action for network service providers.

As of 2022, our data showed that around 15 percent of networks were validating routes. Ongoing measurements from APNIC show progress: this year about 20 percent of APNIC probes globally correctly filter invalid routes with ROV. In the US, it’s 70 percent. Continued growth of ROV is a critical step towards achieving better BGP security.

Filtering out invalid routes is prominently highlighted in the report’s recommendations. While recognizing that there’s been dramatic improvement in filtering by the large transit networks, the first report recommendation is for network service providers — large and small — to fully deploy ROV.

In addition, the Roadmap proposes using the federal government’s considerable weight as a purchaser, writing, “[Office of Management and Budget] should require the Federal Government’s contracted service providers to adopt and deploy current commercially-viable Internet routing security technologies.” It goes on to say that grant programs, particularly broadband grants, “should require grant recipients to incorporate routing security measures into their projects.”

The roadmap doesn’t only cover well-established best practices, but also highlights emerging security technologies, such as Autonomous System Provider Authorization (ASPA) and BGPsec. ROAs only cover part of the BGP routing ecosystem, so additional work is needed to ensure we secure everything. It’s encouraging to see the work being done by the wider community to address these concerns is acknowledged, and more importantly, actively followed.

What’s next for the Internet community

The new roadmap is an important step in outlining actions that can be taken today to improve routing security. But as the roadmap itself recognizes, there’s more work to be done both in making sure that the steps are implemented, and that we continue to push routing security forward.

From an implementation standpoint, our hope is that the government’s focus on routing security through all the levers outlined in the roadmap will speed up ROA adoption, and encourage wider implementation of ROV and other best practices. At Cloudflare, we’ll continue to report on routing practices on Cloudflare Radar to help assess progress against the goals in the roadmap.

At a technical level, the wider Internet community has made massive strides in adopting RPKI ROV, and have set their sights on the next problem: we are securing the IP-to-originating network relationship, but what about the relationships between the individual networks?

Through the adoption of BGPsec and ASPA, network operators are able to not only validate the destination of a prefix, but also validate the path to get there. These two new technical additions within the RPKI will combine with ROV to ultimately provide a fully secure signaling protocol for the modern Internet. The community has actively undertaken this work, and we’re excited to see it progress!

Outside the RPKI, the community has also ratified the formalization of customer roles through RFC9234: Route Leak Prevention and Detection Using Roles in UPDATE and OPEN Messages. As this new BGP feature gains support, we’re hopeful that this will be another helpful tool in the operator toolbox in preventing route leaks of any kind.

How you can help keep the Internet secure

If you’re a network operator, you’ll need to sign your routes, and validate incoming prefixes. This consists of signing Route Origin Authorization (ROA) records, and performing Route Origin Validation (ROV). Route signing involves creating records with your local Regional Internet Registry (RIR) and signing to their PKI. Route validation involves only accepting routes that are signed with a ROA. This will help ensure that only secure routes get through. You can learn more about that here.

If you’re not a network operator, head to isbgpsafeyet.com, and test your ISP. If your ISP is not keeping BGP safe, be sure to let them know how important it is. If the government has pointed out prioritization is a consistent problem, let’s help increase the priority of routing security.

A secure Internet is an open Internet

As the report points out, one of the keys to keeping the Internet open is ensuring that users can feel safe accessing any site they need to without worrying about attacks that they can’t control. Cloudflare wholeheartedly supports the US government’s efforts to bolster routing security around the world and is eager to work to ensure that we can help create a safe, open Internet for every user.

The backbone behind Cloudflare’s Connectivity Cloud

Shozo Moritz Takaya — Tue, 06 Aug 2024 14:00:00 GMT

The modern use of "cloud" arguably traces its origins to the cloud icon, omnipresent in network diagrams for decades. A cloud was used to represent the vast and intricate infrastructure components required to deliver network or Internet services without going into depth about the underlying complexities. At Cloudflare, we embody this principle by providing critical infrastructure solutions in a user-friendly and easy-to-use way. Our logo, featuring the cloud symbol, reflects our commitment to simplifying the complexities of Internet infrastructure for all our users.

This blog post provides an update about our infrastructure, focusing on our global backbone in 2024, and highlights its benefits for our customers, our competitive edge in the market, and the impact on our mission of helping build a better Internet. Since the time of our last backbone-related blog post in 2021, we have increased our backbone capacity (Tbps) by more than 500%, unlocking new use cases, as well as reliability and performance benefits for all our customers.

A snapshot of Cloudflare’s infrastructure

As of July 2024, Cloudflare has data centers in 330 cities across more than 120 countries, each running Cloudflare equipment and services. The goal of delivering Cloudflare products and services everywhere remains consistent, although these data centers vary in the number of servers and amount of computational power.

These data centers are strategically positioned around the world to ensure our presence in all major regions and to help our customers comply with local regulations. It is a programmable smart network, where your traffic goes to the best data center possible to be processed. This programmability allows us to keep sensitive data regional, with our Data Localization Suite solutions, and within the constraints that our customers impose. Connecting these sites, exchanging data with customers, public clouds, partners, and the broader Internet, is the role of our network, which is managed by our infrastructure engineering and network strategy teams. This network forms the foundation that makes our products lightning fast, ensuring our global reliability, security for every customer request, and helping customers comply with data sovereignty requirements.

Traffic exchange methods

The Internet is an interconnection of different networks and separate autonomous systems that operate by exchanging data with each other. There are multiple ways to exchange data, but for simplicity, we'll focus on two key methods on how these networks communicate: Peering and IP Transit. To better understand the benefits of our global backbone, it helps to understand these basic connectivity solutions we use in our network.

Peering: The voluntary interconnection of administratively separate Internet networks that allows for traffic exchange between users of each network is known as “peering”. Cloudflare is one of the most peered networks globally. We have peering agreements with ISPs and other networks in 330 cities and across all major
Internet Exchanges (IX’s). Interested parties can register to peer with us anytime, or directly connect to our network with a link through a private network interconnect (PNI).
IP transit: A paid service that allows traffic to cross or "transit" somebody else's network, typically connecting a smaller Internet service provider (ISP) to the larger Internet. Think of it as paying a toll to access a private highway with your car.

The backbone is a dedicated high-capacity optical fiber network that moves traffic between Cloudflare’s global data centers, where we interconnect with other networks using these above-mentioned traffic exchange methods. It enables data transfers that are more reliable than over the public Internet. For the connectivity within a city and long distance connections we manage our own dark fiber or lease wavelengths using Dense Wavelength Division Multiplexing (DWDM). DWDM is a fiber optic technology that enhances network capacity by transmitting multiple data streams simultaneously on different wavelengths of light within the same fiber. It’s like having a highway with multiple lanes, so that more cars can drive on the same highway. We buy and lease these services from our global carrier partners all around the world.

Backbone operations and benefits

Operating a global backbone is challenging, which is why many competitors don’t do it. We take this challenge for two key reasons: traffic routing control and cost-effectiveness.

With IP transit, we rely on our transit partners to carry traffic from Cloudflare to the ultimate destination network, introducing unnecessary third-party reliance. In contrast, our backbone gives us full control over routing of both internal and external traffic, allowing us to manage it more effectively. This control is crucial because it lets us optimize traffic routes, usually resulting in the lowest latency paths, as previously mentioned. Furthermore, the cost of serving large traffic volumes through the backbone is, on average, more cost-effective than IP transit. This is why we are doubling down on backbone capacity in regions such as Frankfurt, London, Amsterdam, and Paris and Marseille, where we see continuous traffic growth and where connectivity solutions are widely available and competitively priced.

Our backbone serves both internal and external traffic. Internal traffic includes customer traffic using our security or performance products and traffic from Cloudflare's internal systems that shift data between our data centers. Tiered caching, for example, optimizes our caching delivery by dividing our data centers into a hierarchy of lower tiers and upper tiers. If lower-tier data centers don’t have the content, they request it from the upper tiers. If the upper tiers don’t have it either, they then request it from the origin server. This process reduces origin server requests and improves cache efficiency. Using our backbone to transport the cached content between lower and upper-tier data centers and the origin is often the most cost-effective method, considering the scale of our network. Magic Transit is another example where we attract traffic, by means of BGP anycast, to the Cloudflare data center closest to the end user and implement our DDoS solution. Our backbone transports the clean traffic to our customer’s data center, which they connect through a Cloudflare Network Interconnect (CNI).

External traffic that we carry on our backbone can be traffic from other origin providers like AWS, Oracle, Alibaba, Google Cloud Platform, or Azure, to name a few. The origin responses from these cloud providers are transported through peering points and our backbone to the Cloudflare data center closest to our customer. By leveraging our backbone we have more control over how we backhaul this traffic throughout our network, which results in more reliability and better performance and less dependency on the public Internet.

This interconnection between public clouds, offices, and the Internet with a controlled layer of performance, security, programmability, and visibility running on our global backbone is our Connectivity Cloud.

_{This map is a simplification of our current backbone network and does not show all paths}

Expanding our network

As mentioned in the introduction, we have increased our backbone capacity (Tbps) by more than 500% since 2021. With the addition of sub-sea cable capacity to Africa, we achieved a big milestone in 2023 by completing our global backbone ring. It now reaches six continents through terrestrial fiber and subsea cables.

Building out our backbone within regions where Internet infrastructure is less developed compared to markets like Central Europe or the US has been a key strategy for our latest network expansions. We have a shared goal with regional ISP partners to keep our data flow localized and as close as possible to the end user. Traffic often takes inefficient routes outside the region due to the lack of sufficient local peering and regional infrastructure. This phenomenon, known as traffic tromboning, occurs when data is routed through more cost-effective international routes and existing peering agreements.

Our regional backbone investments in countries like India or Turkey aim to reduce the need for such inefficient routing. With our own in-region backbone, traffic can be directly routed between in-country Cloudflare data centers, such as from Mumbai to New Delhi to Chennai, reducing latency, increasing reliability, and helping us to provide the same level of service quality as in more developed markets. We can control that data stays local, supporting our Data Localization Suite (DLS), which helps businesses comply with regional data privacy laws by controlling where their data is stored and processed.

Improved latency and performance

This strategic expansion has not only extended our global reach but has also significantly improved our overall latency. One illustration of this is that since the deployment of our backbone between Lisbon and Johannesburg, we have seen a major performance improvement for users in Johannesburg. Customers benefiting from this improved latency can be, for example, a financial institution running their APIs through us for real-time trading, where milliseconds can impact trades, or our Magic WAN users, where we facilitate site-to-site connectivity between their branch offices.

The table above shows an example where we measured the round-trip time (RTT) for an uncached origin fetch, from an end-user in Johannesburg to various origin locations, comparing our backbone and the public Internet. By carrying the origin request over our backbone, as opposed to IP transit or peering, local users in Johannesburg get their content up to 22% faster. By using our own backbone to long-haul the traffic to its final destination, we are in complete control of the path and performance. This improvement in latency varies by location, but consistently demonstrates the superiority of our backbone infrastructure in delivering high performance connectivity.

Traffic control

Consider a navigation system using 1) GPS to identify the route and 2) a highway toll pass that is valid until your final destination and allows you to drive straight through toll stations without stopping. Our backbone works quite similarly.

Our global backbone is built upon two key pillars. The first is BGP (Border Gateway Protocol), the routing protocol for the Internet, and the second is Segment Routing MPLS (Multiprotocol label switching), a technique for steering traffic across predefined forwarding paths in an IP network. By default, Segment Routing provides end-to-end encapsulation from ingress to egress routers where the intermediate nodes execute no route lookup. Instead, they forward traffic across an end-to-end virtual circuit, or tunnel, called a label-switched path. Once traffic is put on a label-switched path, it cannot detour onto the public Internet and must continue on the predetermined route across Cloudflare’s backbone. This is nothing new, as many networks will even run a “BGP Free Core” where all the route intelligence is carried at the edge of the network, and intermediate nodes only participate in forwarding from ingress to egress.

While leveraging Segment Routing Traffic Engineering (SR-TE) in our backbone, we can automatically select paths between our data centers that are optimized for latency and performance. Sometimes the “shortest path” in terms of routing protocol cost is not the lowest latency or highest performance path.

Supercharged: Argo and the global backbone

Argo Smart Routing is a service that uses Cloudflare’s portfolio of backbone, transit, and peering connectivity to find the most optimal path between the data center where a user’s request lands and your back-end origin server. Argo may forward a request from one Cloudflare data center to another on the way to an origin if the performance would improve by doing so. Orpheus is the counterpart to Argo, and routes around degraded paths for all customer origin requests free of charge. Orpheus is able to analyze network conditions in real-time and actively avoid reachability failures. Customers with Argo enabled get optimal performance for requests from Cloudflare data centers to their origins, while Orpheus provides error self-healing for all customers universally. By mixing our global backbone using Segment Routing as an underlay with Argo Smart Routing and Orpheus as our connectivity overlay, we are able to transport critical customer traffic along the most optimized paths that we have available.

So how exactly does our global backbone fit together with Argo Smart Routing? Argo Transit Selection is an extension of Argo Smart Routing where the lowest latency path between Cloudflare data center hops is explicitly selected and used to forward customer origin requests. The lowest latency path will often be our global backbone, as it is a more dedicated and private means of connectivity, as opposed to third-party transit networks.

Consider a multinational Dutch pharmaceutical company that relies on Cloudflare's network and services with our SASE solution to connect their global offices, research centers, and remote employees. Their Asian branch offices depend on Cloudflare's security solutions and network to provide secure access to important data from their central data centers back to their offices in Asia. In case of a cable cut between regions, our network would automatically look for the best alternative route between them so that business impact is limited.

Argo measures every potential combination of the different provider paths, including our own backbone, as an option for reaching origins with smart routing. Because of our vast interconnection with so many networks, and our global private backbone, Argo is able to identify the most performant network path for requests. The backbone is consistently one of the lowest latency paths for Argo to choose from.

In addition to high performance, we care greatly about network reliability for our customers. This means we need to be as resilient as possible from fiber cuts and third-party transit provider issues. During a disruption of the AAE-1 (Asia Africa Europe-1) submarine cable, this is what Argo saw between Singapore and Amsterdam across some of our transit provider paths vs. the backbone.

The large (purple line) spike shows a latency increase on one of our third-party IP transit provider paths due to congestion, which was eventually resolved following likely traffic engineering within the provider’s network. We saw a smaller latency increase (yellow line) over other transit networks, but still one that is noticeable. The bottom (green) line on the graph is our backbone, where round-trip time more or less remains flat throughout the event, due to our diverse backbone connectivity between Asia and Europe. Throughout the fiber cut, we remained stable at around 200ms between Amsterdam and Singapore. There was no noticeable network hiccup as was seen on the transit provider paths, so Argo actively leveraged the backbone for optimal performance.

Call to action

As Argo improves performance in our network, Cloudflare Network Interconnects (CNIs) optimize getting onto it. We encourage our Enterprise customers to use our free CNI’s as on-ramps onto our network whenever practical. In this way, you can fully leverage our network, including our robust backbone, and increase overall performance for every product within your Cloudflare Connectivity Cloud. In the end, our global network is our main product and our backbone plays a critical role in it. This way we continue to help build a better Internet, by improving our services for everybody, everywhere.

If you want to be part of our mission, join us as a Cloudflare network on-ramp partner to offer secure and reliable connectivity to your customers by integrating directly with us. Learn more about our on-ramp partnerships and how they can benefit your business here.

Exam-ining recent Internet shutdowns in Syria, Iraq, and Algeria

David Belson — Fri, 21 Jun 2024 13:00:02 GMT

The practice of cheating on exams (or at least attempting to) is presumably as old as the concept of exams itself, especially when the results of the exam can have significant consequences for one’s academic future or career. As access to the Internet became more ubiquitous with the growth of mobile connectivity, and communication easier with an assortment of social media and messaging apps, a new avenue for cheating on exams emerged, potentially facilitating the sharing of test materials or answers. Over the last decade, some governments have reacted to this perceived risk by taking aggressive action to prevent cheating, ranging from targeted DNS-based blocking/filtering to multi-hour nationwide shutdowns across multi-week exam periods.

Syria and Iraq are well-known practitioners of the latter approach, and we have covered past exam-related Internet shutdowns in Syria (2021, 2022, 2023) and Iraq (2022, 2023) here on the Cloudflare blog. It is now mid-June 2024, and exams in both countries took place over the last several weeks, and with those exams, regular nationwide Internet shutdowns. In addition, Baccalaureate exams also took place in Algeria, and we have written about related Internet disruptions there in the past (2022, 2023). However, in contrast to the single daily shutdowns in Syria and Iraq, the Algerian government opted instead for two multi-hour disruptions each day – one in the morning, one in the afternoon – and appears to be pursuing a content blocking strategy, rather than a full nationwide shutdown.

As we have done in past year’s posts, we will examine the impact that these shutdowns have on Internet traffic, but also analyze routing information and traffic from other Cloudflare services in an effort to better understand how these shutdowns are being implemented.

Syria

The Syrian Telecom Company, to their credit, publishes an exam schedule on social media, with the image below published to their Facebook page. The English version was created by applying Google Translate to the image. The schedule shows the date & time of each Internet shutdown (“disconnection”), in addition to the subject(s) of that day’s exam(s). In 2024, exams started on May 26, and went through June 13.

In Syria, AS29256 (Syrian Telecom) is effectively the Internet, as shown in the table below. While there are a few other autonomous systems (ASNs/ASes) registered in Syria, there are only two that currently announce IP address space to the public Internet. As such, the trends seen at a country level for Syria reflect those seen for AS29256, and this is clearly evident in the traffic graphs below.

Nationwide Internet shutdowns in Syria began on May 26, taking place for varying multi-hour periods from Sunday to Thursday for three consecutive weeks. The graphs below show Internet traffic from the country, as well as AS29256, dropping to zero during the scheduled shutdowns.

In addition, graphs from the Cloudflare Radar Routing pages for Syria and AS29256 show the number of IPv4 and IPv6 prefixes being announced country-wide and by AS29256 dropping to at or near zero during each shutdown. This ultimately means that there is no Internet path back to systems (IP addresses) connected to Syrian Telecom. Below, we explore why this is important and problematic.

As has been observed in the past, the shutdowns in Syria are asymmetrical. That is, traffic can exit the country (via AS29256), but there are no paths for responses to return. The impact of this approach is clearly evident in traffic to Cloudflare’s 1.1.1.1 DNS Resolver. We continue to see traffic to the resolver when the shutdowns take place, and in fact, we see the traffic spike during the shutdowns, as the graph below shows.

If we dig into traffic to 1.1.1.1 by protocol, we can see that it is driven by requests over UDP port 53, the standard port used for DNS requests over UDP and TCP. (Given the request pattern, that also appears to be the primary way that we see traffic to the resolver from Syria.)

If we remove the UDP line from the graph, we see that request volume for DNS over TCP port 53, as well as DNS over HTTPS (DoH) and DNS over TLS (DoT), all drops to zero during the shutdowns.

Similarly, we can clearly see the shutdowns in HTTP(S) request-based traffic graphs as well, since HTTP(S) is also a TCP-based protocol.

Why do we see this impact? With DNS over UDP, the client simply makes a request to the resolver – no multi-step handshake is involved, as with TCP. So in this case, 1.1.1.1 is receiving these requests, but as shown above, there’s no path for the response to reach the client. Because it hasn’t received a response, the client retries the request, and this flood of retries is manifested as the spike seen in the graphs above.

However, as we see above, request volume for DNS over TCP, as well as DoH, DoT, and HTTP(S) (which all use TCP), falls to zero during the shutdowns. The lack of a path back to the client means that the TCP 3-way handshake can’t complete, and thus we don’t see DNS requests over these protocols.

In looking at 1.1.1.1 Resolver request volume from Syria for popular social media and messaging applications, we can see traffic for facebook.com most closely matches the spikes shown above. Removing facebook.com from the graph, we can also see similar, though more limited, increases for domains used by popular messaging applications WhatsApp, Signal, and Telegram. Facebook and WhatsApp are reportedly the most popular social media and messaging applications in Syria.

Although we have focused on the analysis of traffic to Cloudflare’s DNS resolver, and the patterns seen within that traffic, it is also worth highlighting an interesting pattern observed in traffic to Cloudflare’s Authoritative DNS platform. (DNS resolvers act as a middleman between clients, such as a laptop or phone, and an authoritative DNS server. Authoritative DNS servers contain information specific to the domain names they serve, including IP addresses and other types of records.)

The graph below shows bits/second traffic from Syria for Cloudflare’s authoritative DNS service on June 13. (Similar patterns were observed during the other days when shutdowns occurred, but data volume limits the ability to create a graph showing an extended period of time.) In this graph, we can see that at the start of the shutdown (03:00 UTC), traffic rises sharply, effectively plateaus for the duration of the shutdown, and then returns to normal levels. We believe that the traffic pattern illustrated here could be the result of some local resolvers in Syria having the IP addresses for our authoritative DNS servers cached, and are making requests to them. The increased traffic level could be because they are retrying their queries after not receiving responses, but in a less aggressive fashion than the client applications driving the resolver traffic spikes shown above.

In summary, Syria appears to be implementing their Internet shutdowns not through filtering, but rather by simply not announcing their IP address space for the duration of the shutdown, thereby preventing any responses from returning to the originating requestor, whether client application, web browser, or local DNS resolver.

Iraq

On May 19, the Iraqi Ministry of Communication posted an update that stated (translated) “The Ministry of Communications would like to note that the Internet service will be cut off for two hours during the general exams for intermediate studies, from six in the morning until eight in the morning, based on higher directives and at the request of the Ministry of Education.” The post came nearly a year after the Iraqi Ministry of Communication refused a request from the Ministry of Education to shut down the Internet during the baccalaureate exams as part of efforts to prevent cheating. On May 20, the Iraqi Ministry of Education posted the schedule for the upcoming set of exams to its Facebook page.

Iraq has a much richer network service provider environment than Syria does, with over 150 autonomous systems (ASNs) registered in the country and announcing IP address space, compared to just two ASNs (both Syrian Telecom) in Syria announcing IP address space. Although traffic in Iraq is generally concentrated among the larger providers, shutdowns are rarely “complete” at a country level because not every autonomous system (network provider) in the country implements a shutdown. (This is due in part to the autonomous Kurdistan region in the north, which often implements similar shutdowns on their own schedule. Network providers in this region are included in Iraq’s country-level graphs.)

We can see this in a Cloudflare Radar traffic graph that shows the shutdowns at a country level, where traffic is dropping by around 87% during each multi-hour shutdown. In addition to the five networks also shown here (AS203214 (HulumTele), AS199739 (Earthlink), AS58322 (Halasat), AS51684 (Asiacell), and AS59588 (Zainas)), further analysis finds more than 30 where we observed a complete loss of traffic during the shutdowns, with a number of them downstream of these providers.

In contrast to Syria, the changes to announced IP address space during the shutdowns are much less severe in Iraq. Several of the shutdowns are correlated with a drop of ~20-25% in announced IPv4 address space, while a few others saw a drop closer to just 2%.

At an ASN level, the changes in announced address space were mixed – AS59588 (Zainas), AS199739 (Earthlink), and AS51684 (Asiacell) experienced a significant loss, while AS203214 (HulumTele) and AS58322 (Halasat) experienced little to no change.

Similar to Syria, we can also look at 1.1.1.1 resolver traffic data to better understand how the shutdowns are being implemented. The country-level graphs below suggest that UDP traffic patterns are not visibly changing, suggesting that responses from the resolver are, in fact, getting back to the clients. However, this likely isn’t the case, and such a conclusion is at least in part an artifact of the graph’s time frame and hourly granularity, as well as the inclusion of resolver traffic from Kurdish network providers (ASNs). The shutdowns are more clearly evident in the DNS-over-TCP and DNS-over-HTTPS graphs below, as well as in the graph for HTTP(S) request traffic (both mobile & desktop), which is also TCP-based. In these graphs, the troughs on days that shutdowns occurred generally dip lower than those on the days that the Internet remained available.

In looking at authoritative DNS traffic from Iraq during a shutdown (for June 13 as an example day, as above), we see evidence of a decline in traffic during the time the shutdown occurs.

The decline in authoritative DNS traffic is more evident at an ASN level, such as in the graph below for AS203214 (Hulum), effectively confirming that UDP traffic is not getting through here either.

Considering the traffic, 1.1.1.1 Resolver, and authoritative DNS observations reviewed here, it suggests that the Internet shutdowns taking place in Iraq are more complex than Syria’s, as it appears that both UDP and TCP traffic are unable to egress from impacted network providers. As not all impacted network providers are showing a complete loss of announced IP address space during the shutdowns, Iraq is taking a different approach to disrupting Internet connectivity. Although analysis of our data doesn’t provide a definitive conclusion, there are several likely options, and network providers in the country may be combining several. These options revolve around:

IP: Block packets from reaching IP addresses. This may be done by withdrawing prefix announcements from the routing table (a brute force approach) or by blocking access to specific IP addresses, such as those associated with a specific application or service (a more surgical approach).
Connection: Block connections based on SNI/HTTP headers, or other application data. If a network or on-path device is able to observe the server name (or other relevant headers/data), then the connection can be terminated.
DNS: Operators of private or ‘internal’ DNS resolvers, offered by ISPs and enterprise environments for use by their own users, can apply content restrictions, blocking the resolution of hostnames associated with websites and other applications.

The consequences of these options are covered in more detail in a blog post. In addition, applying them at common network chokepoints, such as AS212330 (IRAQIXP) or AS208293 (AlSalam State Company, associated with the Iraqi Ministry of Communications), can disrupt connectivity at multiple downstream ISPs, without those providers necessarily having to take action themselves.

Algeria

As we noted in blog posts in 2022 and 2023, Algeria has a history of disrupting Internet connectivity during Baccalaureate exams. This has been taking place since 2018, following widespread cheating in 2016 that saw questions leaked online both before and during tests. On March 13, the Algerian Ministry of Education announced that the Baccalaureate exams would be held June 9-13. As expected, Internet disruptions were observed both country-wide and at a network level. Similar to previous years, two disruptions were observed each day. The first one began at 08:00 local time (07:00 UTC), and except for June 9, lasted three hours, ending at 11:00 local time (10:00 UTC). (On June 9, it lasted until 13:00 local time (12:00 UTC).) The second one began between 14:00-14:30 local time (13:00-13:30 UTC), and lasted until 16:00-17:00 local time (15:00-16:00 UTC) – the end time varied by day.

As seen in the graphs below, the impact to traffic was fairly nominal, suggesting that wide scale Internet shutdowns similar to those seen in Syria were not being implemented. While this is in line with 2023’s pronouncement by the Minister of Education that there would be no Internet shutdown on exam days, a number of posts on X complained of broader cuts to Internet connectivity.

Similar to the analysis above of the shutdowns in Syria and Iraq, we can also review changes to announced IP address space to better understand how connectivity was being disrupted. In this case, as the graphs below show, no meaningful changes to announced IPv4 address space were observed during the days the Baccalaureate exams were given. As such, the observed drops in traffic were not caused by routing changes.

In the HTTP(S) request traffic graph below, the twice-daily disruptions are highlighted, with the morning one appearing as a nominal drop in traffic, and the afternoon one causing a more severe decline. (The graph shows request traffic aggregated at a country level, but the graphs for the ASNs listed above also show similar patterns.)

In addition, similar patterns are observed in 1.1.1.1 resolver traffic at a country and ASN level, but only for DNS over TCP, DNS over TLS, and DNS over HTTPS, all of which leverage TCP. In the graph below showing only resolver traffic over UDP, there’s no clear evidence of disruptions. However, in the graph that shows resolver traffic over HTTPS, TCP, and TLS, a slight perturbation is visible in the morning, as traffic begins to rise for the day, and a sharper decrease is visible in the afternoon, with both disruptions aligning with the twice daily drops in traffic discussed above.

These observations support the conjecture that the Algerian government is likely taking a more nuanced approach to restricting access to content, interfering in some fashion with TCP-based traffic. The conjecture is also supported by an internal tool that helps to understand connection tampering that is based on research co-designed and developed by members of the Cloudflare Research team. We will be launching insights into TCP connection tampering on Cloudflare Radar later in 2024 and, in the meantime, technical details can be found in the peer-reviewed paper titled Global, Passive Detection of Connection Tampering.

The graph below, taken from the internal tool, highlights observed TCP connection tampering in connections from Algeria during the week that the Baccalaureate exams took place. While some baseline level of post-ACK and post-PSH tampering is consistently visible, we see significant increases in post-ACK twice a day during the exam period, at the times that align with the shifts in traffic discussed above. Technical descriptions of post-ACK and post-PSH tampering can be found in the Cloudflare Radar glossary, but in short, tampering post-ACK means an established TCP connection to Cloudflare’s server has been abruptly ended by one or more RST packets before the server sees data packets. Although clients do use RSTs, clients are more likely to close connections with a FIN (as specified by the RFC). The RST method can also be used by middleboxes that (i) sees the data packet, then (ii) drops the data packet, then (iii) sends an RST to the server to force the server to close the connection (and very likely another RST to the client too for the same reason). Tampering post-PSH means that something on the path, like a middlebox, (i) saw something it didn't like on an established connection, then (ii) permitted the data to pass but then, (iii) it sends the RST to force endpoints to close the connection.

Looking beyond Cloudflare-sourced data, aggregated test results from the Open Observatory of Network Interference (OONI) also show evidence of anomalous behavior. Using OONI Probe, a mobile and desktop app, can probe for potential blocking of websites, instant messaging apps, and censorship circumvention tools. Examining test results from users in Algeria for popular messaging platforms WhatsApp, Telegram, Signal, and Facebook Messenger for the first two weeks of June, we clearly see the appearance of test results marked as “Anomaly” starting on June 9. (OONI defines “Anomaly” results as “Measurements that provided signs of potential blocking”.) OONI Tor test results also show a similar “Anomaly” pattern. Anomalous traffic patterns are also visible for Google Web Search, YouTube, and GMail.

Although the analysis of these observations and data sets doesn’t provide us with specific details around exactly how the observed Internet disruptions are being implemented, it strongly supports the supposition that network providers in Algeria are, in some fashion, interfering with TCP connections, but not blocking them outright nor shutting down their networks completely. Given that popular messaging platforms, Google properties, Cloudflare’s 1.1.1.1 DNS resolver, and some number of Cloudflare customer sites all appear to be impacted, it suggests that a list of hostnames are being targeted for disruption/interference, either by the SNI or the destination IP address.

Conclusion

Perhaps recognizing the broad negative impact that brute-force nationwide Internet shutdowns have as a response to cheating on exams, some governments appear to be turning to more nuanced techniques, such as content blocking or connection tampering. However, because these are widely applied as well, they are arguably just as disruptive as a full nationwide Internet shutdown. The cause of full shutdowns, such as those seen in Syria, are arguably easier to diagnose than the disruptions to connectivity seen in Iraq and Algeria, which appear to use approaches that are hard to specifically identify from the outside.

Visit Cloudflare Radar for additional insights around these, and other, Internet disruptions. Follow us on social media at @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

Traffic anomalies and notifications with Cloudflare Radar

David Belson — Tue, 26 Sep 2023 13:00:37 GMT

We launched the Cloudflare Radar Outage Center (CROC) during Birthday Week 2022 as a way of keeping the community up to date on Internet disruptions, including outages and shutdowns, visible in Cloudflare’s traffic data. While some of the entries have their genesis in information from social media posts made by local telecommunications providers or civil society organizations, others are based on an internal traffic anomaly detection and alerting tool. Today, we’re adding this alerting feed to Cloudflare Radar, showing country and network-level traffic anomalies on the CROC as they are detected, as well as making the feed available via API.

Building on this new functionality, as well as the route leaks and route hijacks insights that we recently launched on Cloudflare Radar, we are also launching new Radar notification functionality, enabling you to subscribe to notifications about traffic anomalies, confirmed Internet outages, route leaks, or route hijacks. Using the Cloudflare dashboard’s existing notification functionality, users can set up notifications for one or more countries or autonomous systems, and receive notifications when a relevant event occurs. Notifications may be sent via e-mail or webhooks — the available delivery methods vary according to plan level.

Traffic anomalies

Internet traffic generally follows a fairly regular pattern, with daily peaks and troughs at roughly the same volumes of traffic. However, while weekend traffic patterns may look similar to weekday ones, their traffic volumes are generally different. Similarly, holidays or national events can also cause traffic patterns and volumes to differ significantly from “normal”, as people shift their activities and spend more time offline, or as people turn to online sources for information about, or coverage of, the event. These traffic shifts can be newsworthy, and we have covered some of them in past Cloudflare blog posts (King Charles III coronation, Easter/Passover/Ramadan, Brazilian presidential elections).

However, as you also know from reading our blog posts and following Cloudflare Radar on social media, it is the more drastic drops in traffic that are a cause for concern. Some are the result of infrastructure damage from severe weather or a natural disaster like an earthquake and are effectively unavoidable, but getting timely insights into the impact of these events on Internet connectivity is valuable from a communications perspective. Other traffic drops have occurred when an authoritarian government orders mobile Internet connectivity to be shut down, or shuts down all Internet connectivity nationwide. Timely insights into these types of anomalous traffic drops are often critical from a human rights perspective, as Internet shutdowns are often used as a means of controlling communication with the outside world.

Over the last several months, the Cloudflare Radar team has been using an internal tool to identify traffic anomalies and post alerts for followup to a dedicated chat space. The companion blog post Gone Offline: Detecting Internet Outages goes into deeper technical detail about our traffic analysis and anomaly detection methodologies that power this internal tool.

Many of these internal traffic anomaly alerts ultimately result in Outage Center entries and Cloudflare Radar social media posts. Today, we’re extending the Cloudflare Radar Outage Center and publishing information about these anomalies as we identify them. As shown in the figure below, the new Traffic anomalies table includes the type of anomaly (location or ASN), the entity where the anomaly was detected (country/region name or autonomous system), the start time, duration, verification status, and an “Actions” link, where the user can view the anomaly on the relevant entity traffic page or subscribe to a notification. (If manual review of a detected anomaly finds that it is present in multiple Cloudflare traffic datasets and/or is visible in third-party datasets, such as Georgia Tech’s IODA platform, we will mark it as verified. Unverified anomalies may be false positives, or related to Netflows collection issues, though we endeavor to minimize both.)

In addition to this new table, we have updated the Cloudflare Radar Outage Center map to highlight where we have detected anomalies, as well as placing them into a broader temporal context in a new timeline immediately below the map. Anomalies are represented as orange circles on the map, and can be hidden with the toggle in the upper right corner. Double-bordered circles represent an aggregation across multiple countries, and zooming in to that area will ultimately show the number of anomalies associated with each country that was included in the aggregation. Hovering over a specific dot in the timeline displays information about the outage or anomaly with which it is associated.

Internet outage information has been available via the Radar API since we launched the Outage Center and API in September 2022, and traffic anomalies are now available through a Radar API endpoint as well. An example traffic anomaly API request and response are shown below.

Example request:

curl --request GET \ --url https://api.cloudflare.com/client/v4/radar/traffic_anomalies \ --header 'Content-Type: application/json' \ --header 'X-Auth-Email: '

Example response:

{
  "result": {
    "trafficAnomalies": [
      {
        "asnDetails": {
          "asn": "189",
          "locations": {
            "code": "US",
            "name": "United States"
          },
          "name": "LUMEN-LEGACY-L3-PARTITION"
        },
        "endDate": "2023-08-03T23:15:00Z",
        "locationDetails": {
          "code": "US",
          "name": "United States"
        },
        "startDate": "2023-08-02T23:15:00Z",
        "status": "UNVERIFIED",
        "type": "LOCATION",
        "uuid": "55a57f33-8bc0-4984-b4df-fdaff72df39d",
        "visibleInDataSources": [
          "string"
        ]
      }
    ]
  },
  "success": true
}

Notifications overview

Timely knowledge about Internet “events”, such as drops in traffic or routing issues, are potentially of interest to multiple audiences. Customer service or help desk agents can use the information to help diagnose customer/user complaints about application performance or availability. Similarly, network administrators can use the information to better understand the state of the Internet outside their network. And civil society organizations can use the information to inform action plans aimed at maintaining communications and protecting human rights in areas of conflict or instability. With the new notifications functionality also being launched today, you can subscribe to be notified about observed traffic anomalies, confirmed Internet outages, route leaks, or route hijacks, at a country or autonomous system level. In the following sections, we discuss how to subscribe to and configure notifications, as well as the information contained within the various types of notifications.

Subscribing to notifications

Note that you need to log in to the Cloudflare dashboard to subscribe to and configure notifications. No purchase of Cloudflare services is necessary — just a verified email address is required to set up an account. While we would have preferred to not require a login, it enables us to take advantage of Cloudflare’s existing notifications engine, allowing us to avoid having to dedicate time and resources to building a separate one just for Radar. If you don’t already have a Cloudflare account, visit https://dash.cloudflare.com/sign-up to create one. Enter your username and a unique strong password, click “Sign Up”, and follow the instructions in the verification email to activate your account. (Once you’ve activated your account, we also suggest activating two-factor authentication (2FA) as an additional security measure.)

Once you have set up and activated your account, you are ready to start creating and configuring notifications. The first step is to look for the Notifications (bullhorn) icon – the presence of this icon means that notifications are available for that metric — in the Traffic, Routing, and Outage Center sections on Cloudflare Radar. If you are on a country or ASN-scoped traffic or routing page, the notification subscription will be scoped to that entity.

Look for this icon in the Traffic, Routing, and Outage Center sections of Cloudflare Radar to start setting up notifications.

In the Outage Center, click the icon in the “Actions” column of an Internet outages table entry to subscribe to notifications for the related location and/or ASN(s). Click the icon alongside the table description to subscribe to notifications for all confirmed Internet outages.

In the Outage Center, click the icon in the “Actions” column of a Traffic anomalies table entry to subscribe to notifications for the related entity. Click the icon alongside the table description to subscribe to notifications for all traffic anomalies.

On country or ASN traffic pages, click the icon alongside the description of the traffic trends graph to subscribe to notifications for traffic anomalies or Internet outages impacting the selected country or ASN.

On country or ASN routing pages, click the icon alongside the description to subscribe to notifications for route leaks or origin hijacks related to the selected country or ASN.

Within the Route Leaks or Origin Hijacks tables on the routing pages, click the icon in a table entry to subscribe to notifications for route leaks or origin hijacks for referenced countries and/or ASNs.

After clicking a notification icon, you’ll be taken to the Cloudflare login screen. Enter your username and password (and 2FA code if required), and once logged in, you’ll see the Add Notification page, pre-filled with the key information passed through from the referring page on Radar, including relevant locations and/or ASNs. (If you are already logged in to Cloudflare, then you’ll be taken directly to the Add Notification page after clicking a notification icon on Radar.) On this page, you can name the notification, add an optional description, and adjust the location and ASN filters as necessary. Enter an email address for notifications to be sent to, or select an established webhook destination (if you have webhooks enabled on your account).

Click “Save”, and the notification is added to the Notifications Overview page for the account.

You can also create and configure notifications directly within Cloudflare, without starting from a link on a Radar page. To do so, log in to Cloudflare, and choose “Notifications” from the left side navigation bar. That will take you to the Notifications page shown below. Click the “Add” button to add a new notification.

On the next page, search for and select “Radar” from the list of Cloudflare products for which notifications are available.

On the subsequent “Add Notification” page, you can create and configure a notification from scratch. Event types can be selected in the “Notify me for:” field, and both locations and ASNs can be searched for and selected within the respective “Filtered by (optional)” fields. Note that if no filters are selected, then notifications will be sent for all events of the selected type(s). Add one or more emails to send notifications to, or select a webhook target if available, and click “Save” to add it to the list of notifications configured for your account.

It is worth mentioning that advanced users can also create and configure notifications through the Cloudflare API Notification policies endpoint, but we will not review that process within this blog post.

Notification messages

Example notification email messages are shown below for the various types of events. Each contains key information like the type of event, affected entities, and start time — additional relevant information is included depending on the event type. Each email includes both plaintext and HTML versions to accommodate multiple types of email clients. (Final production emails may vary slightly from those shown below.)

Internet outage notification emails include information about the affected entities, a description of the cause of the outage, start time, scope (if available), and the type of outage (Nationwide, Network, Regional, or Platform), as well as a link to view the outage in a Radar traffic graph.

Traffic anomaly notification emails simply include information about the affected entity and a start time, as well as a link to view the anomaly in a Radar traffic graph.

BGP hijack notification emails include information about the hijacking and victim ASNs, affected IP address prefixes, the number of BGP messages (announcements) containing leaked routes, the number of peers announcing the hijack, detection timing, a confidence level on the event being a true hijack, and relevant tags, as well as a link to view details of the hijack event on Radar.

BGP route leak notification emails include information about the AS that the leaked routes were learned from, the AS that leaked the routes, the AS that received and propagated the leaked routes, the number of affected prefixes, the number of affected origin ASes, the number of BGP route collector peers that saw the route leak, and detection timing, as well as a link to view details of the route leak event on Radar.

If you are sending notifications to webhooks, you can integrate those notifications into tools like Slack. For example, by following the directions in Slack’s API documentation, creating a simple integration took just a few minutes and results in messages like the one shown below.

Conclusion

Cloudflare’s unique perspective on the Internet provides us with near-real-time insight into unexpected drops in traffic, as well as potentially problematic routing events. While we’ve been sharing these insights with you over the past year, you had to visit Cloudflare Radar to figure out if there were any new “events”. With the launch of notifications, we’ll now automatically send you information about the latest events that you are interested in.

We encourage you to visit Cloudflare Radar to familiarize yourself with the information we publish about traffic anomalies, confirmed Internet outages, BGP route leaks, and BGP origin hijacks. Look for the notification icon on the relevant graphs and tables on Radar, and go through the workflow to set up and subscribe to notifications. (And don’t forget to sign up for a Cloudflare account if you don’t have one already.) Please send us feedback about the notifications, as we are constantly working to improve them, and let us know how and where you’ve integrated Radar notifications into your own tools/workflows/organization.

Follow Cloudflare Radar on social media at @CloudflareRadar (Twitter), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky).

Cloudflare Radar's new BGP origin hijack detection system

Mingwei Zhang — Fri, 28 Jul 2023 13:00:26 GMT

Border Gateway Protocol (BGP) is the de facto inter-domain routing protocol used on the Internet. It enables networks and organizations to exchange reachability information for blocks of IP addresses (IP prefixes) among each other, thus allowing routers across the Internet to forward traffic to its destination. BGP was designed with the assumption that networks do not intentionally propagate falsified information, but unfortunately that’s not a valid assumption on today’s Internet.

Malicious actors on the Internet who control BGP routers can perform BGP hijacks by falsely announcing ownership of groups of IP addresses that they do not own, control, or route to. By doing so, an attacker is able to redirect traffic destined for the victim network to itself, and monitor and intercept its traffic. A BGP hijack is much like if someone were to change out all the signs on a stretch of freeway and reroute automobile traffic onto incorrect exits.

You can learn more about BGP and BGP hijacking and its consequences in our learning center.

At Cloudflare, we have long been monitoring suspicious BGP anomalies internally. With our recent efforts, we are bringing BGP origin hijack detection to the Cloudflare Radar platform, sharing our detection results with the public. In this blog post, we will explain how we built our detection system and how people can use Radar and its APIs to integrate our data into their own workflows**.**

What is BGP origin hijacking?

Services and devices on the Internet locate each other using IP addresses. Blocks of IP addresses are called an IP prefix (or just prefix for short), and multiple prefixes from the same organization are aggregated into an autonomous system (AS).

Using the BGP protocol, ASes announce which routes can be imported or exported to other ASes and routers from their routing tables. This is called the AS routing policy. Without this routing information, operating the Internet on a large scale would quickly become impractical: data packets would get lost or take too long to reach their destinations.

During a BGP origin hijack, an attacker creates fake announcements for a targeted prefix, falsely identifying an autonomous systems (AS) under their control as the origin of the prefix.

In the following graph, we show an example where AS 4 announces the prefix P that was previously originated by AS 1. The receiving parties, i.e. AS 2 and AS 3, accept the hijacked routes and forward traffic toward prefix P to AS 4 instead.

As you can see, the normal and hijacked traffic flows back in the opposite direction of the BGP announcements we receive.

If successful, this type of attack will result in the dissemination of the falsified prefix origin announcement throughout the Internet, causing network traffic previously intended for the victim network to be redirected to the AS controlled by the attacker. As an example of a famous BGP hijack attack, in 2018 someone was able to convince parts of the Internet to reroute traffic for AWS to malicious servers where they used DNS to redirect MyEtherWallet.com, a popular crypto wallet, to a hacked page.

Prevention mechanisms and why they’re not perfect (yet)

The key difficulty in preventing BGP origin hijacks is that the BGP protocol itself does not provide a mechanism to validate the announcement content. In other words, the original BGP protocol does not provide any authentication or ownership safeguards; any route can be originated and announced by any random network, independent of its rights to announce that route.

To address this problem, operators and researchers have proposed the Resource Public Key Infrastructure (RPKI) to store and validate prefix-to-origin mapping information. With RPKI, operators can prove the ownership of their network resources and create ROAs, short for Route Origin Authorisations, cryptographically signed objects that define which Autonomous System (AS) is authorized to originate a specific prefix.

Cloudflare committed to support RPKI since the early days of the RFC. With RPKI, IP prefix owners can store and share the ownership information securely, and other operators can validate BGP announcements by checking the prefix origin to the information stored on RPKI. Any hijacking attempt to announce an IP prefix with an incorrect origin AS will result in invalid validation results, and such invalid BGP messages will be discarded. This validation process is referred to as route origin validation (ROV).

In order to further advocate for RPKI deployment and filtering of RPKI invalid announcements, Cloudflare has been providing a RPKI test service, Is BGP Safe Yet?, allowing users to test whether their ISP filters RPKI invalid announcements. We also provide rich information with regard to the RPKI status of individual prefixes and ASes at https://rpki.cloudflare.com/.

However, the effectiveness of RPKI on preventing BGP origin hijacks depends on two factors:

The ratio of prefix owners register their prefixes on RPKI;
The ratio of networks performing route origin validation.

Unfortunately, neither ratio is at a satisfactory level yet. As of today, July 27, 2023, only about 45% of the IP prefixes routable on the Internet are covered by some ROA on RPKI. The remaining prefixes are highly vulnerable to BGP origin hijacks. Even for the 45% prefix that are covered by some ROA, origin hijack attempts can still affect them due to the low ratio of networks that perform route origin validation (ROV). Based on our recent study, only 6.5% of the Internet users are protected by ROV from BGP origin hijacks.

Despite the benefits of RPKI and RPKI ROAs, their effectiveness in preventing BGP origin hijacks is limited by the slow adoption and deployment of these technologies. Until we achieve a high rate of RPKI ROA registration and RPKI invalid filtering, BGP origin hijacks will continue to pose a significant threat to the daily operations of the Internet and the security of everyone connected to it. Therefore, it’s also essential to prioritize developing and deploying BGP monitoring and detection tools to enhance the security and stability of the Internet's routing infrastructure.

Design of Cloudflare’s BGP hijack detection system

Our system comprises multiple data sources and three distinct modules that work together to detect and analyze potential BGP hijack events: prefix origin change detection, hijack detection and the storage and notification module.

The Prefix Origin Change Detection module provides the data, the Hijack Detection module analyzes the data, and the Alerts Storage and Delivery module stores and provides access to the results. Together, these modules work in tandem to provide a comprehensive system for detecting and analyzing potential BGP hijack events.

Prefix origin change detection module

At its core, the BGP protocol involves:

Exchanging prefix reachability (routing) information;
Deciding where to forward traffic based on the reachability information received.

The reachability change information is encoded in BGP update messages while the routing decision results are encoded as a route information base (RIB) on the routers, also known as the routing table.

In our origin hijack detection system, we focus on investigating BGP update messages that contain changes to the origin ASes of any IP prefixes. There are two types of BGP update messages that could indicate prefix origin changes: announcements and withdrawals.

Announcements include an AS-level path toward one or more prefixes. The path tells the receiving parties through which sequence of networks (ASes) one can reach the corresponding prefixes. The last hop of an AS path is the origin AS. In the following diagram, AS 1 is the origin AS of the announced path.

Withdrawals, on the other hand, simply inform the receiving parties that the prefixes are no longer reachable.

Both types of messages are stateless. They inform us of the current route changes, but provide no information about the previous states. As a result, detecting origin changes is not as straightforward as one may think. Our system needs to keep track of historical BGP updates and build some sort of state over time so that we can verify if a BGP update contains origin changes.

We didn't want to deal with a complex system like a database to manage the state of all the prefixes we see resulting from all the BGP updates we get from them. Fortunately, there's this thing called prefix trie in computer science that you can use to store and look up string-indexed data structures, which is ideal for our use case. We ended up developing a fast Rust-based custom IP prefix trie that we use to hold the relevant information such as the origin ASN and the AS path for each IP prefix and allows information to be updated based on BGP announcements and withdrawals.

The example figure below shows an example of the AS path information for prefix 192.0.2.0/24 stored on a prefix trie. When updating the information on the prefix trie, if we see a change of origin ASN for any given prefix, we record the BGP message as well as the change and create an Origin Change Signal.

The prefix origin changes detection module collects and processes live-stream and historical BGP data from various sources. For live streams, our system applies a thin layer of data processing to translate BGP messages into our internal data structure. At the same time, for historical archives, we use a dedicated deployment of the BGPKIT broker and parser to convert MRT files from RouteViews and RIPE RIS into BGP message streams as they become available.

After the data is collected, consolidated and normalized it then creates, maintains and destroys the prefix tries so that we can know what changed from previous BGP announcements from the same peers. Based on these calculations we then send enriched messages downstream to be analyzed.

Hijack detection module

Determining whether BGP messages suggest a hijack is a complex task, and no common scoring mechanism can be used to provide a definitive answer. Fortunately, there are several types of data sources that can collectively provide a relatively good idea of whether a BGP announcement is legitimate or not. These data sources can be categorized into two types: inter-AS relationships and prefix-origin binding.

The inter-AS relationship datasets include AS2org and AS2rel datasets from CAIDA/UCSD, AS2rel datasets from BGPKIT, AS organization datasets from PeeringDB, and per-prefix AS relationship data built at Cloudflare. These datasets provide information about the relationship between autonomous systems, such as whether they are upstream or downstream from one another, or if the origins of any change signal belong to the same organization.

Prefix-to-origin binding datasets include live RPKI validated ROA payload (VRP) from the Cloudflare RPKI portal, daily Internet Routing Registry (IRR) dumps curated and cleaned up by MANRS, and prefix and AS bogon lists (private and reserved addresses defined by RFC 1918, RFC 5735, and RFC 6598). These datasets provide information about the ownership of prefixes and the ASes that are authorized to originate them.

By combining all these data sources, it is possible to collect information about each BGP announcement and answer questions programmatically. For this, we have a scoring function that takes all the evidence gathered for a specific BGP event as the input and runs that data through a sequence of checks. Each condition returns a neutral, positive, or negative weight that keeps adding to the final score. The higher the score, the more likely it is that the event is a hijack attempt.

The following diagram illustrates this sequence of checks:

As you can see, for each event, several checks are involved that help calculate the final score: RPKI, Internet Routing Registry (IRR), bogon prefixes and ASNs lists, AS relationships, and AS path.

Our guiding principles are: if the newly announced origins are RPKI or IRR invalid, it’s more likely that it’s a hijack, but if the old origins are also invalid, then it’s less likely. We discard events about private and reserved ASes and prefixes. If the new and old origins have a direct business relationship, then it’s less likely that it’s a hijack. If the new AS path indicates that the traffic still goes through the old origin, then it’s probably not a hijack.

Signals that are deemed legitimate are discarded, while signals with a high enough confidence score are flagged as potential hijacks and sent downstream for further analysis.

It's important to reiterate that the decision is not binary but a score. There will be situations where we find false negatives or false positives. The advantage of this framework is that we can easily monitor the results, learn from additional datasets and conduct the occasional manual inspection, which allows us to adjust the weights, add new conditions and continue improving the score precision over time.

Aggregating BGP hijack events

Our BGP hijack detection system provides fast response time and requires minimal resources by operating on a per-message basis.

However, when a hijack is happening, the number of hijack signals can be overwhelming for operators to manage. To address this issue, we designed a method to aggregate individual hijack messages into BGP hijack events, thereby reducing the number of alerts triggered.

An event aggregates BGP messages that are coming from the same hijacker related to prefixes from the same victim. The start date is the same as the date of the first suspicious signal. To calculate the end of an event we look for one of the following conditions:

A BGP withdrawn message for the hijacked prefix: regardless of who sends the withdrawal, the route towards the prefix is no longer via the hijacker, and thus this hijack message is considered finished.
A new BGP announcement message with the previous (legitimate) network as the origin: this indicates that the route towards the prefix is reverted to the state before the hijack, and the hijack is therefore considered finished.

If all BGP messages for an event have been withdrawn or reverted, and there are no more new suspicious origin changes from the hijacker ASN for six hours, we mark the event as finished and set the end date.

Hijack events can capture both small-scale and large-scale attacks. Alerts are then based on these aggregated events, not individual messages, making it easier for operators to manage and respond appropriately.

Alerts, Storage and Notifications module

This module provides access to detected BGP hijack events and sends out notifications to relevant parties. It handles storage of all detected events and provides a user interface for easy access and search of historical events. It also generates notifications and delivers them to the relevant parties, such as network administrators or security analysts, when a potential BGP hijack event is detected. Additionally, this module can build dashboards to display high-level information and visualizations of detected events to facilitate further analysis.

Lightweight and portable implementation

Our BGP hijack detection system is implemented as a Rust-based command line application that is lightweight and portable. The whole detection pipeline runs off a single binary application that connects to a PostgreSQL database and essentially runs a complete self-contained BGP data pipeline. And if you are wondering, yes, the full system, including the database, can run well on a laptop.

The runtime cost mainly comes from maintaining the in-memory prefix tries for each full-feed router, each costing roughly 200 MB RAM. For the beta deployment, we use about 170 full-feed peers and the whole system runs well on a single 32 GB node with 12 threads.

Using the BGP Hijack Detection

The BGP Hijack Detection results are now available on both the Cloudflare Radar website and the Cloudflare Radar API.

Cloudflare Radar

Under the “Security & Attacks” section of the Cloudflare Radar for both global and ASN view, we now display the BGP origin hijacks table. In this table, we show a list of detected potential BGP hijack events with the following information:

The detected and expected origin ASes;
The start time and event duration;
The number of BGP messages and route collectors peers that saw the event;
The announced prefixes;
Evidence tags and confidence level (on the likelihood of the event being a hijack).

For each BGP event, our system generates relevant evidence tags to indicate why the event is considered suspicious or not. These tags are used to inform the confidence score assigned to each event. Red tags indicate evidence that increases the likelihood of a hijack event, while green tags indicate the opposite.

For example, the red tag "RPKI INVALID" indicates an event is likely a hijack, as it suggests that the RPKI validation failed for the announcement. Conversely, the tag "SIBLING ORIGINS" is a green tag that indicates the detected and expected origins belong to the same organization, making it less likely for the event to be a hijack.

Users can now access the BGP hijacks table in the following ways:

Global view under Security & Attacks page without location filters. This view lists the most recent 150 detected BGP hijack events globally.
When filtered by a specific ASN, the table will appear on Overview, Traffic, and Traffic & Attacks tabs.

Cloudflare Radar API

We also provide programmable access to the BGP hijack detection results via the Cloudflare Radar API, which is freely available under CC BY-NC 4.0 license. The API documentation is available at the Cloudflare API portal.

The following curl command fetches the most recent 10 BGP hijack events relevant to AS64512.

curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&format=json&per_page=10" \
    -H "Authorization: Bearer "

Users can further filter events with high confidence by specifying the minConfidence parameter with a 0-10 value, where a higher value indicates higher confidence of the events being a hijack. The following example expands on the previous example by adding the minimum confidence score of 8 to the query:

curl -X GET "https://api.cloudflare.com/client/v4/radar/bgp/hijacks/events?invlovedAsn=64512&format=json&per_page=10&minConfidence=8" \
    -H "Authorization: Bearer "

Additionally, users can also quickly build custom hijack alerters using a Cloudflare Workers + KV combination. We have a full tutorial on building alerters that send out webhook-based messages or emails (with Email Routing) available on the Cloudflare Radar documentation site.

More routing security on Cloudflare Radar

As we continue improving Cloudflare Radar, we are planning to introduce additional Internet routing and security data. For example, Radar will soon get a dedicated routing section to provide digestible BGP information for given networks or regions, such as distinct routable prefixes, RPKI valid/invalid/unknown routes, distribution of IPv4/IPv6 prefixes, etc. Our goal is to provide the best data and tools for routing security to the community, so that we can build a better and more secure Internet together.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Helping build a safer Internet by measuring BGP RPKI Route Origin Validation

Carlos Rodrigues — Fri, 16 Dec 2022 14:00:00 GMT

The Border Gateway Protocol (BGP) is the glue that keeps the entire Internet together. However, despite its vital function, BGP wasn't originally designed to protect against malicious actors or routing mishaps. It has since been updated to account for this shortcoming with the Resource Public Key Infrastructure (RPKI) framework, but can we declare it to be safe yet?

If the question needs asking, you might suspect we can't. There is a shortage of reliable data on how much of the Internet is protected from preventable routing problems. Today, we’re releasing a new method to measure exactly that: what percentage of Internet users are protected by their Internet Service Provider from these issues. We find that there is a long way to go before the Internet is protected from routing problems, though it varies dramatically by country.

Why RPKI is necessary to secure Internet routing

The Internet is a network of independently-managed networks, called Autonomous Systems (ASes). To achieve global reachability, ASes interconnect with each other and determine the feasible paths to a given destination IP address by exchanging routing information using BGP. BGP enables routers with only local network visibility to construct end-to-end paths based on the arbitrary preferences of each administrative entity that operates that equipment. Typically, Internet traffic between a user and a destination traverses multiple AS networks using paths constructed by BGP routers.

BGP, however, lacks built-in security mechanisms to protect the integrity of the exchanged routing information and to provide authentication and authorization of the advertised IP address space. Because of this, AS operators must implicitly trust that the routing information exchanged through BGP is accurate. As a result, the Internet is vulnerable to the injection of bogus routing information, which cannot be mitigated by security measures at the client or server level of the network.

An adversary with access to a BGP router can inject fraudulent routes into the routing system, which can be used to execute an array of attacks, including:

Denial-of-Service (DoS) through traffic blackholing or redirection,
Impersonation attacks to eavesdrop on communications,
Machine-in-the-Middle exploits to modify the exchanged data, and subvert reputation-based filtering systems.

Additionally, local misconfigurations and fat-finger errors can be propagated well beyond the source of the error and cause major disruption across the Internet.

Such an incident happened on June 24, 2019. Millions of users were unable to access Cloudflare address space when a regional ISP in Pennsylvania accidentally advertised routes to Cloudflare through their capacity-limited network. This was effectively the Internet equivalent of routing an entire freeway through a neighborhood street.

Traffic misdirections like these, either unintentional or intentional, are not uncommon. The Internet Society’s MANRS (Mutually Agreed Norms for Routing Security) initiative estimated that in 2020 alone there were over 3,000 route leaks and hijacks, and new occurrences can be observed every day through Cloudflare Radar.

The most prominent proposals to secure BGP routing, standardized by the IETF focus on validating the origin of the advertised routes using Resource Public Key Infrastructure (RPKI) and verifying the integrity of the paths with BGPsec. Specifically, RPKI (defined in RFC 7115) relies on a Public Key Infrastructure to validate that an AS advertising a route to a destination (an IP address space) is the legitimate owner of those IP addresses.

RPKI has been defined for a long time but lacks adoption. It requires network operators to cryptographically sign their prefixes, and routing networks to perform an RPKI Route Origin Validation (ROV) on their routers. This is a two-step operation that requires coordination and participation from many actors to be effective.

The two phases of RPKI adoption: signing origins and validating origins

RPKI has two phases of deployment: first, an AS that wants to protect its own IP prefixes can cryptographically sign Route Origin Authorization (ROA) records thereby attesting to be the legitimate origin of that signed IP space. Second, an AS can avoid selecting invalid routes by performing Route Origin Validation (ROV, defined in RFC 6483).

With ROV, a BGP route received by a neighbor is validated against the available RPKI records. A route that is valid or missing from RPKI is selected, while a route with RPKI records found to be invalid is typically rejected, thus preventing the use and propagation of hijacked and misconfigured routes.

One issue with RPKI is the fact that implementing ROA is meaningful only if other ASes implement ROV, and vice versa. Therefore, securing BGP routing requires a united effort and a lack of broader adoption disincentivizes ASes from commiting the resources to validate their own routes. Conversely, increasing RPKI adoption can lead to network effects and accelerate RPKI deployment. Projects like MANRS and Cloudflare’s isbgpsafeyet.com are promoting good Internet citizenship among network operators, and make the benefits of RPKI deployment known to the Internet. You can check whether your own ISP is being a good Internet citizen by testing it on isbgpsafeyet.com.

Measuring the extent to which both ROA (signing of addresses by the network that controls them) and ROV (filtering of invalid routes by ISPs) have been implemented is important to evaluating the impact of these initiatives, developing situational awareness, and predicting the impact of future misconfigurations or attacks.

Measuring ROAs is straightforward since ROA data is readily available from RPKI repositories. Querying RPKI repositories for publicly routed IP prefixes (e.g. prefixes visible in the RouteViews and RIPE RIS routing tables) allows us to estimate the percentage of addresses covered by ROA objects. Currently, there are 393,344 IPv4 and 86,306 IPv6 ROAs in the global RPKI system, covering about 40% of the globally routed prefix-AS origin pairs¹.

Measuring ROV, however, is significantly more challenging given it is configured inside the BGP routers of each AS, not accessible by anyone other than each router’s administrator.

Measuring ROV deployment

Although we do not have direct access to the configuration of everyone’s BGP routers, it is possible to infer the use of ROV by comparing the reachability of RPKI-valid and RPKI-invalid prefixes from measurement points within an AS².

Consider the following toy topology as an example, where an RPKI-invalid origin is advertised through AS0 to AS1 and AS2. If AS1 filters and rejects RPKI-invalid routes, a user behind AS1 would not be able to connect to that origin. By contrast, if AS2 does not reject RPKI invalids, a user behind AS2 would be able to connect to that origin.

While occasionally a user may be unable to access an origin due to transient network issues, if multiple users act as vantage points for a measurement system, we would be able to collect a large number of data points to infer which ASes deploy ROV.

If, in the figure above, AS0 filters invalid RPKI routes, then vantage points in both AS1 and AS2 would be unable to connect to the RPKI-invalid origin, making it hard to distinguish if ROV is deployed at the ASes of our vantage points or in an AS along the path. One way to mitigate this limitation is to announce the RPKI-invalid origin from multiple locations from an anycast network taking advantage of its direct interconnections to the measurement vantage points as shown in the figure below. As a result, an AS that does not itself deploy ROV is less likely to observe the benefits of upstream ASes using ROV, and we would be able to accurately infer ROV deployment per AS³.

Note that it’s also important that the IP address of the RPKI-invalid origin should not be covered by a less specific prefix for which there is a valid or unknown RPKI route, otherwise even if an AS filters invalid RPKI routes its users would still be able to find a route to that IP.

The measurement technique described here is the one implemented by Cloudflare’s isbgpsafeyet.com website, allowing end users to assess whether or not their ISPs have deployed BGP ROV.

The isbgpsafeyet.com website itself doesn't submit any data back to Cloudflare, but recently we started measuring whether end users’ browsers can successfully connect to invalid RPKI origins when ROV is present. We use the same mechanism as is used for global performance data⁴. In particular, every measurement session (an individual end user at some point in time) attempts a request to both valid.rpki.cloudflare.com, which should always succeed as it’s RPKI-valid, and invalid.rpki.cloudflare.com, which is RPKI-invalid and should fail when the user’s ISP uses ROV.

This allows us to have continuous and up-to-date measurements from hundreds of thousands of browsers on a daily basis, and develop a greater understanding of the state of ROV deployment.

The state of global ROV deployment

The figure below shows the raw number of ROV probe requests per hour during October 2022 to valid.rpki.cloudflare.com and invalid.rpki.cloudflare.com. In total, we observed 69.7 million successful probes from 41,531 ASNs.

Based on APNIC's estimates on the number of end users per ASN, our weighted⁵ analysis covers 96.5% of the world's Internet population. As expected, the number of requests follow a diurnal pattern which reflects established user behavior in daily and weekly Internet activity⁶.

We can also see that the number of successful requests to valid.rpki.cloudflare.com (gray line) closely follows the number of sessions that issued at least one request (blue line), which works as a smoke test for the correctness of our measurements.

As we don't store the IP addresses that contribute measurements, we don’t have any way to count individual clients and large spikes in the data may introduce unwanted bias. We account for that by capturing those instants and excluding them.

Overall, we estimate that out of the four billion Internet users, only 261 million (6.5%) are protected by BGP Route Origin Validation, but the true state of global ROV deployment is more subtle than this.

The following map shows the fraction of dropped RPKI-invalid requests from ASes with over 200 probes over the month of October. It depicts how far along each country is in adopting ROV but doesn’t necessarily represent the fraction of protected users in each country, as we will discover.

Sweden and Bolivia appear to be the countries with the highest level of adoption (over 80%), while only a few other countries have crossed the 50% mark (e.g. Finland, Denmark, Chad, Greece, the United States).

ROV adoption may be driven by a few ASes hosting large user populations, or by many ASes hosting small user populations. To understand such disparities, the map below plots the contrast between overall adoption in a country (as in the previous map) and median adoption over the individual ASes within that country. Countries with stronger reds have relatively few ASes deploying ROV with high impact, while countries with stronger blues have more ASes deploying ROV but with lower impact per AS.

In the Netherlands, Denmark, Switzerland, or the United States, adoption appears mostly driven by their larger ASes, while in Greece or Yemen it’s the smaller ones that are adopting ROV.

The following histogram summarizes the worldwide level of adoption for the 6,765 ASes covered by the previous two maps.

Most ASes either don’t validate at all, or have close to 100% adoption, which is what we’d intuitively expect. However, it's interesting to observe that there are small numbers of ASes all across the scale. ASes that exhibit partial RPKI-invalid drop rate compared to total requests may either implement ROV partially (on some, but not all, of their BGP routers), or appear as dropping RPKI invalids due to ROV deployment by other ASes in their upstream path.

To estimate the number of users protected by ROV we only considered ASes with an observed adoption above 95%, as an AS with an incomplete deployment still leaves its users vulnerable to route leaks from its BGP peers.

If we take the previous histogram and summarize by the number of users behind each AS, the green bar on the right corresponds to the 261 million users currently protected by ROV according to the above criteria (686 ASes).

Looking back at the country adoption map one would perhaps expect the number of protected users to be larger. But worldwide ROV deployment is still mostly partial, lacking larger ASes, or both. This becomes even more clear when compared with the next map, plotting just the fraction of fully protected users.

To wrap up our analysis, we look at two world economies chosen for their contrasting, almost symmetrical, stages of deployment: the United States and the European Union.

112 million Internet users are protected by 111 ASes from the United States with comprehensive ROV deployments. Conversely, more than twice as many ASes from countries making up the European Union have fully deployed ROV, but end up covering only half as many users. This can be reasonably explained by end user ASes being more likely to operate within a single country rather than span multiple countries.

Conclusion

Probe requests were performed from end user browsers and very few measurements were collected from transit providers (which have few end users, if any). Also, paths between end user ASes and Cloudflare are often very short (a nice outcome of our extensive peering) and don't traverse upper-tier networks that they would otherwise use to reach the rest of the Internet.

In other words, the methodology used focuses on ROV adoption by end user networks (e.g. ISPs) and isn’t meant to reflect the eventual effect of indirect validation from (perhaps validating) upper-tier transit networks. While indirect validation may limit the "blast radius" of (malicious or accidental) route leaks, it still leaves non-validating ASes vulnerable to leaks coming from their peers.

As with indirect validation, an AS remains vulnerable until its ROV deployment reaches a sufficient level of completion. We chose to only consider AS deployments above 95% as truly comprehensive, and Cloudflare Radar will soon begin using this threshold to track ROV adoption worldwide, as part of our mission to help build a better Internet.

When considering only comprehensive ROV deployments, some countries such as Denmark, Greece, Switzerland, Sweden, or Australia, already show an effective coverage above 50% of their respective Internet populations, with others like the Netherlands or the United States slightly above 40%, mostly driven by few large ASes rather than many smaller ones.

Worldwide we observe a very low effective coverage of just 6.5% over the measured ASes, corresponding to 261 million end users currently safe from (malicious and accidental) route leaks, which means there’s still a long way to go before we can declare BGP to be safe.

......

¹https://rpki.cloudflare.com/

²Gilad, Yossi, Avichai Cohen, Amir Herzberg, Michael Schapira, and Haya Shulman. "Are we there yet? On RPKI's deployment and security." Cryptology ePrint Archive (2016).

³Geoff Huston. “Measuring ROAs and ROV”. https://blog.apnic.net/2021/03/24/measuring-roas-and-rov/ ⁴Measurements are issued stochastically when users encounter 1xxx error pages from default (non-customer) configurations.

⁵Probe requests are weighted by AS size as calculated from Cloudflare's worldwide HTTP traffic.

⁶Quan, Lin, John Heidemann, and Yuri Pradkin. "When the Internet sleeps: Correlating diurnal networks with external factors." In Proceedings of the 2014 Conference on Internet Measurement Conference, pp. 87-100. 2014.

Why BGP communities are better than AS-path prepends

Tom Strickx — Thu, 24 Nov 2022 17:31:47 GMT

The Internet, in its purest form, is a loosely connected graph of independent networks (also called Autonomous Systems (AS for short)). These networks use a signaling protocol called BGP (Border Gateway Protocol) to inform their neighbors (also known as peers) about the reachability of IP prefixes (a group of IP addresses) in and through their network. Part of this exchange contains useful metadata about the IP prefix that are used to inform network routing decisions. One example of the metadata is the full AS-path, which consists of the different autonomous systems an IP packet needs to pass through to reach its destination.

As we all want our packets to get to their destination as fast as possible, selecting the shortest AS-path for a given prefix is a good idea. This is where something called prepending comes into play.

Routing on the Internet, a primer

Let's briefly talk about how the Internet works at its most fundamental level, before we dive into some nitty-gritty details.

The Internet is, at its core, a massively interconnected network of thousands of networks. Each network owns two things that are critical:

1. An Autonomous System Number (ASN): a 32-bit integer that uniquely identifies a network. For example, one of the Cloudflare ASNs (we have multiple) is 13335.

2. IP prefixes: An IP prefix is a range of IP addresses, bundled together in powers of two: In the IPv4 space, two addresses form a /31 prefix, four form a /30, and so on, all the way up to /0, which is shorthand for “all IPv4 prefixes''. The same applies for IPv6 but instead of aggregating 32 bits at most, you can aggregate up to 128 bits. The figure below shows this relationship between IP prefixes, in reverse -- a /24 contains two /25s that contains two /26s, and so on.

To communicate on the Internet, you must be able to reach your destination, and that’s where routing protocols come into play. They enable each node on the Internet to know where to send your message (and for the receiver to send a message back).

As mentioned earlier, these destinations are identified by IP addresses, and contiguous ranges of IP addresses are expressed as IP prefixes. We use IP prefixes for routing as an efficiency optimization: Keeping track of where to go for four billion (2³²) IP addresses in IPv4 would be incredibly complex, and require a lot of resources. Sticking to prefixes reduces that number down to about one million instead.

Now recall that Autonomous Systems are independently operated and controlled. In the Internet’s network of networks, how do I tell Source A in some other network that there is an available path to get to Destination B in (or through) my network? In comes BGP! BGP is the Border Gateway Protocol, and it is used to signal reachability information. Signal messages generated by the source ASN are referred to as ‘announcements’ because they declare to the Internet that IP addresses in the prefix are online and reachable.

Have a look at the figure above. Source A should now know how to get to Destination B through 2 different networks!

This is what an actual BGP message would look like:

BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24

As you can see, BGP messages contain more than just the IP prefix (the NLRI bit) and the path, but also a bunch of other metadata that provides additional information about the path. Other fields include communities (more on that later), as well as MED, or origin code. MED is a suggestion to other directly connected networks on which path should be taken if multiple options are available, and the lowest value wins. The origin code can be one of three values: IGP, EGP or Incomplete. IGP will be set if you originate the prefix through BGP, EGP is no longer used (it’s an ancient routing protocol), and Incomplete is set when you distribute a prefix into BGP from another routing protocol (like IS-IS or OSPF).

Now that source A knows how to get to Destination B through two different networks, let's talk about traffic engineering!

Traffic engineering

Traffic engineering is a critical part of the day to day management of any network. Just like in the physical world, detours can be put in place by operators to optimize the traffic flows into (inbound) and out of (outbound) their network. Outbound traffic engineering is significantly easier than inbound traffic engineering because operators can choose from neighboring networks, even prioritize some traffic over others. In contrast, inbound traffic engineering requires influencing a network that is operated by someone else entirely. The autonomy and self-governance of a network is paramount, so operators use available tools to inform or shape inbound packet flows from other networks. The understanding and use of those tools is complex, and can be a challenge.

The available set of traffic engineering tools, both in- and outbound, rely on manipulating attributes (metadata) of a given route. As we’re talking about traffic engineering between independent networks, we’ll be manipulating the attributes of an EBGP-learned route. BGP can be split into two categories:

EBGP: BGP communication between two different ASNs
IBGP: BGP communication within the same ASN.

While the protocol is the same, certain attributes can be exchanged on an IBGP session that aren’t exchanged on an EBGP session. One of those is local-preference. More on that in a moment.

BGP best path selection

When a network is connected to multiple other networks and service providers, it can receive path information to the same IP prefix from many of those networks, each with slightly different attributes. It is then up to the receiving network of that information to use a BGP best path selection algorithm to pick the “best” prefix (and route), and use this to forward IP traffic. I’ve put “best” in quotation marks, as best is a subjective requirement. “Best” is frequently the shortest, but what can be best for my network might not be the best outcome for another network.

BGP will consider multiple prefix attributes when filtering through the received options. However, rather than combine all those attributes into a single selection criteria, BGP best path selection uses the attributes in tiers -- at any tier, if the available attributes are sufficient to choose the best path, then the algorithm terminates with that choice.

The BGP best path selection algorithm is extensive, containing 15 discrete steps to select the best available path for a given prefix. Given the numerous steps, it’s in the interest of the network to decide the best path as early as possible. The first four steps are most used and influential, and are depicted in the figure below as sieves.

Picking the shortest path possible is usually a good idea, which is why “AS-path length” is a step executed early on in the algorithm. However, looking at the figure above, “AS-path length” appears second, despite being the attribute to find the shortest path. So let’s talk about the first step: local preference.

Local preferenceLocal preference is an operator favorite because it allows them to handpick a route+path combination of their choice. It’s the first attribute in the algorithm because it is unique for any given route+neighbor+AS-path combination.

A network sets the local preference on import of a route (having learned about the route from a neighbor network). Being a non-transitive property, meaning that it’s an attribute that is never sent in an EBGP message to other networks. This intrinsically means, for example, that the operator of AS 64496 can’t set the local preference of routes to their own (or transiting) IP prefixes inside neighboring AS 64511. The inability to do so is partially why inbound traffic engineering through EBGP is so difficult.

Prepending artificially increases AS-path lengthSince no network is able to directly set the local preference for a prefix inside another network, the first opportunity to influence other networks’ choices is modifying the AS-path. If the next hops are valid, and the local preference for all the different paths for a given route are the same, modifying the AS-path is an obvious option to change the path traffic will take towards your network. In a BGP message, prepending looks like this:

BEFORE:

BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24

AFTER:

BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24

Specifically, operators can do AS-path prepending. When doing AS-path prepending, an operator adds additional autonomous systems to the path (usually the operator uses their own AS, but that’s not enforced in the protocol). This way, an AS-path can go from a length of 1 to a length of 255. As the length has now increased dramatically, that specific path for the route will not be chosen. By changing the AS-path advertised to different peers, an operator can control the traffic flows coming into their network.

Unfortunately, prepending has a catch: To be the deciding factor, all the other attributes need to be equal. This is rarely true, especially in large networks that are able to choose from many possible routes to a destination.

Business Policy Engine

BGP is colloquially also referred to as a Business Policy Engine: it does not select the best path from a performance point of view; instead, and more often than not, it will select the best path from a business point of view. The business criteria could be anything from investment (port) efficiency to increased revenue, and more. This may sound strange but, believe it or not, this is what BGP is designed to do! The power (and complexity) of BGP is that it enables a network operator to make choices according to the operator’s needs, contracts, and policies, many of which cannot be reflected by conventional notions of engineering performance.

Different local preferences

A lot of networks (including Cloudflare) assign a local preference depending on the type of connection used to send us the routes. A higher value is a higher preference. For example, routes learned from transit network connections will get a lower local preference of 100 because they are the most costly to use; backbone-learned routes will be 150, Internet exchange (IX) routes get 200, and lastly private interconnect (PNI) routes get 250. This means that for egress (outbound) traffic, the Cloudflare network, by default, will prefer a PNI-learned route, even if a shorter AS-path is available through an IX or transit neighbor.

Part of the reason a PNI is preferred over an IX is reliability, because there is no third-party switching platform involved that is out of our control, which is important because we operate on the assumption that all hardware can and will eventually break. Another part of the reason is for port efficiency reasons. Here, efficiency is defined by cost per megabit transferred on each port. Roughly speaking, the cost is calculated by:

((cost_of_switch / port_count) + transceiver_cost)

which is combined with the cross-connect cost (might be monthly recurring (MRC), or a one-time fee). PNI is preferable because it helps to optimize value by reducing the overall cost per megabit transferred, because the unit price decreases with higher utilization of the port.

This reasoning is similar for a lot of other networks, and is very prevalent in transit networks. BGP is at least as much about cost and business policy, as it is about performance.

Transit local preference

For simplicity, when referring to transits, I mean the traditional tier-1 transit networks. Due to the nature of these networks, they have two distinct sets of network peers:

1. Customers (like Cloudflare)2. Settlement-free peers (like other tier-1 networks)

In normal circumstances, transit customers will get a higher local preference assigned than the local preference used for their settlement-free peers. This means that, no matter how much you prepend a prefix, if traffic enters that transit network, traffic will always land on your interconnection with that transit network, it will not be offloaded to another peer.

A prepend can still be used if you want to switch/offload traffic from a single link with one transit if you have multiple distinguished links with them, or if the source of traffic is multihomed behind multiple transits (and they don’t have their own local preference playbook preferring one transit over another). But inbound traffic engineering traffic away from one transit port to another through AS-path prepending has significant diminishing returns: once you’re past three prepends, it’s unlikely to change much, if anything, at that point.

Example

In the above scenario, no matter the adjustment Cloudflare makes in its AS-path towards AS 64496, the traffic will keep flowing through the Transit B <> Cloudflare interconnection, even though the path Origin A → Transit B → Transit A → Cloudflare is shorter from an AS-path point of view.

In this scenario, not a lot has changed, but Origin A is now multi-homed behind the two transit providers. In this case, the AS-path prepending was effective, as the paths seen on the Origin A side are both the prepended and non-prepended path. As long as Origin A is not doing any egress traffic engineering, and is treating both transit networks equally, then the path chosen will be Origin A → Transit A → Cloudflare.

Community-based traffic engineering

So we have now identified a pretty critical problem within the Internet ecosystem for operators: with the tools mentioned above, it’s not always (some might even say outright impossible) possible to accurately dictate paths traffic can ingress your own network, reducing the control an autonomous system has over its own network. Fortunately, there is a solution for this problem: community-based local preference.

Some transit providers allow their customers to influence the local preference in the transit network through the use of BGP communities. BGP communities are an optional transitive attribute for a route advertisement. The communities can be informative (“I learned this prefix in Rome”), but they can also be used to trigger actions on the receiving side. For example, Cogent publishes the following action communities:

Community	Local preference
174:10	10
174:70	70
174:120	120
174:125	125
174:135	135
174:140	140

When you know that Cogent uses the following default local preferences in their network:

Peers → Local preference 100Customers → Local preference 130

It’s easy to see how we could use the communities provided to change the route used. It’s important to note though that, as we can’t set the local preference of a route to 100 (or 130), AS-path prepending remains largely irrelevant, as the local preference won’t ever be the same.

Take for example the following configuration:

term ADV-SITELOCAL {
    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    }
    then {
        as-path-prepend "13335 13335";
        accept;
    }
}

We’re prepending the Cloudflare ASN two times, resulting in a total AS-path of three, yet we were still seeing a lot (too much) traffic coming in on our Cogent link. At that point, an engineer could add another prepend, but for a well-connected network as Cloudflare, if two prepends didn’t do much, or three, then four or five isn’t going to do much either. Instead, we can leverage the Cogent communities documented above to change the routing within Cogent:

term ADV-SITELOCAL {
    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    }
    then {
        community add COGENT_LPREF70;
        accept;
    }
}

The above configuration changes the traffic flow to this:

Which is exactly what we wanted!

Conclusion

AS-path prepending is still useful, and has its use as part of the toolchain for operators to do traffic engineering, but should be used sparingly. Excessive prepending opens a network up to wider spread route hijacks, which should be avoided at all costs. As such, using community-based ingress traffic engineering is highly preferred (and recommended). In cases where communities aren’t available (or not available to steer customer traffic), prepends can be applied, but I encourage operators to actively monitor their effects, and roll them back if ineffective.

As a side-note, P Marcos et al. have published an interesting paper on AS-path prepending, and go into some trends seen in relation to prepending, I highly recommend giving it a read: https://www.caida.org/catalog/papers/2020_aspath_prepending/aspath_prepending.pdf

How we detect route leaks and our new Cloudflare Radar route leak service

Mingwei Zhang — Wed, 23 Nov 2022 16:00:00 GMT

Today we’re introducing Cloudflare Radar’s route leak data and API so that anyone can get information about route leaks across the Internet. We’ve built a comprehensive system that takes in data from public sources and Cloudflare’s view of the Internet drawn from our massive global network. The system is now feeding route leak data on Cloudflare Radar’s ASN pages and via the API.

This blog post is in two parts. There’s a discussion of BGP and route leaks followed by details of our route leak detection system and how it feeds Cloudflare Radar.

About BGP and route leaks

Inter-domain routing, i.e., exchanging reachability information among networks, is critical to the wellness and performance of the Internet. The Border Gateway Protocol (BGP) is the de facto routing protocol that exchanges routing information among organizations and networks. At its core, BGP assumes the information being exchanged is genuine and trust-worthy, which unfortunately is no longer a valid assumption on the current Internet. In many cases, networks can make mistakes or intentionally lie about the reachability information and propagate that to the rest of the Internet. Such incidents can cause significant disruptions of the normal operations of the Internet. One type of such disruptive incident is route leaks.

We consider route leaks as the propagation of routing announcements beyond their intended scope (RFC7908). Route leaks can cause significant disruption affecting millions of Internet users, as we have seen in many past notable incidents. For example, in June 2019 a misconfiguration in a small network in Pennsylvania, US (AS396531 - Allegheny Technologies Inc) accidentally leaked a Cloudflare prefix to Verizon, which proceeded to propagate the misconfigured route to the rest of its peers and customers. As a result, the traffic of a large portion of the Internet was squeezed through the limited-capacity links of a small network. The resulting congestion caused most of Cloudflare traffic to and from the affected IP range to be dropped.

A similar incident in November 2018 caused widespread unavailability of Google services when a Nigerian ISP (AS37282 - Mainone) accidentally leaked a large number of Google IP prefixes to its peers and providers violating the valley-free principle.

These incidents illustrate not only that route leaks can be very impactful, but also the snowball effects that misconfigurations in small regional networks can have on the global Internet.

Despite the criticality of detecting and rectifying route leaks promptly, they are often detected only when users start reporting the noticeable effects of the leaks. The challenge with detecting and preventing route leaks stems from the fact that AS business relationships and BGP routing policies are generally undisclosed, and the affected network is often remote to the root of the route leak.

In the past few years, solutions have been proposed to prevent the propagation of leaked routes. Such proposals include RFC9234 and ASPA, which extends the BGP to annotate sessions with the relationship type between the two connected AS networks to enable the detention and prevention of route leaks.

An alternative proposal to implement similar signaling of BGP roles is through the use of BGP Communities; a transitive attribute used to encode metadata in BGP announcements. While these directions are promising in the long term, they are still in very preliminary stages and are not expected to be adopted at scale soon.

At Cloudflare, we have developed a system to detect route leak events automatically and send notifications to multiple channels for visibility. As we continue our efforts to bring more relevant data to the public, we are happy to announce that we are starting an open data API for our route leak detection results today and integrate results to Cloudflare Radar pages.

Route leak definition and types

Before we jump into how we design our systems, we will first do a quick primer on what a route leak is, and why it is important to detect it.

We refer to the published IETF RFC7908 document "Problem Definition and Classification of BGP Route Leaks" to define route leaks.

> A route leak is the propagation of routing announcement(s) beyond their intended scope.

The intended scope is often concretely defined as inter-domain routing policies based on business relationships between Autonomous Systems (ASes). These business relationships are broadly classified into four categories: customers, transit providers, peers and siblings, although more complex arrangements are possible.

In a customer-provider relationship the customer AS has an agreement with another network to transit its traffic to the global routing table. In a peer-to-peer relationship two ASes agree to free bilateral traffic exchange, but only between their own IPs and the IPs of their customers. Finally, ASes that belong under the same administrative entity are considered siblings, and their traffic exchange is often unrestricted. The image below illustrates how the three main relationship types translate to export policies.

By categorizing the types of AS-level relationships and their implications on the propagation of BGP routes, we can define multiple phases of a prefix origination announcements during propagation:

upward: all path segments during this phase are customer to provider
peering: one peer-peer path segment
downward: all path segments during this phase are provider to customer

An AS path that follows valley-free routing principle will have upward, peering, downward phases, all optional but have to be in that order. Here is an example of an AS path that conforms with valley-free routing.

In RFC7908, "Problem Definition and Classification of BGP Route Leaks", the authors define six types of route leaks, and we refer to these definitions in our system design. Here are illustrations of each of the route leak types.

Type 1: Hairpin Turn with Full Prefix

> A multihomed AS learns a route from one upstream ISP and simply propagates it to another upstream ISP (the turn essentially resembling a hairpin). Neither the prefix nor the AS path in the update is altered.

An AS path that contains a provider-customer and customer-provider segment is considered a type 1 leak. The following example: AS4 → AS5 → AS6 forms a type 1 leak.

Type 1 is the most recognized type of route leaks and is very impactful. In many cases, a customer route is preferable to a peer or a provider route. In this example, AS6 will likely prefer sending traffic via AS5 instead of its other peer or provider routes, causing AS5 to unintentionally become a transit provider. This can significantly affect the performance of the traffic related to the leaked prefix or cause outages if the leaking AS is not provisioned to handle a large influx of traffic.

In June 2015, Telekom Malaysia (AS4788), a regional ISP, leaked over 170,000 routes learned from its providers and peers to its other provider Level3 (AS3549, now Lumen). Level3 accepted the routes and further propagated them to its downstream networks, which in turn caused significant network issues globally.

Type 2: Lateral ISP-ISP-ISP Leak

Type 2 leak is defined as propagating routes obtained from one peer to another peer, creating two or more consecutive peer-to-peer path segments.

Here is an example: AS3 → AS4 → AS5 forms a type 2 leak.

One example of such leaks is more than three very large networks appearing in sequence. Very large networks (such as Verizon and Lumen) do not purchase transit from each other, and having more than three such networks on the path in sequence is often an indication of a route leak.

However, in the real world, it is not unusual to see multiple small peering networks exchanging routes and passing on to each other. Legit business reasons exist for having this type of network path. We are less concerned about this type of route leak as compared to type 1.

Type 3 and 4: Provider routes to peer; peer routes to provider

These two types involve propagating routes from a provider or a peer not to a customer, but to another peer or provider. Here are the illustrations of the two types of leaks:

As in the previously mentioned example, a Nigerian ISP who peers with Google accidentally leaked its route to its provider AS4809, and thus generated a type 4 route leak. Because routes via customers are usually preferred to others, the large provider (AS4809) rerouted its traffic to Google via its customer, i.e. the leaking ASN, overwhelmed the small ISP and took down Google for over one hour.

Route leak summary

So far, we have looked at the four types of route leaks defined in RFC7908. The common thread of the four types of route leaks is that they're all defined using AS-relationships, i.e., peers, customers, and providers. We summarize the types of leaks by categorizing the AS path propagation based on where the routes are learned from and propagate to. The results are shown in the following table.

Routes from / propagates to	To provider	To peer	To customer
From provider	Type 1	Type 3	Normal
From peer	Type 4	Type 2	Normal
From customer	Normal	Normal	Normal

We can summarize the whole table into one single rule: routes obtained from a non-customer AS can only be propagated to customers.

Note: Type 5 and type 6 route leaks are defined as prefix re-origination and announcing of private prefixes. Type 5 is more closely related to prefix hijackings, which we plan to expand our system to as the next steps, while type 6 leaks are outside the scope of this work. Interested readers can refer to sections 3.5 and 3.6 of RFC7908 for more information.

The Cloudflare Radar route leak system

Now that we know what a route leak is, let’s talk about how we designed our route leak detection system.

From a very high level, we compartmentalize our system into three different components:

Raw data collection module: responsible for gathering BGP data from multiple sources and providing BGP message stream to downstream consumers.
Leak detection module: responsible for determining whether a given AS-level path is a route leak, estimate the confidence level of the assessment, aggregating and providing all external evidence needed for further analysis of the event.
Storage and notification module: responsible for providing access to detected route leak events and sending out notifications to relevant parties. This could also include building a dashboard for easy access and search of the historical events and providing the user interface for high-level analysis of the event.

Data collection module

There are three types of data input we take into consideration:

Historical: BGP archive files for some time range in the pasta. RouteViews and RIPE RIS BGP archives
Semi-real-time: BGP archive files as soon as they become available, with a 10-30 minute delay.a. RouteViews and RIPE RIS archives with data broker that checks new files periodically (e.g. BGPKIT Broker)
Real-time: true real-time data sourcesa. RIPE RIS Liveb. Cloudflare internal BGP sources

For the current version, we use the semi-real-time data source for the detection system, i.e., the BGP updates files from RouteViews and RIPE RIS. For data completeness, we process data from all public collectors from these two projects (a total of 63 collectors and over 2,400 collector peers) and implement a pipeline that’s capable of handling the BGP data processing as the data files become available.

For data files indexing and processing, we deployed an on-premises BGPKIT Broker instance with Kafka feature enabled for message passing, and a custom concurrent MRT data processing pipeline based on BGPKIT Parser Rust SDK. The data collection module processes MRT files and converts results into a BGP messages stream at over two billion BGP messages per day (roughly 30,000 messages per second).

Route leak detection

The route leak detection module works at the level of individual BGP announcements. The detection component investigates one BGP message at a time, and estimates how likely a given BGP message is a result of a route leak event.

We base our detection algorithm mainly on the valley-free model, which we believe can capture most of the notable route leak incidents. As mentioned previously, the key to having low false positives for detecting route leaks with the valley-free model is to have accurate AS-level relationships. While those relationship types are not publicized by every AS, there have been over two decades of research on the inference of the relationship types using publicly observed BGP data.

While state-of-the-art relationship inference algorithms have been shown to be highly accurate, even a small margin of errors can still incur inaccuracies in the detection of route leaks. To alleviate such artifacts, we synthesize multiple data sources for inferring AS-level relationships, including CAIDA/UCSD’s AS relationship data and our in-house built AS relationship dataset. Building on top of the two AS-level relationships, we create a much more granular dataset at the per-prefix and per-peer levels. The improved dataset allows us to answer the question like what is the relationship between AS1 and AS2 with respect to prefix P observed by collector peer X. This eliminates much of the ambiguity for cases where networks have multiple different relationships based on prefixes and geo-locations, and thus helps us reduce the number of false positives in the system. Besides the AS-relationships datasets, we also apply the AS Hegemony dataset from IHR IIJ to further reduce false positives.

Route leak storage and presentation

After processing each BGP message, we store the generated route leak entries in a database for long-term storage and exploration. We also aggregate individual route leak BGP announcements and group relevant leaks from the same leak ASN within a short period together into route-leak events. The route leak events will then be available for consumption by different downstream applications like web UIs, an API, or alerts.

Route leaks on Cloudflare Radar

At Cloudflare, we aim to help build a better Internet, and that includes sharing our efforts on monitoring and securing Internet routing. Today, we are releasing our route leak detection system as public beta.

Starting today, users going to the Cloudflare Radar ASN pages will now find the list of route leaks that affect that AS. We consider that an AS is being affected when the leaker AS is one hop away from it in any direction, before or after.

The Cloudflare Radar ASN page is directly accessible via https://radar.cloudflare.com/as{ASN}. For example, one can navigate to https://radar.cloudflare.com/as174 to view the overview page for Cogent AS174. ASN pages now show a dedicated card for route leaks detected relevant to the current ASN within the selected time range.

Users can also start using our public data API to lookup route leak events with regards to any given ASN. Our API supports filtering route leak results by time ranges, and ASes involved. Here is a screenshot of the route leak events API documentation page on the newly updated API docs site.

More to come on routing security

There is a lot more we are planning to do with route-leak detection. More features like a global view page, route leak notifications, more advanced APIs, custom automations scripts, and historical archive datasets will begin to ship on Cloudflare Radar over time. Your feedback and suggestions are also very important for us to continue improving on our detection results and serve better data to the public.

Furthermore, we will continue to expand our work on other important topics of Internet routing security, including global BGP hijack detection (not limited to our customer networks), RPKI validation monitoring, open-sourcing tools and architecture designs, and centralized routing security web gateway. Our goal is to provide the best data and tools for routing security to the communities so that we can build a better and more secure Internet together.

In the meantime, we opened a Radar room on our Developers Discord Server. Feel free to join and talk to us; the team is eager to receive feedback and answer questions.

Visit Cloudflare Radar for more Internet insights. You can also follow us on Twitter for more Radar updates.

BGP security and confirmation biases

Maximilian Wilhelm — Wed, 23 Feb 2022 13:59:26 GMT

This is not what I imagined my first blog article would look like, but here we go.

On February 1, 2022, a configuration error on one of our routers caused a route leak of up to 2,000 Internet prefixes to one of our Internet transit providers. This leak lasted for 32 seconds and at a later time 7 seconds. We did not see any traffic spikes or drops in our network and did not see any customer impact because of this error, but this may have caused an impact to external parties, and we are sorry for the mistake.

Timeline

All timestamps are UTC.

As part of our efforts to build the best network, we regularly update our Internet transit and peering links throughout our network. On February 1, 2022, we had a “hot-cut” scheduled with one of our Internet transit providers to simultaneously update router configurations on Cloudflare and ISP routers to migrate one of our existing Internet transit links in Newark to a link with more capacity. Doing a “hot-cut” means that both parties will change cabling and configuration at the same time, usually while being on a conference call, to reduce downtime and impact on the network. The migration started off-peak at 10:45 (05:45 local time) with our network engineer entering the bridge call with our data center engineers and remote hands on site as well as operators from the ISP.

At 11:17, we connected the new fiber link and established the BGP sessions to the ISP successfully. We had BGP filters in place on our end to not accept and send any prefixes, so we could evaluate the connection and settings without any impact on our network and services.

As the connection between our router and the ISP — like most Internet connections — was realized over a fiber link, the first item to check are the “light levels” of that link. This shows the strength of the optical signal received by our router from the ISP router and can indicate a bad connection when it’s too low. Low light levels are likely caused by unclean fiber ends or not fully seated connectors, but may also indicate a defective optical transceiver which connects the fiber link to the router - all of which can degrade service quality.

The next item on the checklist is interface errors, which will occur when a network device receives incorrect or malformed network packets, which would also indicate a bad connection and would likely lead to a degradation in service quality, too.

As light levels were good, and we observed no errors on the link, we deemed it ready for production and removed the BGP reject filters at 11:22.

This immediately triggered the maximum prefix-limit protection the ISP had configured on the BGP session and shut down the session, preventing further impact. The maximum prefix-limit is a safeguard in BGP to prevent the spread of route leaks and to protect the Internet. The limit is usually set just a little higher than the expected number of Internet prefixes from a peer to leave some headroom for growth but also catch configuration errors fast. The configured value was just 40 prefixes short of the number of prefixes we were advertising at that site, so this was considered the reason for the session to be shut down. After checking back internally, we asked the ISP to raise the prefix-limit, which they did.

The BGP session was reestablished at 12:08 and immediately shut down again. The problem was identified and fixed at 12:14.

10:45: Start of scheduled maintenance

11:17: New link was connected and BGP sessions went up (filters still in place)

11:22: Link was deemed ready for production and filters removed

11:23: BGP sessions were torn down by ISP router due to configured prefix-limit

12:08: ISP configures higher prefix-limits, BGP sessions briefly come up again and are shut down

12:14: Issue identified and configuration updated

What happened and what we’re doing about it

The outage occurred while migrating one of our Internet transits to a link with more capacity. Once the new link and a BGP session had been established, and the link deemed error-free, our network engineering team followed the peer-reviewed deployment plan. The team removed the filters from the BGP sessions, which had been preventing the Cloudflare router from accepting and sending prefixes via BGP.

Due to an oversight in the deployment plan, which had been peer-reviewed before without noticing this issue, no BGP filters to only export prefixes of Cloudflare and our customers were added. A peer review on the internal chat did not notice this either, so the network engineer performing this change went ahead.

ewr02# show |compare                                     
[edit protocols bgp group 4-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;
[edit protocols bgp group 6-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;

The change resulted in our router sending all known prefixes to the ISP router, which shut down the session as the number of prefixes received exceeded the maximum prefix-limit configured.

As the configured values for the maximum prefix-limits turned out to be rather low for the number of prefixes on our network, this didn’t come as a surprise to our network engineering team and no investigation into why the BGP session went down was started. The prefix-limit being too low seemed to be a perfectly valid reason.

We asked the ISP to increase the prefix-limit, which they did after they received approval on their side. Once the prefix-limit had been increased and the previously shutdown BGP sessions reset, the sessions were reestablished but were shut down immediately as the maximum prefix-limit was triggered again. This is when our network engineer started questioning whether there was another issue at fault and found and corrected the configuration error previously overlooked.

We made the following change in response to this event: we introduced an implicit reject policy for BGP sessions which will take effect if no import/export policy is configured for a specific BGP neighbor or neighbor group. This change has been deployed.

BGP security & preventing route-leaks — what’s in the cards?

Route leaks aren’t new, and they keep happening. The industry has come up with many approaches to limit the impact or even prevent route-leaks. Policies and filters are used to control which prefixes should be exported to or imported from a given peer. RPKI can help to make sure only allowed prefixes are accepted from a peer and a maximum prefix-limit can act as a last line of defense when everything else fails.

BGP policies and filters are commonly used to ensure only explicitly allowed prefixes are sent out to BGP peers, usually only allowing prefixes owned by the entity operating the network and its customers. They can also be used to tweak some knobs (BGP local-pref, MED, AS path prepend, etc.) to influence routing decisions and balance traffic across links. This is what the policies we have in place for our peers and transits do. As explained above, the maximum prefix-limit is intended to tear down BGP sessions if more prefixes are being sent or received than to be expected. We have talked about RPKI before, it’s the required cryptographic upgrade to BGP routing, and we still are on our path to securing Internet Routing.

To improve the overall stability of the Internet even more, in 2017, a new Internet standard was proposed, which adds another layer of protection into the mix: RFC8212 defines Default External BGP (EBGP) Route Propagation Behavior without Policies which pretty much tackles the exact issues we were facing.

This RFC updates the BGP-4 standard (RFC4271) which defines how BGP works and what vendors are expected to implement. On the Juniper operating system, JunOS, this can be activated by setting defaults ebgp no-policy reject-always on the protocols bgp hierarchy level starting with Junos OS Release 20.3R1. The RFC8212 repository on GitHub provides a good overview on the current implementation status of RFC8212 in common network vendor OSes and routing daemons.

If you are running an older version of JunOS, a similar effect can be achieved by defining a REJECT-ALL policy and setting this as import/export policy on the protocols bgp hierarchy level. Note that this will also affect iBGP sessions, which the solution above will have no impact on.

policy-statement REJECT-ALL {
  then reject;
}

protocol bgp {
  import REJECT-ALL;
  export REJECT-ALL;
}

Conclusion

We are sorry for leaking routes of prefixes which did not belong to Cloudflare or our customers and to network engineers who got paged as a result of this.

We have processes in place to make sure that changes to our infrastructure are reviewed before being executed, so potential issues can be spotted before they reach production. In this case, the review process failed to catch this configuration error. In response, we will increase our efforts to further our network automation, to fully derive the device configuration from an intended state.

While this configuration error was caused by human error, it could have been detected and mitigated significantly faster if the confirmation bias did not kick in, making the operator think the observed behavior was to be expected. This error underlines the importance of our existing efforts on training our people to be aware of biases we have in our life. This also serves as a great example on how confirmation bias can influence and impact our work and that we should question our conclusions (early).

It also shows how important protocols like RPKI are. Route leaks are something even experienced network operators can cause accidentally, and technical solutions are needed to reduce the impact of leaks whether they are intentional or the result of an error.

What happened on the Internet during the Facebook outage

Celso Martinho — Fri, 08 Oct 2021 15:16:00 GMT

It's been a few days now since Facebook, Instagram, and WhatsApp went AWOL and experienced one of the most extended and rough downtime periods in their existence.

When that happened, we reported our bird's-eye view of the event and posted the blog Understanding How Facebook Disappeared from the Internet where we tried to explain what we saw and how DNS and BGP, two of the technologies at the center of the outage, played a role in the event.

In the meantime, more information has surfaced, and Facebook has published a blog post giving more details of what happened internally.

As we said before, these events are a gentle reminder that the Internet is a vast network of networks, and we, as industry players and end-users, are part of it and should work together.

In the aftermath of an event of this size, we don't waste much time debating how peers handled the situation. We do, however, ask ourselves the more important questions: "How did this affect us?" and "What if this had happened to us?" Asking and answering these questions whenever something like this happens is a great and healthy exercise that helps us improve our own resilience.

Today, we're going to show you how the Facebook and affiliate sites downtime affected us, and what we can see in our data.

1.1.1.1

1.1.1.1 is a fast and privacy-centric public DNS resolver operated by Cloudflare, used by millions of users, browsers, and devices worldwide. Let's look at our telemetry and see what we find.

First, the obvious. If we look at the response rate, there was a massive spike in the number of SERVFAIL codes. SERVFAILs can happen for several reasons; we have an excellent blog called Unwrap the SERVFAIL that you should read if you're curious.

In this case, we started serving SERVFAIL responses to all facebook.com and whatsapp.com DNS queries because our resolver couldn't access the upstream Facebook authoritative servers. About 60x times more than the average on a typical day.

If we look at all the queries, not specific to Facebook or WhatsApp domains, and we split them by IPv4 and IPv6 clients, we can see that our load increased too.

As explained before, this is due to a snowball effect associated with applications and users retrying after the errors and generating even more traffic. In this case, 1.1.1.1 had to handle more than the expected rate for A and AAAA queries.

Here's another fun one.

DNS vs. DoT and DoH. Typically, DNS queries and responses are sent in plaintext over UDP (or TCP sometimes), and that's been the case for decades now. Naturally, this poses security and privacy risks to end-users as it allows in-transit attacks or traffic snooping.

With DNS over TLS (DoT) and DNS over HTTPS, clients can talk DNS using well-known, well-supported encryption and authentication protocols.

Our learning center has a good article on "DNS over TLS vs. DNS over HTTPS" that you can read. Browsers like Chrome, Firefox, and Edge have supported DoH for some time now, WAP uses DoH too, and you can even configure your operating system to use the new protocols.

When Facebook went offline, we saw the number of DoT+DoH SERVFAILs responses grow by over x300 vs. the average rate.

So, we got hammered with lots of requests and errors, causing traffic spikes to our 1.1.1.1 resolver and causing an unexpected load in the edge network and systems. How did we perform during this stressful period?

Quite well. 1.1.1.1 kept its cool and continued serving the vast majority of requests around the famous 10ms mark. An insignificant fraction of p95 and p99 percentiles saw increased response times, probably due to timeouts trying to reach Facebook’s nameservers.

Another interesting perspective is the distribution of the ratio between SERVFAIL and good DNS answers, by country. In theory, the higher this ratio is, the more the country uses Facebook. Here's the map with the countries that suffered the most:

Here’s the top twelve country list, ordered by those that apparently use Facebook, WhatsApp and Instagram the most:

Country	SERVFAIL/Good Answers ratio
Turkey	7.34
Grenada	4.84
Congo	4.44
Lesotho	3.94
Nicaragua	3.57
South Sudan	3.47
Syrian Arab Republic	3.41
Serbia	3.25
Turkmenistan	3.23
United Arab Emirates	3.17
Togo	3.14
French Guiana	3.00

Impact on other sites

When Facebook, Instagram, and WhatsApp aren't around, the world turns to other places to look for information on what's going on, other forms of entertainment or other applications to communicate with their friends and family. Our data shows us those shifts. While Facebook was going down, other services and platforms were going up.

To get an idea of the changing traffic patterns we look at DNS queries as an indicator of increased traffic to specific sites or types of site.

Here are a few examples.

Other social media platforms saw a slight increase in use, compared to normal.

Traffic to messaging platforms like Telegram, Signal, Discord and Slack got a little push too.

Nothing like a little gaming time when Instagram is down, we guess, when looking at traffic to sites like Steam, Xbox, Minecraft and others.

And yes, people want to know what’s going on and fall back on news sites like CNN, New York Times, The Guardian, Wall Street Journal, Washington Post, Huffington Post, BBC, and others:

Attacks

One could speculate that the Internet was under attack from malicious hackers. Our Firewall doesn't agree; nothing out of the ordinary stands out.

Network Error Logs

Network Error Logging, NEL for short, is an experimental technology supported in Chrome. A website can issue a Report-To header and ask the browser to send reports about network problems, like bad requests or DNS issues, to a specific endpoint.

Cloudflare uses NEL data to quickly help triage end-user connectivity issues when end-users reach our network. You can learn more about this feature in our help center.

If Facebook is down and their DNS isn't responding, Chrome will start reporting NEL events every time one of the pages in our zones fails to load Facebook comments, posts, ads, or authentication buttons. This chart shows it clearly.

WARP

Cloudflare announced WARP in 2019, and called it "A VPN for People Who Don't Know What V.P.N. Stands For" and offered it for free to its customers. Today WARP is used by millions of people worldwide to securely and privately access the Internet on their desktop and mobile devices. Here's what we saw during the outage by looking at traffic volume between WARP and Facebook’s network:

You can see how the steep drop on Facebook ASN traffic coincides with the start of the incident and how it compares to the same period the day before.

Our own traffic

People tend to think of Facebook as a place to visit. We log in, and we access Facebook, we post. It turns out that Facebook likes to visit us too, quite a lot. Like Google and other platforms, Facebook uses an army of crawlers to constantly check websites for data and updates. Those robots gather information about websites content, such as its titles, descriptions, thumbnail images, and metadata. You can learn more about this on the "The Facebook Crawler" page and the Open Graph website.

Here's what we see when traffic is coming from the Facebook ASN, supposedly from crawlers, to our CDN sites:

The robots went silent.

What about the traffic coming to our CDN sites from Facebook User-Agents? The gap is indisputable.

We see about 30% of a typical request rate hitting us. But it's not zero; why is that?

We'll let you know a little secret. Never trust User-Agent information; it's broken. User-Agent spoofing is everywhere. Browsers, apps, and other clients deliberately change the User-Agent string when they fetch pages from the Internet to hide, obtain access to certain features, or bypass paywalls (because pay-walled sites want sites like Facebook to index their content, so that then they get more traffic from links).

Fortunately, there are newer, and privacy-centric standards emerging like User-Agent Client Hints.

Core Web Vitals

Core Web Vitals are the subset of Web Vitals, an initiative by Google to provide a unified interface to measure real-world quality signals when a user visits a web page. Such signals include Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS).

We use Core Web Vitals with our privacy-centric Web Analytics product and collect anonymized data on how end-users experience the websites that enable this feature.

One of the metrics we can calculate using these signals is the page load time. Our theory is that if a page includes scripts coming from external sites (for example, Facebook "like" buttons, comments, ads), and they are unreachable, its total load time gets affected.

We used a list of about 400 domains that we know embed Facebook scripts in their pages and looked at the data.

Now let's look at the Largest Contentful Paint. LCP marks the point in the page load timeline when the page's main content has likely loaded. The faster the LCP is, the better the end-user experience.

Again, the page load experience got visibly degraded.

The outcome seems clear. The sites that use Facebook scripts in their pages took 1.5x more time to load their pages during the outage, with some of them taking more than 2x the usual time. Facebook's outage dragged the performance of some other sites down.

Conclusion

When Facebook, Instagram, and WhatsApp went down, the Web felt it. Some websites got slower or lost traffic, other services and platforms got unexpected load, and people lost the ability to communicate or do business normally.

Understanding how Facebook disappeared from the Internet

Celso Martinho — Mon, 04 Oct 2021 21:08:52 GMT

The Internet - A Network of Networks

“Facebook can't be down, can it?”, we thought, for a second.

Today at 15:51 UTC, we opened an internal incident entitled "Facebook DNS lookup returning SERVFAIL" because we were worried that something was wrong with our DNS resolver 1.1.1.1. But as we were about to post on our public status page we realized something else more serious was going on.

Social media quickly burst into flames, reporting what our engineers rapidly confirmed too. Facebook and its affiliated services WhatsApp and Instagram were, in fact, all down. Their DNS names stopped resolving, and their infrastructure IPs were unreachable. It was as if someone had "pulled the cables" from their data centers all at once and disconnected them from the Internet.

This wasn't a DNS issue itself, but failing DNS was the first symptom we'd seen of a larger Facebook outage.

How's that even possible?

Update from Facebook

Facebook has now published a blog post giving some details of what happened internally. Externally, we saw the BGP and DNS problems outlined in this post but the problem actually began with a configuration change that affected the entire internal backbone. That cascaded into Facebook and other properties disappearing and staff internal to Facebook having difficulty getting service going again.

Facebook posted a further blog post with a lot more detail about what happened. You can read that post for the inside view and this post for the outside view.

Now on to what we saw from the outside.

Meet BGP

BGP stands for Border Gateway Protocol. It's a mechanism to exchange routing information between autonomous systems (AS) on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver every network packet to their final destinations. Without BGP, the Internet routers wouldn't know what to do, and the Internet wouldn't work.

The Internet is literally a network of networks, and it’s bound together by BGP. BGP allows one network (say Facebook) to advertise its presence to other networks that form the Internet. As we write Facebook is not advertising its presence, ISPs and other networks can’t find Facebook’s network and so it is unavailable.

The individual networks each have an ASN: an Autonomous System Number. An Autonomous System (AS) is an individual network with a unified internal routing policy. An AS can originate prefixes (say that they control a group of IP addresses), as well as transit prefixes (say they know how to reach specific groups of IP addresses).

Cloudflare's ASN is AS13335. Every ASN needs to announce its prefix routes to the Internet using BGP; otherwise, no one will know how to connect and where to find us.

Our learning center has a good overview of what BGP and ASNs are and how they work.

In this simplified diagram, you can see six autonomous systems on the Internet and two possible routes that one packet can use to go from Start to End. AS1 → AS2 → AS3 being the fastest, and AS1 → AS6 → AS5 → AS4 → AS3 being the slowest, but that can be used if the first fails.

At 15:58 UTC we noticed that Facebook had stopped announcing the routes to their DNS prefixes. That meant that, at least, Facebook’s DNS servers were unavailable. Because of this Cloudflare’s 1.1.1.1 DNS resolver could no longer respond to queries asking for the IP address of facebook.com.

route-views>show ip bgp 185.89.218.0/23
% Network not in table
route-views>

route-views>show ip bgp 129.134.30.0/23
% Network not in table
route-views>

Meanwhile, other Facebook IP addresses remained routed but weren’t particularly useful since without DNS Facebook and related services were effectively unavailable:

route-views>show ip bgp 129.134.30.0   
BGP routing table entry for 129.134.0.0/17, version 1025798334
Paths: (24 available, best #14, table default)
  Not advertised to any peer
  Refresh Epoch 2
  3303 6453 32934
    217.192.89.50 from 217.192.89.50 (138.187.128.158)
      Origin IGP, localpref 100, valid, external
      Community: 3303:1004 3303:1006 3303:3075 6453:3000 6453:3400 6453:3402
      path 7FE1408ED9C8 RPKI State not found
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
route-views>

We keep track of all the BGP updates and announcements we see in our global network. At our scale, the data we collect gives us a view of how the Internet is connected and where the traffic is meant to flow from and to everywhere on the planet.

A BGP UPDATE message informs a router of any changes you’ve made to a prefix advertisement or entirely withdraws the prefix. We can clearly see this in the number of updates we received from Facebook when checking our time-series BGP database. Normally this chart is fairly quiet: Facebook doesn’t make a lot of changes to its network minute to minute.

But at around 15:40 UTC we saw a peak of routing changes from Facebook. That’s when the trouble began.

If we split this view by routes announcements and withdrawals, we get an even better idea of what happened. Routes were withdrawn, Facebook’s DNS servers went offline, and one minute after the problem occurred, Cloudflare engineers were in a room wondering why 1.1.1.1 couldn’t resolve facebook.com and worrying that it was somehow a fault with our systems.

With those withdrawals, Facebook and its sites had effectively disconnected themselves from the Internet.

DNS gets affected

As a direct consequence of this, DNS resolvers all over the world stopped resolving their domain names.

➜  ~ dig @1.1.1.1 facebook.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;facebook.com.			IN	A
➜  ~ dig @1.1.1.1 whatsapp.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;whatsapp.com.			IN	A
➜  ~ dig @8.8.8.8 facebook.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;facebook.com.			IN	A
➜  ~ dig @8.8.8.8 whatsapp.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322
;whatsapp.com.			IN	A

This happens because DNS, like many other systems on the Internet, also has its routing mechanism. When someone types the https://facebook.com URL in the browser, the DNS resolver, responsible for translating domain names into actual IP addresses to connect to, first checks if it has something in its cache and uses it. If not, it tries to grab the answer from the domain nameservers, typically hosted by the entity that owns it.

If the nameservers are unreachable or fail to respond because of some other reason, then a SERVFAIL is returned, and the browser issues an error to the user.

Again, our learning center provides a good explanation on how DNS works.

Due to Facebook stopping announcing their DNS prefix routes through BGP, our and everyone else's DNS resolvers had no way to connect to their nameservers. Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses.

But that's not all. Now human behavior and application logic kicks in and causes another exponential effect. A tsunami of additional DNS traffic follows.

This happened in part because apps won't accept an error for an answer and start retrying, sometimes aggressively, and in part because end-users also won't take an error for an answer and start reloading the pages, or killing and relaunching their apps, sometimes also aggressively.

This is the traffic increase (in number of requests) that we saw on 1.1.1.1:

So now, because Facebook and their sites are so big, we have DNS resolvers worldwide handling 30x more queries than usual and potentially causing latency and timeout issues to other platforms.

Fortunately, 1.1.1.1 was built to be Free, Private, Fast (as the independent DNS monitor DNSPerf can attest), and scalable, and we were able to keep servicing our users with minimal impact.

The vast majority of our DNS requests kept resolving in under 10ms. At the same time, a minimal fraction of p95 and p99 percentiles saw increased response times, probably due to expired TTLs having to resort to the Facebook nameservers and timeout. The 10 seconds DNS timeout limit is well known amongst engineers.

Impacting other services

People look for alternatives and want to know more or discuss what’s going on. When Facebook became unreachable, we started seeing increased DNS queries to Twitter, Signal and other messaging and social media platforms.

We can also see another side effect of this unreachability in our WARP traffic to and from Facebook's affected ASN 32934. This chart shows how traffic changed from 15:45 UTC to 16:45 UTC compared with three hours before in each country. All over the world WARP traffic to and from Facebook’s network simply disappeared.

The Internet

Today's events are a gentle reminder that the Internet is a very complex and interdependent system of millions of systems and protocols working together. That trust, standardization, and cooperation between entities are at the center of making it work for almost five billion active users worldwide.

Update

At around 21:00 UTC we saw renewed BGP activity from Facebook's network which peaked at 21:17 UTC.

This chart shows the availability of the DNS name 'facebook.com' on Cloudflare's DNS resolver 1.1.1.1. It stopped being available at around 15:50 UTC and returned at 21:20 UTC.

Undoubtedly Facebook, WhatsApp and Instagram services will take further time to come online but as of 21:28 UTC Facebook appears to be reconnected to the global Internet and DNS working again.