Mingwei Zhang - The Cloudflare Blog

Enforcing the First AS in BGP AS_PATHs

Bryton Herdes — Wed, 03 Jun 2026 17:00:00 GMT

Some recent route hijacks reported by Spamhaus captured our attention. In many of these hijack attempts, an apparent bad actor took advantage of unused autonomous system numbers, or ASNs. Notably in these hijacks, the actor appears to be creating fake AS_PATHs toward destinations, misdirecting traffic down an unexpected path.

By creating forged AS_PATHs, the hijacker is attempting to lead traffic somewhere it isn’t normally meant to go while also trying to conceal their identity. A hijacker could strip enough information away from a network path that they could pretend to be the origin of a Border Gateway Protocol (BGP) prefix themselves. Attackers can use this hijacked route to intercept traffic and for other nefarious purposes.

There is a simple solution for these cases: basic verification that a BGP peer autonomous system (AS) always includes their network as the “First AS” in an advertised route. To get a sense of how well these safeguards are implemented, we stress-tested several major networks and researched their BGP implementations. Read on to see what we learned.

Examining route hijacks involving forged paths

The idea that an actor is creating fake AS_PATHs is supported when we take a closer look at implausible AS relationships in the path. For example, let’s examine one of the hijacks reported by Spamhaus, involving a prefix belonging to Orange S.A., the French telecom company. Using the monocle tool, we can easily find a BGP UPDATE message related to the hijack:

We know AS1299 (Arelion) is a Tier 1 network, meaning every AS on the right-hand side in the path is describing an upstream (customer-to-provider) relationship. This implies that AS17072 is a transit provider for AS41128, AS270118 for AS17072, and AS199524 for AS270118. If we take a closer look at these networks:

AS41128 is an unused ASN belonging to Orange France
AS17072 is an ISP primarily based in Mexico
AS270118 is a hosting provider based in Mexico
AS199524 is Gcore, a provider with a global peering presence

The order of the ASes in the message above would suggest that an unused Orange France AS is buying transit from Mexican ISPs, which is then upstreamed to Gcore and Tier 1 providers – which would be quite odd.

In another instance, a reported hijack for prefixes 47.1.0.0/16 and 47.2.0.0/16 from origin AS36429 even included Cloudflare’s main ASN, 13335, in the AS_PATH, “199524 270118 17072 13335 36429”. We can view examples of these BGP UPDATEs in the MRT Explorer from Cloudflare Radar:

We can authoritatively confirm that we (Cloudflare, AS13335) have no adjacency with the now-unused AS36429 owned by Charter. This means this was a forged path by the hijacker that included Cloudflare’s ASN as one of the fake upstream networks in advertisements propagated toward Gcore (AS199524). Further, Spamhaus correctly pointed out that all the hijack routes led to a network behind Gcore peering in Chicago, never actually traversing the Mexican ISPs or Cloudflare’s network in the forwarding path.

Because of this, we can reasonably conclude these paths are forged up until the leftmost common AS, which in this case is AS199524, as the rest of the path seems implausible. We believe what is happening here is the result of a specific strategy by the hijacker, involving the following steps:

Originate BGP announcements for “parked” prefixes
Forge the AS_PATH completely, without including the hijacker’s own local ASN
Advertise these routes to Gcore, AS199524

In these hijacks it appears Gcore (AS199524) skips the verification and enforcement of the First AS matching the expected customer’s ASN. (We’ll look at why it might skip those steps later in this post.) As a result, the forged path is accepted and the hijacked prefixes are propagated to upstream providers and peers.

While Autonomous System Provider Authorization (ASPA) will help invalidate these forged paths, attackers may bypass it by only including an RPKI-ROV-valid origin AS, or a legitimate ASPA upstream AS. To stop these specific hijacks, we must rely on a different protection mechanism already built into BGP: First AS checking and enforcement.

The importance of First AS checking

Routing traffic across the Internet is a bit like shipping a package. When the package is shipped, a log is kept of every courier that handles it. In BGP, this is called the AS_PATH (Autonomous System Path) and it tracks each network in the path of that route.

The AS_PATH attribute in BGP is used for path selection. This selection algorithm determines which route to a destination traverses the best list of hops, where “best” is defined by multiple variables. It is also used for loop prevention, where networks can decide not to accept paths that have already traversed their own network. Aside from keeping a record of the networks a BGP UPDATE, and therefore route, will traverse, the AS_PATH can also be examined by operator-configured routing policies to route around or purposely through a given AS – for example to avoid BGP anomalies having unexpected impact.

BGP was built on trust, and the AS_PATH can be easily manipulated – whether for seemingly legitimate reasons such as AS prepending to move traffic around, or nefarious reasons such as shortening it to artificially attract traffic or perform origin attacks.

Let’s look at how these two types of malicious BGP manipulations are carried out.

Example 1: Forged origin attacks

AS64506 cryptographically signs their routes with an RPKI ROA (Route Origin Authorization) record, to prevent route origin hijacks.
AS64506 also creates an ASPA object, specifying only AS64503 as a valid provider
AS64505 manipulates their AS_PATH to strip AS64505 and originate with AS64506
AS64502 does not enforce the First AS

The route appears RPKI-ROV valid and is the shortest path, effectively hijacking traffic with the route. AS64506 has done everything correctly by specifying a valid ROA for a prefix advertisement, and has even configured an ASPA object consisting of their sole provider AS64503.

Unfortunately, the hijacker running AS64505 is still able to attract traffic meant for AS64506. Even if AS64501, the customer, and AS64502, their provider, run ASPA validation, they will not find an invalid path, because there is no valley in the path “64502 64506”. In other words, AS64505 by way of not even including their own ASN in the AS_PATH is able to pretend they are AS64506 with no intermediate AS hop.

The correct way of preventing this hijack with existing tools is to enforce the First AS in the AS_PATH. Once enforcing this rule, AS64502 would properly drop the route from AS64505.

Example 2: Shortening the AS_PATH to attract traffic

AS64506 has two transit providers: AS64503 and AS64505.
AS64505 bills their customer AS64506 based on traffic usage ratios.
AS64505 strips itself from the path, and their peer AS64504 does not enforce the First AS.

The BGP path selection algorithm now chooses the route via AS64504 as the best path from AS64501. AS64506 pays both of their providers, AS64503 and AS64505, to deliver traffic from the Internet. However, now AS64505 provides a shorter BGP path from far-end sources, meaning AS64505 will process all the traffic toward AS64506 and be paid for doing so, and AS64503 will not be paid at all.

These BGP vulnerabilities can be solved very simply by enforcing the First AS to match the peer AS in a received AS_PATH.

When an operator configures a BGP neighbor, they must set the remote AS of the network they are interconnecting with. If the First AS in the AS_PATH does not match this value, then the path has been manipulated. The First AS enforcement procedure is outlined in Section 6.3 of RFC 4271 very clearly as:

“If the UPDATE message is received from an external peer, the local

system MAY check whether the leftmost (with respect to the position

of octets in the protocol message) AS in the AS_PATH attribute is

equal to the autonomous system number of the peer that sent the

message. If the check determines this is not the case, the Error

Subcode MUST be set to Malformed AS_PATH.”

RFC 7606 later revises how error-handling should be implemented by vendors, suggesting that routes containing malformed AS_PATHs should be dropped via treat-as-withdraw method. This allows routers to drop specific prefixes with malformed attributes without disrupting the entire BGP session.

The current ASPA draft clearly calls out the importance of First AS enforcement, stating that ASPA cannot handle paths where sufficient AS_PATH information is lacking due to malformed announcements. Enforcing First AS in AS_PATHs is a must for Internet routing security.

Measurement by breaking the First AS rule on purpose

Instead of sticking to theoretical failure cases and past public incidents about violations of the First AS rule, we wanted to measure for ourselves how widely these AS_PATH violations could be accepted on the Internet. To do so, we set up BGP announcements to neighbors where we purposely violated the rule ourselves. Here is what we did:

Allocated two IP prefixes, one for IPv4 and one for IPv6, to advertise to Tier 1 External BGP (EBGP) neighbors
Purposely prepended the test prefix advertisements to Tier 1 neighbors with a Cloudflare-owned, non-13335 ASN (AS402542) in front of 13335

For example, we advertised the prefixes to AS1299 from our normal BGP session in Geneva. Our local AS is AS13335, but we include AS402542 clearly as the First AS in the AS_PATH.

With this configuration, our expectation is that:

Networks that do enforce-first-as will quietly drop the route via RFC 7606 withdrawal method
Networks that do not enforce-first-as will accept the route and install it for forwarding toward our test prefixes

Either result will be visible in BGP public route views. It was initially our goal to implement a continuous announcement of prefixes toward all peers that would purposely violate the First AS rule in announcements, and give everyone a tool to check which ISPs validate First AS and those which do not. However, we found there are still networks that have not implemented the guidance published in RFC 7606 when receiving malformed BGP AS_PATHs, and would reset BGP sessions instead of a treat-as-withdraw behavior. This meant we could not safely implement a continuous set of announcements that violate the First AS rule without impacting real traffic to Cloudflare, which we obviously can’t do.

But we can take a closer look at the networks whose policies make the biggest impact: Tier 1 networks. These networks make up the backbone of the Internet and have the largest AS customer cones of anyone, meaning hijacks or malformed paths by these peers have the broadest significance. Let’s start by examining the normal propagation of an anycast prefix, 1.1.1.0/24, across the Tier 1 networks.

The propagation of 1.1.1.0/24 looks how you would expect – it is directly reachable by every Tier 1 network that Cloudflare has a direct adjacency with currently.

Now, let’s compare that with our purposely malformed announcement of the prefix 162.159.82.0/24:

^{Note: AS5511 (Orange S.A.) is not pictured above due to its limited presence in public route views, but it was a part of our testing and measurements.}

The prefix is propagated very differently from 1.1.1.0/24 – far fewer Tier 1 networks are accepting the announcement directly from Cloudflare (in this case from AS13335 with AS402542 prepended). Based on the criteria of our test mentioned earlier, these are the results we found.

Tier 1 networks that are enforcing First AS rule (by dropping the invalid announcements):

AS174 (Cogent)
AS1299 (Arelion)
AS3257 (GTT)
AS3491 (PCCW)
AS5511 (Orange S.A.)
AS6453 (Tata)
AS7018 (AT&T)

Tier 1 networks that are not enforcing the First AS rule (by accepting and installing the prefixes):

AS701 (Verizon)
AS2914 (NTT)
AS3356 (Lumen/Colt/Cirion)
AS6461 (Zayo)
AS6762 (Sparkle)
AS6830 (Liberty Global)
AS12956 (Telefonica)

With our testing, we uncovered a troubling reality: Half of the Tier 1 networks are vulnerable to hijacks that violate the First AS rule.

While we only tested Tier 1 networks in this measurement study, there’s no doubt there are many non-Tier 1 networks that also break the First AS rule.

We noted that the majority of the Tier 1 networks failing the First AS violation test are running Juniper Networks routers, identified by the peers’ MAC addresses.

This highlights that the default behavior of vendors defines how secure a network is “out of the box” against First AS violation-based attacks. Let’s go over some of the BGP implementations and their defaults to have a better understanding of who is protected by default, and who isn’t.

BGP implementations and default behaviors

The chart below lists major routing/networking vendors and their BGP policies. Here, “Yes” means the BGP implementation by default enforces First AS, which is good. “No” means the BGP implementation is vulnerable by default.

The lack of default enforcement from some vendors may stem from the only valid use case where the First AS should not be enforced on External BGP (EBGP) sessions: Internet Exchange (IX) route servers.

A route server is responsible for transparently (without appending its AS to the AS_PATH) distributing routes between peers on the fabric. This ensures peers do not have to configure new BGP sessions every time a network joins the fabric – instead they can peer with just the route server.

In reality, most production networks have far more sessions with neighbors who are not transparent IX route servers than neighbors who are. It makes much more sense to configure “no enforce-first-as” on a handful of route-server sessions than to manually enable “enforce-first-as” on every single peer in your network.

While a “safe by default” approach is best for protecting against First AS violations, it is generally a steep hill to climb trying to convince vendors to change longstanding defaults. Vendors would also need to introduce a method of doing this gracefully, so as to not impact the IX route server BGP sessions that require “no enforce-first-as” settings to successfully receive routes.

Safer Internet routing with your help: enforce the First AS

Attackers will purposely malform AS_PATHs to slide around BGP security mechanisms. Even RPKI-based ASPA path validation will not be able to protect us from forged-origin hijacks where the path has been totally stripped of everything but the origin AS, leaving nothing for ASPA to invalidate.

The good news is we already have a mitigation for these cases: we can verify the First AS matches BGP peer AS and always enforce it. Refer to the corresponding “Documentation” column in the above table we have provided. It should be safe to enforce First AS on any External BGP (EBGP) session besides those facing an IX route server neighbor.

If you are a network operator, please enforce First AS on your routers today to protect your network and the wider Internet.

If your router vendor or choice of BGP implementation has a default of enforcing First AS, you’re already safe and should be rejecting any First AS violations.

By working together, we can make the Internet safer from these kinds of hijacks.

ASPA: making Internet routing more secure

Mingwei Zhang — Fri, 27 Feb 2026 06:00:00 GMT

Internet traffic relies on the Border Gateway Protocol (BGP) to find its way between networks. However, this traffic can sometimes be misdirected due to configuration errors or malicious actions. When traffic is routed through networks it was not intended to pass through, it is known as a route leak. We have written on our blog multiple times about BGP route leaks and the impact they have on Internet routing, and a few times we have even alluded to a future of path verification in BGP.

While the network community has made significant progress in verifying the final destination of Internet traffic, securing the actual path it takes to get there remains a key challenge for maintaining a reliable Internet. To address this, the industry is adopting a new cryptographic standard called ASPA (Autonomous System Provider Authorization), which is designed to validate the entire path of network traffic and prevent route leaks.

To help the community track the rollout of this standard, Cloudflare Radar has introduced a new ASPA deployment monitoring feature. This view allows users to observe ASPA adoption trends over time across the five Regional Internet Registries (RIRs), and view ASPA records and changes over time at the Autonomous System (AS) level.

What is ASPA?

To understand how ASPA works, it is helpful to look at how the Internet currently secures traffic destinations.

Today, networks use a secure infrastructure system called RPKI (Resource Public Key Infrastructure), which has seen significant deployment growth over the past few years. Within RPKI, networks publish specific cryptographic records called ROAs (Route Origin Authorizations). A ROA acts as a verifiable digital ID card, confirming that an Autonomous System (AS) is officially authorized to announce specific IP addresses. This addresses the "origin hijacks" issue, where one network attempts to impersonate another.

ASPA (Autonomous System Provider Authorization) builds directly on this foundation. While a ROA verifies the destination, an ASPA record verifies the journey.

When data travels across the Internet, it keeps a running log of every network it passes through. In BGP, this log is known as the AS_PATH (Autonomous System Path). ASPA provides networks with a way to officially publish a list of their authorized upstream providers within the RPKI system. This allows any receiving network to look at the AS_PATH, check the associated ASPA records, and verify that the traffic only traveled through an approved chain of networks.

A ROA helps ensure the traffic arrives at the correct destination, ASPA ensures the traffic takes an intended, authorized route to get there. Let’s take a look at how path evaluation actually works in practice.

Route leak detection with ASPA

How does ASPA know if a route is a detour? It relies on the hierarchy of the Internet.

In a healthy Internet routing topology (e.g. “valley-free” routing), traffic generally follows a specific path: it travels "up" from a customer to a large provider (like a major ISP), optionally crosses over to another big provider, and then flows "down" to the destination. You can visualize this as a “mountain” shape:

The Up-Ramp: Traffic starts at a Customer and travels "up" through larger and larger Providers (ISPs), where ISPs pay other ISPs to transit traffic for them.
The Apex: It reaches the top tier of the Internet backbone and may cross a single peering link.

The Down-Ramp: It travels "down" through providers to reach the destination Customer.

^{A visualization of "valley-free" routing. Routes propagate up to a provider, optionally across one peering link, and down to a customer.}

In this model, a route leak is like a valley, or dip. One type of such leak happens when traffic goes down to a customer and then unexpectedly tries to go back up to another provider.

This "down-and-up" movement is undesirable as customers aren't intended nor equipped to transit traffic between two larger network providers.

How ASPA validation works

ASPA gives network operators a cryptographic way to declare their authorized providers, enabling receiving networks to verify that an AS path follows this expected structure.

ASPA validates AS paths by checking the “chain of relationships” from both ends of the routes propagation:

Checking the Up-Ramp: The check starts at the origin and moves forward. At every hop, it asks: "Did this network authorize the next network as a Provider?" It keeps going until the chain stops.
Checking the Down-Ramp: It does the same thing from the destination of a BGP update, moving backward.

If the "Up" path and the "Down" path overlap or meet at the top, the route is Valid. The mountain shape is intact.

However, if the two valid paths do not meet, i.e. there is a gap in the middle where authorization is missing or invalid, ASPA reports such paths as problematic. That gap represents the "valley" or the leak.

Validation process example

Let’s look at a scenario where a network (AS65539) receives a bad route from a customer (AS65538).

The customer (AS65538) is trying to send traffic received from one provider (AS65537) "up" to another provider (AS65539), acting like a bridge between providers. This is a classic route leak. Now let’s walk the ASPA validation process.

We check the Up-Ramp: The original source (AS65536) authorizes its provider. (Check passes).
We check the Down-Ramp: We start from the destination and look back. We see the customer (AS65538).
The Mismatch: The up-ramp ends at AS65537, while the down-ramp ends at 65538. The two ramps do not connect.

Because the "Up" path and "Down" path fail to connect, the system flags this as ASPA Invalid. ASPA is required to do this path validation, as without signed ASPA objects in RPKI, we cannot find which networks are authorized to advertise which prefixes to whom. By signing a list of provider networks for each AS, we know which networks should be able to propagate prefixes laterally or upstream.

ASPA against forged-origin hijacks

ASPA can serve as an effective defense against forged-origin hijacks, where an attacker bypasses Route Origin Validation (ROV) by pretending and advertising a BGP path to a real origin prefix. Although the origin AS remains correct, the relationship between the hijacker and the victim is fabricated.

ASPA exposes this deception by allowing the victim network to cryptographically declare its actual authorized providers; because the hijacker is not on that authorized list, the path is rejected as invalid, effectively preventing the malicious redirection.

ASPA cannot fully protect against forged-origin hijacks, however. There is still at least one case where not even ASPA validation can fully prevent this type of attack on a network. An example of a forged-origin hijack that ASPA cannot account for is when a provider forges a path advertisement to their customer.

Essentially, a provider could “fake” a peering link with another AS to attract traffic from a customer with a short AS_PATH length, even when no such peering link exists. ASPA does not prevent this path forgery by the provider, because ASPA only works off of provider information and knows nothing specific about peering relationships.

So while ASPA can be an effective means of rejecting forged-origin hijack routes, there are still some rare cases where it will be ineffective, and those are worth noting.

Creating ASPA objects: just a few clicks away

Creating an ASPA object for your network (or Autonomous System) is now a simple process in registries like RIPE and ARIN. All you need is your AS number and the AS numbers of the providers you purchase Internet transit service from. These are the authorized upstream networks you trust to announce your IP addresses to the wider Internet. In the opposite direction, these are also the networks you authorize to send you a full routing table, which acts as the complete map of how to reach the rest of the Internet.

We’d like to show you just how easy creating an ASPA object is with a quick example.

Say we need to create the ASPA object for AS203898, an AS we use for our Cloudflare London office Internet. At the time of writing we have three Internet providers for the office: AS8220, AS2860, and AS1273. This means we will create an ASPA object for AS203898 with those three provider members in a list.

First, we log into the RIPE RPKI dashboard and navigate to the ASPA section:

Then, we click on “Create ASPA” for the object we want to create an ASPA object for. From there, we just fill in the providers for that AS.

It’s as simple as that. After just a short period of waiting, we can query the global RPKI ecosystem and find our ASPA object for AS203898 with the providers we defined.

It’s a similar story with ARIN, the only other Regional Internet Registries (RIRs) that currently supports the creation of ASPA objects. Log in to ARIN online, then navigate to Routing Security, and click “Manage RPKI”.

From there, you’ll be able to click on “Create ASPA”. In this example, we will create an object for another one of our ASNs, AS400095.

And that’s it – now we have created our ASPA object for AS40095 with provider AS0.

The “AS0” provider entry is special when used, and means the AS owner attests there are no valid upstream providers for their network. By definition this means every transit-free Tier-1 network should eventually sign an ASPA with only “AS0” in their object, if they truly only have peer and customer relationships.

New ASPA features in Cloudflare Radar

We have added a new ASPA deployment monitoring feature to Cloudflare Radar. The new ASPA deployment view allows users to examine the growth of ASPA adoption over time, with the ability to visualize trends across the five Regional Internet Registries (RIRs) based on AS registration.

We have also integrated ASPA data directly into the country/region and ASN routing pages. Users can now track how different locations are progressing in securing their infrastructure, based on the associated ASPA records from the customer ASNs registered locally.

There are also new features when you zoom into a particular Autonomous System (AS), for example AS203898.

We can see whether a network’s observed BGP upstream providers are ASPA authorized, their full list of providers in their ASPA object, and the timeline of ASPA changes that involve their AS.

The road to better routing security

With ASPA finally becoming a reality, we have our cryptographic upgrade for Internet path validation. However, those who have been around since the start of RPKI for route origin validation know this will be a long road to actually providing significant value on the Internet. Changes are needed to RPKI Relaying Party (RP) packages, signer implementations, RTR (RPKI-to-Router protocol) software, and BGP implementations to actually use ASPA objects and validate paths with them.

In addition to ASPA adoption, operators should also configure BGP roles as described within RFC9234. The BGP roles configured on BGP sessions will help future ASPA implementations on routers decide which algorithm to apply: upstream or downstream. In other words, BGP roles give us the power as operators to directly tie our intended BGP relationships with another AS to sessions with those neighbors. Check with your routing vendors and make sure they support RFC9234 BGP roles and OTC (Only-to-Customer) attribute implementation.

To get the most out of ASPA, we encourage everyone to create their ASPA objects for their AS. Creating and maintaining these ASPA objects requires careful attention. In the future, as networks use these records to actively block invalid paths, omitting a legitimate provider could cause traffic to be dropped. However, managing this risk is no different from how networks already handle Route Origin Authorizations (ROAs) today. ASPA is the necessary cryptographic upgrade for Internet path validation, and we’re happy it’s here!

Bringing more transparency to post-quantum usage, encrypted messaging, and routing security

David Belson — Fri, 27 Feb 2026 06:00:00 GMT

Cloudflare Radar already offers a wide array of security insights — from application and network layer attacks, to malicious email messages, to digital certificates and Internet routing.

And today we’re introducing even more. We are launching several new security-related data sets and tools on Radar:

We are extending our post-quantum (PQ) monitoring beyond the client side to now include origin-facing connections. We have also released a new tool to help you check any website's post-quantum encryption compatibility.
A new Key Transparency section on Radar provides a public dashboard showing the real-time verification status of Key Transparency Logs for end-to-end encrypted messaging services like WhatsApp, showing when each log was last signed and verified by Cloudflare's Auditor. The page serves as a transparent interface where anyone can monitor the integrity of public key distribution and access the API to independently validate our Auditor’s proofs.
Routing Security insights continue to expand with the addition of global, country, and network-level information about the deployment of ASPA, an emerging standard that can help detect and prevent BGP route leaks.

Measuring origin post-quantum support

Since April 2024, we have tracked the aggregate growth of client support for post-quantum encryption on Cloudflare Radar, chronicling its global growth from under 3% at the start of 2024, to over 60% in February 2026. And in October 2025, we added the ability for users to check whether their browser supports X25519MLKEM768 — a hybrid key exchange algorithm combining classical X25519 with ML-KEM, a lattice-based post-quantum scheme standardized by NIST. This provides security against both classical and quantum attacks.

However, post-quantum encryption support on user-to-Cloudflare connections is only part of the story.

For content not in our CDN cache, or for uncacheable content, Cloudflare’s edge servers establish a separate connection with a customer’s origin servers to retrieve it. To accelerate the transition to quantum-resistant security for these origin-facing fetches, we previously introduced an API allowing customers to opt in to preferring post-quantum connections. Today, we’re making post-quantum compatibility of origin servers visible on Radar.

The new origin post-quantum support graph on Radar illustrates the share of customer origins supporting X25519MLKEM768. This data is derived from our automated TLS scanner, which probes TLS 1.3-compatible origins and aggregates the results daily. It is important to note that our scanner tests for support rather than the origin server's specific preference. While an origin may support a post-quantum key exchange algorithm, its local TLS key exchange preference can ultimately dictate the encryption outcome.

While the headline graph focuses on post-quantum readiness, the scanner also evaluates support for classical key exchange algorithms. Within the Radar Data Explorer view, you can also see the full distribution of these supported TLS key exchange methods.

As shown in the graphs above, approximately 10% of origins could benefit from a post-quantum-preferred key agreement today. This represents a significant jump from less than 1% at the start of 2025 — a 10x increase in just over a year. We expect this number to grow steadily as the industry continues its migration. This upward trend likely accelerated in 2025 as many server-side TLS libraries, such as OpenSSL 3.5.0+, GnuTLS 3.8.9+, and Go 1.24+, enabled hybrid post-quantum key exchange by default, allowing platforms and services to support post-quantum connections simply by upgrading their cryptographic library dependencies.

In addition to the Radar and Data Explorer graphs, the origin readiness data is available through the Radar API as well.

As an additional part of our efforts to help the Internet transition to post-quantum cryptography, we are also launching a tool to test whether a specific hostname supports post-quantum encryption. These tests can be run against any publicly accessible website, as long as they allow connections from Cloudflare’s egress IP address ranges.

_{A screenshot of the tool in Radar to test whether a hostname supports post-quantum encryption.}

The tool presents a simple form where users can enter a hostname (such as cloudflare.com or www.wikipedia.org) and optionally specify a custom port (the default is 443, the standard HTTPS port). After clicking "Test", the result displays a tag indicating PQ support status alongside the negotiated TLS key exchange algorithm. If the server prefers PQ secure connections, a green "PQ" tag appears with a message confirming the connection is "post-quantum secure." Otherwise, a red tag indicates the connection is "not post-quantum secure", showing the classical algorithm that was negotiated.

Under the hood, this tool uses Cloudflare Containers — a new capability that allows running container workloads alongside Workers. Since the Workers runtime is not exposed to details of the underlying TLS handshake, Workers cannot initiate TLS scans. Therefore, we created a Go container that leverages the crypto/tls package's support for post-quantum compatibility checks. The container runs on-demand and performs the actual handshake to determine the negotiated TLS key exchange algorithm, returning results through the Radar API.

With the addition of these origin-facing insights, complementing the existing client-facing insights, we have moved all the post-quantum content to its own section on Radar.

Securing E2EE messaging systems with Key Transparency

End-to-end encrypted (E2EE) messaging apps like WhatsApp and Signal have become essential tools for private communication, relied upon by billions of people worldwide. These apps use public-key cryptography to ensure that only the sender and recipient can read the contents of their messages — not even the messaging service itself. However, there's an often-overlooked vulnerability in this model: users must trust that the messaging app is distributing the correct public keys for each contact.

If an attacker were able to substitute an incorrect public key in the messaging app's database, they could intercept messages intended for someone else — all without the sender knowing.

Key Transparency addresses this challenge by creating an auditable, append-only log of public keys — similar in concept to Certificate Transparency for TLS certificates. Messaging apps publish their users' public keys to a transparency log, and independent third parties can verify and vouch that the log has been constructed correctly and consistently over time. In September 2024, Cloudflare announced such a Key Transparency auditor for WhatsApp, providing an independent verification layer that helps ensure the integrity of public key distribution for the messaging app's billions of users.

Today, we're publishing Key Transparency audit data in a new Key Transparency section on Cloudflare Radar. This section showcases the Key Transparency logs that Cloudflare audits, giving researchers, security professionals, and curious users a window into the health and activity of these critical systems.

The new page launches with two monitored logs: WhatsApp and Facebook Messenger Transport. Each monitored log is displayed as a card containing the following information:

Status: Indicates whether the log is online, in initialization, or disabled. An "online" status means the log is actively publishing key updates into epochs that Cloudflare audits. (An epoch represents a set of updates applied to the key directory at a specific time.)
Last signed epoch: The most recent epoch that has been published by the messaging service's log and acknowledged by Cloudflare. By clicking on the eye icon, users can view the full epoch data in JSON format, including the epoch number, timestamp, cryptographic digest, and signature.
Last verified epoch: The most recent epoch that Cloudflare has verified. Verification involves checking that the transition of the transparency log data structure from the previous epoch to the current one represents a valid tree transformation — ensuring the log has been constructed correctly. The verification timestamp indicates when Cloudflare completed its audit.
Root: The current root hash of the Auditable Key Directory (AKD) tree. This hash cryptographically represents the entire state of the key directory at the current epoch. Like the epoch fields, users can click to view the complete JSON response from the auditor.

The data shown on the page is also available via the Key Transparency Auditor API, with endpoints for auditor information and namespaces.

If you would like to perform audit proof verification yourself, you can follow the instructions in our Auditing Key Transparency blog post. We hope that these use cases are the first of many that we publish in this Key Transparency section in Radar — if your company or organization is interested in auditing for your public key or related infrastructure, you can reach out to us here.

Tracking RPKI ASPA adoption

While the Border Gateway Protocol (BGP) is the backbone of Internet routing, it was designed without built-in mechanisms to verify the validity of the paths it propagates. This inherent trust has long left the global network vulnerable to route leaks and hijacks, where traffic is accidentally or maliciously detoured through unauthorized networks.

Although RPKI and Route Origin Authorizations (ROAs) have successfully hardened the origin of routes, they cannot verify the path traffic takes between networks. This is where ASPA (Autonomous System Provider Authorization) comes in. ASPA extends RPKI protection by allowing an Autonomous System (AS) to cryptographically sign a record listing the networks authorized to propagate its routes upstream. By validating these Customer-to-Provider relationships, ASPA allows systems to detect invalid path announcements with confidence and react accordingly.

While the specific IETF standard remains in draft, the operational community is moving fast. Support for creating ASPA objects has already landed in the portals of Regional Internet Registries (RIRs) like ARIN and RIPE NCC, and validation logic is available in major software routing stacks like OpenBGPD and BIRD.

To provide better visibility into the adoption of this emerging standard, we have added comprehensive RPKI ASPA support to the Routing section of Cloudflare Radar. Tracking these records globally allows us to understand how quickly the industry is moving toward better path validation.

Our new ASPA deployment view allows users to examine the growth of ASPA adoption over time, with the ability to visualize trends across the five Regional Internet Registries (RIRs) based on AS registration. You can view the entire history of ASPA entries, dating back to October 1, 2023, or zoom into specific date ranges to correlate spikes in adoption with industry events, such as the introduction of ASPA features on ARIN and RIPE NCC online dashboards.

Beyond aggregate trends, we have also introduced a granular, searchable explorer for real-time ASPA content. This table view allows you to inspect the current state of ASPA records, searchable by AS number, AS name, or by filtering for only providers or customer ASNs. This allows network operators to verify that their records are published correctly and to view other networks’ configurations.

We have also integrated ASPA data directly into the country/region routing pages. Users can now track how different locations are progressing in securing their infrastructure, based on the associated ASPA records from the customer ASNs registered locally.

On individual AS pages, we have updated the Connectivity section. Now, when viewing the connections of a network, you may see a visual indicator for "ASPA Verified Provider." This annotation confirms that an ASPA record exists authorizing that specific upstream connection, providing an immediate signal of routing hygiene and trust.

For ASes that have deployed ASPA, we now display a complete list of authorized provider ASNs along with their details. Beyond the current state, Radar also provides a detailed timeline of ASPA activity involving the AS. This history distinguishes between changes initiated by the AS itself ("As customer") and records created by others designating it as a provider ("As provider"), allowing users to immediately identify when specific routing authorizations were established or modified.

Visibility is an essential first step toward broader adoption of emerging routing security protocols like ASPA. By surfacing this data, we aim to help operators deploy protections and assist researchers in tracking the Internet's progress toward a more secure routing path. For those who need to integrate this data into their own workflows or perform deeper analysis, we are also exposing these metrics programmatically. Users can now access ASPA content snapshots, historical timeseries, and detailed changes data using the newly introduced endpoints in the Cloudflare Radar API.

As security evolves, so does our data

Internet security continues to evolve, with new approaches, protocols, and standards being developed to ensure that information, applications, and networks remain secure. The security data and insights available on Cloudflare Radar will continue to evolve as well. The new sections highlighted above serve to expand existing routing security, transparency, and post-quantum insights already available on Cloudflare Radar.

If you share any of these new charts and graphs on social media, be sure to tag us: @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky). If you have questions or comments, or suggestions for data that you’d like to see us add to Radar, you can reach out to us on social media, or contact us via email.

BGP zombies and excessive path hunting

Bryton Herdes — Fri, 31 Oct 2025 15:30:00 GMT

Here at Cloudflare, we’ve been celebrating Halloween with some zombie hunting of our own. The zombies we’d like to remove are those that disrupt the core framework responsible for how the Internet routes traffic: BGP (Border Gateway Protocol).

A BGP zombie is a silly name for a route that has become stuck in the Internet’s Default-Free Zone, aka the DFZ: the collection of all internet routers that do not require a default route, potentially due to a missed or lost prefix withdrawal.

The underlying root cause of a zombie could be multiple things, spanning from buggy software in routers or just general route processing slowness. It’s when a BGP prefix is meant to be gone from the Internet, but for one reason or another it becomes a member of the undead and hangs around for some period of time.

The longer these zombies linger, the more they create operational impact and become a real headache for network operators. Zombies can lead packets astray, either by trapping them inside of route loops or by causing them to take an excessively scenic route. Today, we’d like to celebrate Halloween by covering how BGP zombies form and how we can lessen the likelihood that they wreak havoc on Internet traffic.

Path hunting

To understand the slowness that can often lead to BGP zombies, we need to talk about path hunting. Path hunting occurs when routers running BGP exhaustively search for the best path to a prefix as determined by Longest Prefix Matching (LPM) and BGP routing attributes like path length and local preference. This becomes relevant in our observations of exactly how routes become stuck, for how long they become stuck, and how visible they are on the Internet.

For example, path hunting happens when a more-specific BGP prefix is withdrawn from the global routing table, and networks need to fallback to a less-specific BGP advertisement. In this example, we use 2001:db8::/48 for the more-specific BGP announcement and 2001:db8::/32 for the less-specific prefix. When the /48 is withdrawn by the originating Autonomous System (AS), BGP routers have to recognize that route as missing and begin routing traffic to IPs such as 2001:db8::1 via the 2001:db8::/32 route, which still remains while the prefix 2001:db8::/48 is gone.

Let’s see what this could look like in action with a few diagrams.

_{Diagram 1: Active 2001:db8::/48 route}

In this initial state, 2001:db8::/48 is used actively for traffic forwarding, which all flows through AS13335 on the way to AS64511. In this case, AS64511 would be a BYOIP customer of Cloudflare. AS64511 also announces a backup route to another Internet Service Provider (ISP), AS64510, but this route is not active even in AS64510’s routing table for forwarding to 2001:db8::1 because 2001:db8::/48 is a longer prefix match when compared to 2001:db8::/32.

Things get more interesting when AS64510 signals for 2001:db8::/48 to be withdrawn by Cloudflare (AS13335), perhaps because a DDoS attack is over and the customer opts to use Cloudflare only when they are actively under attack.

When the customer signals to Cloudflare (via BGP Control or API call) to withdraw the 2001:db8::/48 announcement, all BGP routers have to converge upon this update, which involves path hunting. AS13335 sends a BGP withdrawal message for 2001:db8::/48 to its directly-connected BGP neighbors. While the news of withdrawal may travel quickly from AS13335 to the other networks, news may travel more quickly to some of the neighbors than others. This means that until everyone has received and processed the withdrawal, networks may try routing through one another to reach the 2001:db8::/48 prefix – even after AS13335 has withdrawn it.

_{Diagram 2: 2001:db8::/48 route withdrawn via AS13335}

Imagine AS64501 is a little slower than the rest – perhaps due to using older hardware, hardware being overloaded, a software bug, specific configuration settings, poor luck, or some other factor – and still has not processed the withdrawal of the /48. This in itself could be a BGP zombie, since the route is stuck for a small period. Our pings toward 2001:db8::1 are never able to actually reach AS64511, because AS13335 knows the /48 is meant to be withdrawn, but some routers carrying a full table have not yet converged upon that result.

The length of time spent path hunting is amplified by something called the Minimum Route Advertisement Interval (MRAI). The MRAI specifies the minimum amount of time between BGP advertisement messages from a BGP router, meaning it introduces a purposeful number of seconds of delay between each BGP advertisement update. RFC4271 recommends an MRAI value of 30-seconds for eBGP updates, and while this can cut down on the chattiness of BGP, or even potential oscillation of updates, it also makes path hunting take longer.

At the next cycle of path hunting, even AS64501, which was previously still pointing toward a nonexistent /48 route from AS13335, should find the /32 advertisement is all that is left toward 2001:db8::1. Once it has done so, the traffic flow will become the following:

_{Diagram 3: Routing fallback to 2001:db8::/32 and 2001:db8::/48 is gone from DFZ}

This would mean BGP path hunting is over, and the Internet has realized that the 2001:db8::/32 is the best route available toward 2001:db8::1, and that 2001:db8::/48 is really gone. While in this example we’ve purposely made path hunting only last two cycles, in reality it can be far more, especially with how highly connected AS13335 is to thousands of peer networks and Tier-1’s globally.

Now that we’ve discussed BGP path hunting and how it works, you can probably already see how a BGP zombie outbreak can begin and how routing tables can become stuck for a lengthy period of time. Excessive BGP path hunting for a previously-known more-specific prefix can be an early indicator that a zombie could follow.

Spawning a zombie

Zombies have captured our attention more recently as they were noticed by some of our customers leveraging Bring-Your-Own-IP (BYOIP) on-demand advertisement for Magic Transit. BYOIP may be configured in two modes: "always-on", in which a prefix is continuously announced, or "on-demand", where a prefix is announced only when a customer chooses to. For some on-demand customers, announcement and withdrawal cycles may be a more frequent occurrence, which can lead to an increase in BGP zombies.

With that in mind and also knowing how path hunting works, let’s spawn our own zombie onto the Internet. To do so, we’ll take a spare block of IPv4 and IPv6 and announce them like so:

Once the routes are announced and stable, we’ll then proceed to withdraw the more specific routes advertised via Cloudflare globally. With a few quick clicks, we’ve successfully re-animated the dead.

Variant A: Ghoulish Gateways

One place zombies commonly occur is between upstream ISPs. When one router in a given ISP’s network is a little slower to update, routes can become stuck.

Take, for example, the following loop we observed between two of our upstream partners:

Or this loop - observed on the same withdrawal test - between two different providers:

Variant B: Undead LAN (Local Area Network)

Simultaneously, zombies can occur entirely within a given network. When a route is withdrawn from Cloudflare’s network, each device in our network must individually begin the process of withdrawing the route. While this is generally a smooth process, things can still become stuck.

Take, for instance, a situation where one router inside of our network has not yet fully processed the withdrawal. Connectivity partners will continue routing traffic towards that router (as they have not yet received the withdrawal) while no host remains behind the router which is capable of actually processing the traffic. The result is an internal-only looping path:

Unlike most fictionally-depicted hoards of the walking dead, our highly-visible zombie has a limited lifetime in most major networks – in this instance, only around around 6 minutes, after which most had re-converged around the less-specific as the best path. Sadly, this is on the shorter side – in some cases, we have seen long-lived zombies cause reachability issues for more than 10 minutes. It’s safe to say this is longer than most network operators would expect BGP convergence to take in a normal situation.

But, you may ask – is this the excessive path hunting we talked about earlier, or a BGP zombie? Really, it depends on the expectation and tolerance around how long BGP convergence should take to process the prefix withdrawal. In any case, even over 30 minutes after our withdrawal of our more-specific prefix, we are able to see zombie routes in the route-views public collectors easily:

You might argue that six to eleven minutes (or more) is a reasonable time for worst-case BGP convergence in the Tier-1 network layer, though that itself seems like a stretch. Even setting that aside, our data shows that very real BGP zombies exist in the global routing table, and they will negatively impact traffic. Curiously, we observed the path hunting delay is worse on IPv4, with the longest observed IPv6 impact in major (Tier-1) networks being just over 4 minutes. One could speculate this is in part due to the much higher number of IPv4 prefixes in the Internet global routing table than the IPv6 global table, and how BGP speakers handle them separately.

_{Source: RIPEstat’s BGPlay}

Part of the delay appears to originate from how interconnected AS13335 is; being heavily peered with a large portion of the Internet increases the likelihood of a route becoming stuck in a given location. Given that, perhaps a zombie would be shorter-lived if we operated in the opposite direction: announcing a less-specific persistently to 13335 and announcing more specifics via our local ISP during normal operation. Since the withdrawal will come from what is likely a less well-peered network, the time-to-convergence may be shorter:

Indeed, as predicted, we still get a stuck route, and it only lives for around 20 seconds in the Tier-1 network layer:

Unfortunately, that 20 seconds is still an impactful 20 seconds - while better, it’s not where we want to be. The exact length of time will depend on the native ISP networks one is connected with, and it could certainly ease into the minutes worth of stuck routing.

In both cases, the initial time-to-announce yielded no loss, nor was a zombie created, as both paths remained valid for the entirety of their initial lifetime. Zombies were only created when a more specific prefix was fully withdrawn. A newly-announced route is not subject to path hunting in the same way a withdrawn more-specific route is. As they say, good (new) news travels fast.

Lessening the zombie outbreak

Our findings lead us to believe that the withdrawal of a more-specific prefix may lead to zombies running rampant for longer periods of time. Because of this, we are exploring some improvements that make the consequences of BGP zombie routing less impactful for our customers relying on our on-demand BGP functionality.

For the traffic that does reach Cloudflare with stuck routes, we will introduce some BGP traffic forwarding improvements internally that allow for a more graceful withdrawal of traffic, even if routes are erroneously pointing toward us. In many ways, this will closely resemble the BGP well-known no-export community’s functionality from our servers running BGP. This means even if we receive traffic from external parties due to stuck routing, we will still have the opportunity to deliver traffic to our far-end customers over a tunneled connection or via a Cloudflare Network Interconnect (CNI). We look forward to reporting back the positive impact after making this improvement for a more graceful draining of traffic by default.

For the traffic that does not reach Cloudflare’s edge, and instead loops between network providers, we need to use a different approach. Since we know more-specific to less-specific prefix routing fallback is more prone to BGP zombie outbreak, we are encouraging customers to instead use a multi-step draining process when they want traffic drained from the Cloudflare edge for an on-demand prefix without introducing route loops or blackhole events. The draining process when removing traffic for a BYOIP prefix from Cloudflare should look like this:

The customer is already announcing an example prefix from Cloudflare, ex. 198.18.0.0/24
The customer begins natively announcing the prefix 198.18.0.0/24 (i.e. the same-length as the prefix they are advertising via Cloudflare) from their network to the Internet Service Providers that they wish to fail over traffic to.
After a few minutes, the customer signals BGP withdrawal from Cloudflare for the 198.18.0.0/24 prefix.

The result is a clean cut over: impactful zombies are avoided because the same-length prefix (198.18.0.0/24) remains in the global routing table. Excessive path hunting is avoided because instead of routers needing to aggressively seek out a missing more-specific prefix match, they can fallback to the same-length announcement that persists in the routing table from the natively-originated path to the customer’s network.

_{Source: RIPEstat’s BGPlay}

What next?

We are going to continue to refine our methods of measuring BGP zombies, so you can look forward to more insights in the future. There is also work from others in the community around zombie measurement that is interesting and producing useful data. In terms of combatting the software bugs around BGP zombie creation, routing vendors should implement RFC9687, the BGP SendHoldTimer. The general idea is that a local router can detect via the SendHoldTimer if the far-end router stops processing BGP messages unexpectedly, which lowers the possibility of zombies becoming stuck for long periods of time.

In addition, it’s worth keeping in mind our observations made in this post about more-specific prefix announcements and excessive path hunting. If as a network operator you rely on more-specific BGP prefix announcements for failover, or for traffic engineering, you need to be aware that routes could become stuck for a longer period of time before full BGP convergence occurs.

If you’re interested in problems like BGP zombies, consider coming to work at Cloudflare or applying for an internship. Together we can help build a better Internet!

Monitoring AS-SETs and why they matter

Mingwei Zhang — Fri, 26 Sep 2025 14:00:00 GMT

Introduction to AS-SETs

An AS-SET, not to be confused with the recently deprecated BGP AS_SET, is an Internet Routing Registry (IRR) object that allows network operators to group related networks together. AS-SETs have been used historically for multiple purposes such as grouping together a list of downstream customers of a particular network provider. For example, Cloudflare uses the AS13335:AS-CLOUDFLARE AS-SET to group together our list of our own Autonomous System Numbers (ASNs) and our downstream Bring-Your-Own-IP (BYOIP) customer networks, so we can ultimately communicate to other networks whose prefixes they should accept from us.

In other words, an AS-SET is currently the way on the Internet that allows someone to attest the networks for which they are the provider. This system of provider authorization is completely trust-based, meaning it's not reliable at all, and is best-effort. The future of an RPKI-based provider authorization system is coming in the form of ASPA (Autonomous System Provider Authorization), but it will take time for standardization and adoption. Until then, we are left with AS-SETs.

Because AS-SETs are so critical for BGP routing on the Internet, network operators need to be able to monitor valid and invalid AS-SET memberships for their networks. Cloudflare Radar now introduces a transparent, public listing to help network operators in our routing page per ASN.

AS-SETs and building BGP route filters

AS-SETs are a critical component of BGP policies, and often paired with the expressive Routing Policy Specification Language (RPSL) that describes how a particular BGP ASN accepts and propagates routes to other networks. Most often, networks use AS-SET to express what other networks should accept from them, in terms of downstream customers.

Back to the AS13335:AS-CLOUDFLARE example AS-SET, this is published clearly on PeeringDB for other peering networks to reference and build filters against.

When turning up a new transit provider service, we also ask the provider networks to build their route filters using the same AS-SET. Because BGP prefixes are also created in IRR registries using the route or route6 objects, peers and providers now know what BGP prefixes they should accept from us and deny the rest. A popular tool for building prefix-lists based on AS-SETs and IRR databases is bgpq4, and it’s one you can easily try out yourself.

For example, to generate a Juniper router’s IPv4 prefix-list containing prefixes that AS13335 could propagate for Cloudflare and its customers, you may use:

^{Restricted to 10 lines, actual output of prefix-list would be much greater}

This prefix list would be applied within an eBGP import policy by our providers and peers to make sure AS13335 is only able to propagate announcements for ourselves and our customers.

How accurate AS-SETs prevent route leaks

Let’s see how accurate AS-SETs can help prevent route leaks with a simple example. In this example, AS64502 has two providers – AS64501 and AS64503. AS64502 has accidentally messed up their BGP export policy configuration toward the AS64503 neighbor, and is exporting all routes, including those it receives from their AS64501 provider. This is a typical Type 1 Hairpin route leak.

Fortunately, AS64503 has implemented an import policy that they generated using IRR data including AS-SETs and route objects. By doing so, they will only accept the prefixes that originate from the AS Cone of AS64502, since they are their customer. Instead of having a major reachability or latency impact for many prefixes on the Internet because of this route leak propagating, it is stopped in its tracks thanks to the responsible filtering by the AS64503 provider network. Again it is worth keeping in mind the success of this strategy is dependent upon data accuracy for the fictional AS64502:AS-CUSTOMERS AS-SET.

Monitoring AS-SET misuse

Besides using AS-SETs to group together one’s downstream customers, AS-SETs can also represent other types of relationships, such as peers, transits, or IXP participations.

For example, there are 76 AS-SETs that directly include one of the Tier-1 networks, Telecom Italia / Sparkle (AS6762). Judging from the names of the AS-SETs, most of them are representing peers and transits of certain ASNs, which includes AS6762. You can view this output yourself at https://radar.cloudflare.com/routing/as6762#irr-as-sets

There is nothing wrong with defining AS-SETs that contain one’s peers or upstreams as long as those AS-SETs are not submitted upstream for customer->provider BGP session filtering. In fact, an AS-SET for upstreams or peer-to-peer relationships can be useful for defining a network’s policies in RPSL.

However, some AS-SETs in the AS6762 membership list such as AS-10099 look to attest customer relationships.

We know AS6762 is transit free and this customer membership must be invalid, so it is a prime example of AS-SET misuse that would ideally be cleaned up. Many Internet Service Providers and network operators are more than happy to correct an invalid AS-SET entry when asked to. It is reasonable to look at each AS-SET membership like this as a potential risk of having higher route leak propagation to major networks and the Internet when they happen.

AS-SET information on Cloudflare Radar

Cloudflare Radar is a hub that showcases global Internet traffic, attack, and technology trends and insights. Today, we are adding IRR AS-SET information to Radar’s routing section, freely available to the public via both website and API access. To view all AS-SETs an AS is a member of, directly or indirectly via other AS-SETs, a user can visit the corresponding AS’s routing page. For example, the AS-SETs list for Cloudflare (AS13335) is available at https://radar.cloudflare.com/routing/as13335#irr-as-sets

The AS-SET data on IRR contains only limited information like the AS members and AS-SET members. Here at Radar, we also enhance the AS-SET table with additional useful information as follows.

Inferred ASN shows the AS number that is inferred to be the creator of the AS-SET. We use PeeringDB AS-SET information match if available. Otherwise, we parse the AS-SET name to infer the creator.
IRR Sources shows which IRR databases we see the corresponding AS-SET. We are currently using the following databases: AFRINIC, APNIC, ARIN, LACNIC, RIPE, RADB, ALTDB, NTTCOM, and TC.
AS Members and AS-SET members show the count of the corresponding types of members.
AS Cone is the count of the unique ASNs that are included by the AS-SET directly or indirectly.
Upstreams is the count of unique AS-SETs that includes the corresponding AS-SET.

Users can further filter the table by searching for a specific AS-SET name or ASN. A toggle to show only direct or indirect AS-SETs is also available.

In addition to listing AS-SETs, we also provide a tree-view to display how an AS-SET includes a given ASN. For example, the following screenshot shows how as-delta indirectly includes AS6762 through 7 additional other AS-SETs. Users can copy or download this tree-view content in the text format, making it easy to share with others.

We built this Radar feature using our publicly available API, the same way other Radar websites are built. We have also experimented using this API to build additional features like a full AS-SET tree visualization. We encourage developers to give this API (and other Radar APIs) a try, and tell us what you think!

Looking ahead

We know AS-SETs are hard to keep clean of error or misuse, and even though Radar is making them easier to monitor, the mistakes and misuse will continue. Because of this, we as a community need to push forth adoption of RFC9234 and implementations of it from the major vendors. RFC9234 embeds roles and an Only-To-Customer (OTC) attribute directly into the BGP protocol itself, helping to detect and prevent route leaks in-line. In addition to BGP misconfiguration protection with RFC9234, Autonomous System Provider Authorization (ASPA) is still making its way through the IETF and will eventually help offer an authoritative means of attesting who the actual providers are per BGP Autonomous System (AS).

If you are a network operator and manage an AS-SET, you should seriously consider moving to hierarchical AS-SETs if you have not already. A hierarchical AS-SET looks like AS13335:AS-CLOUDFLARE instead of AS-CLOUDFLARE, but the difference is very important. Only a proper maintainer of the AS13335 ASN can create AS13335:AS-CLOUDFLARE, whereas anyone could create AS-CLOUDFLARE in an IRR database if they wanted to. In other words, using hierarchical AS-SETs helps guarantee ownership and prevent the malicious poisoning of routing information.

While keeping track of AS-SET memberships seems like a chore, it can have significant payoffs in preventing BGP-related incidents such as route leaks. We encourage all network operators to do their part in making sure the AS-SETs you submit to your providers and peers to communicate your downstream customer cone are accurate. Every small adjustment or clean-up effort in AS-SETs could help lessen the impact of a BGP incident later.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (X), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Bringing connections into view: real-time BGP route visibility on Cloudflare Radar

Mingwei Zhang — Wed, 21 May 2025 13:00:00 GMT

The Internet relies on the Border Gateway Protocol (BGP) to exchange IP address reachability information. This information outlines the path a sender or router can use to reach a specific destination. These paths, conveyed in BGP messages, are sequences of Autonomous System Numbers (ASNs), with each ASN representing an organization that operates its own segment of Internet infrastructure.

Throughout this blog post, we'll use the terms "BGP routes" or simply "routes" to refer to these paths. In essence, BGP functions by enabling autonomous systems to exchange routes to IP address blocks (“IP prefixes”), allowing different entities across the Internet to construct their routing tables.

When network operators debug reachability issues or assess a resource's global reach, BGP routes are often the first thing they examine. Therefore, it’s critical to have an up-to-date view of the routes toward the IP prefixes of interest. Some networks provide tools called "looking glasses" — public routing information services offering data directly from their own BGP routers. These allow external operators to examine routes from that specific network's perspective. Furthermore, services like bgp.tools, bgp.he.net, RouteViews, or the NLNOG RING looking glass offer aggregated, looking glass-like lookup capabilities, drawing on data sources from multiple organizations rather than just a single one.

However, individual looking glass instances offer a limited scope, typically restricted to the infrastructure of the service provider's network. While aggregated routing information services provide broader vantage points, they often lack the API access necessary for building automated tools on top of them. For example, systems designed for automated tasks, such as BGP leak or hijack detection, depend on programmatic API access.

We're excited to introduce Cloudflare Radar's new real-time BGP route lookup service, described below. Built using public data sources, this service provides visualizations of real-time routes directly on the corresponding IP prefix pages within Radar (see the page for 1.1.1.0/24 as an example). We are also offering API access through our free-to-use Cloudflare Radar API, empowering developers to leverage this data to build their own innovative systems and tools.

Cloudflare Radar provides real-time routes

We are excited to announce the launch of our new real-time BGP route lookup service, now accessible through both Cloudflare Radar web interface and the Cloudflare Radar API. This enhancement provides users with a near instantaneous view into global BGP routing data.

Cloudflare Radar prefix pages

Cloudflare Radar's real-time routes feature now offers a Sankey diagram illustrating the BGP routes for a given prefix. To minimize visual complexity, the visualization displays routes directed towards the Tier 1 networks. For example, the diagram below shows that 1.1.1.0/24 is announced by AS13335 (Cloudflare) and that Cloudflare has direct connections to almost all U.S.-based and international Tier 1 network providers.

Expanding on this more concise view, users also have the option to 'Show full paths' and visualize every BGP route from the prefix of interest to the collectors. (The role of the collectors in gathering this data is discussed below.) The interactive view allows panning and zooming, and hovering over the links provides tooltip information on which collector saw the route and when it was last updated.

For both views, the prefix origin table is displayed above the route’s visualization. The table shows the originating Autonomous System (AS), the visibility percentage (representing the proportion of route collectors observing the origin ASN announcement), and RPKI validation outcomes.

During a recently detected BGP misconfiguration, we saw two origin ASNs for a prefix, with AS3 incorrectly used instead of the intended origin being prepended three times. The visualization reveals AS3 as RPKI invalid with low visibility, indicating limited network acceptance. Operators can analyze these issues visually or in the table and monitor real-time corrections by refreshing the page.

Whether facing network outages, implementing new deployments, or investigating route leaks, users can leverage this feature for any scenario where a clear, global understanding of a prefix's routing paths is essential.

To allow easier access to this information, users can now search for any prefix using the Radar search bar and navigate to the corresponding prefix routing pages. Prefixes involved in BGP route leak and origin hijack events are also linked to this enhanced routing information page, helping operators debug BGP anomalies in real-time.

Cloudflare Routes API

Cloudflare Radar real-time route data is also accessible via the Radar API, and users can follow this guide to get started.

The following example shows an HTTP GET request to query all the current routes for a prefix of interest:

With the help of JSON data processing tools like jq, users can further filter data results by routes containing a certain ASN. In the following example, we make a request to ask for all current routes toward the prefix 1.1.1.0/24 and filter all routes with AS paths containing AS174:

The command output is a JSON array of route objects. Each object details a route that includes AS174 in its AS path. Additional information provided for each route includes the BGP route collector, BGP community values, and the timestamp of the last update.

The API also offers supplementary metadata alongside BGP route information, including insights into BGP route collector status and aggregated prefix-to-origin data. Recalling the earlier example of an AS path prepending misconfiguration, the RPKI invalid AS3 origin is now directly visible to users and API clients in the JSON response, showing that just 9% of all collectors observed its announcements.

From archives to real-time

Architecture overview

Cloudflare Radar uses RIPE RIS and the University of Oregon’s RouteViews as our primary BGP data sources for services like the routing statistics widget, anomaly detection, and announced address space graphs. We have previously discussed in detail on how we use the data archives from these two providers to build Cloudflare Radar’s routing pages, and our route leak and hijack detection systems.

In brief, RIPE RIS and RouteViews maintain several BGP route collectors, each connected to BGP routers across a diverse set of networks. These routers forward BGP messages to the collectors, which generate periodic data dumps for public access. These data dumps include both collections of BGP message updates and full routing table snapshots (RIB dump files).

For services monitoring stable routing information, like global routing statistics, we process RIB dump files from the archives as they become available. Conversely, for detecting dynamic events such as route leaks and hijacks, we process periodic BGP update files in batches. Services depending on this historical BGP data may experience processing delays of 10 to 30 minutes at the route collectors.

For the new real-time BGP routes feature, we aim to reduce the data delay from minutes or tens of minutes down to seconds. With the real-time stream capability provided by the BGP archiver services — RIS Live WebSocket from RIPE RIS and OpenBMP Kafka stream from RouteViews — we designed an additional real-time data stream component that enhances the routes snapshots built with MRT archive files by constantly updating the snapshots.

System design

At its core, the system enables a user to look up a prefix's routes stored in BGP routes snapshots. The BGP routes snapshots serve as a queryable data repository, organized hierarchically. The snapshots use a trie structure to allow for the retrieval of route information (such as AS paths and community values) associated with specific address prefixes. Each node in the hierarchy stores routing information from different peering routers, providing a consolidated global view. To handle the large data volumes from multiple BGP route collectors, the system partitions routing data into separate BGP routes snapshots, where each snapshot receives data streamed from its corresponding collector. This architecture enables horizontal scalability, allowing for dynamic adjustment of data sources by selecting which independent collectors' data to include or exclude.

Because the collectors’ BGP route information is maintained independently, to query for a global status, we apply the actor model software architecture for the implementation. Each collector is considered an actor that runs completely independently and communicates with a central controller via a dedicated communication channel. The central controller starts all actors by sending a signal to each of them, triggering actors to start collecting archival and real-time BGP data, on their separate threads.

Upon queries from users, the central controller will relay the query to all running actors via a query message. The actors will retrieve the corresponding route information on its prefix-trie and return the results to the controller with another message. The controller aggregates all messages from the actors and compiles them into a reply response to the user. During the whole process, the real-time BGP streaming and snapshots’ updating processes continue to run in the background without interruptions.

Our actor-model implementation enables a single node to efficiently store hundreds of full routing tables in its memory. Our current deployment uses eight route collectors, housing a total of 261 full routing tables. This in-memory system operating on a single node consumes approximately 45 GB of memory, which translates to about 170 MB per full routing table.

Summary

Cloudflare Radar now offers a real-time BGP route lookup service, providing near-instantaneous insights into global Internet routing. This feature leverages real-time data streams from RouteViews and RIPE RIS, moving beyond historical archives to deliver up-to-the-minute information. Users can now visualize routes in real time on Cloudflare Radar's prefix pages with intuitive Sankey diagrams that detail complete route information. Furthermore, the Cloudflare Radar API provides programmatic access to this data, allowing for seamless integration into custom tools and workflows.

Visit Cloudflare Radar for additional insights around Internet disruptions, routing issues, Internet traffic trends, attacks, and Internet quality. Follow us on social media at @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

Cloudflare 1.1.1.1 incident on June 27, 2024

Bryton Herdes — Thu, 04 Jul 2024 13:00:50 GMT

Introduction

On June 27, 2024, a small number of users globally may have noticed that 1.1.1.1 was unreachable or degraded. The root cause was a mix of BGP (Border Gateway Protocol) hijacking and a route leak.

Cloudflare was an early adopter of Resource Public Key Infrastructure (RPKI) for route origin validation (ROV). With RPKI, IP prefix owners can store and share ownership information securely, and other operators can validate BGP announcements by comparing received BGP routes with what is stored in the form of Route Origin Authorizations (ROAs). When Route Origin Validation is enforced by networks properly and prefixes are signed via ROA, the impact of a BGP hijack is greatly limited. Despite increased adoption of RPKI over the past several years and 1.1.1.0/24 being a signed resource, during the incident 1.1.1.1/32 was originated by ELETRONET S.A. (AS267613) and accepted by multiple networks, including at least one Tier 1 provider who accepted 1.1.1.1/32 as a blackhole route.

This caused immediate unreachability for the DNS resolver address from over 300 networks in 70 countries, although the impact on the overall percentage of users was quite low (less than 1% of users in the UK and Germany, for example), and in some countries no users noticed an impact.

Route leaks are something Cloudflare has written and talked about before, and unfortunately there are only best-effort safeguards in wide deployment today, such as IRR (Internet Routing Registry) prefix-list filtering by providers. During the same period of time as the 1.1.1.1/32 hijack, 1.1.1.0/24 was erroneously leaked upstream by Nova Rede de Telecomunicações Ltda (AS262504). The leak was further and widely propagated by Peer-1 Global Internet Exchange (AS1031), which also contributed to the impact felt by customers during the incident.

We apologize for the impact felt by users of 1.1.1.1, and take the operation of the service very seriously. Although the root cause of the impact was external to Cloudflare, we will continue to improve the detection methods in place to yield quicker response times, and will use our stance within the Internet community to further encourage adoption of RPKI-based hijack and leak prevention mechanisms such as Route Origin Validation (ROV) and Autonomous Systems Provider Authorization (ASPA) objects for BGP.

Background

Cloudflare introduced the 1.1.1.1 public DNS resolver service in 2018. Since the announcement, 1.1.1.1 has become one of the most popular resolver IP addresses that is free-to-use by anyone. Along with the popularity and easily recognized IP address comes some operational difficulties. The difficulties stem from historical use of 1.1.1.1 by networks in labs or as a testing IP address, resulting in some residual unexpected traffic or blackholed routing behavior. Because of this, Cloudflare is no stranger to dealing with the effects of BGP misrouting traffic, two of which are covered below.

BGP hijacks

Some of the difficulty comes from potential routing hijacks of 1.1.1.1. For example, if some fictitious FooBar Networks assigns 1.1.1.1/32 to one of their routers and shares this prefix within their internal network, their customers will have difficulty routing to the 1.1.1.1 DNS service. If they advertise the 1.1.1.1/32 prefix outside their immediate network, the impact can be even greater. The reason 1.1.1.1/32 would be selected instead of the 1.1.1.0/24 BGP-announced by Cloudflare is due to Longest Prefix Matching (LPM). While many prefixes in a route table could match the 1.1.1.1 address, such as 1.1.1.0/24, 1.1.1.0/29, and 1.1.1.1/32, 1.1.1.1/32 is considered the “longest match” by the LPM algorithm because it has the highest number of identical bits and longest subnet mask while matching the 1.1.1.1 address. In simple terms, we would call 1.1.1.1/32 the “most specific” route available to 1.1.1.1.

Instead of traffic toward 1.1.1.1 routing to Cloudflare via anycast and landing on one of our servers globally, it will instead land somewhere on a device within FooBar Networks where 1.1.1.1 is terminated, and a legitimate response will fail to be served back to clients. This would be considered a hijack of requests to 1.1.1.1, either done purposefully or accidentally by network operators within FooBar Networks.

BGP route leaks

Another source of impact we sometimes face for 1.1.1.1 is BGP route leaks. A route leak occurs when a network becomes an upstream, in terms of BGP announcement, for a network it shouldn’t be an upstream provider for.

Here is an example of a route leak where a customer forward routes learned from one provider to another, causing a type 1 leak (defined in RFC7908).

If enough networks within the Default-Free Zone (DFZ) accept a route leak, it may be used widely for forwarding traffic along the bad path. Often this will cause the network leaking the prefixes to overload, as they aren’t prepared for the amount of global traffic they are now attracting. We wrote about a wide-scale route leak that knocked off a large portion of the Internet, when a provider in Pennsylvania attracted traffic toward global destinations it would have typically never transited traffic for. Even though Cloudflare interconnects with over 13,000 networks globally, the BGP local-preference assigned to a leaked route could be higher than the route received by a network directly from Cloudflare. This sounds counterproductive, but unfortunately it can happen.

To explain why this happens, it helps to think of BGP as a business policy engine along with the routing protocol for the Internet. A transit provider has customers who pay them to transport their data, so logically they assign a higher BGP local-preference than connections with either private or Internet Exchange (IX) peers, so the connection being paid for is most utilized. Think of local-preference as a way of influencing priority of which outgoing connection to send traffic to. Different networks also may choose to prefer Private Network Interconnects (PNIs) over Internet Exchange (IX) received routes. Part of the reason for this is reliability, as a private connection can be viewed as a point-to-point connection between two networks with no third-party managed fabric in between to worry about. Another reason could be cost efficiency, as if you’ve gone to the trouble to allocate a router port and purchase a cross connect between yourself and another peer, you’d like to make use of it to get the best return on your investment.

It is worth noting that both BGP hijacks and route leaks can happen to any IP and prefix on the Internet, not just 1.1.1.1. But as mentioned earlier, 1.1.1.1 is such a recognizable and historically misappropriated address that it tends to be more prone to accidental hijacks or leaks than other IP resources.

During the Cloudflare 1.1.1.1 incident that happened on June 27, 2024, we ended up fighting the impact caused by a combination of both BGP hijacking and a route leak.

Incident timeline and impact

All timestamps are in UTC.

2024-06-27 18:51:00 AS267613 (Eletronet) begins announcing 1.1.1.1/32 to peers, providers, and customers. 1.1.1.1/32 is announced with the AS267613 origin AS

2024-06-27 18:52:00 AS262504 (Nova) leaks 1.1.1.0/24, also received from AS267613, upstream to AS1031 (PEER 1 Global Internet Exchange) with AS path “1031 262504 267613 13335”

2024-06-27 18:52:00 AS1031 (upstream of Nova) propagates 1.1.1.0/24 to various Internet Exchange peers and route-servers, widening impact of the leak

2024-06-27 18:52:00 One tier 1 provider receives the 1.1.1.1/32 announcement from AS267613 as a RTBH (Remote Triggered Blackhole) route, causing blackholed traffic for all the tier 1’s customers

2024-06-27 20:03:00 Cloudflare raises internal incident for 1.1.1.1 reachability issues from various countries

2024-06-27 20:08:00 Cloudflare disables a partner peering location with AS267613 that is receiving traffic toward 1.1.1.0/24

2024-06-27 20:08:00 Cloudflare team engages peering partner AS267613 about the incident

2024-06-27 20:10:00 AS262504 leaks 1.1.1.0/24 with a new AS path, “262504 53072 7738 13335” which is also redistributed by AS1031. Traffic is being delivered successfully to Cloudflare when along this path, but with high latency for affected clients

2024-06-27 20:17:00 Cloudflare engages AS262504 regarding the route leak of 1.1.1.0/24 to their upstream providers

2024-06-27 21:56:00 Cloudflare engineers disable a second peering point with AS267613 that is receiving traffic meant for 1.1.1.0/24 from multiple sources not in Brazil

2024-06-27 22:16:00 AS262504 leaks 1.1.1.0/24 again, attracting some traffic to a Cloudflare peering with AS267613 in São Paulo. Some 1.1.1.1 requests as a result are returned with higher latency, but the hijack of 1.1.1.1/32 and traffic blackholing appears resolved

2024-06-28 02:28:00 AS262504 fully resolves the route leak of 1.1.1.0/24

The impact to customers surfaced in one of two ways: unable to reach 1.1.1.1 at all; Able to reach 1.1.1.1, but with high latency per request.

Since AS267613 was hijacking the 1.1.1.1/32 address somewhere within their network, many requests failed at some device in their autonomous system. There were intermittent periods, or flaps, during the incident where they successfully routed requests toward 1.1.1.1 to Cloudflare data centers, albeit with high latency.

Looking at two source countries during the incident, Germany and the United States, impacted traffic to 1.1.1.1 looked like this:

Source Country of Users:

Keep in mind that overall this may represent a relatively small amount of total requests per source country, but normally no requests would route from the US or Germany to Brazil at all for 1.1.1.1.

Cloudflare Data Center city:

Looking at the graphs, requests to 1.1.1.1 were landing in Brazilian data centers. The gaps between the spikes are when 1.1.1.1 requests were blackholed prior to or within AS267613, and the spikes themselves are when traffic was delivered to Cloudflare with high latency invoked on the request and response. The brief spikes of traffic successfully carried to the Cloudflare peering location with AS267613 could be explained by the 1.1.1.1/32 route flapping within their network, occasionally letting traffic through to Cloudflare instead of it dropping somewhere in the intermediate path.

Technical description of the error and how it happened

Normally, a request to 1.1.1.1 from users routes to the nearest data center via BGP anycast. During the incident, AS267613 (Eletronet) advertised 1.1.1.1/32 to their peers and upstream providers, and AS262504 leaked 1.1.1.0/24 upstream, changing the normal path of BGP anycast for multiple eyeball networks drastically.

With public route collectors and the monocle tool, we can search for the rogue BGP updates.

We see above that AS398465 and AS13760 reported to the route-views collectors that they received 1.1.1.1/32 from AS267613 around the time impact begins. Normally, the longest IPv4 prefix accepted in the Default-Free-Zone (DFZ) is a /24, but in this case we observed multiple networks using the 1.1.1.1/32 route from AS267613 for forwarding, made apparent by the blackholing of traffic that never arrived at a Cloudflare POP (Point of Presence). The origination of 1.1.1.1/32 by AS267613 is a BGP route hijack. They were announcing the prefix with origin AS267613 even though the Route Origin Authorization (ROA) is only signed for origin AS13335 (Cloudflare) with a maximum prefix length of /24.

We even saw BGP updates for 1.1.1.1/32 when looking at our own BMP (BGP Monitoring Protocol) data at Cloudflare. From at least a couple different route servers, we received our own 1.1.1.1/32 announcement via BGP. Thankfully, Cloudflare rejects these routes on import as both RPKI Invalid and DFZ Invalid due to invalid AS origin and prefix length. The BMP data display is pre-policy, meaning even though we rejected the route we can see where we receive the BGP update over a peering session.

So not only are multiple networks accepting prefixes that should not exist in the global routing table, but they are also accepting an RPKI (Resource Public Key Infrastructure) Invalid route. To make matters worse, one Tier-1 transit provider accepted the 1.1.1.1/32 announcement as a RTBH (Remote-Triggered Blackhole) route from AS267613, discarding all traffic at their edge that would normally route to Cloudflare. This alone caused wide impact, as any networks leveraging this particular Tier-1 provider in routing to 1.1.1.1 would have been unable to reach the IP address during the incident.

For those unfamiliar with Remote-Triggered Blackholing, it is a method of signaling to a provider a set of destinations you would like traffic to be dropped for within their network. It exists as a blunt method of fighting off DDoS attacks. When you are being attacked on a specific IP or prefix, you can tell your upstream provider to absorb all traffic toward that destination IP address or prefix by discarding it before it comes to your network port. The problem during this incident was AS267613 was unauthorized to blackhole 1.1.1.1/32. Cloudflare only should have the sole right to leverage RTBH for discarding of traffic destined for AS13335, which is something we would in reality never do.

Looking now at BGP updates for 1.1.1.0/24 multiple networks received the prefix from AS262504 and accepted it.

Here we pay attention to the AS path again. This time, AS13335 is the origin AS at the very end of the announcements. This BGP announcement is RPKI Valid, because the origin is correctly AS13335, but this is a route leak of 1.1.1.0/24 because the path itself is invalid.

How do we know it’s a route leak?

Looking at an example path, “199524 1031 262504 267613 13335”, AS267613 is functionally a peer of AS13335 and should not share the 1.1.1.0/24 announcement with their peers or upstreams, only their customers (AS Cone). AS262504 is a customer of AS267613 and the next adjacent ASN in the path, so that particular announcement is fine up until this point. Where the 1.1.1.0/24 goes wrong is AS262504, when they announce the prefix to their upstream AS1031. Furthermore, AS1031 redistributed the advertisement to their peers.

This means AS262504 is the leaking network. AS1031 accepted the leak from their customer, AS262504, and caused wide impact by distributing the route in multiple peering locations globally. AS1031 (Peer-1 Global Internet Exchange) advertises themselves as a global peering exchange. Cloudflare is not a customer of AS1031, so 1.1.1.0/24 should have never been redistributed to peers, route-servers, or upstreams of AS1031. It appears that AS1031 does not perform any extensive filtering for customer BGP sessions, and instead just matches on adjacency (in this case, AS262504) and redistributes everything that meets this criteria. Unfortunately, this is irresponsible of AS1031 and causes direct impact to 1.1.1.1 and potentially other services that fall victim to the unguarded route propagation. While the original leaking network was AS262504, impact was greatly amplified by AS1031 and others when they accepted the hijack or leak and further distributed the announcements.

During the majority of the incident, the leak by AS262504 eventually landed requests within AS267613, which was discarding 1.1.1.1/32 traffic somewhere in their network. To that end, AS262504 really just amplified the impact in terms of 1.1.1.1 unreachability by leaking routes upstream.

To limit impact of the route leak, Cloudflare disabled peering in multiple locations with AS267613. The problem did not completely go away, as AS262504 was still leaking a stale path pointing to São Paulo. Requests landing in São Paulo were able to be served, albeit with a high round-trip time back to users. Cloudflare has been engaging with all networks mentioned throughout this post in regard to the leak and future prevention mechanisms, as well as at least one Tier 1 transit provider who accepted 1.1.1.1/32 from AS267613 as a blackhole route that was unauthorized by Cloudflare and caused widespread impact.

Remediation and follow-up steps

BGP hijacks

RPKI origin validationRPKI has recently reached a major milestone at 50% deployment in terms of prefixes signed by Route Origin Authorization (ROA). While RPKI certainly helps limit the spread of a hijacked BGP prefix throughout the Internet, we need all networks to do their part, especially major networks with a large sum of downstream Autonomous Systems (AS’s). During the hijack of 1.1.1.1/32, multiple networks accepted and used the route announced by AS267613 for traffic forwarding.

RPKI and Remote-Triggered Blackholing (RTBH)A significant amount of the impact caused during this incident was due to a Tier 1 provider accepting 1.1.1.1/32 as a blackhole route from a third party that is not Cloudflare. This in itself is a hijack of 1.1.1.1, and a very dangerous one. RTBH is a useful tool used by many networks when desperate for a mitigation against large DDoS attacks. The problem is the BGP filtering used for RTBH is loose in nature, relying often only on AS-SET objects found in Internet Routing Registries. Relying on Route Origin Authorization (ROA) would be infeasible for RTBH filtering, as that would require thousands of potential ROAs be created for the network the size of Cloudflare. Not only this, but creating specific /32 entries opens up the potential for an individual address such as 1.1.1.1/32 being announced by someone pretending to be AS13335, becoming the best route to 1.1.1.1 on the Internet and causing severe impact.

AS-SET filtering is not representative of authority to blackhole a route, such as 1.1.1.1/32. Only Cloudflare should be able to blackhole a destination it has the rights to operate. A potential way to fix the lenient filtering of providers on RTBH sessions would again be leveraging an RPKI. Using an example from the IETF, the expired draft-spaghetti-sidrops-rpki-doa-00 proposal specified a Discard Origin Authorization (DOA) object that would be used to authorize only specific origins to authorize a blackhole action for a prefix. If such an object was signed, and RTBH requests validated against the object, the unauthorized blackhole attempt of 1.1.1.1/32 by AS267613 would have been invalid instead of accepted by the Tier 1 provider.

BGP best practicesSimply following BGP best practices laid out by MANRS, and rejecting IPv4 prefixes that are longer than a /24 in the Default-Free Zone (DFZ) would have reduced impact to 1.1.1.1. Rejecting invalid prefix lengths within the wider Internet should be part of a standard BGP policy for all networks.

BGP route leaks

Route leak detection

While route leaks are not unavoidable for Cloudflare today, because the Internet inherently relies on trust for interconnection, there are some steps we will take to limit impact.

We have expanded data sources to use for our route leak detection system to cover more networks and are in the process of incorporating real-time data into the detection system to allow more timely response toward similar events in the future.

ASPA for BGP

We will continue advocating for the adoption of RPKI into AS Path based route leak prevention. Autonomous System Provider Authorization (ASPA) objects are similar to ROAs, except instead of signing prefixes with an authorized origin AS, the AS itself is signed with a list of provider networks that are allowed to propagate their routes. So, in the case of Cloudflare, only valid upstream transit providers would be signed as authorized to advertise AS13335 prefixes such as 1.1.1.0/24 upstream.

In the route leak example where AS262504 (customer of AS267613) shared 1.1.1.0/24 upstream, BGP ASPA would see this leak if AS267613 had signed their authorized providers and AS1031 had validated paths against that list. Similar to RPKI origin validation, however, this will be a long-term effort and dependent on networks, especially large providers, rejecting invalid AS paths as based on ASPA objects.

Other potential approaches

There are alternative approaches to ASPA that do exist, in various stages of adoption that may be worth noting. There is no guarantee that the following make it to a stage of wide Internet deployment, however.

RFC9234, for example, uses a concept of peer roles within BGP capabilities and attributes, and depending on the configuration of routers along a path for updates, an “Only-To-Customer” (OTC) attribute can be added to prefixes that will prevent the upstream spread of a prefix such as 1.1.1.0/24 along a leaked path. The downside is BGP configuration needs to be completed to assign the various roles to each peering session, and vendor adoption still has to be fully ironed out to make this feasible for actual use in production with positive results.

Like all approaches to solving route leaks, cooperation amongst network operators on the Internet is required for success.

Conclusion

Cloudflare’s 1.1.1.1 DNS resolver service fell victim to a simultaneous BGP hijack and BGP route leak event. While the actions of external networks are outside of Cloudflare’s direct control, we intend to take every step within both the Internet community and internally at Cloudflare to detect impact more quickly and lessen impact to our users.

Long term, Cloudflare continues to support adoption of RPKI-based origin validation, as well as AS path validation. The former exists with deployment across a wide array of the world’s largest networks, and the latter is still in draft phase at the IETF (Internet Engineering Task Force). In the meantime, to check if your ISP is enforcing RPKI origin validation, you can always visit isbgpsafeyet.com and Test Your ISP.

Cloudflare Radar's new BGP origin hijack detection system

Mingwei Zhang — Fri, 28 Jul 2023 13:00:26 GMT

Border Gateway Protocol (BGP) is the de facto inter-domain routing protocol used on the Internet. It enables networks and organizations to exchange reachability information for blocks of IP addresses (IP prefixes) among each other, thus allowing routers across the Internet to forward traffic to its destination. BGP was designed with the assumption that networks do not intentionally propagate falsified information, but unfortunately that’s not a valid assumption on today’s Internet.

Malicious actors on the Internet who control BGP routers can perform BGP hijacks by falsely announcing ownership of groups of IP addresses that they do not own, control, or route to. By doing so, an attacker is able to redirect traffic destined for the victim network to itself, and monitor and intercept its traffic. A BGP hijack is much like if someone were to change out all the signs on a stretch of freeway and reroute automobile traffic onto incorrect exits.

You can learn more about BGP and BGP hijacking and its consequences in our learning center.

At Cloudflare, we have long been monitoring suspicious BGP anomalies internally. With our recent efforts, we are bringing BGP origin hijack detection to the Cloudflare Radar platform, sharing our detection results with the public. In this blog post, we will explain how we built our detection system and how people can use Radar and its APIs to integrate our data into their own workflows**.**

What is BGP origin hijacking?

Services and devices on the Internet locate each other using IP addresses. Blocks of IP addresses are called an IP prefix (or just prefix for short), and multiple prefixes from the same organization are aggregated into an autonomous system (AS).

Using the BGP protocol, ASes announce which routes can be imported or exported to other ASes and routers from their routing tables. This is called the AS routing policy. Without this routing information, operating the Internet on a large scale would quickly become impractical: data packets would get lost or take too long to reach their destinations.

During a BGP origin hijack, an attacker creates fake announcements for a targeted prefix, falsely identifying an autonomous systems (AS) under their control as the origin of the prefix.

In the following graph, we show an example where AS 4 announces the prefix P that was previously originated by AS 1. The receiving parties, i.e. AS 2 and AS 3, accept the hijacked routes and forward traffic toward prefix P to AS 4 instead.

As you can see, the normal and hijacked traffic flows back in the opposite direction of the BGP announcements we receive.

If successful, this type of attack will result in the dissemination of the falsified prefix origin announcement throughout the Internet, causing network traffic previously intended for the victim network to be redirected to the AS controlled by the attacker. As an example of a famous BGP hijack attack, in 2018 someone was able to convince parts of the Internet to reroute traffic for AWS to malicious servers where they used DNS to redirect MyEtherWallet.com, a popular crypto wallet, to a hacked page.

Prevention mechanisms and why they’re not perfect (yet)

The key difficulty in preventing BGP origin hijacks is that the BGP protocol itself does not provide a mechanism to validate the announcement content. In other words, the original BGP protocol does not provide any authentication or ownership safeguards; any route can be originated and announced by any random network, independent of its rights to announce that route.

To address this problem, operators and researchers have proposed the Resource Public Key Infrastructure (RPKI) to store and validate prefix-to-origin mapping information. With RPKI, operators can prove the ownership of their network resources and create ROAs, short for Route Origin Authorisations, cryptographically signed objects that define which Autonomous System (AS) is authorized to originate a specific prefix.

Cloudflare committed to support RPKI since the early days of the RFC. With RPKI, IP prefix owners can store and share the ownership information securely, and other operators can validate BGP announcements by checking the prefix origin to the information stored on RPKI. Any hijacking attempt to announce an IP prefix with an incorrect origin AS will result in invalid validation results, and such invalid BGP messages will be discarded. This validation process is referred to as route origin validation (ROV).

In order to further advocate for RPKI deployment and filtering of RPKI invalid announcements, Cloudflare has been providing a RPKI test service, Is BGP Safe Yet?, allowing users to test whether their ISP filters RPKI invalid announcements. We also provide rich information with regard to the RPKI status of individual prefixes and ASes at https://rpki.cloudflare.com/.

However, the effectiveness of RPKI on preventing BGP origin hijacks depends on two factors:

The ratio of prefix owners register their prefixes on RPKI;
The ratio of networks performing route origin validation.

Unfortunately, neither ratio is at a satisfactory level yet. As of today, July 27, 2023, only about 45% of the IP prefixes routable on the Internet are covered by some ROA on RPKI. The remaining prefixes are highly vulnerable to BGP origin hijacks. Even for the 45% prefix that are covered by some ROA, origin hijack attempts can still affect them due to the low ratio of networks that perform route origin validation (ROV). Based on our recent study, only 6.5% of the Internet users are protected by ROV from BGP origin hijacks.

Despite the benefits of RPKI and RPKI ROAs, their effectiveness in preventing BGP origin hijacks is limited by the slow adoption and deployment of these technologies. Until we achieve a high rate of RPKI ROA registration and RPKI invalid filtering, BGP origin hijacks will continue to pose a significant threat to the daily operations of the Internet and the security of everyone connected to it. Therefore, it’s also essential to prioritize developing and deploying BGP monitoring and detection tools to enhance the security and stability of the Internet's routing infrastructure.

Design of Cloudflare’s BGP hijack detection system

Our system comprises multiple data sources and three distinct modules that work together to detect and analyze potential BGP hijack events: prefix origin change detection, hijack detection and the storage and notification module.

The Prefix Origin Change Detection module provides the data, the Hijack Detection module analyzes the data, and the Alerts Storage and Delivery module stores and provides access to the results. Together, these modules work in tandem to provide a comprehensive system for detecting and analyzing potential BGP hijack events.

Prefix origin change detection module

At its core, the BGP protocol involves:

Exchanging prefix reachability (routing) information;
Deciding where to forward traffic based on the reachability information received.

The reachability change information is encoded in BGP update messages while the routing decision results are encoded as a route information base (RIB) on the routers, also known as the routing table.

In our origin hijack detection system, we focus on investigating BGP update messages that contain changes to the origin ASes of any IP prefixes. There are two types of BGP update messages that could indicate prefix origin changes: announcements and withdrawals.

Announcements include an AS-level path toward one or more prefixes. The path tells the receiving parties through which sequence of networks (ASes) one can reach the corresponding prefixes. The last hop of an AS path is the origin AS. In the following diagram, AS 1 is the origin AS of the announced path.

Withdrawals, on the other hand, simply inform the receiving parties that the prefixes are no longer reachable.

Both types of messages are stateless. They inform us of the current route changes, but provide no information about the previous states. As a result, detecting origin changes is not as straightforward as one may think. Our system needs to keep track of historical BGP updates and build some sort of state over time so that we can verify if a BGP update contains origin changes.

We didn't want to deal with a complex system like a database to manage the state of all the prefixes we see resulting from all the BGP updates we get from them. Fortunately, there's this thing called prefix trie in computer science that you can use to store and look up string-indexed data structures, which is ideal for our use case. We ended up developing a fast Rust-based custom IP prefix trie that we use to hold the relevant information such as the origin ASN and the AS path for each IP prefix and allows information to be updated based on BGP announcements and withdrawals.

The example figure below shows an example of the AS path information for prefix 192.0.2.0/24 stored on a prefix trie. When updating the information on the prefix trie, if we see a change of origin ASN for any given prefix, we record the BGP message as well as the change and create an Origin Change Signal.

The prefix origin changes detection module collects and processes live-stream and historical BGP data from various sources. For live streams, our system applies a thin layer of data processing to translate BGP messages into our internal data structure. At the same time, for historical archives, we use a dedicated deployment of the BGPKIT broker and parser to convert MRT files from RouteViews and RIPE RIS into BGP message streams as they become available.

After the data is collected, consolidated and normalized it then creates, maintains and destroys the prefix tries so that we can know what changed from previous BGP announcements from the same peers. Based on these calculations we then send enriched messages downstream to be analyzed.

Hijack detection module

Determining whether BGP messages suggest a hijack is a complex task, and no common scoring mechanism can be used to provide a definitive answer. Fortunately, there are several types of data sources that can collectively provide a relatively good idea of whether a BGP announcement is legitimate or not. These data sources can be categorized into two types: inter-AS relationships and prefix-origin binding.

The inter-AS relationship datasets include AS2org and AS2rel datasets from CAIDA/UCSD, AS2rel datasets from BGPKIT, AS organization datasets from PeeringDB, and per-prefix AS relationship data built at Cloudflare. These datasets provide information about the relationship between autonomous systems, such as whether they are upstream or downstream from one another, or if the origins of any change signal belong to the same organization.

Prefix-to-origin binding datasets include live RPKI validated ROA payload (VRP) from the Cloudflare RPKI portal, daily Internet Routing Registry (IRR) dumps curated and cleaned up by MANRS, and prefix and AS bogon lists (private and reserved addresses defined by RFC 1918, RFC 5735, and RFC 6598). These datasets provide information about the ownership of prefixes and the ASes that are authorized to originate them.

By combining all these data sources, it is possible to collect information about each BGP announcement and answer questions programmatically. For this, we have a scoring function that takes all the evidence gathered for a specific BGP event as the input and runs that data through a sequence of checks. Each condition returns a neutral, positive, or negative weight that keeps adding to the final score. The higher the score, the more likely it is that the event is a hijack attempt.

The following diagram illustrates this sequence of checks:

As you can see, for each event, several checks are involved that help calculate the final score: RPKI, Internet Routing Registry (IRR), bogon prefixes and ASNs lists, AS relationships, and AS path.

Our guiding principles are: if the newly announced origins are RPKI or IRR invalid, it’s more likely that it’s a hijack, but if the old origins are also invalid, then it’s less likely. We discard events about private and reserved ASes and prefixes. If the new and old origins have a direct business relationship, then it’s less likely that it’s a hijack. If the new AS path indicates that the traffic still goes through the old origin, then it’s probably not a hijack.

Signals that are deemed legitimate are discarded, while signals with a high enough confidence score are flagged as potential hijacks and sent downstream for further analysis.

It's important to reiterate that the decision is not binary but a score. There will be situations where we find false negatives or false positives. The advantage of this framework is that we can easily monitor the results, learn from additional datasets and conduct the occasional manual inspection, which allows us to adjust the weights, add new conditions and continue improving the score precision over time.

Aggregating BGP hijack events

Our BGP hijack detection system provides fast response time and requires minimal resources by operating on a per-message basis.

However, when a hijack is happening, the number of hijack signals can be overwhelming for operators to manage. To address this issue, we designed a method to aggregate individual hijack messages into BGP hijack events, thereby reducing the number of alerts triggered.

An event aggregates BGP messages that are coming from the same hijacker related to prefixes from the same victim. The start date is the same as the date of the first suspicious signal. To calculate the end of an event we look for one of the following conditions:

A BGP withdrawn message for the hijacked prefix: regardless of who sends the withdrawal, the route towards the prefix is no longer via the hijacker, and thus this hijack message is considered finished.
A new BGP announcement message with the previous (legitimate) network as the origin: this indicates that the route towards the prefix is reverted to the state before the hijack, and the hijack is therefore considered finished.

If all BGP messages for an event have been withdrawn or reverted, and there are no more new suspicious origin changes from the hijacker ASN for six hours, we mark the event as finished and set the end date.

Hijack events can capture both small-scale and large-scale attacks. Alerts are then based on these aggregated events, not individual messages, making it easier for operators to manage and respond appropriately.

Alerts, Storage and Notifications module

This module provides access to detected BGP hijack events and sends out notifications to relevant parties. It handles storage of all detected events and provides a user interface for easy access and search of historical events. It also generates notifications and delivers them to the relevant parties, such as network administrators or security analysts, when a potential BGP hijack event is detected. Additionally, this module can build dashboards to display high-level information and visualizations of detected events to facilitate further analysis.

Lightweight and portable implementation

Our BGP hijack detection system is implemented as a Rust-based command line application that is lightweight and portable. The whole detection pipeline runs off a single binary application that connects to a PostgreSQL database and essentially runs a complete self-contained BGP data pipeline. And if you are wondering, yes, the full system, including the database, can run well on a laptop.

The runtime cost mainly comes from maintaining the in-memory prefix tries for each full-feed router, each costing roughly 200 MB RAM. For the beta deployment, we use about 170 full-feed peers and the whole system runs well on a single 32 GB node with 12 threads.

Using the BGP Hijack Detection

The BGP Hijack Detection results are now available on both the Cloudflare Radar website and the Cloudflare Radar API.

Cloudflare Radar

Under the “Security & Attacks” section of the Cloudflare Radar for both global and ASN view, we now display the BGP origin hijacks table. In this table, we show a list of detected potential BGP hijack events with the following information:

The detected and expected origin ASes;
The start time and event duration;
The number of BGP messages and route collectors peers that saw the event;
The announced prefixes;
Evidence tags and confidence level (on the likelihood of the event being a hijack).

For each BGP event, our system generates relevant evidence tags to indicate why the event is considered suspicious or not. These tags are used to inform the confidence score assigned to each event. Red tags indicate evidence that increases the likelihood of a hijack event, while green tags indicate the opposite.

For example, the red tag "RPKI INVALID" indicates an event is likely a hijack, as it suggests that the RPKI validation failed for the announcement. Conversely, the tag "SIBLING ORIGINS" is a green tag that indicates the detected and expected origins belong to the same organization, making it less likely for the event to be a hijack.

Users can now access the BGP hijacks table in the following ways:

Global view under Security & Attacks page without location filters. This view lists the most recent 150 detected BGP hijack events globally.
When filtered by a specific ASN, the table will appear on Overview, Traffic, and Traffic & Attacks tabs.

Cloudflare Radar API

We also provide programmable access to the BGP hijack detection results via the Cloudflare Radar API, which is freely available under CC BY-NC 4.0 license. The API documentation is available at the Cloudflare API portal.

The following curl command fetches the most recent 10 BGP hijack events relevant to AS64512.

Users can further filter events with high confidence by specifying the minConfidence parameter with a 0-10 value, where a higher value indicates higher confidence of the events being a hijack. The following example expands on the previous example by adding the minimum confidence score of 8 to the query:

Additionally, users can also quickly build custom hijack alerters using a Cloudflare Workers + KV combination. We have a full tutorial on building alerters that send out webhook-based messages or emails (with Email Routing) available on the Cloudflare Radar documentation site.

More routing security on Cloudflare Radar

As we continue improving Cloudflare Radar, we are planning to introduce additional Internet routing and security data. For example, Radar will soon get a dedicated routing section to provide digestible BGP information for given networks or regions, such as distinct routable prefixes, RPKI valid/invalid/unknown routes, distribution of IPv4/IPv6 prefixes, etc. Our goal is to provide the best data and tools for routing security to the community, so that we can build a better and more secure Internet together.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Routing information now on Cloudflare Radar

Mingwei Zhang — Thu, 27 Jul 2023 13:00:30 GMT

Routing is one of the most critical operations of the Internet. Routing decides how and where the Internet traffic should flow from the source to the destination, and can be categorized into two major types: intra-domain routing and inter-domain routing. Intra-domain routing handles making decisions on how individual packets should be routed among the servers and routers within an organization/network. When traffic reaches the edge of a network, the inter-domain routing kicks in to decide what the next hop is and forward the traffic along to the corresponding networks. Border Gateway Protocol (BGP) is the de facto inter-domain routing protocol used on the Internet.

Today, we are introducing another section on Cloudflare Radar: the Routing page, which focuses on monitoring the BGP messages exchanged to extract and present insights on the IP prefixes, individual networks, countries, and the Internet overall. The new routing data allows users to quickly examine routing status of the Internet, examine secure routing protocol deployment for a country, identify routing anomalies, validate IP block reachability and much more from globally distributed vantage points.

It’s a detailed view of how the Internet itself holds together.

Collecting routing statistics

The Internet consists of tens of thousands of interconnected organizations. Each organization manages its own internal networking infrastructure autonomously, and is referred to as an autonomous system (AS). ASes establish connectivity among each other and exchange routing information via BGP messages to form the current Internet.

When we open the Radar Routing page the “Routing Statistics” block provides a quick glance on the sizes and status of an autonomous system (AS), a country, or the Internet overall. The routing statistics component contains the following count information:

The number of ASes on the Internet or registered from a given country;
The number of distinct prefixes and the routes toward them observed on the global routing table, worldwide, by country, or by AS;
The number of routes categorized by Resource Public Key Infrastructure (RPKI) validation results (valid, invalid, or unknown).

We also show the breakdown of these numbers for IPv4 and IPv6 separately, so users may have a better understanding of such information with respect to different IP protocols.

For a given network, we also show the BGP announcements volume chart for the past week as well as other basic information like network name, registration country, estimated user count, and sibling networks.

Identifying routing anomalies

BGP as a routing protocol suffers from a number of security weaknesses. In the new Routing page we consolidate the BGP route leaks and BGP hijacks detection results in one single place, showing the relevant detected events for any given network or globally.

The BGP Route Leaks table shows the detected BGP route leak events. Each entry in the table contains the information about the related ASes of the leak event, start and end time, as well as other numeric statistics that reflect the scale and impact of the event. The BGP Origin Hijacks table shows the detected potential BGP origin hijacks. Apart from the relevant ASes, time, and impact information, we also show the key evidence that we collected for each event to provide additional context on why and how likely one event being a BGP hijack.

With this release, we introduce another anomaly detection: RPKI Invalid Multiple Origin AS (MOAS) is one type of routing conflict where multiple networks (ASes) originate the same IP prefixes at the same time, which goes against the best practice recommendation. Our system examines the most recent global routing tables and identifies MOASes on the routing tables. With the help of Resource Public Key Infrastructure (RPKI), we can further identify MOAS events that have origins that were proven RPKI invalid, which are less likely to be legitimate cases. Users and operators can quickly identify such anomalies relevant to the networks of interest and take actions accordingly.

The Routing page will be the permanent home for all things BGP and routing data in the future; we will gradually introduce more anomaly detections and improve our pipeline to provide more security insights.

Examining routing assets and connectivity

Apart from examining the overall routing statistics and anomalies, we also gather information on the routing assets (IP prefixes for a network and networks for a country) and networks’ connectivity.

Tens of thousands autonomous systems (ASes) connect to each other to form the current Internet. The ASes differ in size and operate in different geolocations. Generally, larger networks are more well-connected and considered “upstream” and smaller networks are less connected and considered “downstream” on the Internet. Below is an example connectivity diagram showing how two smaller networks may connect to each other. AS1 announces its IP prefixes to its upstream providers and propagates upwards until it reaches the large networks AS3 and AS4, and then the route propagates downstream to smaller networks until it reaches AS6.

In the routing page, we examine what IP prefixes any given AS originates, as well as the interconnections among ASes. We show the full list of IP prefixes originated for any given AS, including the breakdown lists by RPKI validation status. We also show the detected connectivity among other ASes categorized into upstream, downstream and peering connections. Users can easily search for any ASes upstream, downstream, or peers.

For a given country, we show the full list of networks registered in the country, sorted by the number of IP prefixes originated from the corresponding networks. This allows users to quickly glance and find networks from any given country. The table is also searchable by network name or AS number.

Routing data API access

Like all the other data, the Cloudflare Radar Routing data is powered by our developer API. The data API is freely available under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0) license. In the following table, we list all of our data APIs available at launch. As we improve the routing section, we will introduce more APIs in the future.

Example 1: lookup origin AS for a given prefix with cURL

The Cloudflare Radar prefix-to-origin mapping API returns the matching prefix-origin pairs observed on the global routing tables, allowing users to quickly examine the networks that originate a given prefix or listing all the prefixes a network originates.

In this example, we ask of “which network(s) originated the prefix 1.1.1.0/24?” using the following cURL command:

The returned JSON result shows that Cloudflare (AS13335) originates the prefix 1.1.1.0/24 and it is a RPKI valid origin. It also returns the meta information such as the UTC timestamp of the query as well as when the dataset is last updated (data_time field).

Example 2: integrate Radar API into command-line tool

BGPKIT monocle is an open-source command-line application that provides multiple utility functions like searching BGP messages on public archives, network lookup by name, RPKI validation status for a given IP prefix, etc.

By integrating Cloudflare Radar APIs into monocle, users can now quickly lookup routing statistics or prefix-to-origin mapping by running monocle radar stats [QUERY] and monocle radar pfx2as commands.

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

How we detect route leaks and our new Cloudflare Radar route leak service

Mingwei Zhang — Wed, 23 Nov 2022 16:00:00 GMT

Today we’re introducing Cloudflare Radar’s route leak data and API so that anyone can get information about route leaks across the Internet. We’ve built a comprehensive system that takes in data from public sources and Cloudflare’s view of the Internet drawn from our massive global network. The system is now feeding route leak data on Cloudflare Radar’s ASN pages and via the API.

This blog post is in two parts. There’s a discussion of BGP and route leaks followed by details of our route leak detection system and how it feeds Cloudflare Radar.

About BGP and route leaks

Inter-domain routing, i.e., exchanging reachability information among networks, is critical to the wellness and performance of the Internet. The Border Gateway Protocol (BGP) is the de facto routing protocol that exchanges routing information among organizations and networks. At its core, BGP assumes the information being exchanged is genuine and trust-worthy, which unfortunately is no longer a valid assumption on the current Internet. In many cases, networks can make mistakes or intentionally lie about the reachability information and propagate that to the rest of the Internet. Such incidents can cause significant disruptions of the normal operations of the Internet. One type of such disruptive incident is route leaks.

We consider route leaks as the propagation of routing announcements beyond their intended scope (RFC7908). Route leaks can cause significant disruption affecting millions of Internet users, as we have seen in many past notable incidents. For example, in June 2019 a misconfiguration in a small network in Pennsylvania, US (AS396531 - Allegheny Technologies Inc) accidentally leaked a Cloudflare prefix to Verizon, which proceeded to propagate the misconfigured route to the rest of its peers and customers. As a result, the traffic of a large portion of the Internet was squeezed through the limited-capacity links of a small network. The resulting congestion caused most of Cloudflare traffic to and from the affected IP range to be dropped.

A similar incident in November 2018 caused widespread unavailability of Google services when a Nigerian ISP (AS37282 - Mainone) accidentally leaked a large number of Google IP prefixes to its peers and providers violating the valley-free principle.

These incidents illustrate not only that route leaks can be very impactful, but also the snowball effects that misconfigurations in small regional networks can have on the global Internet.

Despite the criticality of detecting and rectifying route leaks promptly, they are often detected only when users start reporting the noticeable effects of the leaks. The challenge with detecting and preventing route leaks stems from the fact that AS business relationships and BGP routing policies are generally undisclosed, and the affected network is often remote to the root of the route leak.

In the past few years, solutions have been proposed to prevent the propagation of leaked routes. Such proposals include RFC9234 and ASPA, which extends the BGP to annotate sessions with the relationship type between the two connected AS networks to enable the detention and prevention of route leaks.

An alternative proposal to implement similar signaling of BGP roles is through the use of BGP Communities; a transitive attribute used to encode metadata in BGP announcements. While these directions are promising in the long term, they are still in very preliminary stages and are not expected to be adopted at scale soon.

At Cloudflare, we have developed a system to detect route leak events automatically and send notifications to multiple channels for visibility. As we continue our efforts to bring more relevant data to the public, we are happy to announce that we are starting an open data API for our route leak detection results today and integrate results to Cloudflare Radar pages.

Route leak definition and types

Before we jump into how we design our systems, we will first do a quick primer on what a route leak is, and why it is important to detect it.

We refer to the published IETF RFC7908 document "Problem Definition and Classification of BGP Route Leaks" to define route leaks.

> A route leak is the propagation of routing announcement(s) beyond their intended scope.

The intended scope is often concretely defined as inter-domain routing policies based on business relationships between Autonomous Systems (ASes). These business relationships are broadly classified into four categories: customers, transit providers, peers and siblings, although more complex arrangements are possible.

In a customer-provider relationship the customer AS has an agreement with another network to transit its traffic to the global routing table. In a peer-to-peer relationship two ASes agree to free bilateral traffic exchange, but only between their own IPs and the IPs of their customers. Finally, ASes that belong under the same administrative entity are considered siblings, and their traffic exchange is often unrestricted. The image below illustrates how the three main relationship types translate to export policies.

By categorizing the types of AS-level relationships and their implications on the propagation of BGP routes, we can define multiple phases of a prefix origination announcements during propagation:

upward: all path segments during this phase are customer to provider
peering: one peer-peer path segment
downward: all path segments during this phase are provider to customer

An AS path that follows valley-free routing principle will have upward, peering, downward phases, all optional but have to be in that order. Here is an example of an AS path that conforms with valley-free routing.

In RFC7908, "Problem Definition and Classification of BGP Route Leaks", the authors define six types of route leaks, and we refer to these definitions in our system design. Here are illustrations of each of the route leak types.

Type 1: Hairpin Turn with Full Prefix

> A multihomed AS learns a route from one upstream ISP and simply propagates it to another upstream ISP (the turn essentially resembling a hairpin). Neither the prefix nor the AS path in the update is altered.

An AS path that contains a provider-customer and customer-provider segment is considered a type 1 leak. The following example: AS4 → AS5 → AS6 forms a type 1 leak.

Type 1 is the most recognized type of route leaks and is very impactful. In many cases, a customer route is preferable to a peer or a provider route. In this example, AS6 will likely prefer sending traffic via AS5 instead of its other peer or provider routes, causing AS5 to unintentionally become a transit provider. This can significantly affect the performance of the traffic related to the leaked prefix or cause outages if the leaking AS is not provisioned to handle a large influx of traffic.

In June 2015, Telekom Malaysia (AS4788), a regional ISP, leaked over 170,000 routes learned from its providers and peers to its other provider Level3 (AS3549, now Lumen). Level3 accepted the routes and further propagated them to its downstream networks, which in turn caused significant network issues globally.

Type 2: Lateral ISP-ISP-ISP Leak

Type 2 leak is defined as propagating routes obtained from one peer to another peer, creating two or more consecutive peer-to-peer path segments.

Here is an example: AS3 → AS4 → AS5 forms a type 2 leak.

One example of such leaks is more than three very large networks appearing in sequence. Very large networks (such as Verizon and Lumen) do not purchase transit from each other, and having more than three such networks on the path in sequence is often an indication of a route leak.

However, in the real world, it is not unusual to see multiple small peering networks exchanging routes and passing on to each other. Legit business reasons exist for having this type of network path. We are less concerned about this type of route leak as compared to type 1.

Type 3 and 4: Provider routes to peer; peer routes to provider

These two types involve propagating routes from a provider or a peer not to a customer, but to another peer or provider. Here are the illustrations of the two types of leaks:

As in the previously mentioned example, a Nigerian ISP who peers with Google accidentally leaked its route to its provider AS4809, and thus generated a type 4 route leak. Because routes via customers are usually preferred to others, the large provider (AS4809) rerouted its traffic to Google via its customer, i.e. the leaking ASN, overwhelmed the small ISP and took down Google for over one hour.

Route leak summary

So far, we have looked at the four types of route leaks defined in RFC7908. The common thread of the four types of route leaks is that they're all defined using AS-relationships, i.e., peers, customers, and providers. We summarize the types of leaks by categorizing the AS path propagation based on where the routes are learned from and propagate to. The results are shown in the following table.

We can summarize the whole table into one single rule: routes obtained from a non-customer AS can only be propagated to customers.

Note: Type 5 and type 6 route leaks are defined as prefix re-origination and announcing of private prefixes. Type 5 is more closely related to prefix hijackings, which we plan to expand our system to as the next steps, while type 6 leaks are outside the scope of this work. Interested readers can refer to sections 3.5 and 3.6 of RFC7908 for more information.

The Cloudflare Radar route leak system

Now that we know what a route leak is, let’s talk about how we designed our route leak detection system.

From a very high level, we compartmentalize our system into three different components:

Raw data collection module: responsible for gathering BGP data from multiple sources and providing BGP message stream to downstream consumers.
Leak detection module: responsible for determining whether a given AS-level path is a route leak, estimate the confidence level of the assessment, aggregating and providing all external evidence needed for further analysis of the event.
Storage and notification module: responsible for providing access to detected route leak events and sending out notifications to relevant parties. This could also include building a dashboard for easy access and search of the historical events and providing the user interface for high-level analysis of the event.

Data collection module

There are three types of data input we take into consideration:

Historical: BGP archive files for some time range in the pasta. RouteViews and RIPE RIS BGP archives
Semi-real-time: BGP archive files as soon as they become available, with a 10-30 minute delay.a. RouteViews and RIPE RIS archives with data broker that checks new files periodically (e.g. BGPKIT Broker)
Real-time: true real-time data sourcesa. RIPE RIS Liveb. Cloudflare internal BGP sources

For the current version, we use the semi-real-time data source for the detection system, i.e., the BGP updates files from RouteViews and RIPE RIS. For data completeness, we process data from all public collectors from these two projects (a total of 63 collectors and over 2,400 collector peers) and implement a pipeline that’s capable of handling the BGP data processing as the data files become available.

For data files indexing and processing, we deployed an on-premises BGPKIT Broker instance with Kafka feature enabled for message passing, and a custom concurrent MRT data processing pipeline based on BGPKIT Parser Rust SDK. The data collection module processes MRT files and converts results into a BGP messages stream at over two billion BGP messages per day (roughly 30,000 messages per second).

Route leak detection

The route leak detection module works at the level of individual BGP announcements. The detection component investigates one BGP message at a time, and estimates how likely a given BGP message is a result of a route leak event.

We base our detection algorithm mainly on the valley-free model, which we believe can capture most of the notable route leak incidents. As mentioned previously, the key to having low false positives for detecting route leaks with the valley-free model is to have accurate AS-level relationships. While those relationship types are not publicized by every AS, there have been over two decades of research on the inference of the relationship types using publicly observed BGP data.

While state-of-the-art relationship inference algorithms have been shown to be highly accurate, even a small margin of errors can still incur inaccuracies in the detection of route leaks. To alleviate such artifacts, we synthesize multiple data sources for inferring AS-level relationships, including CAIDA/UCSD’s AS relationship data and our in-house built AS relationship dataset. Building on top of the two AS-level relationships, we create a much more granular dataset at the per-prefix and per-peer levels. The improved dataset allows us to answer the question like what is the relationship between AS1 and AS2 with respect to prefix P observed by collector peer X. This eliminates much of the ambiguity for cases where networks have multiple different relationships based on prefixes and geo-locations, and thus helps us reduce the number of false positives in the system. Besides the AS-relationships datasets, we also apply the AS Hegemony dataset from IHR IIJ to further reduce false positives.

Route leak storage and presentation

After processing each BGP message, we store the generated route leak entries in a database for long-term storage and exploration. We also aggregate individual route leak BGP announcements and group relevant leaks from the same leak ASN within a short period together into route-leak events. The route leak events will then be available for consumption by different downstream applications like web UIs, an API, or alerts.

Route leaks on Cloudflare Radar

At Cloudflare, we aim to help build a better Internet, and that includes sharing our efforts on monitoring and securing Internet routing. Today, we are releasing our route leak detection system as public beta.

Starting today, users going to the Cloudflare Radar ASN pages will now find the list of route leaks that affect that AS. We consider that an AS is being affected when the leaker AS is one hop away from it in any direction, before or after.

The Cloudflare Radar ASN page is directly accessible via https://radar.cloudflare.com/as{ASN}. For example, one can navigate to https://radar.cloudflare.com/as174 to view the overview page for Cogent AS174. ASN pages now show a dedicated card for route leaks detected relevant to the current ASN within the selected time range.

Users can also start using our public data API to lookup route leak events with regards to any given ASN. Our API supports filtering route leak results by time ranges, and ASes involved. Here is a screenshot of the route leak events API documentation page on the newly updated API docs site.

More to come on routing security

There is a lot more we are planning to do with route-leak detection. More features like a global view page, route leak notifications, more advanced APIs, custom automations scripts, and historical archive datasets will begin to ship on Cloudflare Radar over time. Your feedback and suggestions are also very important for us to continue improving on our detection results and serve better data to the public.

Furthermore, we will continue to expand our work on other important topics of Internet routing security, including global BGP hijack detection (not limited to our customer networks), RPKI validation monitoring, open-sourcing tools and architecture designs, and centralized routing security web gateway. Our goal is to provide the best data and tools for routing security to the communities so that we can build a better and more secure Internet together.

In the meantime, we opened a Radar room on our Developers Discord Server. Feel free to join and talk to us; the team is eager to receive feedback and answer questions.

Visit Cloudflare Radar for more Internet insights. You can also follow us on Twitter for more Radar updates.

Cloudflare’s view of the Rogers Communications outage in Canada

João Tomé — Fri, 08 Jul 2022 17:28:12 GMT

(Check for the latest updates at the end of this blog: Internet traffic started to come back at around July 9, 01:00 UTC, after 17 hours)

An outage at one of the largest ISPs in Canada, Rogers Communications, started earlier today, July 8, 2022, and is ongoing (eight hours and counting), and is impacting businesses and consumers. At the time of writing, we are seeing a very small amount of traffic from Rogers, but we are only seeing residual traffic, and nothing close to a full recovery to normal traffic levels.

Based on what we’re seeing and similar incidents in the past, we believe this is likely to be an internal error, not a cyber attack.

Cloudflare Radar shows a near complete loss of traffic from Rogers ASN, AS812, that started around 08:45 UTC (all times in this blog are UTC).

What happened?

Cloudflare data shows that there was a clear spike in BGP (Border Gateway Protocol) updates after 08:15, reaching its peak at 08:45.

BGP is a mechanism to exchange routing information between networks on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver each network packet to its final destination. Without BGP, the Internet routers wouldn't know what to do, and the Internet wouldn't exist.

The Internet is literally a network of networks, or for the maths fans, a graph, with each individual network a node in it, and the edges representing the interconnections. All of this is bound together by BGP. BGP allows one network (say Rogers) to advertise its presence to other networks that form the Internet. Rogers is not advertising its presence, so other networks can’t find Rogers network and so it is unavailable.

A BGP update message informs a router of changes made to a prefix (a group of IP addresses) advertisement or entirely withdraws the prefix. In this next chart, we can see that at 08:45 there was a withdrawal of prefixes from Rogers ASN.

Since then, at 14:30, attempts seem to be made to advertise their prefixes again. This maps to us seeing a slow increase in traffic again from Rogers’ end users.

The graph below, which shows the prefixes we were receiving from Rogers in Toronto, clearly shows the withdrawal of prefixes around 08:45, and the slow start in recovery at 14:30, with another round of withdraws at around 15:45.

Outages happen more regularly than people think. This week we did an Internet disruptions overview for Q2 2022 where you can get a better sense of that, and on how collaborative and interconnected the Internet (the network of networks) is. And not so long ago Facebook had an hours long outage where BGP updates showed Facebook itself disappearing from the Internet.

Follow @CloudflareRadar on Twitter for updates on Internet disruptions as they occur, and find up-to-date information on Internet trends using Cloudflare Radar.

UPDATE: July 8, 2022, 23:00 UTC (19:00 EST)

The Rogers outage is still ongoing after 15 hours without clear signs of Internet traffic fully coming back.

At around 18:15 there was a small bump in traffic (only around 3% of the usual traffic at that time) that lasted for about 30 minutes, quickly returning to the ongoing outage. Here is the representation of that increase in AS812.

Here we can see an update on the BGP announcements. Rogers is still trying to get their services back online with new spikes in announcements to advertise their prefixes, but instants later it all seems to crumble again with the withdrawal of prefixes. The latest attempt was at 21:45 UTC:

It seems that the internal sessions in the Rogers core network flap, causing withdrawals when going down, and advertisements when coming up.Rogers Senior Vice President, Kye Prigg, said a few hours ago in an interview that they haven’t identified the root cause for the outage and still don’t have an estimate on when the service will be restored.

UPDATE: July 9, 2022, 01:50 UTC (21:50 EST)

We are seeing partial recovery of traffic from the Rogers network, mostly after 00:15 UTC. The current traffic rate (01:50 UTC) is at about 17.5% of the rate from 24 hours before, an hour ago it was 13%. More than 17 hours have passed since the outage started.

We continue to see frequent BGP announcement and withdrawals originated from Rogers network, which indicates the core network flapping issue has not been fully resolved at this moment.

UPDATE: July 9, 2022, 09:00 UTC (05:00 EST)

Rogers traffic seems to be climbing back up to usual levels, for the past eight hours. Cloudflare's data shows that Saturday, July 9, at 08:40 UTC, there was around 76% of the previous day traffic at the same time. You can track it here.