The Cloudflare Blog

Enforcing the First AS in BGP AS_PATHs

Bryton Herdes — Wed, 03 Jun 2026 17:00:00 GMT

Some recent route hijacks reported by Spamhaus captured our attention. In many of these hijack attempts, an apparent bad actor took advantage of unused autonomous system numbers, or ASNs. Notably in these hijacks, the actor appears to be creating fake AS_PATHs toward destinations, misdirecting traffic down an unexpected path.

By creating forged AS_PATHs, the hijacker is attempting to lead traffic somewhere it isn’t normally meant to go while also trying to conceal their identity. A hijacker could strip enough information away from a network path that they could pretend to be the origin of a Border Gateway Protocol (BGP) prefix themselves. Attackers can use this hijacked route to intercept traffic and for other nefarious purposes.

There is a simple solution for these cases: basic verification that a BGP peer autonomous system (AS) always includes their network as the “First AS” in an advertised route. To get a sense of how well these safeguards are implemented, we stress-tested several major networks and researched their BGP implementations. Read on to see what we learned.

Examining route hijacks involving forged paths

The idea that an actor is creating fake AS_PATHs is supported when we take a closer look at implausible AS relationships in the path. For example, let’s examine one of the hijacks reported by Spamhaus, involving a prefix belonging to Orange S.A., the French telecom company. Using the monocle tool, we can easily find a BGP UPDATE message related to the hijack:

➜  ~ monocle search --start-ts 2026-04-13T00:20:00Z --end-ts 2026-04-13T00:23:59Z --prefix 90.98.0.0/15 --collector rrc26 --json
{
  "aggr_asn": null,
  "aggr_ip": null,
  "as_path": "48237 1299 199524 270118 17072 41128",
  "atomic": false,
  "collector": "rrc26",
  "communities": null,
  "local_pref": 0,
  "med": 0,
  "next_hop": "185.1.8.3",
  "origin": "IGP",
  "peer_asn": 48237,
  "peer_ip": "185.1.8.3",
  "prefix": "90.98.0.0/15",
  "timestamp": 1776039612.0,
  "type": "ANNOUNCE"
}

We know AS1299 (Arelion) is a Tier 1 network, meaning every AS on the right-hand side in the path is describing an upstream (customer-to-provider) relationship. This implies that AS17072 is a transit provider for AS41128, AS270118 for AS17072, and AS199524 for AS270118. If we take a closer look at these networks:

AS41128 is an unused ASN belonging to Orange France
AS17072 is an ISP primarily based in Mexico
AS270118 is a hosting provider based in Mexico
AS199524 is Gcore, a provider with a global peering presence

The order of the ASes in the message above would suggest that an unused Orange France AS is buying transit from Mexican ISPs, which is then upstreamed to Gcore and Tier 1 providers – which would be quite odd.

In another instance, a reported hijack for prefixes 47.1.0.0/16 and 47.2.0.0/16 from origin AS36429 even included Cloudflare’s main ASN, 13335, in the AS_PATH, “199524 270118 17072 13335 36429”. We can view examples of these BGP UPDATEs in the MRT Explorer from Cloudflare Radar:

We can authoritatively confirm that we (Cloudflare, AS13335) have no adjacency with the now-unused AS36429 owned by Charter. This means this was a forged path by the hijacker that included Cloudflare’s ASN as one of the fake upstream networks in advertisements propagated toward Gcore (AS199524). Further, Spamhaus correctly pointed out that all the hijack routes led to a network behind Gcore peering in Chicago, never actually traversing the Mexican ISPs or Cloudflare’s network in the forwarding path.

Because of this, we can reasonably conclude these paths are forged up until the leftmost common AS, which in this case is AS199524, as the rest of the path seems implausible. We believe what is happening here is the result of a specific strategy by the hijacker, involving the following steps:

Originate BGP announcements for “parked” prefixes
Forge the AS_PATH completely, without including the hijacker’s own local ASN
Advertise these routes to Gcore, AS199524

In these hijacks it appears Gcore (AS199524) skips the verification and enforcement of the First AS matching the expected customer’s ASN. (We’ll look at why it might skip those steps later in this post.) As a result, the forged path is accepted and the hijacked prefixes are propagated to upstream providers and peers.

While Autonomous System Provider Authorization (ASPA) will help invalidate these forged paths, attackers may bypass it by only including an RPKI-ROV-valid origin AS, or a legitimate ASPA upstream AS. To stop these specific hijacks, we must rely on a different protection mechanism already built into BGP: First AS checking and enforcement.

The importance of First AS checking

Routing traffic across the Internet is a bit like shipping a package. When the package is shipped, a log is kept of every courier that handles it. In BGP, this is called the AS_PATH (Autonomous System Path) and it tracks each network in the path of that route.

The AS_PATH attribute in BGP is used for path selection. This selection algorithm determines which route to a destination traverses the best list of hops, where “best” is defined by multiple variables. It is also used for loop prevention, where networks can decide not to accept paths that have already traversed their own network. Aside from keeping a record of the networks a BGP UPDATE, and therefore route, will traverse, the AS_PATH can also be examined by operator-configured routing policies to route around or purposely through a given AS – for example to avoid BGP anomalies having unexpected impact.

BGP was built on trust, and the AS_PATH can be easily manipulated – whether for seemingly legitimate reasons such as AS prepending to move traffic around, or nefarious reasons such as shortening it to artificially attract traffic or perform origin attacks.

Let’s look at how these two types of malicious BGP manipulations are carried out.

Example 1: Forged origin attacks

AS64506 cryptographically signs their routes with an RPKI ROA (Route Origin Authorization) record, to prevent route origin hijacks.
AS64506 also creates an ASPA object, specifying only AS64503 as a valid provider
AS64505 manipulates their AS_PATH to strip AS64505 and originate with AS64506
AS64502 does not enforce the First AS

The route appears RPKI-ROV valid and is the shortest path, effectively hijacking traffic with the route. AS64506 has done everything correctly by specifying a valid ROA for a prefix advertisement, and has even configured an ASPA object consisting of their sole provider AS64503.

Unfortunately, the hijacker running AS64505 is still able to attract traffic meant for AS64506. Even if AS64501, the customer, and AS64502, their provider, run ASPA validation, they will not find an invalid path, because there is no valley in the path “64502 64506”. In other words, AS64505 by way of not even including their own ASN in the AS_PATH is able to pretend they are AS64506 with no intermediate AS hop.

The correct way of preventing this hijack with existing tools is to enforce the First AS in the AS_PATH. Once enforcing this rule, AS64502 would properly drop the route from AS64505.

Example 2: Shortening the AS_PATH to attract traffic

AS64506 has two transit providers: AS64503 and AS64505.
AS64505 bills their customer AS64506 based on traffic usage ratios.
AS64505 strips itself from the path, and their peer AS64504 does not enforce the First AS.

The BGP path selection algorithm now chooses the route via AS64504 as the best path from AS64501. AS64506 pays both of their providers, AS64503 and AS64505, to deliver traffic from the Internet. However, now AS64505 provides a shorter BGP path from far-end sources, meaning AS64505 will process all the traffic toward AS64506 and be paid for doing so, and AS64503 will not be paid at all.

These BGP vulnerabilities can be solved very simply by enforcing the First AS to match the peer AS in a received AS_PATH.

When an operator configures a BGP neighbor, they must set the remote AS of the network they are interconnecting with. If the First AS in the AS_PATH does not match this value, then the path has been manipulated. The First AS enforcement procedure is outlined in Section 6.3 of RFC 4271 very clearly as:

“If the UPDATE message is received from an external peer, the local
system MAY check whether the leftmost (with respect to the position
of octets in the protocol message) AS in the AS_PATH attribute is
equal to the autonomous system number of the peer that sent the
message. If the check determines this is not the case, the Error
Subcode MUST be set to Malformed AS_PATH.”

RFC 7606 later revises how error-handling should be implemented by vendors, suggesting that routes containing malformed AS_PATHs should be dropped via treat-as-withdraw method. This allows routers to drop specific prefixes with malformed attributes without disrupting the entire BGP session.

The current ASPA draft clearly calls out the importance of First AS enforcement, stating that ASPA cannot handle paths where sufficient AS_PATH information is lacking due to malformed announcements. Enforcing First AS in AS_PATHs is a must for Internet routing security.

Measurement by breaking the First AS rule on purpose

Instead of sticking to theoretical failure cases and past public incidents about violations of the First AS rule, we wanted to measure for ourselves how widely these AS_PATH violations could be accepted on the Internet. To do so, we set up BGP announcements to neighbors where we purposely violated the rule ourselves. Here is what we did:

Allocated two IP prefixes, one for IPv4 and one for IPv6, to advertise to Tier 1 External BGP (EBGP) neighbors
Purposely prepended the test prefix advertisements to Tier 1 neighbors with a Cloudflare-owned, non-13335 ASN (AS402542) in front of 13335

For example, we advertised the prefixes to AS1299 from our normal BGP session in Geneva. Our local AS is AS13335, but we include AS402542 clearly as the First AS in the AS_PATH.

bryton@edge01.gva01> show configuration policy-options policy-statement 4-TELIA-ACCEPT-EXPORT term ADV-FIRST-AS-PROBE-CR-1695522
from {
    community ANYCAST-ROUTE;
    prefix-list fl_first_as_prober;
    route-type internal;
}
then {
    origin igp;
    as-path-prepend 402542;
    next-hop self;
    accept;
}

bryton@edge01.gva01> show route advertising-protocol bgp  162.159.82.0/24 detail | grep "AS path: "
     AS path: 402542 [13335] I

With this configuration, our expectation is that:

Networks that do enforce-first-as will quietly drop the route via RFC 7606 withdrawal method
Networks that do not enforce-first-as will accept the route and install it for forwarding toward our test prefixes

Either result will be visible in BGP public route views. It was initially our goal to implement a continuous announcement of prefixes toward all peers that would purposely violate the First AS rule in announcements, and give everyone a tool to check which ISPs validate First AS and those which do not. However, we found there are still networks that have not implemented the guidance published in RFC 7606 when receiving malformed BGP AS_PATHs, and would reset BGP sessions instead of a treat-as-withdraw behavior. This meant we could not safely implement a continuous set of announcements that violate the First AS rule without impacting real traffic to Cloudflare, which we obviously can’t do.

But we can take a closer look at the networks whose policies make the biggest impact: Tier 1 networks. These networks make up the backbone of the Internet and have the largest AS customer cones of anyone, meaning hijacks or malformed paths by these peers have the broadest significance. Let’s start by examining the normal propagation of an anycast prefix, 1.1.1.0/24, across the Tier 1 networks.

The propagation of 1.1.1.0/24 looks how you would expect – it is directly reachable by every Tier 1 network that Cloudflare has a direct adjacency with currently.

Now, let’s compare that with our purposely malformed announcement of the prefix 162.159.82.0/24:

^{Note: AS5511 (Orange S.A.) is not pictured above due to its limited presence in public route views, but it was a part of our testing and measurements.}

The prefix is propagated very differently from 1.1.1.0/24 – far fewer Tier 1 networks are accepting the announcement directly from Cloudflare (in this case from AS13335 with AS402542 prepended). Based on the criteria of our test mentioned earlier, these are the results we found.

Tier 1 networks that are enforcing First AS rule (by dropping the invalid announcements):

AS174 (Cogent)
AS1299 (Arelion)
AS3257 (GTT)
AS3491 (PCCW)
AS5511 (Orange S.A.)
AS6453 (Tata)
AS7018 (AT&T)

Tier 1 networks that are not enforcing the First AS rule (by accepting and installing the prefixes):

AS701 (Verizon)
AS2914 (NTT)
AS3356 (Lumen/Colt/Cirion)
AS6461 (Zayo)
AS6762 (Sparkle)
AS6830 (Liberty Global)
AS12956 (Telefonica)

With our testing, we uncovered a troubling reality: Half of the Tier 1 networks are vulnerable to hijacks that violate the First AS rule.

While we only tested Tier 1 networks in this measurement study, there’s no doubt there are many non-Tier 1 networks that also break the First AS rule.

We noted that the majority of the Tier 1 networks failing the First AS violation test are running Juniper Networks routers, identified by the peers’ MAC addresses.

This highlights that the default behavior of vendors defines how secure a network is “out of the box” against First AS violation-based attacks. Let’s go over some of the BGP implementations and their defaults to have a better understanding of who is protected by default, and who isn’t.

BGP implementations and default behaviors

The chart below lists major routing/networking vendors and their BGP policies. Here, “Yes” means the BGP implementation by default enforces First AS, which is good. “No” means the BGP implementation is vulnerable by default.

BGP implementation	First AS enforced by default	Documentation
Cisco IOS/XE/XR	Yes	bgp enforce-first-as
Junos OS / Junos OS Evolved	No	enforce-first-as
Arista EOS	Yes	bgp enforce-first-as
Nokia SR OS	No	enforce-first-as
Huawei	Yes	check-first-as
Extreme SLX-OS	No	enforce-first-as
RouterOS	No	Configuration not available
BIRD	No	enforce first as
OpenBGPD	Yes	enforce neighbor-as
FRR	Yes (since October 2023 patch)	bgp enforce-first-as

The lack of default enforcement from some vendors may stem from the only valid use case where the First AS should not be enforced on External BGP (EBGP) sessions: Internet Exchange (IX) route servers.

A route server is responsible for transparently (without appending its AS to the AS_PATH) distributing routes between peers on the fabric. This ensures peers do not have to configure new BGP sessions every time a network joins the fabric – instead they can peer with just the route server.

In reality, most production networks have far more sessions with neighbors who are not transparent IX route servers than neighbors who are. It makes much more sense to configure “no enforce-first-as” on a handful of route-server sessions than to manually enable “enforce-first-as” on every single peer in your network.

While a “safe by default” approach is best for protecting against First AS violations, it is generally a steep hill to climb trying to convince vendors to change longstanding defaults. Vendors would also need to introduce a method of doing this gracefully, so as to not impact the IX route server BGP sessions that require “no enforce-first-as” settings to successfully receive routes.

Safer Internet routing with your help: enforce the First AS

Attackers will purposely malform AS_PATHs to slide around BGP security mechanisms. Even RPKI-based ASPA path validation will not be able to protect us from forged-origin hijacks where the path has been totally stripped of everything but the origin AS, leaving nothing for ASPA to invalidate.

The good news is we already have a mitigation for these cases: we can verify the First AS matches BGP peer AS and always enforce it. Refer to the corresponding “Documentation” column in the above table we have provided. It should be safe to enforce First AS on any External BGP (EBGP) session besides those facing an IX route server neighbor.

If you are a network operator, please enforce First AS on your routers today to protect your network and the wider Internet.

If your router vendor or choice of BGP implementation has a default of enforcing First AS, you’re already safe and should be rejecting any First AS violations.

By working together, we can make the Internet safer from these kinds of hijacks.

Bringing more transparency to post-quantum usage, encrypted messaging, and routing security

David Belson — Fri, 27 Feb 2026 06:00:00 GMT

Cloudflare Radar already offers a wide array of security insights — from application and network layer attacks, to malicious email messages, to digital certificates and Internet routing.

And today we’re introducing even more. We are launching several new security-related data sets and tools on Radar:

We are extending our post-quantum (PQ) monitoring beyond the client side to now include origin-facing connections. We have also released a new tool to help you check any website's post-quantum encryption compatibility.
A new Key Transparency section on Radar provides a public dashboard showing the real-time verification status of Key Transparency Logs for end-to-end encrypted messaging services like WhatsApp, showing when each log was last signed and verified by Cloudflare's Auditor. The page serves as a transparent interface where anyone can monitor the integrity of public key distribution and access the API to independently validate our Auditor’s proofs.
Routing Security insights continue to expand with the addition of global, country, and network-level information about the deployment of ASPA, an emerging standard that can help detect and prevent BGP route leaks.

Measuring origin post-quantum support

Since April 2024, we have tracked the aggregate growth of client support for post-quantum encryption on Cloudflare Radar, chronicling its global growth from under 3% at the start of 2024, to over 60% in February 2026. And in October 2025, we added the ability for users to check whether their browser supports X25519MLKEM768 — a hybrid key exchange algorithm combining classical X25519 with ML-KEM, a lattice-based post-quantum scheme standardized by NIST. This provides security against both classical and quantum attacks.

However, post-quantum encryption support on user-to-Cloudflare connections is only part of the story.

For content not in our CDN cache, or for uncacheable content, Cloudflare’s edge servers establish a separate connection with a customer’s origin servers to retrieve it. To accelerate the transition to quantum-resistant security for these origin-facing fetches, we previously introduced an API allowing customers to opt in to preferring post-quantum connections. Today, we’re making post-quantum compatibility of origin servers visible on Radar.

The new origin post-quantum support graph on Radar illustrates the share of customer origins supporting X25519MLKEM768. This data is derived from our automated TLS scanner, which probes TLS 1.3-compatible origins and aggregates the results daily. It is important to note that our scanner tests for support rather than the origin server's specific preference. While an origin may support a post-quantum key exchange algorithm, its local TLS key exchange preference can ultimately dictate the encryption outcome.

While the headline graph focuses on post-quantum readiness, the scanner also evaluates support for classical key exchange algorithms. Within the Radar Data Explorer view, you can also see the full distribution of these supported TLS key exchange methods.

As shown in the graphs above, approximately 10% of origins could benefit from a post-quantum-preferred key agreement today. This represents a significant jump from less than 1% at the start of 2025 — a 10x increase in just over a year. We expect this number to grow steadily as the industry continues its migration. This upward trend likely accelerated in 2025 as many server-side TLS libraries, such as OpenSSL 3.5.0+, GnuTLS 3.8.9+, and Go 1.24+, enabled hybrid post-quantum key exchange by default, allowing platforms and services to support post-quantum connections simply by upgrading their cryptographic library dependencies.

In addition to the Radar and Data Explorer graphs, the origin readiness data is available through the Radar API as well.

As an additional part of our efforts to help the Internet transition to post-quantum cryptography, we are also launching a tool to test whether a specific hostname supports post-quantum encryption. These tests can be run against any publicly accessible website, as long as they allow connections from Cloudflare’s egress IP address ranges.

_{A screenshot of the tool in Radar to test whether a hostname supports post-quantum encryption.}

The tool presents a simple form where users can enter a hostname (such as cloudflare.com or www.wikipedia.org) and optionally specify a custom port (the default is 443, the standard HTTPS port). After clicking "Test", the result displays a tag indicating PQ support status alongside the negotiated TLS key exchange algorithm. If the server prefers PQ secure connections, a green "PQ" tag appears with a message confirming the connection is "post-quantum secure." Otherwise, a red tag indicates the connection is "not post-quantum secure", showing the classical algorithm that was negotiated.

Under the hood, this tool uses Cloudflare Containers — a new capability that allows running container workloads alongside Workers. Since the Workers runtime is not exposed to details of the underlying TLS handshake, Workers cannot initiate TLS scans. Therefore, we created a Go container that leverages the crypto/tls package's support for post-quantum compatibility checks. The container runs on-demand and performs the actual handshake to determine the negotiated TLS key exchange algorithm, returning results through the Radar API.

With the addition of these origin-facing insights, complementing the existing client-facing insights, we have moved all the post-quantum content to its own section on Radar.

Securing E2EE messaging systems with Key Transparency

End-to-end encrypted (E2EE) messaging apps like WhatsApp and Signal have become essential tools for private communication, relied upon by billions of people worldwide. These apps use public-key cryptography to ensure that only the sender and recipient can read the contents of their messages — not even the messaging service itself. However, there's an often-overlooked vulnerability in this model: users must trust that the messaging app is distributing the correct public keys for each contact.

If an attacker were able to substitute an incorrect public key in the messaging app's database, they could intercept messages intended for someone else — all without the sender knowing.

Key Transparency addresses this challenge by creating an auditable, append-only log of public keys — similar in concept to Certificate Transparency for TLS certificates. Messaging apps publish their users' public keys to a transparency log, and independent third parties can verify and vouch that the log has been constructed correctly and consistently over time. In September 2024, Cloudflare announced such a Key Transparency auditor for WhatsApp, providing an independent verification layer that helps ensure the integrity of public key distribution for the messaging app's billions of users.

Today, we're publishing Key Transparency audit data in a new Key Transparency section on Cloudflare Radar. This section showcases the Key Transparency logs that Cloudflare audits, giving researchers, security professionals, and curious users a window into the health and activity of these critical systems.

The new page launches with two monitored logs: WhatsApp and Facebook Messenger Transport. Each monitored log is displayed as a card containing the following information:

Status: Indicates whether the log is online, in initialization, or disabled. An "online" status means the log is actively publishing key updates into epochs that Cloudflare audits. (An epoch represents a set of updates applied to the key directory at a specific time.)
Last signed epoch: The most recent epoch that has been published by the messaging service's log and acknowledged by Cloudflare. By clicking on the eye icon, users can view the full epoch data in JSON format, including the epoch number, timestamp, cryptographic digest, and signature.
Last verified epoch: The most recent epoch that Cloudflare has verified. Verification involves checking that the transition of the transparency log data structure from the previous epoch to the current one represents a valid tree transformation — ensuring the log has been constructed correctly. The verification timestamp indicates when Cloudflare completed its audit.
Root: The current root hash of the Auditable Key Directory (AKD) tree. This hash cryptographically represents the entire state of the key directory at the current epoch. Like the epoch fields, users can click to view the complete JSON response from the auditor.

The data shown on the page is also available via the Key Transparency Auditor API, with endpoints for auditor information and namespaces.

If you would like to perform audit proof verification yourself, you can follow the instructions in our Auditing Key Transparency blog post. We hope that these use cases are the first of many that we publish in this Key Transparency section in Radar — if your company or organization is interested in auditing for your public key or related infrastructure, you can reach out to us here.

Tracking RPKI ASPA adoption

While the Border Gateway Protocol (BGP) is the backbone of Internet routing, it was designed without built-in mechanisms to verify the validity of the paths it propagates. This inherent trust has long left the global network vulnerable to route leaks and hijacks, where traffic is accidentally or maliciously detoured through unauthorized networks.

Although RPKI and Route Origin Authorizations (ROAs) have successfully hardened the origin of routes, they cannot verify the path traffic takes between networks. This is where ASPA (Autonomous System Provider Authorization) comes in. ASPA extends RPKI protection by allowing an Autonomous System (AS) to cryptographically sign a record listing the networks authorized to propagate its routes upstream. By validating these Customer-to-Provider relationships, ASPA allows systems to detect invalid path announcements with confidence and react accordingly.

While the specific IETF standard remains in draft, the operational community is moving fast. Support for creating ASPA objects has already landed in the portals of Regional Internet Registries (RIRs) like ARIN and RIPE NCC, and validation logic is available in major software routing stacks like OpenBGPD and BIRD.

To provide better visibility into the adoption of this emerging standard, we have added comprehensive RPKI ASPA support to the Routing section of Cloudflare Radar. Tracking these records globally allows us to understand how quickly the industry is moving toward better path validation.

Our new ASPA deployment view allows users to examine the growth of ASPA adoption over time, with the ability to visualize trends across the five Regional Internet Registries (RIRs) based on AS registration. You can view the entire history of ASPA entries, dating back to October 1, 2023, or zoom into specific date ranges to correlate spikes in adoption with industry events, such as the introduction of ASPA features on ARIN and RIPE NCC online dashboards.

Beyond aggregate trends, we have also introduced a granular, searchable explorer for real-time ASPA content. This table view allows you to inspect the current state of ASPA records, searchable by AS number, AS name, or by filtering for only providers or customer ASNs. This allows network operators to verify that their records are published correctly and to view other networks’ configurations.

We have also integrated ASPA data directly into the country/region routing pages. Users can now track how different locations are progressing in securing their infrastructure, based on the associated ASPA records from the customer ASNs registered locally.

On individual AS pages, we have updated the Connectivity section. Now, when viewing the connections of a network, you may see a visual indicator for "ASPA Verified Provider." This annotation confirms that an ASPA record exists authorizing that specific upstream connection, providing an immediate signal of routing hygiene and trust.

For ASes that have deployed ASPA, we now display a complete list of authorized provider ASNs along with their details. Beyond the current state, Radar also provides a detailed timeline of ASPA activity involving the AS. This history distinguishes between changes initiated by the AS itself ("As customer") and records created by others designating it as a provider ("As provider"), allowing users to immediately identify when specific routing authorizations were established or modified.

Visibility is an essential first step toward broader adoption of emerging routing security protocols like ASPA. By surfacing this data, we aim to help operators deploy protections and assist researchers in tracking the Internet's progress toward a more secure routing path. For those who need to integrate this data into their own workflows or perform deeper analysis, we are also exposing these metrics programmatically. Users can now access ASPA content snapshots, historical timeseries, and detailed changes data using the newly introduced endpoints in the Cloudflare Radar API.

As security evolves, so does our data

Internet security continues to evolve, with new approaches, protocols, and standards being developed to ensure that information, applications, and networks remain secure. The security data and insights available on Cloudflare Radar will continue to evolve as well. The new sections highlighted above serve to expand existing routing security, transparency, and post-quantum insights already available on Cloudflare Radar.

If you share any of these new charts and graphs on social media, be sure to tag us: @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky). If you have questions or comments, or suggestions for data that you’d like to see us add to Radar, you can reach out to us on social media, or contact us via email.

ASPA: making Internet routing more secure

Mingwei Zhang — Fri, 27 Feb 2026 06:00:00 GMT

Internet traffic relies on the Border Gateway Protocol (BGP) to find its way between networks. However, this traffic can sometimes be misdirected due to configuration errors or malicious actions. When traffic is routed through networks it was not intended to pass through, it is known as a route leak. We have written on our blog multiple times about BGP route leaks and the impact they have on Internet routing, and a few times we have even alluded to a future of path verification in BGP.

While the network community has made significant progress in verifying the final destination of Internet traffic, securing the actual path it takes to get there remains a key challenge for maintaining a reliable Internet. To address this, the industry is adopting a new cryptographic standard called ASPA (Autonomous System Provider Authorization), which is designed to validate the entire path of network traffic and prevent route leaks.

To help the community track the rollout of this standard, Cloudflare Radar has introduced a new ASPA deployment monitoring feature. This view allows users to observe ASPA adoption trends over time across the five Regional Internet Registries (RIRs), and view ASPA records and changes over time at the Autonomous System (AS) level.

What is ASPA?

To understand how ASPA works, it is helpful to look at how the Internet currently secures traffic destinations.

Today, networks use a secure infrastructure system called RPKI (Resource Public Key Infrastructure), which has seen significant deployment growth over the past few years. Within RPKI, networks publish specific cryptographic records called ROAs (Route Origin Authorizations). A ROA acts as a verifiable digital ID card, confirming that an Autonomous System (AS) is officially authorized to announce specific IP addresses. This addresses the "origin hijacks" issue, where one network attempts to impersonate another.

ASPA (Autonomous System Provider Authorization) builds directly on this foundation. While a ROA verifies the destination, an ASPA record verifies the journey.

When data travels across the Internet, it keeps a running log of every network it passes through. In BGP, this log is known as the AS_PATH (Autonomous System Path). ASPA provides networks with a way to officially publish a list of their authorized upstream providers within the RPKI system. This allows any receiving network to look at the AS_PATH, check the associated ASPA records, and verify that the traffic only traveled through an approved chain of networks.

A ROA helps ensure the traffic arrives at the correct destination, ASPA ensures the traffic takes an intended, authorized route to get there. Let’s take a look at how path evaluation actually works in practice.

Route leak detection with ASPA

How does ASPA know if a route is a detour? It relies on the hierarchy of the Internet.

In a healthy Internet routing topology (e.g. “valley-free” routing), traffic generally follows a specific path: it travels "up" from a customer to a large provider (like a major ISP), optionally crosses over to another big provider, and then flows "down" to the destination. You can visualize this as a “mountain” shape:

The Up-Ramp: Traffic starts at a Customer and travels "up" through larger and larger Providers (ISPs), where ISPs pay other ISPs to transit traffic for them.
The Apex: It reaches the top tier of the Internet backbone and may cross a single peering link.

The Down-Ramp: It travels "down" through providers to reach the destination Customer.

^{A visualization of "valley-free" routing. Routes propagate up to a provider, optionally across one peering link, and down to a customer.}

In this model, a route leak is like a valley, or dip. One type of such leak happens when traffic goes down to a customer and then unexpectedly tries to go back up to another provider.

This "down-and-up" movement is undesirable as customers aren't intended nor equipped to transit traffic between two larger network providers.

How ASPA validation works

ASPA gives network operators a cryptographic way to declare their authorized providers, enabling receiving networks to verify that an AS path follows this expected structure.

ASPA validates AS paths by checking the “chain of relationships” from both ends of the routes propagation:

Checking the Up-Ramp: The check starts at the origin and moves forward. At every hop, it asks: "Did this network authorize the next network as a Provider?" It keeps going until the chain stops.
Checking the Down-Ramp: It does the same thing from the destination of a BGP update, moving backward.

If the "Up" path and the "Down" path overlap or meet at the top, the route is Valid. The mountain shape is intact.

However, if the two valid paths do not meet, i.e. there is a gap in the middle where authorization is missing or invalid, ASPA reports such paths as problematic. That gap represents the "valley" or the leak.

Validation process example

Let’s look at a scenario where a network (AS65539) receives a bad route from a customer (AS65538).

The customer (AS65538) is trying to send traffic received from one provider (AS65537) "up" to another provider (AS65539), acting like a bridge between providers. This is a classic route leak. Now let’s walk the ASPA validation process.

We check the Up-Ramp: The original source (AS65536) authorizes its provider. (Check passes).
We check the Down-Ramp: We start from the destination and look back. We see the customer (AS65538).
The Mismatch: The up-ramp ends at AS65537, while the down-ramp ends at 65538. The two ramps do not connect.

Because the "Up" path and "Down" path fail to connect, the system flags this as ASPA Invalid. ASPA is required to do this path validation, as without signed ASPA objects in RPKI, we cannot find which networks are authorized to advertise which prefixes to whom. By signing a list of provider networks for each AS, we know which networks should be able to propagate prefixes laterally or upstream.

ASPA against forged-origin hijacks

ASPA can serve as an effective defense against forged-origin hijacks, where an attacker bypasses Route Origin Validation (ROV) by pretending and advertising a BGP path to a real origin prefix. Although the origin AS remains correct, the relationship between the hijacker and the victim is fabricated.

ASPA exposes this deception by allowing the victim network to cryptographically declare its actual authorized providers; because the hijacker is not on that authorized list, the path is rejected as invalid, effectively preventing the malicious redirection.

ASPA cannot fully protect against forged-origin hijacks, however. There is still at least one case where not even ASPA validation can fully prevent this type of attack on a network. An example of a forged-origin hijack that ASPA cannot account for is when a provider forges a path advertisement to their customer.

Essentially, a provider could “fake” a peering link with another AS to attract traffic from a customer with a short AS_PATH length, even when no such peering link exists. ASPA does not prevent this path forgery by the provider, because ASPA only works off of provider information and knows nothing specific about peering relationships.

So while ASPA can be an effective means of rejecting forged-origin hijack routes, there are still some rare cases where it will be ineffective, and those are worth noting.

Creating ASPA objects: just a few clicks away

Creating an ASPA object for your network (or Autonomous System) is now a simple process in registries like RIPE and ARIN. All you need is your AS number and the AS numbers of the providers you purchase Internet transit service from. These are the authorized upstream networks you trust to announce your IP addresses to the wider Internet. In the opposite direction, these are also the networks you authorize to send you a full routing table, which acts as the complete map of how to reach the rest of the Internet.

We’d like to show you just how easy creating an ASPA object is with a quick example.

Say we need to create the ASPA object for AS203898, an AS we use for our Cloudflare London office Internet. At the time of writing we have three Internet providers for the office: AS8220, AS2860, and AS1273. This means we will create an ASPA object for AS203898 with those three provider members in a list.

First, we log into the RIPE RPKI dashboard and navigate to the ASPA section:

Then, we click on “Create ASPA” for the object we want to create an ASPA object for. From there, we just fill in the providers for that AS.

It’s as simple as that. After just a short period of waiting, we can query the global RPKI ecosystem and find our ASPA object for AS203898 with the providers we defined.

It’s a similar story with ARIN, the only other Regional Internet Registries (RIRs) that currently supports the creation of ASPA objects. Log in to ARIN online, then navigate to Routing Security, and click “Manage RPKI”.

From there, you’ll be able to click on “Create ASPA”. In this example, we will create an object for another one of our ASNs, AS400095.

And that’s it – now we have created our ASPA object for AS40095 with provider AS0.

The “AS0” provider entry is special when used, and means the AS owner attests there are no valid upstream providers for their network. By definition this means every transit-free Tier-1 network should eventually sign an ASPA with only “AS0” in their object, if they truly only have peer and customer relationships.

New ASPA features in Cloudflare Radar

We have added a new ASPA deployment monitoring feature to Cloudflare Radar. The new ASPA deployment view allows users to examine the growth of ASPA adoption over time, with the ability to visualize trends across the five Regional Internet Registries (RIRs) based on AS registration.

We have also integrated ASPA data directly into the country/region and ASN routing pages. Users can now track how different locations are progressing in securing their infrastructure, based on the associated ASPA records from the customer ASNs registered locally.

There are also new features when you zoom into a particular Autonomous System (AS), for example AS203898.

We can see whether a network’s observed BGP upstream providers are ASPA authorized, their full list of providers in their ASPA object, and the timeline of ASPA changes that involve their AS.

The road to better routing security

With ASPA finally becoming a reality, we have our cryptographic upgrade for Internet path validation. However, those who have been around since the start of RPKI for route origin validation know this will be a long road to actually providing significant value on the Internet. Changes are needed to RPKI Relaying Party (RP) packages, signer implementations, RTR (RPKI-to-Router protocol) software, and BGP implementations to actually use ASPA objects and validate paths with them.

In addition to ASPA adoption, operators should also configure BGP roles as described within RFC9234. The BGP roles configured on BGP sessions will help future ASPA implementations on routers decide which algorithm to apply: upstream or downstream. In other words, BGP roles give us the power as operators to directly tie our intended BGP relationships with another AS to sessions with those neighbors. Check with your routing vendors and make sure they support RFC9234 BGP roles and OTC (Only-to-Customer) attribute implementation.

To get the most out of ASPA, we encourage everyone to create their ASPA objects for their AS. Creating and maintaining these ASPA objects requires careful attention. In the future, as networks use these records to actively block invalid paths, omitting a legitimate provider could cause traffic to be dropped. However, managing this risk is no different from how networks already handle Route Origin Authorizations (ROAs) today. ASPA is the necessary cryptographic upgrade for Internet path validation, and we’re happy it’s here!

A closer look at a BGP anomaly in Venezuela

Bryton Herdes — Tue, 06 Jan 2026 08:00:00 GMT

As news unfolds surrounding the U.S. capture and arrest of Venezuelan leader Nicolás Maduro, a cybersecurity newsletter examined Cloudflare Radar data and took note of a routing leak in Venezuela on January 2.

We dug into the data. Since the beginning of December there have been eleven route leak events, impacting multiple prefixes, where AS8048 is the leaker. Although it is impossible to determine definitively what happened on the day of the event, this pattern of route leaks suggests that the CANTV (AS8048) network, a popular Internet Service Provider (ISP) in Venezuela, has insufficient routing export and import policies. In other words, the BGP anomalies observed by the researcher could be tied to poor technical practices by the ISP rather than malfeasance.

In this post, we’ll briefly discuss Border Gateway Protocol (BGP) and BGP route leaks, and then dig into the anomaly observed and what may have happened to cause it.

Background: BGP route leaks

First, let’s revisit what a BGP route leak is. BGP route leaks cause behavior similar to taking the wrong exit off of a highway. While you may still make it to your destination, the path may be slower and come with delays you wouldn’t otherwise have traveling on a more direct route.

Route leaks were given a formal definition in RFC7908 as “the propagation of routing announcement(s) beyond their intended scope.” Intended scope is defined using pairwise business relationships between networks. The relationships between networks, which in BGP we represent using Autonomous Systems (ASes), can be one of the following:

customer-provider: A customer pays a provider network to connect them and their own downstream customers to the rest of the Internet
peer-peer: Two networks decide to exchange traffic between one another, to each others’ customers, settlement-free (without payment)

In a customer-provider relationship, the provider will announce all routes to the customer. The customer, on the other hand, will advertise only the routes from their own customers and originating from their network directly.

In a peer-peer relationship, each peer will advertise to one another only their own routes and the routes of their downstream customers.

These advertisements help direct traffic in expected ways: from customers upstream to provider networks, potentially across a single peering link, and then potentially back down to customers on the far end of the path from their providers.

A valid path would look like the following that abides by the valley-free routing rule:

A route leak is a violation of valley-free routing where an AS takes routes from a provider or peer and redistributes them to another provider or peer. For example, a BGP path should never go through a “valley” where traffic goes up to a provider, and back down to a customer, and then up to a provider again. There are different types of route leaks defined in RFC7908, but a simple one is the Type 1: Hairpin route leak between two provider networks by a customer.

In the figure above, AS64505 takes routes from one of its providers and redistributes them to their other provider. This is unexpected, since we know providers should not use their customer as an intermediate IP transit network. AS64505 would become overwhelmed with traffic, as a smaller network with a smaller set of backbone and network links than its providers. This can become very impactful quickly.

Route leak by AS8048 (CANTV)

Now that we have reminded ourselves what a route leak is in BGP, let’s examine what was hypothesized in the newsletter post. The post called attention to a few route leak anomalies on Cloudflare Radar involving AS8048. On the Radar page for this leak, we see this information:

We see the leaker AS, which is AS8048 — CANTV, Venezuela’s state-run telephone and Internet Service Provider. We observe that routes were taken from one of their providers AS6762 (Sparkle, an Italian telecom company) and then redistributed to AS52320 (V.tal GlobeNet, a Colombian network service provider). This is definitely a route leak.

The newsletter suggests “BGP shenanigans” and posits that such a leak could be exploited to collect intelligence useful to government entities.

While we can’t say with certainty what caused this route leak, our data suggests that its likely cause was more mundane. That’s in part because BGP route leaks happen all of the time, and they have always been part of the Internet — most often for reasons that aren’t malicious.

To understand more, let’s look closer at the impacted prefixes and networks. The prefixes involved in the leak were all originated by AS21980 (Dayco Telecom, a Venezuelan company):

The prefixes are also all members of the same 200.74.224.0/20 subnet, as noted by the newsletter author. Much more intriguing than this, though, is the relationship between the originating network AS21980 and the leaking network AS8048: AS8048 is a provider of AS21980.

The customer-provider relationship between AS8048 and AS21980 is visible in both Cloudflare Radar and bgp.tools AS relationship interference data. We can also get a confidence score of the AS relationship using the monocle tool from BGPKIT, as you see here:

➜ ~ monocle as2rel 8048 21980 Explanation: - connected: % of 1813 peers that see this AS relationship - peer: % where the relationship is peer-to-peer - as1_upstream: % where ASN1 is the upstream (provider) - as2_upstream: % where ASN2 is the upstream (provider)

Data source: https://data.bgpkit.com/as2rel/as2rel-latest.json.bz2

╭──────┬───────┬───────────┬──────┬──────────────┬──────────────╮ │ asn1 │ asn2 │ connected │ peer │ as1_upstream │ as2_upstream │ ├──────┼───────┼───────────┼──────┼──────────────┼──────────────┤ │ 8048 │ 21980 │ 9.9% │ 0.6% │ 9.4% │ 0.0% │ ╰──────┴───────┴───────────┴──────┴──────────────┴──────────────╯

While only 9.9% of route collectors see these two ASes as adjacent, almost all of the paths containing them reflect AS8048 as an upstream provider for AS21980, meaning confidence is high in the provider-customer relationship between the two.

Many of the leaked routes were also heavily prepended with AS8048, meaning it would have been potentially less attractive for routing when received by other networks. Prepending is the padding of an AS more than one time in an outbound advertisement by a customer or peer, to attempt to switch traffic away from a particular circuit to another. For example, many of the paths during the leak by AS8048 looked like this: “52320,8048,8048,8048,8048,8048,8048,8048,8048,8048,23520,1299,269832,21980”.

You can see that AS8048 has sent their AS multiple times in an advertisement to AS52320, because by means of BGP loop prevention the path would never actually travel in and out of AS8048 multiple times in a row. A non-prepended path would look like this: “52320,8048,23520,1299,269832,21980”.

If AS8048 was intentionally trying to become a man-in-the-middle (MITM) for traffic, why would they make the BGP advertisement less attractive instead of more attractive? Also, why leak prefixes to try and MITM traffic when you’re already a provider for the downstream AS anyway? That wouldn’t make much sense.

The leaks from AS8048 also surfaced in multiple separate announcements, each around an hour apart on January 2, 2026 between 15:30 and 17:45 UTC, suggesting they may have been having network issues that surfaced in a routing policy issue or a convergence-based mishap.

It is also noteworthy that these leak events begin over twelve hours prior to the U.S. military strikes in Venezuela. Leaks that impact South American networks are common, and we have no reason to believe, based on timing or the other factors I have discussed, that the leak is related to the capture of Maduro several hours later.

In fact, looking back the past two months, we can see plenty of leaks by AS8048 that are just like this one, meaning this is not a new BGP anomaly:

You can see above in the history of Cloudflare Radar’s route leak alerting pipeline that AS8048 is no stranger to Type 1 hairpin route leaks. Since the beginning of December alone there have been eleven route leak events where AS8048 is the leaker.

From this we can draw a more innocent possible explanation about the route leak: AS8048 may have configured too loose of export policies facing at least one of their providers, AS52320. And because of that, redistributed routes belong to their customer even when the direct customer BGP routes were missing. If their export policy toward AS52320 only matched on IRR-generated prefix list and not a customer BGP community tag, for example, it would make sense why an indirect path toward AS6762 was leaked back upstream by AS8048.

These types of policy errors are something RFC9234 and the Only-to-Customer (OTC) attribute would help with considerably, by coupling BGP more tightly to customer-provider and peer-peer roles, when supported by all routing vendors. I will save the more technical details on RFC9234 for a follow-up blog post.

The difference between origin and path validation

The newsletter also calls out as “notable” that Sparkle (AS6762) does not implement RPKI (Resource Public Key Infrastructure) Route Origin Validation (ROV). While it is true that AS6762 appears to have an incomplete deployment of ROV and is flagged as “unsafe” on isbgpsafeyet.com because of it, origin validation would not have prevented this BGP anomaly in Venezuela.

It is important to separate BGP anomalies into two categories: route misoriginations, and path-based anomalies. Knowing the difference between the two helps to understand the solution for each. Route misoriginations, often called BGP hijacks, are meant to be fixed by RPKI Route Origin Validation (ROV) by making sure the originator of a prefix is who rightfully owns it. In the case of the BGP anomaly described in this post, the origin AS was correct as AS21980 and only the path was anomalous. This means ROV wouldn’t help here.

Knowing that, we need path-based validation. This is what Autonomous System Provider Authorization (ASPA), an upcoming draft standard in the IETF, is going to provide. The idea is similar to RPKI Route Origin Authorizations (ROAs) and ROV: create an ASPA object that defines a list of authorized providers (upstreams) for our AS, and everyone will use this to invalidate route leaks on the Internet at various vantage points. Using a concrete example, AS6762 is a Tier-1 transit-free network, and they would use the special reserved “AS0” member in their ASPA signed object to communicate to the world that they have no upstream providers, only lateral peers and customers. Then, AS52320, the other provider of AS8048, would see routes from their customer with “6762” in the path and reject them by performing an ASPA verification process.

ASPA is based on RPKI and is exactly what would help prevent route leaks similar to the one we observed in Venezuela.

A safer BGP, built together

We felt it was important to offer an alternative explanation for the BGP route leak by AS8048 in Venezuela that was observed on Cloudflare Radar. It is helpful to understand that route leaks are an expected side effect of BGP historically being based entirely on trust and carefully-executed business relationship-driven intent.

While route leaks could be done with malicious intent, the data suggests this event may have been an accident caused by a lack of routing export and import policies that would prevent it. This is why to have a safer BGP and Internet we need to work together and drive adoption of RPKI-based ASPA, for which RIPE recently released object creation, on the wide Internet. It will be a collaborative effort, just like RPKI has been for origin validation, but it will be worth it and prevent BGP incidents such as the one in Venezuela.

In addition to ASPA, we can all implement simpler mechanisms such as Peerlock and Peerlock-lite as operators, which sanity-checks received paths for obvious leaks. One especially promising initiative is the adoption of RFC9234, which should be used in addition to ASPA for preventing route leaks with the establishing of BGP roles and a new Only-To-Customer (OTC) attribute. If you haven’t already asked your routing vendors for an implementation of RFC9234 to be on their roadmap: please do. You can help make a big difference.

Update: Sparkle (AS6762) finished RPKI ROV deployment and was marked safe on February 3, 2026.

BGP zombies and excessive path hunting

Bryton Herdes — Fri, 31 Oct 2025 15:30:00 GMT

Here at Cloudflare, we’ve been celebrating Halloween with some zombie hunting of our own. The zombies we’d like to remove are those that disrupt the core framework responsible for how the Internet routes traffic: BGP (Border Gateway Protocol).

A BGP zombie is a silly name for a route that has become stuck in the Internet’s Default-Free Zone, aka the DFZ: the collection of all internet routers that do not require a default route, potentially due to a missed or lost prefix withdrawal.

The underlying root cause of a zombie could be multiple things, spanning from buggy software in routers or just general route processing slowness. It’s when a BGP prefix is meant to be gone from the Internet, but for one reason or another it becomes a member of the undead and hangs around for some period of time.

The longer these zombies linger, the more they create operational impact and become a real headache for network operators. Zombies can lead packets astray, either by trapping them inside of route loops or by causing them to take an excessively scenic route. Today, we’d like to celebrate Halloween by covering how BGP zombies form and how we can lessen the likelihood that they wreak havoc on Internet traffic.

Path hunting

To understand the slowness that can often lead to BGP zombies, we need to talk about path hunting. Path hunting occurs when routers running BGP exhaustively search for the best path to a prefix as determined by Longest Prefix Matching (LPM) and BGP routing attributes like path length and local preference. This becomes relevant in our observations of exactly how routes become stuck, for how long they become stuck, and how visible they are on the Internet.

For example, path hunting happens when a more-specific BGP prefix is withdrawn from the global routing table, and networks need to fallback to a less-specific BGP advertisement. In this example, we use 2001:db8::/48 for the more-specific BGP announcement and 2001:db8::/32 for the less-specific prefix. When the /48 is withdrawn by the originating Autonomous System (AS), BGP routers have to recognize that route as missing and begin routing traffic to IPs such as 2001:db8::1 via the 2001:db8::/32 route, which still remains while the prefix 2001:db8::/48 is gone.

Let’s see what this could look like in action with a few diagrams.

_{Diagram 1: Active 2001:db8::/48 route}

In this initial state, 2001:db8::/48 is used actively for traffic forwarding, which all flows through AS13335 on the way to AS64511. In this case, AS64511 would be a BYOIP customer of Cloudflare. AS64511 also announces a backup route to another Internet Service Provider (ISP), AS64510, but this route is not active even in AS64510’s routing table for forwarding to 2001:db8::1 because 2001:db8::/48 is a longer prefix match when compared to 2001:db8::/32.

Things get more interesting when AS64510 signals for 2001:db8::/48 to be withdrawn by Cloudflare (AS13335), perhaps because a DDoS attack is over and the customer opts to use Cloudflare only when they are actively under attack.

When the customer signals to Cloudflare (via BGP Control or API call) to withdraw the 2001:db8::/48 announcement, all BGP routers have to converge upon this update, which involves path hunting. AS13335 sends a BGP withdrawal message for 2001:db8::/48 to its directly-connected BGP neighbors. While the news of withdrawal may travel quickly from AS13335 to the other networks, news may travel more quickly to some of the neighbors than others. This means that until everyone has received and processed the withdrawal, networks may try routing through one another to reach the 2001:db8::/48 prefix – even after AS13335 has withdrawn it.

_{Diagram 2: 2001:db8::/48 route withdrawn via AS13335}

Imagine AS64501 is a little slower than the rest – perhaps due to using older hardware, hardware being overloaded, a software bug, specific configuration settings, poor luck, or some other factor – and still has not processed the withdrawal of the /48. This in itself could be a BGP zombie, since the route is stuck for a small period. Our pings toward 2001:db8::1 are never able to actually reach AS64511, because AS13335 knows the /48 is meant to be withdrawn, but some routers carrying a full table have not yet converged upon that result.

The length of time spent path hunting is amplified by something called the Minimum Route Advertisement Interval (MRAI). The MRAI specifies the minimum amount of time between BGP advertisement messages from a BGP router, meaning it introduces a purposeful number of seconds of delay between each BGP advertisement update. RFC4271 recommends an MRAI value of 30-seconds for eBGP updates, and while this can cut down on the chattiness of BGP, or even potential oscillation of updates, it also makes path hunting take longer.

At the next cycle of path hunting, even AS64501, which was previously still pointing toward a nonexistent /48 route from AS13335, should find the /32 advertisement is all that is left toward 2001:db8::1. Once it has done so, the traffic flow will become the following:

_{Diagram 3: Routing fallback to 2001:db8::/32 and 2001:db8::/48 is gone from DFZ}

This would mean BGP path hunting is over, and the Internet has realized that the 2001:db8::/32 is the best route available toward 2001:db8::1, and that 2001:db8::/48 is really gone. While in this example we’ve purposely made path hunting only last two cycles, in reality it can be far more, especially with how highly connected AS13335 is to thousands of peer networks and Tier-1’s globally.

Now that we’ve discussed BGP path hunting and how it works, you can probably already see how a BGP zombie outbreak can begin and how routing tables can become stuck for a lengthy period of time. Excessive BGP path hunting for a previously-known more-specific prefix can be an early indicator that a zombie could follow.

Spawning a zombie

Zombies have captured our attention more recently as they were noticed by some of our customers leveraging Bring-Your-Own-IP (BYOIP) on-demand advertisement for Magic Transit. BYOIP may be configured in two modes: "always-on", in which a prefix is continuously announced, or "on-demand", where a prefix is announced only when a customer chooses to. For some on-demand customers, announcement and withdrawal cycles may be a more frequent occurrence, which can lead to an increase in BGP zombies.

With that in mind and also knowing how path hunting works, let’s spawn our own zombie onto the Internet. To do so, we’ll take a spare block of IPv4 and IPv6 and announce them like so:

Once the routes are announced and stable, we’ll then proceed to withdraw the more specific routes advertised via Cloudflare globally. With a few quick clicks, we’ve successfully re-animated the dead.

Variant A: Ghoulish Gateways

One place zombies commonly occur is between upstream ISPs. When one router in a given ISP’s network is a little slower to update, routes can become stuck.

Take, for example, the following loop we observed between two of our upstream partners:

7. be2431.ccr31.sjc04.atlas.cogentco.com
8. tisparkle.sjc04.atlas.cogentco.com
9. 213.144.177.184
10. 213.144.177.184
11. 89.221.32.227
12. (waiting for reply)
13. be2749.rcr71.goa01.atlas.cogentco.com
14. be3219.ccr31.mrs02.atlas.cogentco.com
15. be2066.agr21.mrs02.atlas.cogentco.com
16. telecomitalia.mrs02.atlas.cogentco.com
17. 213.144.177.186
18. 89.221.32.227

Or this loop - observed on the same withdrawal test - between two different providers:

15. if-bundle-12-2.qcore2.pvu-paris.as6453.net
16. if-bundle-56-2.qcore1.fr0-frankfurt.as6453.net
17. if-bundle-15-2.qhar1.fr0-frankfurt.as6453.net
18. 195.219.223.11
19. 213.144.177.186
20. 195.22.196.137
21. 213.144.177.186
22. 195.22.196.137

Variant B: Undead LAN (Local Area Network)

Simultaneously, zombies can occur entirely within a given network. When a route is withdrawn from Cloudflare’s network, each device in our network must individually begin the process of withdrawing the route. While this is generally a smooth process, things can still become stuck.

Take, for instance, a situation where one router inside of our network has not yet fully processed the withdrawal. Connectivity partners will continue routing traffic towards that router (as they have not yet received the withdrawal) while no host remains behind the router which is capable of actually processing the traffic. The result is an internal-only looping path:

10. 192.0.2.112
11. 192.0.2.113
12. 192.0.2.112
13. 192.0.2.113
14. 192.0.2.112
15. 192.0.2.113
16. 192.0.2.112
17. 192.0.2.113
18. 192.0.2.112
19. 192.0.2.113

Unlike most fictionally-depicted hoards of the walking dead, our highly-visible zombie has a limited lifetime in most major networks – in this instance, only around around 6 minutes, after which most had re-converged around the less-specific as the best path. Sadly, this is on the shorter side – in some cases, we have seen long-lived zombies cause reachability issues for more than 10 minutes. It’s safe to say this is longer than most network operators would expect BGP convergence to take in a normal situation.

But, you may ask – is this the excessive path hunting we talked about earlier, or a BGP zombie? Really, it depends on the expectation and tolerance around how long BGP convergence should take to process the prefix withdrawal. In any case, even over 30 minutes after our withdrawal of our more-specific prefix, we are able to see zombie routes in the route-views public collectors easily:

~ % monocle search --start-ts 2025-10-28T12:40:13Z --end-ts 2025-10-28T13:00:13Z --prefix 198.18.0.0/24
A|1761656125.550447|206.82.105.116|54309|198.18.0.0/24|54309 13335 395747|IGP|206.82.104.31|0|0|54309:111|false|||route-views.ny

You might argue that six to eleven minutes (or more) is a reasonable time for worst-case BGP convergence in the Tier-1 network layer, though that itself seems like a stretch. Even setting that aside, our data shows that very real BGP zombies exist in the global routing table, and they will negatively impact traffic. Curiously, we observed the path hunting delay is worse on IPv4, with the longest observed IPv6 impact in major (Tier-1) networks being just over 4 minutes. One could speculate this is in part due to the much higher number of IPv4 prefixes in the Internet global routing table than the IPv6 global table, and how BGP speakers handle them separately.

_{Source: RIPEstat’s BGPlay}

Part of the delay appears to originate from how interconnected AS13335 is; being heavily peered with a large portion of the Internet increases the likelihood of a route becoming stuck in a given location. Given that, perhaps a zombie would be shorter-lived if we operated in the opposite direction: announcing a less-specific persistently to 13335 and announcing more specifics via our local ISP during normal operation. Since the withdrawal will come from what is likely a less well-peered network, the time-to-convergence may be shorter:

Indeed, as predicted, we still get a stuck route, and it only lives for around 20 seconds in the Tier-1 network layer:

19. be12488.ccr42.ams03.atlas.cogentco.com
20. 38.88.214.142
21. be2020.ccr41.ams03.atlas.cogentco.com
22. 38.88.214.142
23. (waiting for reply)
24. 38.88.214.142
25. (waiting for reply)
26. 38.88.214.142

Unfortunately, that 20 seconds is still an impactful 20 seconds - while better, it’s not where we want to be. The exact length of time will depend on the native ISP networks one is connected with, and it could certainly ease into the minutes worth of stuck routing.

In both cases, the initial time-to-announce yielded no loss, nor was a zombie created, as both paths remained valid for the entirety of their initial lifetime. Zombies were only created when a more specific prefix was fully withdrawn. A newly-announced route is not subject to path hunting in the same way a withdrawn more-specific route is. As they say, good (new) news travels fast.

Lessening the zombie outbreak

Our findings lead us to believe that the withdrawal of a more-specific prefix may lead to zombies running rampant for longer periods of time. Because of this, we are exploring some improvements that make the consequences of BGP zombie routing less impactful for our customers relying on our on-demand BGP functionality.

For the traffic that does reach Cloudflare with stuck routes, we will introduce some BGP traffic forwarding improvements internally that allow for a more graceful withdrawal of traffic, even if routes are erroneously pointing toward us. In many ways, this will closely resemble the BGP well-known no-export community’s functionality from our servers running BGP. This means even if we receive traffic from external parties due to stuck routing, we will still have the opportunity to deliver traffic to our far-end customers over a tunneled connection or via a Cloudflare Network Interconnect (CNI). We look forward to reporting back the positive impact after making this improvement for a more graceful draining of traffic by default.

For the traffic that does not reach Cloudflare’s edge, and instead loops between network providers, we need to use a different approach. Since we know more-specific to less-specific prefix routing fallback is more prone to BGP zombie outbreak, we are encouraging customers to instead use a multi-step draining process when they want traffic drained from the Cloudflare edge for an on-demand prefix without introducing route loops or blackhole events. The draining process when removing traffic for a BYOIP prefix from Cloudflare should look like this:

The customer is already announcing an example prefix from Cloudflare, ex. 198.18.0.0/24
The customer begins natively announcing the prefix 198.18.0.0/24 (i.e. the same-length as the prefix they are advertising via Cloudflare) from their network to the Internet Service Providers that they wish to fail over traffic to.
After a few minutes, the customer signals BGP withdrawal from Cloudflare for the 198.18.0.0/24 prefix.

The result is a clean cut over: impactful zombies are avoided because the same-length prefix (198.18.0.0/24) remains in the global routing table. Excessive path hunting is avoided because instead of routers needing to aggressively seek out a missing more-specific prefix match, they can fallback to the same-length announcement that persists in the routing table from the natively-originated path to the customer’s network.

_{Source: RIPEstat’s BGPlay}

What next?

We are going to continue to refine our methods of measuring BGP zombies, so you can look forward to more insights in the future. There is also work from others in the community around zombie measurement that is interesting and producing useful data. In terms of combatting the software bugs around BGP zombie creation, routing vendors should implement RFC9687, the BGP SendHoldTimer. The general idea is that a local router can detect via the SendHoldTimer if the far-end router stops processing BGP messages unexpectedly, which lowers the possibility of zombies becoming stuck for long periods of time.

In addition, it’s worth keeping in mind our observations made in this post about more-specific prefix announcements and excessive path hunting. If as a network operator you rely on more-specific BGP prefix announcements for failover, or for traffic engineering, you need to be aware that routes could become stuck for a longer period of time before full BGP convergence occurs.

If you’re interested in problems like BGP zombies, consider coming to work at Cloudflare or applying for an internship. Together we can help build a better Internet!

Making the Internet observable: the evolution of Cloudflare Radar

David Belson — Mon, 27 Oct 2025 12:00:00 GMT

The Internet is constantly changing in ways that are difficult to see. How do we measure its health, spot new threats, and track the adoption of new technologies? When we launched Cloudflare Radar in 2020, our goal was to illuminate the Internet's patterns, helping anyone understand what was happening from a security, performance, and usage perspective, based on aggregated data from Cloudflare services. From the start, Internet measurement, transparency, and resilience has been at the core of our mission.

The launch blog post noted, “There are three key components that we’re launching today: Radar Internet Insights, Radar Domain Insights and Radar IP Insights.” These components have remained at the core of Radar, and they have been continuously expanded and complemented by other data sets and capabilities to support that mission. By shining a brighter light on Internet security, routing, traffic disruptions, protocol adoption, DNS, and now AI, Cloudflare Radar has become an increasingly comprehensive source of information and insights. And despite our expanding scope, we’ve focused on maintaining Radar’s “easy access” by evolving our information architecture, making our search capabilities more powerful, and building everything on top of a powerful, publicly-accessible API.

Now more than ever, Internet observability matters. New protocols and use cases compete with new security threats. Connectivity is threatened not only by errant construction equipment, but also by governments practicing targeted content blocking. Cloudflare Radar is uniquely positioned to provide actionable visibility into these trends, threats, and events with local, network, and global level insights, spanning multiple data sets. Below, we explore some highlights of Radar’s evolution over the five years since its launch, looking at how Cloudflare Radar is building one of the industry’s most comprehensive views of what is happening on the Internet.

Making Internet security more transparent

The Cloudflare Research team takes a practical approach to research, tackling projects that have the potential to make a big impact. A number of these projects have been in the security space, and for three of them, we’ve collaborated to bring associated data sets to Radar, highlighting the impact of these projects.

The 2025 launch of the Certificate Transparency (CT) section on Radar was the culmination of several months of collaborative work to expand visibility into key metrics for the Certificate Transparency ecosystem, enabling us to deprecate the original Merkle Town CT dashboard, which was launched in 2018. Digital certificates are the foundation of trust on the modern Internet, and Certificate Authorities (CAs) serve as trusted gatekeepers, issuing those certificates, with CT logs providing a public, auditable record of every certificate issued, making it possible to detect fraudulent or mis-issued certificates. The information available in the new CT section allows users to explore information about these certificates and CAs, as well as about the CT logs that capture information about every issued certificate.

In 2024, members of Cloudflare’s Research team collaborated with outside researchers to publish a paper titled “Global, Passive Detection of Connection Tampering”. Among the findings presented in the paper, it noted that globally, about 20% of all connections to Cloudflare close unexpectedly before any useful data exchange occurs. This unexpected closure is consistent with connection tampering by a third party, which may occur, for instance, when repressive governments seek to block access to websites or applications. Working with the Research team, we added visibility into TCP resets and timeouts to the Network Layer Security page on Radar. This graph, such as the example below for Turkmenistan, provides a perspective on potential connection tampering activity globally, and at a country level. Changes and trends visible in this graph can be used to corroborate reports of content blocking and other local restrictions on Internet connectivity.

The research team has been working on post-quantum encryption since 2017, racing improvements in quantum computing to help ensure that today’s encrypted data and communications are resistant to being decrypted in the future. They have led the drive to incorporate post-quantum encryption across Cloudflare’s infrastructure and services, and in 2023 we announced that it would be included in our delivery services, available to everyone and free of charge, forever. However, to take full advantage, support is needed on the client side as well, so to track that, we worked together to add a graph to Radar’s Adoption & Usage page that tracks the post-quantum encrypted share of HTTPS request traffic. Starting 2024 at under 3%, it has grown to just over 47%, thanks to major browsers and code libraries activating post-quantum support by default.

Measuring AI bot & crawler activity

The rapid proliferation and growth of AI platforms since the launch of OpenAI’s ChatGPT in November 2022 has upended multiple industries. This is especially true for content creators. Over the last several decades, they generally allowed their sites to be crawled in exchange for the traffic that the search engines would send back to them — traffic that could be monetized in various ways. However, two developments have changed this dynamic. First, AI platforms began aggressively crawling these sites to vacuum up content to use for training their models (with no compensation to content creators). Second, search engines have evolved into answer engines, drastically reducing the amount of traffic they send back to sites. This has led content owners to demand solutions.

Among these solutions is providing customers with increased visibility into how frequently AI crawlers are scraping their content, and Radar has built on that to provide aggregated perspectives on this activity. Radar’s AI Insights page provides graphs based on crawling traffic, including traffic trends by bot and traffic trends by crawl purpose, both of which can be broken out by industry set as well. Customers can compare the traffic trends we show on the dashboard with trends across their industry.

One key insight is the crawl-to-refer ratio: a measure of how many HTML pages a crawler consumes in comparison to the number of page visits that they refer back to the crawled site. A view into these ratios by platform, and how they change over time, gives content creators insight into just how significant the reciprocal traffic imbalances are, and the impact of the ongoing transition of search engines into answer engines.

Over the three decades, the humble robots.txt file has served as something of a gatekeeper for websites, letting crawlers know if they are allowed to access content on the site, and if so, which content. Well-behaved crawlers read and parse the file, and adjust their crawling activity accordingly. Based on the robots.txt files found across Radar’s top 10,000 domains, Radar’s AI Insights page shows how many of these sites explicitly allow or disallow these AI crawlers to access content, and how complete that access/restriction is. With the ability to filter the data by domain category, this graph can provide site owners with visibility into how their peers may be dealing with these AI crawlers.

Improving Internet resilience with routing visibility

Routing is the process of selecting a path across one or more networks, and in the context of the Internet, routing selects the paths for Internet Protocol (IP) packets to travel from their origin to their destination. It is absolutely critical to the functioning of the Internet, but lots of things can go wrong, and when they do, they can take a whole network offline. (And depending on the network, a larger blast radius of sites, applications, and other service providers may be impacted.

Routing visibility provides insights into the health of a network, and its relationship to other networks. These insights can help identify or troubleshoot problems when they occur. Among the more significant things that can go wrong are route leaks and origin hijacks. Route leaks occur when a routing announcement propagates beyond its intended scope — that is, when the announcement reaches networks that it shouldn’t. An origin hijack occurs when an attacker creates fake announcements for a targeted prefix, falsely identifying an autonomous systems (AS) under their control as the origin of the prefix — in other words, the attacker claims that their network is responsible for a given set of IP addresses, which would cause traffic to those addresses to be routed to them.

In 2022 and 2023 respectively, we added route leak and origin hijack detection to Radar, providing network operators and other interested groups (such as researchers) with information to help identify which networks may be party to such events, whether as a leaker/hijacker, or a victim. And perhaps more importantly, in 2023 we also launched notifications for route leaks and origin hijacks, automatically notifying subscribers via email or webhook when such an event is detected, enabling them to take immediate action.

In 2025, we further improved this visibility by adding two additional capabilities. The first was real-time BGP route visibility, which illustrates how a given network prefix is connected to other networks — what is the route that packets take to get from that set of IP addresses to the large “tier 1” network providers? Network administrators can use this information when facing network outages, implementing new deployments, or investigating route leaks.

An AS-SET is a grouping of related networks, historically used for multiple purposes such as grouping together a list of downstream customers of a particular network provider. Our recently announced AS-SET monitoring enables network operators to monitor valid and invalid AS-SET memberships for their networks, which can help prevent misuse and issues like route leaks.

Not just pretty pictures

While Radar has been historically focused on providing clear, informative visualizations, we have also launched capabilities that enable users to get at the underlying data more directly, enabling them to use it in a more programmatic fashion. The most important one is the Radar API, launched in 2022. Requiring just an access token, users can get access to all the data shown on Radar, as well as some more advanced filters that provide more specific data, enabling them to incorporate Radar data into their own tools, websites, and applications. The example below shows a simple API call that returns the global distribution of human and bot traffic observed over the last seven days.

curl -X 'GET' \
'https://api.cloudflare.com/client/v4/radar/http/summary/bot_class?name=main&dateRange=1d' \
-H 'accept: application/json' \
-H 'Authorization: Bearer $TOKEN'

{
  "success": true,
  "errors": [],
  "result": {
    "main": {
      "human": "72.520636",
      "bot": "27.479364"
    },
    "meta": {
      "dateRange": [
        {
          "startTime": "2025-10-19T19:00:00Z",
          "endTime": "2025-10-20T19:00:00Z"
        }
      ],
      "confidenceInfo": {
        "level": null,
        "annotations": []
      },
      "normalization": "PERCENTAGE",
      "lastUpdated": "2025-10-20T19:45:00Z",
      "units": [
        {
          "name": "*",
          "value": "requests"
        }
      ]
    }
  }
}

The Model Context Protocol is a standard way to make information available to large language models (LLMs). Somewhat similar to the way an application programming interface (API) works, MCP offers a documented, standardized way for a computer program to integrate services from an external source. It essentially allows AI programs to exceed their training, enabling them to incorporate new sources of information into their decision-making and content generation, and helps them connect to external tools. The Radar MCP server allows MCP clients to gain access to Radar data and tools, enabling exploration using natural language queries.

Radar’s URL Scanner has proven to be one of its most popular tools, scanning millions of sites since launching in 2023. It allows users to safely determine whether a site may contain malicious content, as well as providing information on technologies used and insights into the site’s headers, cookies, and links. In addition to being available on Radar, it is also accessible through the API and MCP server.

Finally, Radar’s user interface has seen a number of improvements over the last several years, in service of improved usability and a better user experience. As new data sets and capabilities are launched, they are added to the search bar, allowing users to search not only for countries and ASNs, but also IP address prefixes, certificate authorities, bot names, IP addresses, and more. Initially launching with just a few default date ranges (such as last 24 hours, last 7 days, etc.), we’ve expanded the number of default options, as well as enabling the user to select custom date ranges of up to one year in length. And because the Internet is global, Radar should be too. In 2024, we launched internationalized versions of Radar, marking availability of the site in 14 languages/dialects, including downloaded and embedded content.

This is a sampling of the updates and enhancements that we have made to Radar over the last five years in support of Internet measurement, transparency, and resilience. These individual data sets and tools combine to provide one of the most comprehensive views of the Internet available. And we’re not close to being done. We’ll continue to bring additional visibility to the unseen ways that the Internet is changing by adding more tools, data sets, and visualizations, to help users answer more questions in areas including AI, performance, adoption and usage, and security.

Visit radar.cloudflare.com to explore all the great data sets, capabilities, and tools for yourself, and to use the Radar API or MCP server to incorporate Radar data into your own tools, sites, and applications. Keep an eye on the Radar changelog feed, Radar release notes, and the Cloudflare blog for news about the latest changes and launches, and don’t hesitate to reach out to us with feedback, suggestions, and feature requests.

How Cloudflare’s systems dynamically route traffic across the globe

David Tuber — Mon, 25 Sep 2023 13:00:16 GMT

Picture this: you’re at an airport, and you’re going through an airport security checkpoint. There are a bunch of agents who are scanning your boarding pass and your passport and sending you through to your gate. All of a sudden, some of the agents go on break. Maybe there’s a leak in the ceiling above the checkpoint. Or perhaps a bunch of flights are leaving at 6pm, and a number of passengers turn up at once. Either way, this imbalance between localized supply and demand can cause huge lines and unhappy travelers — who just want to get through the line to get on their flight. How do airports handle this?

Some airports may not do anything and just let you suffer in a longer line. Some airports may offer fast-lanes through the checkpoints for a fee. But most airports will tell you to go to another security checkpoint a little farther away to ensure that you can get through to your gate as fast as possible. They may even have signs up telling you how long each line is, so you can make an easier decision when trying to get through.

At Cloudflare, we have the same problem. We are located in 300 cities around the world that are built to receive end-user traffic for all of our product suites. And in an ideal world, we always have enough computers and bandwidth to handle everyone at their closest possible location. But the world is not always ideal; sometimes we take a data center offline for maintenance, or a connection to a data center goes down, or some equipment fails, and so on. When that happens, we may not have enough attendants to serve every person going through security in every location. It’s not because we haven’t built enough kiosks, but something has happened in our data center that prevents us from serving everyone.

So, we built Traffic Manager: a tool that balances supply and demand across our entire global network. This blog is about Traffic Manager: how it came to be, how we built it, and what it does now.

The world before Traffic Manager

The job now done by Traffic Manager used to be a manual process carried out by network engineers: our network would operate as normal until something happened that caused user traffic to be impacted at a particular data center.

When such events happened, user requests would start to fail with 499 or 500 errors because there weren’t enough machines to handle the request load of our users. This would trigger a page to our network engineers, who would then remove some Anycast routes for that data center. The end result: by no longer advertising those prefixes in the impacted data center, user traffic would divert to a different data center. This is how Anycast fundamentally works: user traffic is drawn to the closest data center advertising the prefix the user is trying to connect to, as determined by Border Gateway Protocol. For a primer on what Anycast is, check out this reference article.

Depending on how bad the problem was, engineers would remove some or even all the routes in a data center. When the data center was again able to absorb all the traffic, the engineers would put the routes back and the traffic would return naturally to the data center.

As you might guess, this was a challenging task for our network engineers to do every single time any piece of hardware on our network had an issue. It didn’t scale.

Never send a human to do a machine’s job

But doing it manually wasn’t just a burden on our Network Operations team. It also resulted in a sub-par experience for our customers; our engineers would need to take time to diagnose and re-route traffic. To solve both these problems, we wanted to build a service that would immediately and automatically detect if users were unable to reach a Cloudflare data center, and withdraw routes from the data center until users were no longer seeing issues. Once the service received notifications that the impacted data center could absorb the traffic, it could put the routes back and reconnect that data center. This service is called Traffic Manager, because its job (as you might guess) is to manage traffic coming into the Cloudflare network.

Accounting for second order consequences

When a network engineer removes a route from a router, they can make the best guess at where the user requests will move to, and try to ensure that the failover data center has enough resources to handle the requests — if it doesn’t, they can adjust the routes there accordingly prior to removing the route in the initial data center. To be able to automate this process, we needed to move from a world of intuition to a world of data — accurately predicting where traffic would go when a route was removed, and feeding this information to Traffic Manager, so it could ensure it doesn’t make the situation worse.

Meet Traffic Predictor

Although we can adjust which data centers advertise a route, we are unable to influence what proportion of traffic each data center receives. Each time we add a new data center, or a new peering session, the distribution of traffic changes, and as we are in over 300 cities and 12,500 peering sessions, it has become quite difficult for a human to keep track of, or predict the way traffic will move around our network. Traffic manager needed a buddy: Traffic Predictor.

In order to do its job, Traffic Predictor carries out an ongoing series of real world tests to see where traffic actually moves. Traffic Predictor relies on a testing system that simulates removing a data center from service and measuring where traffic would go if that data center wasn’t serving traffic. To help understand how this system works, let’s simulate the removal of a subset of a data center in Christchurch, New Zealand:

First, Traffic Predictor gets a list of all the IP addresses that normally connect to Christchurch. Traffic Predictor will send a ping request to hundreds of thousands of IPs that have recently made a request there.
Traffic Predictor records if the IP responds, and whether the response returns to Christchurch using a special Anycast IP range specifically configured for Traffic Predictor.
Once Traffic Predictor has a list of IPs that respond to Christchurch, it removes that route containing that special range from Christchurch, waits a few minutes for the Internet routing table to be updated, and runs the test again.
Instead of being routed to Christchurch, the responses are instead going to data centers around Christchurch. Traffic Predictor then uses the knowledge of responses for each data center, and records the results as the failover for Christchurch.

This allows us to simulate Christchurch going offline without actually taking Christchurch offline!

But Traffic Predictor doesn’t just do this for any one data center. To add additional layers of resiliency, Traffic Predictor even calculates a second layer of indirection: for each data center failure scenario, Traffic Predictor also calculates failure scenarios and creates policies for when surrounding data centers fail.

Using our example from before, when Traffic Predictor tests Christchurch, it will run a series of tests that remove several surrounding data centers from service including Christchurch to calculate different failure scenarios. This ensures that even if something catastrophic happens which impacts multiple data centers in a region, we still have the ability to serve user traffic. If you think this data model is complicated, you’re right: it takes several days to calculate all of these failure paths and policies.

Here’s what those failure paths and failover scenarios look like for all of our data centers around the world when they’re visualized:

This can be a bit complicated for humans to parse, so let’s dig into that above scenario for Christchurch, New Zealand to make this a bit more clear. When we take a look at failover paths specifically for Christchurch, we see they look like this:

In this scenario we predict that 99.8% of Christchurch’s traffic would shift to Auckland, which is able to absorb all Christchurch traffic in the event of a catastrophic outage.

Traffic Predictor allows us to not only see where traffic will move to if something should happen, but it allows us to preconfigure Traffic Manager policies to move requests out of failover data centers to prevent a thundering herd scenario: where sudden influx of requests can cause failures in a second data center if the first one has issues. With Traffic Predictor, Traffic Manager doesn’t just move traffic out of one data center when that one fails, but it also proactively moves traffic out of other data centers to ensure a seamless continuation of service.

From a sledgehammer to a scalpel

With Traffic Predictor, Traffic Manager can dynamically advertise and withdraw prefixes while ensuring that every datacenter can handle all the traffic. But withdrawing prefixes as a means of traffic management can be a bit heavy-handed at times. One of the reasons for this is that the only way we had to add or remove traffic to a data center was through advertising routes from our Internet-facing routers. Each one of our routes has thousands of IP addresses, so removing only one still represents a large portion of traffic.

Specifically, Internet applications will advertise prefixes to the Internet from a /24 subnet at an absolute minimum, but many will advertise prefixes larger than that. This is generally done to prevent things like route leaks or route hijacks: many providers will actually filter out routes that are more specific than a /24 (for more information on that, check out this blog on how we detect route leaks). If we assume that Cloudflare maps protected properties to IP addresses at a 1:1 ratio, then each /24 subnet would be able to service 256 customers, which is the number of IP addresses in a /24 subnet. If every IP address sent one request per second, we’d have to move 4 /24 subnets out of a data center if we needed to move 1,000 requests per second (RPS).

But in reality, Cloudflare maps a single IP address to hundreds of thousands of protected properties. So for Cloudflare, a /24 might take 3,000 requests per second, but if we needed to move 1,000 RPS out, we would have no choice but to move a single /24 out. And that’s just assuming we advertise at a /24 level. If we used /20s to advertise, the amount we can withdraw gets less granular: at a 1:1 website to IP address mapping, that’s 4,096 requests per second for each prefix, and even more if the website to IP address mapping is many to one.

While withdrawing prefix advertisements improved the customer experience for those users who would have seen a 499 or 500 error — there may have been a significant portion of users who wouldn’t have been impacted by an issue who still were moved away from the data center they should have gone to, probably slowing them down, even if only a little bit. This concept of moving more traffic out than is necessary is called “stranding capacity”: the data center is theoretically able to service more users in a region but cannot because of how Traffic Manager was built.

We wanted to improve Traffic Manager so that it only moved the absolute minimum of users out of a data center that was seeing a problem and not strand any more capacity. To do so, we needed to shift percentages of prefixes, so we could be extra fine-grained and only move the things that absolutely need to be moved. To solve this, we built an extension of our Layer 4 load balancer Unimog, which we call Plurimog.

A quick refresher on Unimog and layer 4 load balancing: every single one of our machines contains a service that determines whether that machine can take a user request. If the machine can take a user request then it sends the request to our HTTP stack which processes the request before returning it to the user. If the machine can’t take the request, the machine sends the request to another machine in the data center that can. The machines can do this because they are constantly talking to each other to understand whether they can serve requests for users.

Plurimog does the same thing, but instead of talking between machines, Plurimog talks in between data centers and points of presence. If a request goes into Philadelphia and Philadelphia is unable to take the request, Plurimog will forward to another data center that can take the request, like Ashburn, where the request is decrypted and processed. Because Plurimog operates at layer 4, it can send individual TCP or UDP requests to other places which allows it to be very fine-grained: it can send percentages of traffic to other data centers very easily, meaning that we only need to send away enough traffic to ensure that everyone can be served as fast as possible. Check out how that works in our Frankfurt data center, as we are able to shift progressively more and more traffic away to handle issues in our data centers. This chart shows the number of actions taken on free traffic that cause it to be sent out of Frankfurt over time:

But even within a data center, we can route traffic around to prevent traffic from leaving the datacenter at all. Our large data centers, called Multi-Colo Points of Presence (MCPs) contain logical sections of compute within a data center that are distinct from one another. These MCP data centers are enabled with another version of Unimog called Duomog, which allows for traffic to be shifted between logical sections of compute automatically. This makes MCP data centers fault-tolerant without sacrificing performance for our customers, and allows Traffic Manager to work within a data center as well as between data centers.

When evaluating portions of requests to move, Traffic Manager does the following:

Traffic Manager identifies the proportion of requests that need to be removed from a data center or subsection of a data center so that all requests can be served.
Traffic Manager then calculates the aggregated space metrics for each target to see how many requests each failover data center can take.
Traffic Manager then identifies how much traffic in each plan we need to move, and moves either a proportion of the plan, or all of the plan through Plurimog/Duomog, until we've moved enough traffic. We move Free customers first, and if there are no more Free customers in a data center, we'll move Pro, and then Business customers if needed.

For example, let’s look at Ashburn, Virginia: one of our MCPs. Ashburn has nine different subsections of capacity that can each take traffic. On 8/28, one of those subsections, IAD02, had an issue that reduced the amount of traffic it could handle.

During this time period, Duomog sent more traffic from IAD02 to other subsections within Ashburn, ensuring that Ashburn was always online, and that performance was not impacted during this issue. Then, once IAD02 was able to take traffic again, Duomog shifted traffic back automatically. You can see these actions visualized in the time series graph below, which tracks the percentage of traffic moved over time between subsections of capacity within IAD02 (shown in green):

How does Traffic Manager know how much to move?

Although we used requests per second in the example above, using requests per second as a metric isn’t accurate enough when actually moving traffic. The reason for this is that different customers have different resource costs to our service; a website served mainly from cache with the WAF deactivated is much cheaper CPU wise than a site with all WAF rules enabled and caching disabled. So we record the time that each request takes in the CPU. We can then aggregate the CPU time across each plan to find the CPU time usage per plan. We record the CPU time in ms, and take a per second value, resulting in a unit of milliseconds per second.

CPU time is an important metric because of the impact it can have on latency and customer performance. As an example, consider the time it takes for an eyeball request to make it entirely through the Cloudflare front line servers: we call this the cfcheck latency. If this number goes too high, then our customers will start to notice, and they will have a bad experience. When cfcheck latency gets high, it’s usually because CPU utilization is high. The graph below shows 95th percentile cfcheck latency plotted against CPU utilization across all the machines in the same data center, and you can see the strong correlation:

So having Traffic Manager look at CPU time in a data center is a very good way to ensure that we’re giving customers the best experience and not causing problems.

After getting the CPU time per plan, we need to figure out how much of that CPU time to move to other data centers. To do this, we aggregate the CPU utilization across all servers to give a single CPU utilization across the data center. If a proportion of servers in the data center fail, due to network device failure, power failure, etc., then the requests that were hitting those servers are automatically routed elsewhere within the data center by Duomog. As the number of servers decrease, the overall CPU utilization of the data center increases. Traffic Manager has three thresholds for each data center; the maximum threshold, the target threshold, and the acceptable threshold:

Maximum: the CPU level at which performance starts to degrade, where Traffic Manager will take action
Target: the level to which Traffic Manager will try to reduce the CPU utilization to restore optimal service to users
Acceptable: the level below which a data center can receive requests forwarded from another data center, or revert active moves

When a data center goes above the maximum threshold, Traffic Manager takes the ratio of total CPU time across all plans to current CPU utilization, then applies that to the target CPU utilization to find the target CPU time. Doing it this way means we can compare a data center with 100 servers to a data center with 10 servers, without having to worry about the number of servers in each data center. This assumes that load increases linearly, which is close enough to true for the assumption to be valid for our purposes.

Target ratio equals current ratio:

Therefore:

Subtracting the target CPU time from the current CPU time gives us the CPU time to move:

For example, if the current CPU utilization was at 90% across the data center, the target was 85%, and the CPU time across all plans was 18,000, we would have:

This would mean Traffic Manager would need to move 1,000 CPU time:

Now we know the total CPU time needed to move, we can go through the plans, until the required time to move has been met.

What is the maximum threshold?

A frequent problem that we faced was determining at which point Traffic Manager should start taking action in a data center - what metric should it watch, and what is an acceptable level?

As said before, different services have different requirements in terms of CPU utilization, and there are many cases of data centers that have very different utilization patterns.

To solve this problem, we turned to machine learning. We created a service that will automatically adjust the maximum thresholds for each data center according to customer-facing indicators. For our main service-level indicator (SLI), we decided to use the cfcheck latency metric we described earlier.

But we also need to define a service-level objective (SLO) in order for our machine learning application to be able to adjust the threshold. We set the SLO for 20ms. Comparing our SLO to our SLI, our 95th percentile cfcheck latency should never go above 20ms and if it does, we need to do something. The below graph shows 95th percentile cfcheck latency over time, and customers start to get unhappy when cfcheck latency goes into the red zone:

If customers have a bad experience when CPU gets too high, then the goal of Traffic Manager’s maximum thresholds are to ensure that customer performance isn’t impacted and to start redirecting traffic away before performance starts to degrade. At a scheduled interval the Traffic Manager service will fetch a number of metrics for each data center and apply a series of machine learning algorithms. After cleaning the data for outliers we apply a simple quadratic curve fit, and we are currently testing a linear regression algorithm.

After fitting the models we can use them to predict the CPU usage when the SLI is equal to our SLO, and then use it as our maximum threshold. If we plot the cpu values against the SLI we can see clearly why these methods work so well for our data centers, as you can see for Barcelona in the graphs below, which are plotted against curve fit and linear regression fit respectively.

In these charts the vertical line is the SLO, and the intersection of this line with the fitted model represents the value that will be used as the maximum threshold. This model has proved to be very accurate, and we are able to significantly reduce the SLO breaches. Let’s take a look at when we started deploying this service in Lisbon:

Before the change, cfcheck latency was constantly spiking, but Traffic Manager wasn’t taking actions because the maximum threshold was static. But after July 29, we see that cfcheck latency has never hit the SLO because we are constantly measuring to make sure that customers are never impacted by CPU increases.

Where to send the traffic?

So now that we have a maximum threshold, we need to find the third CPU utilization threshold which isn’t used when calculating how much traffic to move - the acceptable threshold. When a data center is below this threshold, it has unused capacity which, as long as it isn’t forwarding traffic itself, is made available for other data centers to use when required. To work out how much each data center is able to receive, we use the same methodology as above, substituting target for acceptable:

Therefore:

Subtracting the current CPU time from the acceptable CPU time gives us the amount of CPU time that a data center could accept:

To find where to send traffic, Traffic Manager will find the available CPU time in all data centers, then it will order them by latency from the data center needing to move traffic. It moves through each of the data centers, using all available capacity based on the maximum thresholds before moving onto the next. When finding which plans to move, we move from the lowest priority plan to highest, but when finding where to send them, we move in the opposite direction.

To make this clearer let's use an example:

We need to move 1,000 CPU time from data center A, and we have the following usage per plan: Free: 500ms/s, Pro: 400ms/s, Business: 200ms/s, Enterprise: 1000ms/s.

We would move 100% of Free (500ms/s), 100% of Pro (400ms/s), 50% of Business (100ms/s), and 0% of Enterprise.

Nearby data centers have the following available CPU time: B: 300ms/s, C: 300ms/s, D: 1,000ms/s.

With latencies: A-B: 100ms, A-C: 110ms, A-D: 120ms.

Starting with the lowest latency and highest priority plan that requires action, we would be able to move all the Business CPU time to data center B and half of Pro. Next we would move onto data center C, and be able to move the rest of Pro, and 20% of Free. The rest of Free could then be forwarded to data center D. Resulting in the following action: Business: 50% → B, Pro: 50% → B, 50% → C, Free: 20% → C, 80% → D.

Reverting actions

In the same way that Traffic Manager is constantly looking to keep data centers from going above the threshold, it is also looking to bring any forwarded traffic back into a data center that is actively forwarding traffic.

Above we saw how Traffic Manager works out how much traffic a data center is able to receive from another data center — it calls this the available CPU time. When there is an active move we use this available CPU time to bring back traffic to the data center — we always prioritize reverting an active move over accepting traffic from another data center.

When you put this all together, you get a system that is constantly measuring system and customer health metrics for every data center and spreading traffic around to make sure that each request can be served given the current state of our network. When we put all of these moves between data centers on a map, it looks something like this, a map of all Traffic Manager moves for a period of one hour. This map doesn’t show our full data center deployment, but it does show the data centers that are sending or receiving moved traffic during this period:

Data centers in red or yellow are under load and shifting traffic to other data centers until they become green, which means that all metrics are showing as healthy. The size of the circles represent how many requests are shifted from or to those data centers. Where the traffic is going is denoted by where the lines are moving. This is difficult to see at a world scale, so let’s zoom into the United States to see this in action for the same time period:

Here you can see Toronto, Detroit, New York, and Kansas City are unable to serve some requests due to hardware issues, so they will send those requests to Dallas, Chicago, and Ashburn until equilibrium is restored for users and data centers. Once data centers like Detroit are able to service all the requests they are receiving without needing to send traffic away, Detroit will gradually stop forwarding requests to Chicago until any issues in the data center are completely resolved, at which point it will no longer be forwarding anything. Throughout all of this, end users are online and are not impacted by any physical issues that may be happening in Detroit or any of the other locations sending traffic.

Happy network, happy products

Because Traffic Manager is plugged into the user experience, it is a fundamental component of the Cloudflare network: it keeps our products online and ensures that they’re as fast and reliable as they can be. It’s our real time load balancer, helping to keep our products fast by only shifting necessary traffic away from data centers that are having issues. Because less traffic gets moved, our products and services stay fast.

But Traffic Manager can also help keep our products online and reliable because they allow our products to predict where reliability issues may occur and preemptively move the products elsewhere. For example, Browser Isolation directly works with Traffic Manager to help ensure the uptime of the product. When you connect to a Cloudflare data center to create a hosted browser instance, Browser Isolation first asks Traffic Manager if the data center has enough capacity to run the instance locally, and if so, the instance is created right then and there. If there isn’t sufficient capacity available, Traffic Manager tells Browser Isolation which the closest data center with sufficient available capacity is, thereby helping Browser Isolation to provide the best possible experience for the user.

Happy network, happy users

At Cloudflare, we operate this huge network to service all of our different products and customer scenarios. We’ve built this network for resiliency: in addition to our MCP locations designed to reduce impact from a single failure, we are constantly shifting traffic around on our network in response to internal and external issues.

But that is our problem — not yours.

Similarly, when human beings had to fix those issues, it was customers and end users who would be impacted. To ensure that you’re always online, we’ve built a smart system that detects our hardware failures and preemptively balances traffic across our network to ensure it’s online and as fast as possible. This system works faster than any person — not only allowing our network engineers to sleep at night — but also providing a better, faster experience for all of our customers.

And finally: if these kinds of engineering challenges sound exciting to you, then please consider checking out the Traffic Engineering team's open position on Cloudflare’s Careers page!

Routing information now on Cloudflare Radar

Mingwei Zhang — Thu, 27 Jul 2023 13:00:30 GMT

Routing is one of the most critical operations of the Internet. Routing decides how and where the Internet traffic should flow from the source to the destination, and can be categorized into two major types: intra-domain routing and inter-domain routing. Intra-domain routing handles making decisions on how individual packets should be routed among the servers and routers within an organization/network. When traffic reaches the edge of a network, the inter-domain routing kicks in to decide what the next hop is and forward the traffic along to the corresponding networks. Border Gateway Protocol (BGP) is the de facto inter-domain routing protocol used on the Internet.

Today, we are introducing another section on Cloudflare Radar: the Routing page, which focuses on monitoring the BGP messages exchanged to extract and present insights on the IP prefixes, individual networks, countries, and the Internet overall. The new routing data allows users to quickly examine routing status of the Internet, examine secure routing protocol deployment for a country, identify routing anomalies, validate IP block reachability and much more from globally distributed vantage points.

It’s a detailed view of how the Internet itself holds together.

Collecting routing statistics

The Internet consists of tens of thousands of interconnected organizations. Each organization manages its own internal networking infrastructure autonomously, and is referred to as an autonomous system (AS). ASes establish connectivity among each other and exchange routing information via BGP messages to form the current Internet.

When we open the Radar Routing page the “Routing Statistics” block provides a quick glance on the sizes and status of an autonomous system (AS), a country, or the Internet overall. The routing statistics component contains the following count information:

The number of ASes on the Internet or registered from a given country;
The number of distinct prefixes and the routes toward them observed on the global routing table, worldwide, by country, or by AS;
The number of routes categorized by Resource Public Key Infrastructure (RPKI) validation results (valid, invalid, or unknown).

We also show the breakdown of these numbers for IPv4 and IPv6 separately, so users may have a better understanding of such information with respect to different IP protocols.

For a given network, we also show the BGP announcements volume chart for the past week as well as other basic information like network name, registration country, estimated user count, and sibling networks.

Identifying routing anomalies

BGP as a routing protocol suffers from a number of security weaknesses. In the new Routing page we consolidate the BGP route leaks and BGP hijacks detection results in one single place, showing the relevant detected events for any given network or globally.

The BGP Route Leaks table shows the detected BGP route leak events. Each entry in the table contains the information about the related ASes of the leak event, start and end time, as well as other numeric statistics that reflect the scale and impact of the event. The BGP Origin Hijacks table shows the detected potential BGP origin hijacks. Apart from the relevant ASes, time, and impact information, we also show the key evidence that we collected for each event to provide additional context on why and how likely one event being a BGP hijack.

With this release, we introduce another anomaly detection: RPKI Invalid Multiple Origin AS (MOAS) is one type of routing conflict where multiple networks (ASes) originate the same IP prefixes at the same time, which goes against the best practice recommendation. Our system examines the most recent global routing tables and identifies MOASes on the routing tables. With the help of Resource Public Key Infrastructure (RPKI), we can further identify MOAS events that have origins that were proven RPKI invalid, which are less likely to be legitimate cases. Users and operators can quickly identify such anomalies relevant to the networks of interest and take actions accordingly.

The Routing page will be the permanent home for all things BGP and routing data in the future; we will gradually introduce more anomaly detections and improve our pipeline to provide more security insights.

Examining routing assets and connectivity

Apart from examining the overall routing statistics and anomalies, we also gather information on the routing assets (IP prefixes for a network and networks for a country) and networks’ connectivity.

Tens of thousands autonomous systems (ASes) connect to each other to form the current Internet. The ASes differ in size and operate in different geolocations. Generally, larger networks are more well-connected and considered “upstream” and smaller networks are less connected and considered “downstream” on the Internet. Below is an example connectivity diagram showing how two smaller networks may connect to each other. AS1 announces its IP prefixes to its upstream providers and propagates upwards until it reaches the large networks AS3 and AS4, and then the route propagates downstream to smaller networks until it reaches AS6.

In the routing page, we examine what IP prefixes any given AS originates, as well as the interconnections among ASes. We show the full list of IP prefixes originated for any given AS, including the breakdown lists by RPKI validation status. We also show the detected connectivity among other ASes categorized into upstream, downstream and peering connections. Users can easily search for any ASes upstream, downstream, or peers.

For a given country, we show the full list of networks registered in the country, sorted by the number of IP prefixes originated from the corresponding networks. This allows users to quickly glance and find networks from any given country. The table is also searchable by network name or AS number.

Routing data API access

Like all the other data, the Cloudflare Radar Routing data is powered by our developer API. The data API is freely available under Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0) license. In the following table, we list all of our data APIs available at launch. As we improve the routing section, we will introduce more APIs in the future.

API	Type
Get BGP origin hijack events	Anomaly detection
Get BGP route leak events	Anomaly detection
Get MOASes	Anomaly detection
Get BGP routing table stats	Routing information
Get prefix-to-origin mapping	Routing information
Get all ASes registered in a country	Routing information
Get AS-level relationship	Routing information

Example 1: lookup origin AS for a given prefix with cURL

The Cloudflare Radar prefix-to-origin mapping API returns the matching prefix-origin pairs observed on the global routing tables, allowing users to quickly examine the networks that originate a given prefix or listing all the prefixes a network originates.

In this example, we ask of “which network(s) originated the prefix 1.1.1.0/24?” using the following cURL command:

curl --request GET \
  --url "https://api.cloudflare.com/client/v4/radar/bgp/routes/pfx2as?prefix=1.1.1.0/24" \
  --header 'Content-Type: application/json' \
--header "Authorization: Bearer YOUR_TOKEN"

The returned JSON result shows that Cloudflare (AS13335) originates the prefix 1.1.1.0/24 and it is a RPKI valid origin. It also returns the meta information such as the UTC timestamp of the query as well as when the dataset is last updated (data_time field).

{
  "success": true,
  "errors": [],
  "result": {
    "prefix_origins": [
      {
        "origin": 13335,
        "peer_count": 82,
        "prefix": "1.1.1.0/24",
        "rpki_validation": "Valid"
      }
    ],
    "meta": {
      "data_time": "2023-07-24T16:00:00",
      "query_time": "2023-07-24T18:04:55",
      "total_peers": 82
    }
  }
}

Example 2: integrate Radar API into command-line tool

BGPKIT monocle is an open-source command-line application that provides multiple utility functions like searching BGP messages on public archives, network lookup by name, RPKI validation status for a given IP prefix, etc.

By integrating Cloudflare Radar APIs into monocle, users can now quickly lookup routing statistics or prefix-to-origin mapping by running monocle radar stats [QUERY] and monocle radar pfx2as commands.

➜  monocle radar stats   
┌─────────────┬─────────┬──────────┬─────────────────┬───────────────┬─────────────────┐
│ scope       │ origins │ prefixes │ rpki_valid      │ rpki_invalid  │ rpki_unknown    │
├─────────────┼─────────┼──────────┼─────────────────┼───────────────┼─────────────────┤
│ global      │ 81769   │ 1204488  │ 551831 (45.38%) │ 15652 (1.29%) │ 648462 (53.33%) │
├─────────────┼─────────┼──────────┼─────────────────┼───────────────┼─────────────────┤
│ global ipv4 │ 74990   │ 1001973  │ 448170 (44.35%) │ 11879 (1.18%) │ 550540 (54.48%) │
├─────────────┼─────────┼──────────┼─────────────────┼───────────────┼─────────────────┤
│ global ipv6 │ 31971   │ 202515   │ 103661 (50.48%) │ 3773 (1.84%)  │ 97922 (47.68%)  │
└─────────────┴─────────┴──────────┴─────────────────┴───────────────┴─────────────────┘

➜  monocle radar pfx2as 1.1.1.0/24
┌────────────┬─────────┬───────┬───────────────┐
│ prefix     │ origin  │ rpki  │ visibility    │
├────────────┼─────────┼───────┼───────────────┤
│ 1.1.1.0/24 │ as13335 │ valid │ high (98.78%) │
└────────────┴─────────┴───────┴───────────────┘

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), https://noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Why BGP communities are better than AS-path prepends

Tom Strickx — Thu, 24 Nov 2022 17:31:47 GMT

The Internet, in its purest form, is a loosely connected graph of independent networks (also called Autonomous Systems (AS for short)). These networks use a signaling protocol called BGP (Border Gateway Protocol) to inform their neighbors (also known as peers) about the reachability of IP prefixes (a group of IP addresses) in and through their network. Part of this exchange contains useful metadata about the IP prefix that are used to inform network routing decisions. One example of the metadata is the full AS-path, which consists of the different autonomous systems an IP packet needs to pass through to reach its destination.

As we all want our packets to get to their destination as fast as possible, selecting the shortest AS-path for a given prefix is a good idea. This is where something called prepending comes into play.

Routing on the Internet, a primer

Let's briefly talk about how the Internet works at its most fundamental level, before we dive into some nitty-gritty details.

The Internet is, at its core, a massively interconnected network of thousands of networks. Each network owns two things that are critical:

1. An Autonomous System Number (ASN): a 32-bit integer that uniquely identifies a network. For example, one of the Cloudflare ASNs (we have multiple) is 13335.

2. IP prefixes: An IP prefix is a range of IP addresses, bundled together in powers of two: In the IPv4 space, two addresses form a /31 prefix, four form a /30, and so on, all the way up to /0, which is shorthand for “all IPv4 prefixes''. The same applies for IPv6 but instead of aggregating 32 bits at most, you can aggregate up to 128 bits. The figure below shows this relationship between IP prefixes, in reverse -- a /24 contains two /25s that contains two /26s, and so on.

To communicate on the Internet, you must be able to reach your destination, and that’s where routing protocols come into play. They enable each node on the Internet to know where to send your message (and for the receiver to send a message back).

As mentioned earlier, these destinations are identified by IP addresses, and contiguous ranges of IP addresses are expressed as IP prefixes. We use IP prefixes for routing as an efficiency optimization: Keeping track of where to go for four billion (2³²) IP addresses in IPv4 would be incredibly complex, and require a lot of resources. Sticking to prefixes reduces that number down to about one million instead.

Now recall that Autonomous Systems are independently operated and controlled. In the Internet’s network of networks, how do I tell Source A in some other network that there is an available path to get to Destination B in (or through) my network? In comes BGP! BGP is the Border Gateway Protocol, and it is used to signal reachability information. Signal messages generated by the source ASN are referred to as ‘announcements’ because they declare to the Internet that IP addresses in the prefix are online and reachable.

Have a look at the figure above. Source A should now know how to get to Destination B through 2 different networks!

This is what an actual BGP message would look like:

BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24

As you can see, BGP messages contain more than just the IP prefix (the NLRI bit) and the path, but also a bunch of other metadata that provides additional information about the path. Other fields include communities (more on that later), as well as MED, or origin code. MED is a suggestion to other directly connected networks on which path should be taken if multiple options are available, and the lowest value wins. The origin code can be one of three values: IGP, EGP or Incomplete. IGP will be set if you originate the prefix through BGP, EGP is no longer used (it’s an ancient routing protocol), and Incomplete is set when you distribute a prefix into BGP from another routing protocol (like IS-IS or OSPF).

Now that source A knows how to get to Destination B through two different networks, let's talk about traffic engineering!

Traffic engineering

Traffic engineering is a critical part of the day to day management of any network. Just like in the physical world, detours can be put in place by operators to optimize the traffic flows into (inbound) and out of (outbound) their network. Outbound traffic engineering is significantly easier than inbound traffic engineering because operators can choose from neighboring networks, even prioritize some traffic over others. In contrast, inbound traffic engineering requires influencing a network that is operated by someone else entirely. The autonomy and self-governance of a network is paramount, so operators use available tools to inform or shape inbound packet flows from other networks. The understanding and use of those tools is complex, and can be a challenge.

The available set of traffic engineering tools, both in- and outbound, rely on manipulating attributes (metadata) of a given route. As we’re talking about traffic engineering between independent networks, we’ll be manipulating the attributes of an EBGP-learned route. BGP can be split into two categories:

EBGP: BGP communication between two different ASNs
IBGP: BGP communication within the same ASN.

While the protocol is the same, certain attributes can be exchanged on an IBGP session that aren’t exchanged on an EBGP session. One of those is local-preference. More on that in a moment.

BGP best path selection

When a network is connected to multiple other networks and service providers, it can receive path information to the same IP prefix from many of those networks, each with slightly different attributes. It is then up to the receiving network of that information to use a BGP best path selection algorithm to pick the “best” prefix (and route), and use this to forward IP traffic. I’ve put “best” in quotation marks, as best is a subjective requirement. “Best” is frequently the shortest, but what can be best for my network might not be the best outcome for another network.

BGP will consider multiple prefix attributes when filtering through the received options. However, rather than combine all those attributes into a single selection criteria, BGP best path selection uses the attributes in tiers -- at any tier, if the available attributes are sufficient to choose the best path, then the algorithm terminates with that choice.

The BGP best path selection algorithm is extensive, containing 15 discrete steps to select the best available path for a given prefix. Given the numerous steps, it’s in the interest of the network to decide the best path as early as possible. The first four steps are most used and influential, and are depicted in the figure below as sieves.

Picking the shortest path possible is usually a good idea, which is why “AS-path length” is a step executed early on in the algorithm. However, looking at the figure above, “AS-path length” appears second, despite being the attribute to find the shortest path. So let’s talk about the first step: local preference.

Local preferenceLocal preference is an operator favorite because it allows them to handpick a route+path combination of their choice. It’s the first attribute in the algorithm because it is unique for any given route+neighbor+AS-path combination.

A network sets the local preference on import of a route (having learned about the route from a neighbor network). Being a non-transitive property, meaning that it’s an attribute that is never sent in an EBGP message to other networks. This intrinsically means, for example, that the operator of AS 64496 can’t set the local preference of routes to their own (or transiting) IP prefixes inside neighboring AS 64511. The inability to do so is partially why inbound traffic engineering through EBGP is so difficult.

Prepending artificially increases AS-path lengthSince no network is able to directly set the local preference for a prefix inside another network, the first opportunity to influence other networks’ choices is modifying the AS-path. If the next hops are valid, and the local preference for all the different paths for a given route are the same, modifying the AS-path is an obvious option to change the path traffic will take towards your network. In a BGP message, prepending looks like this:

BEFORE:

BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24

AFTER:

BGP Message
    Type: UPDATE Message
    Path Attributes:
        Path Attribute - Origin: IGP
        Path Attribute - AS_PATH: 64500 64496 64496
        Path Attribute - NEXT_HOP: 198.51.100.1
        Path Attribute - COMMUNITIES: 64500:13335
        Path Attribute - Multi Exit Discriminator (MED): 100
    Network Layer Reachability Information (NLRI):
        192.0.2.0/24

Specifically, operators can do AS-path prepending. When doing AS-path prepending, an operator adds additional autonomous systems to the path (usually the operator uses their own AS, but that’s not enforced in the protocol). This way, an AS-path can go from a length of 1 to a length of 255. As the length has now increased dramatically, that specific path for the route will not be chosen. By changing the AS-path advertised to different peers, an operator can control the traffic flows coming into their network.

Unfortunately, prepending has a catch: To be the deciding factor, all the other attributes need to be equal. This is rarely true, especially in large networks that are able to choose from many possible routes to a destination.

Business Policy Engine

BGP is colloquially also referred to as a Business Policy Engine: it does not select the best path from a performance point of view; instead, and more often than not, it will select the best path from a business point of view. The business criteria could be anything from investment (port) efficiency to increased revenue, and more. This may sound strange but, believe it or not, this is what BGP is designed to do! The power (and complexity) of BGP is that it enables a network operator to make choices according to the operator’s needs, contracts, and policies, many of which cannot be reflected by conventional notions of engineering performance.

Different local preferences

A lot of networks (including Cloudflare) assign a local preference depending on the type of connection used to send us the routes. A higher value is a higher preference. For example, routes learned from transit network connections will get a lower local preference of 100 because they are the most costly to use; backbone-learned routes will be 150, Internet exchange (IX) routes get 200, and lastly private interconnect (PNI) routes get 250. This means that for egress (outbound) traffic, the Cloudflare network, by default, will prefer a PNI-learned route, even if a shorter AS-path is available through an IX or transit neighbor.

Part of the reason a PNI is preferred over an IX is reliability, because there is no third-party switching platform involved that is out of our control, which is important because we operate on the assumption that all hardware can and will eventually break. Another part of the reason is for port efficiency reasons. Here, efficiency is defined by cost per megabit transferred on each port. Roughly speaking, the cost is calculated by:

((cost_of_switch / port_count) + transceiver_cost)

which is combined with the cross-connect cost (might be monthly recurring (MRC), or a one-time fee). PNI is preferable because it helps to optimize value by reducing the overall cost per megabit transferred, because the unit price decreases with higher utilization of the port.

This reasoning is similar for a lot of other networks, and is very prevalent in transit networks. BGP is at least as much about cost and business policy, as it is about performance.

Transit local preference

For simplicity, when referring to transits, I mean the traditional tier-1 transit networks. Due to the nature of these networks, they have two distinct sets of network peers:

1. Customers (like Cloudflare)2. Settlement-free peers (like other tier-1 networks)

In normal circumstances, transit customers will get a higher local preference assigned than the local preference used for their settlement-free peers. This means that, no matter how much you prepend a prefix, if traffic enters that transit network, traffic will always land on your interconnection with that transit network, it will not be offloaded to another peer.

A prepend can still be used if you want to switch/offload traffic from a single link with one transit if you have multiple distinguished links with them, or if the source of traffic is multihomed behind multiple transits (and they don’t have their own local preference playbook preferring one transit over another). But inbound traffic engineering traffic away from one transit port to another through AS-path prepending has significant diminishing returns: once you’re past three prepends, it’s unlikely to change much, if anything, at that point.

Example

In the above scenario, no matter the adjustment Cloudflare makes in its AS-path towards AS 64496, the traffic will keep flowing through the Transit B <> Cloudflare interconnection, even though the path Origin A → Transit B → Transit A → Cloudflare is shorter from an AS-path point of view.

In this scenario, not a lot has changed, but Origin A is now multi-homed behind the two transit providers. In this case, the AS-path prepending was effective, as the paths seen on the Origin A side are both the prepended and non-prepended path. As long as Origin A is not doing any egress traffic engineering, and is treating both transit networks equally, then the path chosen will be Origin A → Transit A → Cloudflare.

Community-based traffic engineering

So we have now identified a pretty critical problem within the Internet ecosystem for operators: with the tools mentioned above, it’s not always (some might even say outright impossible) possible to accurately dictate paths traffic can ingress your own network, reducing the control an autonomous system has over its own network. Fortunately, there is a solution for this problem: community-based local preference.

Some transit providers allow their customers to influence the local preference in the transit network through the use of BGP communities. BGP communities are an optional transitive attribute for a route advertisement. The communities can be informative (“I learned this prefix in Rome”), but they can also be used to trigger actions on the receiving side. For example, Cogent publishes the following action communities:

Community	Local preference
174:10	10
174:70	70
174:120	120
174:125	125
174:135	135
174:140	140

When you know that Cogent uses the following default local preferences in their network:

Peers → Local preference 100Customers → Local preference 130

It’s easy to see how we could use the communities provided to change the route used. It’s important to note though that, as we can’t set the local preference of a route to 100 (or 130), AS-path prepending remains largely irrelevant, as the local preference won’t ever be the same.

Take for example the following configuration:

term ADV-SITELOCAL {
    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    }
    then {
        as-path-prepend "13335 13335";
        accept;
    }
}

We’re prepending the Cloudflare ASN two times, resulting in a total AS-path of three, yet we were still seeing a lot (too much) traffic coming in on our Cogent link. At that point, an engineer could add another prepend, but for a well-connected network as Cloudflare, if two prepends didn’t do much, or three, then four or five isn’t going to do much either. Instead, we can leverage the Cogent communities documented above to change the routing within Cogent:

term ADV-SITELOCAL {
    from {
        prefix-list SITE-LOCAL;
        route-type internal;
    }
    then {
        community add COGENT_LPREF70;
        accept;
    }
}

The above configuration changes the traffic flow to this:

Which is exactly what we wanted!

Conclusion

AS-path prepending is still useful, and has its use as part of the toolchain for operators to do traffic engineering, but should be used sparingly. Excessive prepending opens a network up to wider spread route hijacks, which should be avoided at all costs. As such, using community-based ingress traffic engineering is highly preferred (and recommended). In cases where communities aren’t available (or not available to steer customer traffic), prepends can be applied, but I encourage operators to actively monitor their effects, and roll them back if ineffective.

As a side-note, P Marcos et al. have published an interesting paper on AS-path prepending, and go into some trends seen in relation to prepending, I highly recommend giving it a read: https://www.caida.org/catalog/papers/2020_aspath_prepending/aspath_prepending.pdf