Posts tagged "Rate Limiting"

Anonymous credentials: rate-limiting bots and agents without compromising privacy

Thibault Meunier — Thu, 30 Oct 2025 13:00:00 GMT

The way we interact with the Internet is changing. Not long ago, ordering a pizza meant visiting a website, clicking through menus, and entering your payment details. Soon, you might ask your phone to order a pizza that matches your preferences. A program on your device or on a remote server, which we call an AI agent, would visit the website and orchestrate the necessary steps on your behalf.

Of course, agents can do much more than order pizza. Soon we might use them to buy concert tickets, plan vacations, or even write, review, and merge pull requests. While some of these tasks will eventually run locally, for now, most are powered by massive AI models running in the biggest datacenters in the world. As agentic AI increases in popularity, we expect to see a large increase in traffic from these AI platforms and a corresponding drop in traffic from more conventional sources (like your phone).

This shift in traffic patterns has prompted us to assess how to keep our customers online and secure in the AI era. On one hand, the nature of requests are changing: Websites optimized for human visitors will have to cope with faster, and potentially greedier, agents. On the other hand, AI platforms may soon become a significant source of attacks, originating from malicious users of the platforms themselves.

Unfortunately, existing tools for managing such (mis)behavior are likely too coarse-grained to manage this transition. For example, when Cloudflare detects that a request is part of a known attack pattern, the best course of action often is to block all subsequent requests from the same source. When the source is an AI agent platform, this could mean inadvertently blocking all users of the same platform, even honest ones who just want to order pizza. We started addressing this problem earlier this year. But as agentic AI grows in popularity, we think the Internet will need more fine-grained mechanisms of managing agents without impacting honest users.

At the same time, we firmly believe that any such security mechanism must be designed with user privacy at its core. In this post, we'll describe how to use anonymous credentials (AC) to build these tools. Anonymous credentials help website operators to enforce a wide range of security policies, like rate-limiting users or blocking a specific malicious user, without ever having to identify any user or track them across requests.

Anonymous credentials are under development at IETF in order to provide a standard that can work across websites, browsers, platforms. It's still in its early stages, but we believe this work will play a critical role in keeping the Internet secure and private in the AI era. We will be contributing to this process as we work towards real-world deployment. This is still early days. If you work in this space, we hope you will follow along and contribute as well.

Let’s build a small agent

To help us discuss how AI agents are affecting web servers, let’s build an agent ourselves. Our goal is to have an agent that can order a pizza from a nearby pizzeria. Without an agent, you would open your browser, figure out which pizzeria is nearby, view the menu and make selections, add any extras (double pepperoni), and proceed to checkout with your credit card. With an agent, it’s the same flow —except the agent is opening and orchestrating the browser on your behalf.

In the traditional flow, there’s a human all along the way, and each step has a clear intent: list all pizzerias within 3 Km of my current location; pick a pizza from the menu; enter my credit card; and so on. An agent, on the other hand, has to infer each of these actions from the prompt "order me a pizza."

In this section, we’ll build a simple program that takes a prompt and can make outgoing requests. Here’s an example of a simple Worker that takes a specific prompt and generates an answer accordingly. You can find the code on GitHub:

In this context, the LLM provides its best answer. It gives us a plan and instruction, but does not perform the action on our behalf. You and I are able to take a list of instructions and act upon it because we have agency and can affect the world. To allow our agent to interact with more of the world, we’re going to give it control over a web browser.

Cloudflare offers a Browser Rendering service that can bind directly into our Worker. Let’s do that. The following code uses Stagehand, an automation framework that makes it simple to control the browser. We pass it an instance of Cloudflare remote browser, as well as a client for Workers AI.

You can access that code for yourself on https://mini-ai-agent.cloudflareresearch.com/llm. Here’s the response we got on October 10, 2025:

Using the screenshot API of browser rendering, we can also inspect what the agent is doing. Here's how the browser renders the page in the example above:

Stagehand allows us to identify components on the page, such as page.act(“Click on pepperoni pizza”) and page.act(“Click on Pay now”). This eases interaction between the developer and the browser.

To go further, and instruct the agent to perform the whole flow autonomously, we have to use the appropriately named agent mode of Stagehand. This feature is not yet supported by Cloudflare Workers, but is provided below for completeness.

We can see that instead of adding step-by-step instructions, the agent is provided control. To actually pay, it would need access to a payment method such as a virtual credit card.

The prompt had some subtlety in that we’ve scoped the location to Cloudflare’s Austin office. This is because while the agent responds to us, it needs to understand our context. In this case, the agent operates out of Cloudflare edge, a location remote to us. This implies we are unlikely to pick up a pizza from this data center if it was ever delivered.

The more capabilities we provide to the agent, the more it has the ability to create some disruption. Instead of someone having to make 5 clicks at a slow rate of 1 request per 10 seconds, they’d have a program running in a data center possibly making all 5 requests in a second.

This agent is simple, but now imagine many thousands of these — some benign, some not — running at datacenter speeds. This is the challenge origins will face.

Protecting origins

For humans to interact with the online world, they need a web browser and some peripherals with which to direct the behavior of that browser. Agents are another way of directing a browser, so it may be tempting to think that not much is actually changing from the origin's point of view. Indeed, the most obvious change from the origin's point of view is merely where traffic comes from:

The reason this change is significant has to do with the tools the server has to manage traffic. Websites generally try to be as permissive as possible, but they also need to manage finite resources (bandwidth, CPU, memory, storage, and so on). There are a few basic ways to do this:

Global security policy: A server may opt to slow down, CAPTCHA, or even temporarily block requests from all users. This policy may be applied to an entire site, a specific resource, or to requests classified as being part of a known or likely attack pattern. Such mechanisms may be deployed in reaction to an observed spike in traffic, as in a DDoS attack, or in anticipation of a spike in legitimate traffic, as in Waiting Room.
Incentives: Servers sometimes try to incentivize users to use the site when more resources are available. For instance, a server price may be lower depending on the location or request time. This could be implemented with a Cloudflare Snippet.

While both tools can be effective, they also sometimes cause significant collateral damage. For example, while rate limiting a website's login endpoint can help prevent credential stuffing attacks, it also degrades the user experience for non-attackers. Before resorting to such measures, servers will first try to apply the security policy (whether a rate limit, a CAPTCHA, or an outright block) to individual users or groups of users.

However, in order to apply a security policy to individuals, the server needs some way of identifying them. Historically, this has been done via some combination of IP addresses, User-Agent, an account tied to the user identity (if available), and other fingerprints. Like most cloud service providers, Cloudflare has a dedicated offering for per-user rate limits based on such heuristics.

Fingerprinting works for the most part. However, it's unequitably distributed. On mobile, users have an especially difficult time solving CAPTCHAs, when using a VPN they’re more likely to be blocked, and when using reading mode they can mess up their fingerprint, preventing rendering of the page.

Likewise, agentic AI only exacerbates the limitations of fingerprinting. Not only will more traffic be concentrated on a smaller source IP range, the agents themselves will run the same software and hardware platform, making it harder to distinguish honest from malicious users.

Something that could help is Web Bot Auth, which would allow agents to identify to the origin which platform they're operated by. However, we wouldn't want to extend this mechanism — intended for identifying the platform itself — to identifying individual users of the platforms, as this would create an unacceptable privacy risk for these users.

We need some way of implementing security controls for individual users without identifying them. But how? The Privacy Pass protocol provides a partial solution.

Privacy Pass and its limitations

Today, one of the most prominent use cases for Privacy Pass is to rate limit requests from a user to an origin, as we have discussed before. The protocol works roughly as follows. The client is issued a number of tokens. Each time it wants to make a request, it redeems one of its tokens to the origin; the origin allows the request through only if the token is fresh, i.e., has never been observed before by the origin.

In order to use Privacy Pass for per-user rate-limiting, it's necessary to limit the number of tokens issued to each user (e.g., 100 tokens per user per hour). To rate limit an AI agent, this role would be fulfilled by the AI platform. To obtain tokens, the user would log in with the platform, and said platform would allow the user to get tokens from the issuer. The AI platform fulfills the attester role in Privacy Pass parlance. The attester is the party guaranteeing the per-user property of the rate limit. The AI platform, as an attester, is incentivized to enforce this token distribution as it stakes its reputation: Should it allow for too many tokens to be issued, the issuer could distrust them.

The issuance and redemption protocols are designed to have two properties:

Tokens are unforgeable: only the issuer can issue valid tokens.
Tokens are unlinkable: no party, including the issuer, attester, or origin, can tell which user a token was issued to.

These properties can be achieved using a cryptographic primitive called a blind signature scheme. In a conventional signature scheme, the signer uses its private key to produce a signature for a message. Later on, a verifier can use the signer’s public key to verify the signature. Blind signature schemes work in the same way, except that the message to be signed is blinded such that the signer doesn't know the message it's signing. The client “blinds” the message to be signed and sends it to the server, which then computes a blinded signature over the blinded message. The client obtains the final signature by unblinding the signature.

This is exactly how the standardised Privacy Pass issuance protocols are defined by RFC 9578:

Blind signatures are simple, cheap, and perfectly suited for many applications. However, they have some limitations that make them unsuitable for our use case.

First, the communication cost of the issuance protocol is too high. For each token issued, the user sends a 256-byte, blinded nullifier and the issuer replies with a 256-byte blind signature (assuming RSA-2048 is used). That's 0.5KB of additional communication per request, or 500KB for every 1,000 requests. This is manageable as we’ve seen in a previous experiment for Privacy Pass, but not ideal. Ideally, the bandwidth would be sublinear in the rate limit we want to enforce. An alternative to blind signatures with lower compute time are Oblivious Pseudorandom Functions (VOPRF), but the bandwidth is still asymptotically linear. We’ve discussed them in the past, as they served as the basis for early deployments of Privacy Pass.

Second, blind signatures can't be used to rate-limit on a per-origin basis. Ideally, when issuing $N$ tokens to the client, the client would be able to redeem at most $N$ tokens at any origin server that can verify the token's validity. However, the client can't safely redeem the same token at more than one server because it would be possible for the servers to link those redemptions to the same client. What's needed is some mechanism for what we'll call late origin-binding: transforming a token for redemption at a particular origin in a way that's unlinkable to other redemptions of the same token.

Third, once a token is issued, it can't be revoked: it remains valid as long as the issuer's public key is valid. This makes it impossible for an origin to block a specific user if it detects an attack, or if its tokens are compromised. The origin can block the offending request, but the user can continue to make requests using its remaining token budget.

Anonymous credentials and the future of Privacy Pass

As noted by Chaum in 1985, an anonymous credential system allows users to obtain a credential from an issuer, and later prove possession of this credential, in an unlinkable way, without revealing any additional information. Also, it is possible to demonstrate that some attributes are attached to the credential.

One way to think of an anonymous credential is as a kind of blind signature with some additional capabilities: late-binding (link a token to an origin after issuance), multi-show (generate multiple tokens from a single issuer response), and expiration distinct from key rotation (token validity decoupled of the issuer cryptographic key validity). In the redemption flow for Privacy Pass, the client presents the unblinded message and signature to the server. To accept the redemption, the server needs to verify the signature. In an AC system, the client only presents a part of the message. In order for the server to accept the request, the client needs to prove to the server that it knows a valid signature for the entire message without revealing the whole thing.

The flow we described above would therefore include this additional presentation step.

Note that the tokens generated through blind signatures or VOPRFs can only be used once, so they can be regarded as single-use tokens. However, there exists a type of anonymous credentials that allows tokens to be used multiple times. For this to work, the issuer grants a credential to the user, who can later derive at most N many single-use tokens for redemption. Therefore, the user can send multiple requests, at the expense of a single issuance session.

The table below describes how blind signatures and anonymous credentials provide features of interest to rate limiting.

checks that the counter is greater than zero; and
decrements the counter issuing a new credential for the updated counter and a fresh nullifier.

A blind signature could be used to meet this functionality. However, whereas the nullifier can be blinded as before, it would be necessary to handle the counter in plaintext so that the server can check that the counter is valid (Step 1) and update it (Step 2). This creates an obvious privacy risk since the server, which is in control of the counter, can use it to link multiple presentations by the same client. For example, when you reach out to buy a pepperoni pizza, the origin could assign you a special counter value, which eases fingerprinting when you present it a second time. Fortunately, there exist anonymous credentials designed to close this kind of privacy gap.

The scheme above is a simplified version of Anonymous Credit Tokens (ACT), one of the anonymous credential schemes being considered for adoption by the Privacy Pass working group at IETF. The key feature of ACT is its statefulness: upon successful redemption, the server re-issues a new credential with updated nullifier and counter values. This creates a feedback loop between the client and server that can be used to express a variety of security policies.

By design, it's not possible to present ACT credentials multiple times simultaneously: the first presentation must be completed so that the re-issued credential can be presented in the next request. Parallelism is the key feature of Anonymous Rate-limited Credential (ARC), another scheme under discussion at the Privacy Pass working group. ARCs can be presented across multiple requests in parallel up to the presentation limit determined during issuance.

Another important feature of ARC is its support for late origin-binding: when a client is issued an ARC with presentation limit $N$, it can safely use its credential to present up to $N$ times to any origin that can verify the credential.

These are just examples of relevant features of some anonymous credentials. Some applications may benefit from a subset of them; others may need additional features. Fortunately, both ACT and ARC can be constructed from a small set of cryptographic primitives that can be easily adapted for other purposes.

Building blocks for anonymous credentials

ARC and ACT share two primitives in common: algebraic MACs, which provide for limited computations on the blinded message; and zero-knowledge proofs (ZKP) for proving validity of the part of the message not revealed to the server. Let's take a closer look at each.

Algebraic MACs

A Message Authenticated Code (MAC) is a cryptographic tag used to verify a message's authenticity (that it comes from the claimed sender) and integrity (that it has not been altered). Algebraic MACs are built from mathematical structures like group actions. The algebraic structure gives them some additional functionality, one of them being a homomorphism that we can blind easily to conceal the actual value of the MAC. Adding a random value on an algebraic MAC blinds the value.

Unlike blind signatures, both ACT and ARC are only privately verifiable, meaning the issuer and the origin must both have the issuer's private key. Taking Cloudflare as an example, this means that a credential issued by Cloudflare can only be redeemed by an origin behind Cloudflare. Publicly verifiable variants of both are possible, but at an additional cost.

Zero-Knowledge Proofs for linear relations

Zero knowledge proofs (ZKP) allow us to prove a statement is true without revealing the exact value that makes the statement true. The ZKP is constructed by a prover in such a way that it can only be generated by someone who actually possesses the secret. The verifier can then run a quick mathematical check on this proof. If the check passes, the verifier is convinced that the prover's initial statement is valid. The crucial property is that the proof itself is just data that confirms the statement; it contains no other information that could be used to reconstruct the original secret.

For ARC and ACT, we want to prove linear relations of secrets. In ARC, a user needs to prove that different tokens are linked to the same original secret credential. For example, a user can generate a proof showing that a request token was derived from a valid issued credential. The system can verify this proof to confirm the tokens are legitimately connected, all without ever learning the underlying secret credential that ties them together. This allows the system to validate user actions while guaranteeing their privacy.

Proving simple linear relations can be extended to prove a number of powerful statements, for example that a number is in range. For example, this is useful to prove that you have a positive balance on your account. To prove your balance is positive, you prove that you can encode your balance in binary. Let’s say you can at most have 1024 credits in your account. To prove your balance is non-zero when it is, for example, 12, you prove two things simultaneously: first, that you have a set of binary bits, in this case 12=(1100)2, and second, that a linear equation using these bits (8*1 + 4*1 + 2*0 + 1*0) correctly adds up to your total committed balance. This convinces the verifier that the number is validly constructed without them learning the exact value. This is how it works for powers of two, but it can easily be extended to arbitrary ranges.

The mathematical structure of algebraic MACs allows easy blinding and evaluation. The structure also allows for an easy proof that a MAC has been evaluated with the private key without revealing the MAC. In addition, ARC could use ZKPs to prove that a nonce has not been spent before. In contrast, ACT uses ZKPs to prove we have enough of a balance left on our token. The balance is subtracted homomorphically using more group structure.

How much does this all cost?

Anonymous credentials allow for more flexibility, and have the potential to reduce the communication cost, compared to blind signatures in certain applications. To identify such applications, we need to measure the concrete communication cost of these new protocols. In addition, we need to understand how their CPU usage compares to blind signatures and oblivious pseudorandom functions.

We measure the time that each participant spends at each stage of some AC schemes. We also report the size of messages transmitted across the network. For ARC, ACT, and VOPRF, we'll use ristretto255 as the prime group and SHAKE128 for hashing. For Blind RSA, we'll use a 2048-bit modulus and SHA-384 for hashing.

Each algorithm was implemented in Go, on top of the CIRCL library. We plan to open source the code once the specifications of ARC and ACT begin to stabilize.

Let’s take a look at the most widely used deployment in Privacy Pass: Blind RSA. Redemption time is low, and most of the cost lies with the server at issuance time. Communication cost is mostly constant and in the order of 256 bytes.

When looking at VOPRF, verification time on the server is slightly higher than for Blind RSA, but communication cost and issuance are much faster. Evaluation time on the server is 10x faster for 1 token, and more than 25x faster when using amortized token issuance. Communication cost per token is also more appealing, with a message size at least 3x lower.

This makes VOPRF tokens appealing for applications requiring a lot of tokens that can accept a slightly higher redemption cost, and that don’t need public verifiability.

Now, let’s take a look at the figures for ARC and ACT anonymous credential schemes. For both schemes we measure the time to issue a credential that can be presented at most $N=1000$ times.

As we would hope, the communication cost and the server’s runtime is much lower than a batched issuance with either Blind RSA or VOPRF. For example, a VOPRF issuance of 1000 tokens takes 99 ms (99 µs per token) vs 1.35 ms for issuing one ARC credential that allows for 1000 presentations. This is about 70x faster. The trade-off is that presentation is more expensive, both for the client and server.

How about ACT? Like ARC, we would expect the communication cost of issuance grows much slower with respect to the credits issued. Our implementation bears this out. However, there are some interesting performance differences between ARC and ACT: issuance is much cheaper for ACT than it is for ARC, but redemption is the opposite.

What's going on? The answer has largely to do with what each party needs to prove with ZKPs at each step. For example, during ACT redemption, the client proves to the server (in zero-knowledge) that its counter $C$ is in the desired range, i.e., $0 \leq C \leq N$. The proof size is on the order of $\log_{2} N$, which accounts for the larger message size. In the current version, ARC redemption does not involve range proofs, but a range proof may be added in a future version. Meanwhile, the statements the client and server need to prove during ARC issuance are a bit more complicated than for ARC presentation, which accounts for the difference in runtime there.

The advantage of anonymous credentials, as discussed in the previous sections, is that issuance only has to be performed once. When a server evaluates its cost, it takes into account the cost of all issuances and the cost of all verifications. At present, only accounting for credentials costs, it’s cheaper for a server to issue and verify tokens than verify an anonymous credential presentation.

The advantage of multiple-use anonymous credentials is that instead of the issuer generating $N$ tokens, the bulk of computation is offloaded to the clients. This is more scoped. Late origin binding allows them to work for multiple origins/namespace, range proof to decorrelate expiration from key rotation, and refund to provide a dynamic rate limit. Their current applications are dictated by the limitation of single-use token based schemes, more than by the added efficiency they provide. This seems to be an exciting area to explore, and see if closing the gap is possible.

Managing agents with anonymous credentials

Managing agents will likely require features from both ARC and ACT.

ARC already has much of the functionality we need: it supports rate limiting, is communication-efficient, and it supports late origin-binding. Its main downside is that, once an ARC credential is issued, it can't be revoked. A malicious user can always make up to N requests to any origin it wants.

We can allow for a limited form of revocation by pairing ARC with blind signatures (or VOPRF). Each presentation of the ARC credential is accompanied by a Privacy Pass token: upon successful presentation, the client is issued another Privacy Pass token it can use during the next presentation. To revoke a credential, the server would simply not re-issue the token:

This scheme is already quite useful. However, it has some important limitations:

Parallel presentation across origins is not possible: the client must wait for the request to one origin to succeed before it can initiate a request to a second origin.
Revocation is global rather than per-origin, meaning the credential is not only revoked for the origin to whom it was presented, but for every origin it can be presented to. We suspect this will be undesirable in some cases. For example, an origin may want to revoke if a request violates its robots.txt policy; but the same request may have been accepted by other origins.

A more fundamental limitation of this design is that the decision to revoke can only be made on the basis of a single request — the one in which the credential was presented. It may be risky to decide to block a user on the basis of a single request; in practice, attack patterns may only emerge across many requests. ACT's statefulness enables at least a rudimentary form of this kind of defense. Consider the following scheme:

Issuance: The client is issued an ARC with presentation limit $N=1$.
Presentation:
- When the client presents its ARC credential to an origin for the first time, the server issues an ACT credential with a valid initial state.
- When the client presents an ACT with valid state (e.g., credit counter greater than 0), the origin either:
  - refuses to issue a new ACT, thereby revoking the credential. It would only do so if it had high confidence that the request was part of an attack; or
  - issues a new ACT with state updated to reduce the ACT credit by the amount of resources consumed while processing the request.

Benign requests wouldn't change the state by much (if at all), but suspicious requests might impact the state in a way that gets the user closer to their rate limit much faster.

Demo

To see how this idea works in practice, let's look at a working example that uses the Model Context Protocol. The demo below is built using MCP Tools. Tools are extensions the AI agent can call to extend its capabilities. They don't need to be integrated at release time within the MCP client. This provides a nice and easy prototyping avenue for anonymous credentials.

Tools are offered by the server via an MCP compatible interface. You can see details on how to build such MCP servers in a previous blog.

In our pizza context, this could look like a pizzeria that offers you a voucher. Each voucher gets you 3 pizza slices. Mocking a design, an integration within a chat application could look as follows:

The first panel presents all tools exposed by the MCP server. The second one showcases an interaction performed by the agent calling these tools.

To look into how such a flow would be implemented, let’s write the MCP tools, offer them in an MCP server, and manually orchestrate the calls with the MCP Inspector.

The MCP server should provide two tools:

act-issue which issues an ACT credential valid for 3 requests. The code used here is an earlier version of the IETF draft which has some limitations.
act-redeem makes a presentation of the local credential, and fetches our pizza menu.

First, we run act-issue. At this stage, we could ask the agent to run an OAuth flow, fetch an internal authentication endpoint, or to compute a proof of work.

This gives us 3 credits to spend against an origin. Then, we run act-redeem

Et voilà. If we run act-redeem once more, we see we have one fewer credit.

You can test it yourself, here are the source codes available. The MCP server is written in Rust to integrate with the ACT rust library. The browser-based client works similarly, check it out.

Moving further

In this post, we’ve presented a concrete approach to rate limit agent traffic. It is in full control of the client, and is built to protect the user's privacy. It uses emerging standards for anonymous credentials, integrates with MCP, and can be readily deployed on Cloudflare Workers.

We're on the right track, but there are still questions that remain. As we touched on before, a notable limitation of both ARC and ACT is that they are only privately verifiable. This means that the issuer and origin need to share a private key, for issuing and verifying the credential respectively. There are likely to be deployment scenarios for which this isn't possible. Fortunately, there may be a path forward for these cases using pairing-based cryptography, as in the BBS signature specification making its way through IETF. We’re also exploring post-quantum implications in a concurrent post.

If you are an agent platform, an agent developer, or a browser, all our code is available on GitHub for you to experiment. Cloudflare is actively working on vetting this approach for real-world use cases.

The specification and discussion are happening within the IETF and W3C. This ensures the protocols are built in the open, and receive participation from experts. Improvements are still to be made to clarify the right performance-to-privacy tradeoff, or even the story to deploy on the open web.

If you’d like to help us, we’re hiring 1,111 interns over the course of next year, and have open positions.

Developer Week 2024 wrap-up

Phillip Jones — Mon, 08 Apr 2024 13:00:02 GMT

Developer Week 2024 has officially come to a close. Each day last week, we shipped new products and functionality geared towards giving developers the components they need to build full-stack applications on Cloudflare.

Even though Developer Week is now over, we are continuing to innovate with the over two million developers who build on our platform. Building a platform is only as exciting as seeing what developers build on it. Before we dive into a recap of the announcements, to send off the week, we wanted to share how a couple of companies are using Cloudflare to power their applications:

We have been using Workers for image delivery using R2 and have been able to maintain stable operations for a year after implementation. The speed of deployment and the flexibility of detailed configurations have greatly reduced the time and effort required for traditional server management. In particular, we have seen a noticeable cost savings and are deeply appreciative of the support we have received from Cloudflare Workers.- FAN Communications

Milkshake helps creators, influencers, and business owners create engaging web pages directly from their phone, to simply and creatively promote their projects and passions. Cloudflare has helped us migrate data quickly and affordably with R2. We use Workers as a routing layer between our users' websites and their images and assets, and to build a personalized analytics offering affordably. Cloudflare’s innovations have consistently allowed us to run infrastructure at a fraction of the cost of other developer platforms and we have been eagerly awaiting updates to D1 and Queues to sustainably scale Milkshake as the product continues to grow.- Milkshake

In case you missed anything, here’s a quick recap of the announcements and in-depth technical explorations that went out last week:

Summary of announcements

Monday

Tuesday

Wednesday

Thursday

Friday

Here's a video summary, by Craig Dennis, Developer Educator, AI:

Continue the conversation

Thank you for being a part of Developer Week! Want to continue the conversation and share what you’re building? Join us on Discord. To get started building on Workers, check out our developer documentation.

New tools for production safety — Gradual deployments, Source maps, Rate Limiting, and new SDKs

Tanushree Sharma — Thu, 04 Apr 2024 13:05:00 GMT

2024’s Developer Week is all about production readiness. On Monday. April 1, we announced that D1, Queues, Hyperdrive, and Workers Analytics Engine are ready for production scale and generally available. On Tuesday, April 2, we announced the same about our inference platform, Workers AI. And we’re not nearly done yet.

However, production readiness isn’t just about the scale and reliability of the services you build with. You also need tools to make changes safely and reliably. You depend not just on what Cloudflare provides, but on being able to precisely control and tailor how Cloudflare behaves to the needs of your application.

Today we are announcing five updates that put more power in your hands – Gradual Deployments, source mapped stack traces in Tail Workers, a new Rate Limiting API, brand-new API SDKs, and updates to Durable Objects – each built with mission-critical production services in mind. We build our own products using Workers, including Access, R2, KV, Waiting Room, Vectorize, Queues, Stream, and more. We rely on each of these new features ourselves to ensure that we are production ready – and now we’re excited to bring them to everyone.

Gradually deploy changes to Workers and Durable Objects

Deploying a Worker is nearly instantaneous – a few seconds and your change is live everywhere.

When you reach production scale, each change you make carries greater risk, both in terms of volume and expectations. You need to meet your 99.99% availability SLA, or have an ambitious P90 latency SLO. A bad deployment that’s live for 100% of traffic for 45 seconds could mean millions of failed requests. A subtle code change could cause a thundering herd of retries to an overwhelmed backend, if rolled out all at once. These are the kinds of risks we consider and mitigate ourselves for our own services built on Workers.

The way to mitigate these risks is to deploy changes gradually – commonly called rolling deployments:

The current version of your application runs in production.
You deploy the new version of your application to production, but only route a small percentage of traffic to this new version, and wait for it to “soak” in production, monitoring for regressions and bugs. If something bad happens, you’ve caught it early at a small percentage (e.g. 1%) of traffic and can revert quickly.
You gradually increment the percentage of traffic until the new version receives 100%, at which point it is fully rolled out.

Today we’re opening up a first-class way to deploy code changes gradually to Workers and Durable Objects via the Cloudflare API, the Wrangler CLI, or the Workers dashboard. Gradual Deployments is entering open beta – you can use Gradual Deployments with any Cloudflare account that is on the Workers Free plan, and very soon you’ll be able to start using Gradual Deployments with Cloudflare accounts on the Workers Paid and Enterprise plans. You’ll see a banner on the Workers dashboard once your account has access.

When you have two versions of your Worker or Durable Object running concurrently in production, you almost certainly want to be able to filter your metrics, exceptions, and logs by version. This can help you spot production issues early, when the new version is only rolled out to a small percentage of traffic, or compare performance metrics when splitting traffic 50/50. We’ve also added observability at a version level across our platform:

You can filter analytics in the Workers dashboard and via the GraphQL Analytics API by version.
Workers Trace Events and Tail Worker events include the version ID of your Worker, along with optional version message and version tag fields.
When using wrangler tail to view live logs, you can view logs for a specific version.
You can access version ID, message, and tag from within your Worker’s code, by configuring the Version Metadata binding.

You may also want to make sure that each client or user only sees a consistent version of your Worker. We’ve added Version Affinity so that requests associated with a particular identifier (such as user, session, or any unique ID) are always handled by a consistent version of your Worker. Session Affinity, when used with Ruleset Engine, gives you full control over both the mechanism and identifier used to ensure “stickiness”.

Gradual Deployments is entering open beta. As we move towards GA, we’re working to support:

Version Overrides. Invoke a specific version of your Worker in order to test before it serves any production traffic. This will allow you to create Blue-Green Deployments.
Cloudflare Pages. Let the CI/CD system in Pages automatically progress the deployments on your behalf.
Automatic rollbacks. Roll back deployments automatically when the error rate spikes for a new version of your Worker.

We’re looking forward to hearing your feedback! Let us know what you think through this feedback form or reach out in our Developer Discord in the #workers-gradual-deployments-beta channel.

Source mapped stack traces in Tail Workers

Production readiness means tracking errors and exceptions, and trying to drive them down to zero. When an error occurs, the first thing you typically want to look at is the error’s stack trace – the specific functions that were called, in what order, from which line and file, and with what arguments.

Most JavaScript code – not just on Workers, but across platforms – is first bundled, often transpiled, and then minified before being deployed to production. This is done behind the scenes to create smaller bundles to optimize performance and convert from Typescript to JavaScript if needed.

If you’ve ever seen an exception return a stack trace like: /src/index.js:1:342,it means the error occurred on the 342nd character of your function’s minified code. This is clearly not very helpful for debugging.

Source maps solve this – they map compiled and minified code back to the original code that you wrote. Source maps are combined with the stack trace returned by the JavaScript runtime in order to present you with a human-readable stack trace. For example, the following stack trace shows that the Worker received an unexpected null value on line 30 of the down.ts file. This is a useful starting point for debugging, and you can move down the stack trace to understand the functions that were called that were set that resulted in the null value.

Here’s how it works:

When you set upload_source_maps = true in your wrangler.toml, Wrangler will automatically generate and upload any source map files when you run wrangler deploy or wrangler versions upload.
When your Worker throws an uncaught exception, we fetch the source map and use it to map the stack trace of the exception back to lines of your Worker’s original source code.
You can then view this deobfuscated stack trace in real-time logs or in Tail Workers.

Starting today, in open beta, you can upload source maps to Cloudflare when you deploy your Worker – get started by reading the docs. And starting on April 15th , the Workers runtime will start using source maps to deobfuscate stack traces. We’ll post a notification in the Cloudflare dashboard and post on our Cloudflare Developers X account when source mapped stack traces are available.

New Rate Limiting API in Workers

An API is only production ready if it has a sensible rate limit. And as you grow, so does the complexity and diversity of limits that you need to enforce in order to balance the needs of specific customers, protect the health of your service, or enforce and adjust limits in specific scenarios. Cloudflare’s own API has this challenge – each of our dozens of products, each with many API endpoints, may need to enforce different rate limits.

You’ve been able to configure Rate Limiting rules on Cloudflare since 2017. But until today, the only way to control this was in the Cloudflare dashboard or via the Cloudflare API. It hasn’t been possible to define behavior at runtime, or write code in a Worker that interacts directly with rate limits – you could only control whether a request is rate limited or not before it hits your Worker.

Today we’re introducing a new API, in open beta, that gives you direct access to rate limits from your Worker. It’s lightning fast, backed by memcached, and dead simple to add to your Worker. For example, the following configuration defines a rate limit of 100 requests within a 60-second period:

Then, in your Worker, you can call the limit method on the RATE_LIMITER binding, providing a key of your choosing. Given the configuration above, this code will return a HTTP 429 response status code once more than 100 requests to a specific path are made within a 60-second period:

Now that Workers can connect directly to a data store like memcached, what else could we provide? Counters? Locks? An in-memory cache? Rate limiting is the first of many primitives that we’re exploring providing in Workers that address questions we’ve gotten for years about where a temporary shared state that spans many Worker isolates should live. If you rely on putting state in the global scope of your Worker today, we’re working on better primitives that are purpose-built for specific use cases.

The Rate Limiting API in Workers is in open beta, and you can get started by reading the docs.

New auto-generated SDKs for Cloudflare’s API

Production readiness means going from making changes by clicking buttons in a dashboard to making changes programmatically, using an infrastructure-as-code approach like Terraform or Pulumi, or by making API requests directly, either on your own or via an SDK.

The Cloudflare API is massive, and constantly adding new capabilities – on average we update our API schemas between 20 and 30 times per day. But to date, our API SDKs have been built and maintained manually, so we had a burning need to automate this.

We’ve done that, and today we’re announcing new client SDKs for the Cloudflare API in three languages – Typescript, Python and Go – with more languages on the way.

Each SDK is generated automatically using Stainless API, based on the OpenAPI schemas that define the structure and capabilities of each of our API endpoints. This means that when we add any new functionality to the Cloudflare API, across any Cloudflare product, these API SDKs are automatically regenerated, and new versions are published, ensuring that they are correct and up-to-date.

You can install the SDKs by running one of the following commands:

If you use Terraform or Pulumi, under the hood, Cloudflare’s Terraform Provider currently uses the existing, non-automated Go SDK. When you run terraform apply, the Cloudflare Terraform Provider determines which API requests to make in what order, and executes these using the Go SDK.

The new, auto-generated Go SDK clears a path towards more comprehensive Terraform support for all Cloudflare products, providing a base set of tools that can be relied upon to be both correct and up-to-date with the latest API changes. We’re building towards a future where any time a product team at Cloudflare builds a new feature that is exposed via the Cloudflare API, it is automatically supported by the SDKs. Expect more updates on this throughout 2024.

Durable Object namespace analytics and WebSocket Hibernation GA

Many of our own products, including Waiting Room, R2, and Queues, as well as platforms like PartyKit, are built using Durable Objects. Deployed globally, including newly added support for Oceania, you can think of Durable Objects like singleton Workers that can provide a single point of coordination and persist state. They’re perfect for applications that need real-time user coordination, like interactive chat or collaborative editing. Take Atlassian’s word for it:

One of our new capabilities is Confluence whiteboards, which provides a freeform way to capture unstructured work like brainstorming and early planning before teams document it more formally. The team considered many options for real-time collaboration and ultimately decided to use Cloudflare’s Durable Objects. Durable Objects have proven to be a fantastic fit for this problem space, with a unique combination of functionalities that has allowed us to greatly simplify our infrastructure and easily scale to a large number of users. - Atlassian

We haven’t previously exposed associated analytical trends in the dashboard, making it hard to understand the usage patterns and error rates within a Durable Objects namespace unless you used the GraphQL Analytics API directly. The Durable Objects dashboard has now been revamped, letting you drill down into metrics, and go as deep as you need.

From day one, Durable Objects have supported WebSockets, allowing many clients to directly connect to a Durable Object to send and receive messages.

However, sometimes client applications open a WebSocket connection and then eventually stop doing...anything. Think about that tab you’ve had sitting open in your browser for the last 5 hours, but haven’t touched. If it uses WebSockets to send and receive messages, it effectively has a long-lived TCP connection that isn’t being used for anything. If this connection is to a Durable Object, the Durable Object must stay running, waiting for something to happen, consuming memory, and costing you money.

We first introduced WebSocket Hibernation to solve this problem, and today we’re announcing that this feature is out of beta and is Generally Available. With WebSocket Hibernation, you set an automatic response to be used while hibernating and serialize state such that it survives hibernation. This gives Cloudflare the inputs we need in order to maintain open WebSocket connections from clients while “hibernating” the Durable Object such that it is not actively running, and you are not billed for idle time. The result is that your state is always available in-memory when you actually need it, but isn’t unnecessarily kept around when it’s not. As long as your Durable Object is hibernating, even if there are active clients still connected over a WebSocket, you won’t be billed for duration.

In addition, we’ve heard developer feedback on the costs of incoming WebSocket messages to Durable Objects, which favor smaller, more frequent messages for real-time communication. Starting today incoming WebSocket messages will be billed at the equivalent of 1/20th of a request (as opposed to 1 message being the equivalent of 1 request as it has been up until now). Following a pricing example:

Production ready, without production complexity

Becoming production ready on the last generation of cloud platforms meant slowing down how fast you shipped. It meant stitching together many disconnected tools or standing up whole teams to work on internal platforms. You had to retrofit your own productivity layers onto platforms that put up roadblocks.

The Cloudflare Developer Platform is grown up and production ready, and committed to being an integrated platform where products intuitively work together and where there aren’t 10 ways to do the same thing, with no need for a compatibility matrix to help understand what works together. Each of these updates shows this in action, integrating new functionality across products and parts of Cloudflare’s platform.

To that end, we want to hear from you about not only what you want to see next, but where you think we could be even simpler, or where you think our products could work better together. Tell us where you think we could do more – the Cloudflare Developers Discord is always open.

New! Rate Limiting analytics and throttling

Radwa Radwan — Tue, 19 Sep 2023 13:00:41 GMT

Rate Limiting rules are essential in the toolbox of security professionals as they are very effective in managing targeted volumetric attacks, takeover attempts, scraping bots, or API abuse. Over the years we have received a lot of feature requests from users, but two stand out: suggesting rate limiting thresholds and implementing a throttle behavior. Today we released both to Enterprise customers!

When creating a rate limit rule, one of the common questions is “what rate should I put in to block malicious traffic without affecting legitimate users?”. If your traffic is authenticated, API Gateway will suggest thresholds based on auth IDs (such a session-id, cookie, or API key). However, when you don’t have authentication headers, you will need to create IP-based rules (like for a ‘/login’ endpoint) and you are left guessing the threshold. From today, we provide analytics tools to determine what rate of requests can be used for your rule.

So far, a rate limit rule could be created with log, challenge, or block action. When ‘block’ is selected, all requests from the same source (for example, IP) were blocked for the timeout period. Sometimes this is not ideal, as you would rather selectively block/allow requests to enforce a maximum rate of requests without an outright temporary ban. When using throttle, a rule lets through enough requests to keep the request rate from individual clients below a customer-defined threshold.

Continue reading to learn more about each feature.

Introducing Rate Limit Analysis in Security Analytics

The Security Analytics view was designed with the intention of offering complete visibility on HTTP traffic while adding an extra layer of security on top. It's proven a great value when it comes to crafting custom rules. Nevertheless, when it comes to creating rate limiting rules, relying solely on Security Analytics can be somewhat challenging.

To create a rate limiting rule you can leverage Security Analytics to determine the filter — what requests are evaluated by the rule (for example, by filtering on mitigated traffic, or selecting other security signals like Bot scores). However, you’ll also need to determine what’s the maximum rate you want to enforce and that depends on the specific application, traffic pattern, time of day, endpoint, etc. What’s the typical rate of legitimate users trying to access the login page at peak time? What’s the rate of requests generated by a botnet with the same JA3 fingerprint scraping prices from an ecommerce site? Until today, you couldn’t answer these questions from the analytics view.

That’s why we made the decision to integrate a rate limit helper into Security Analytics as a new tab called "Rate Limit Analysis," which concentrates on providing a tool to answer rate-related questions.

High level top statistics vs. granular Rate Limit Analysis

In Security Analytics, users can analyze traffic data by creating filters combining what we call top statistics. These statistics reveal the total volume of requests associated with a specific attribute of the HTTP requests. For example, you can filter the traffic from the ASNs that generated more requests in the last 24 hours, or you slice the data to look only at traffic reaching the most popular paths of your application. This tool is handy when creating rules based on traffic analysis.

However, for rate limits, a more detailed approach is required.

The new Rate limit analysis tab now displays data on request rate for traffic matching the selected filter and time period. You can select a rate defined on different time intervals, like one or five minutes, and the attribute of the request used to identify the rate, such as IP address, JA3 fingerprint, or a combination of both as this often improves accuracy. Once the attributes are selected, the chart displays the distribution of request rates for the top 50 unique clients (identified as unique IPs or JA3s) observed during the chosen time interval in descending order.

You can use the slider to determine the impact of a rule with different thresholds. How many clients would have been caught by the rule and rate limited? Can I visually identify abusers with above-average rate vs. the long tail of average users? This information will guide you in assessing what’s the most appropriate rate for the selected filter.

Using Rate Limit Analysis to define rate thresholds

It takes a few minutes to build your rate limit rule now. Let’s apply this to one of the common use cases where we identify /login endpoint and create a rate limit rule based on the IP with a logging action.

Define a scope and rate.

In the HTTP requests tab (the default view), start by selecting a specific time period. If you’re looking for the normal rate distribution you can specify a period with non-peak traffic. Alternatively, you can analyze the rate of offending users by selecting a period when an attack was carried out.

Using the filters in the top statistics, select a specific endpoint (e.g., /login). We can also focus on non-automated/human traffic using the bot score quick filter on the right sidebar or the filter button on top of the chart. In the Rate limiting Analysis tab, you can choose the characteristic (JA3, IP, or both) and duration (1 min, 5 mins, or 1 hour) for your rate limit rule. At this point, moving the dotted line up and down can help you choose an appropriate rate for the rule. JA3 is only available to customers using Bot Management.

Looking at the distribution, we can exclude any IPs or ASNs that might be known to us, to have a better visual on end user traffic. One way to do this is to filter out the outliers right before the long tail begins. A rule with this setting will block the IPs/JA3 with a higher rate of requests.

Validate your rate. You can validate the rate by repeating this process but selecting a portion of traffic where you know there was an attack or traffic peak. The rate you've chosen should block the outliers during the attack and allow traffic during normal times. In addition to that, looking at the sampled logs can be helpful in verifying the fingerprints and filters chosen.

Create a rule. Selecting “Create rate limit rule” will take you to the rate limiting tab in the WAF with your filters pre-populated.

Choose your action and behavior in the rule. Depending on your needs you can choose to log, challenge, or block requests exceeding the selected threshold. It’s often a good idea to first deploy the rule with a log action to validate the threshold and then change the action to block or challenge when you are confident with the result. With every action, you can also choose between two behaviors: fixed action or throttle. Learn more about the difference in the next section.

Introducing the new throttle behavior

Until today, the only available behavior for Rate Limiting has been fixed action, where an action is triggered for a selected time period (also known as timeout). For example, did the IP 192.0.2.23 exceed the rate of 20 requests per minute? Then block (or log) all requests from this IP for, let’s say, 10 minutes.

In some situations, this type of penalty is too severe and risks affecting legitimate traffic. For example, if a device in a corporate network (think about NAT) exceeds the threshold, all devices sharing the same IP will be blocked outright.

With throttling, rate limiting selectively drops requests to maintain the rate within the specified threshold. It’s like a leaky bucket behavior (with the only difference that we do not implement a queuing system). For example, throttling a client to 20 requests per minute means that when a request comes from this client, we look at the last 60 seconds and see if (on average) we have received less than 20 requests. If this is true, the rule won’t perform any action. If the average is already at 20 requests then we will take action on that request. When another request comes in, we will check again. Since some time has passed the average rate might have dropped, making room for more requests.

Throttling can be used with all actions: block, log, or challenge. When creating a rule, you can select the behavior after choosing the action.

When using any challenge action, we recommend using the fixed action behavior. As a result, when a client exceeds the threshold we will challenge all requests until a challenge is passed. The client will then be able to reach the origin again until the threshold is breached again.

Throttle behavior is available to Enterprise rate limiting plans.

Try it out!

Today we are introducing a new Rate Limiting analytics experience along with the throttle behavior for all Rate Limiting users on Enterprise plans. We will continue to work actively on providing a better experience to save our customers' time. Log in to the dashboard, try out the new experience, and let us know your feedback using the feedback button located on the top right side of the Analytics page or by reaching out to your account team directly.

Back in 2017 we gave you Unmetered DDoS Mitigation, here's a birthday gift: Unmetered Rate Limiting for Self Serve customers

Daniele Molteni — Thu, 29 Sep 2022 13:00:00 GMT

In 2017, we made unmetered DDoS protection available to all our customers, regardless of their size or whether they were on a Free or paid plan. Today we are doing the same for Rate Limiting, one of the most successful products of the WAF family.

Rate Limiting is a very effective tool to manage targeted volumetric attacks, takeover attempts, bots scraping sensitive data, attempts to overload computationally expensive API endpoints and more. To manage these threats, customers deploy rules that limit the maximum rate of requests from individual visitors on specific paths or portions of their applications.

Until today, customers on a Free, Pro or Business plan were able to purchase Rate Limiting as an add-on with usage-based cost of $5 per million requests. However, we believe that an essential security tool like Rate Limiting should be available to all customers without restrictions.

Since we launched unmetered DDoS, we have mitigated huge attacks, like a 2 Tbps multi-vector attack or the most recent 26 million requests per second attack. We believe that releasing an unmetered version of Rate Limiting will increase the overall security posture of millions of applications protected by Cloudflare.

Today, we are announcing that Free, Pro and Business plans include Rate Limiting rules without extra charges.

…and we are not just dropping any Rate Limiting extra charges, we are also releasing an updated version of the product which is built on the powerful ruleset engine and allows building rules like in Custom Rules. This is the same engine which powers the enterprise-grade Advanced Rate Limiting. The new ‘Rate limiting rules’ will appear in your dashboard starting this week.

No more usage-based charges, just rate limiting when you need and how much you need it.

New Rate Limiting is in everyone's dashboard under the WAF tab.

Note: starting today, September 29th, Pro and Business customers have the new product available in their dashboard. Free customers will get their rules enabled during the week starting on October 3rd 2022.

End of usage-based charges

New customers get new Rate Limiting by default while existing customers will be able to run both products in parallel: new and previous version.

For new customers, new Rate Limiting rules will be included in each plan according to the following table:

When using these rules, no additional charges will be added to your account. No matter how much traffic these rules handle.

Existing customers will be granted the same amount of rules in the new, unmetered, system as the rules they’re currently using in the previous version (as of September 20, 2022). For example, if you are a Business customer with nine active rules in the previous version, you will get nine rules in the new system as well.

The previous version of Rate Limiting will still be subject to charges when in use. If you want to take advantage of the unmetered option, we recommend rewriting your rules in the new engine. As outlined below, new Rate Limiting offers all the capabilities of the previous version of Rate Limiting and more. In the future, the previous version of Rate Limiting will be deprecated, however we will give plenty of time to self-migrate rules.

New rate limiting engine for all

A couple of weeks ago, we announced that Cloudflare was named a Leader in the Gartner® Magic Quadrant™ for Web Application and API Protection (WAAP). One of the key services offered in our WAAP portfolio is Advanced Rate Limiting.

The recent Advanced Rate Limiting has shown great success among our Enterprise customers. Advanced Rate Limiting allows an unprecedented level of control on how to manage incoming traffic rate. We decided to give the same rule-building experience to all of our customers as well as some of its new features.

A summary of the feature set is outlined in the following table:

(1): Requires Bots Management add-on(2): Requires specific plan

Leveraging the ruleset engine. Previous version of Rate Limiting allows customers to scope the rule based on a single path and method of the request. Thanks to the ruleset engine, customers can now write rules like they do in Custom Rules and combine multiple parameters of the HTTP request.

For example, Pro domains can combine multiple paths in the same rule using the OR or AND operators. Business domains can also write rules using Source IP or User Agent. This allows enforcing different rates for specific User Agents. Furthermore, Business customers can now scope Rate Limiting to specific IPs (using IP List, for example) or exclude IPs where no attack is expected.

Both Rate Limiting products can be found under WAF→ Rate Limiting rules. Previous version of Rate Limiting (left) allows filtering traffic for one URL. New Rate Limiting (right) allows you to combine fields like in Custom Rules.

Counting and mitigation expressions are now separate. A feature request we often heard about was the ability to track the rate of requests on a specific path (such as ‘/login’) and, when an IP exceeds the threshold, block every request from the same IP hitting anywhere on your domain. Business and Enterprise customers can now achieve this by using the counting expression which is separate from the mitigation. The former defines what requests are used to compute the rate while the letter defines what requests are mitigated once the threshold has been reached.

Another use case for using the counting expression is when you need to use Origin Status Code or HTTP Response Headers. If you need to use these fields, we recommend creating a counting expression that includes response parameters and explicitly writing a filter that defines what the request parameters that will trigger a block action.

You can now separate the expression used to compute the rate from the expression used for blocking traffic once the rate is exceeded. In this example, all traffic to example.com will be blocked (see mitigation expression at the top) if more than 3 POST requests to ‘/login’ in 1 minute have returned 429 (defined by the counting expression at the bottom).

Counting dimensions. Similarly to the previous version, Free, Pro and Business customers will get the IP-based Rate Limiting. When we say IP-based we refer to the way we group (or count) requests. You can set a rule that enforces a maximum rate of request from the same IPs. If you set a rule to limit 10 requests over one minute, we will count requests from individual IPs until they reach the limit and then block for a period of time.

Advanced Rate Limiting users are able to group requests based on additional characteristics, such as API keys, cookies, session headers, ASN, query parameters, a JSON body field (e.g. the username value of a login request) and more.

What do Enterprise customers get? Enterprise customers do not get Rate Limiting as part of their contract by default. Rate Limiting is part of application security offering which needs to be contracted based on traffic volume. When WAF with Rate Limiting is included in their contract, they get access to 100 rules, a more comprehensive list of fields available in the rule builder, and they get to upgrade to Advanced Rate Limiting. Please reach out to your account team to learn more.

More information on how to use new Rate Limiting can be found in the documentation.

Additional information for existing customers

If you are a Free, Pro or Business customer, you will automatically get the new product in the dashboard. We will entitle you with as many unmetered Rate Limiting rules as you are using in the previous version.

If you are an Enterprise customer using the previous version of Rate Limiting, please reach out to the account team to discuss the options to move to new Rate Limiting.

To take advantage of the unmetered functionality, you will need to migrate your rules to the new system. The previous version will keep working as usual, and you might be charged based on the traffic that its rules evaluate.

Long term, the previous version of Rate Limiting will be deprecated and when this happens all rules still running on the old system will cease to run.

What’s next?

The WAF team has plans to further expand our Rate Limiting capabilities. Features we are considering include better analytics to support the rule creation. Furthermore, new Rate Limiting can now benefit from new fields made available in the WAF as soon as they are released. For example, Enterprise customers can combine Bot Score or the new WAF Attack Score to create a more fine grain security posture.

Introducing Advanced Rate Limiting

Daniele Molteni — Wed, 16 Mar 2022 12:58:53 GMT

Still relying solely on IP firewalling? It’s time to change that.

While the IP address might still be one of the core technologies allowing networks to function, its value for security is long gone. IPs are rarely static; nowadays, mobile operators use carrier-grade network address translation (CGNAT) to share the same IP amongst thousands of individual devices or users. Bots then carry out distributed attacks with low request volume from different IPs to elude throttling. Furthermore, many countries consider IP addresses to be personal data, and it would be a great advancement for privacy if a replacement could be found for elements of security that currently rely on IP addresses to function. A product that is affected by this trend is rate limiting.

Rate limiting is designed to stop requests from overloading a server. It relies on rules. A rate limiting rule is defined by a filter (which typically is a path, like /login) and the maximum number of requests allowed from each user over a period of time. When this threshold is exceeded, an action is triggered (usually a block) for subsequent requests from the same user for a period of time (known as a timeout). Traditional throttling solutions bucket together requests with the same IP since they follow the logic “requests from the same IP equals requests from the same user”. However, we hear from customers how not effective it is to use IP-based rate limiting to protect traffic, especially for authenticated APIs.

We are excited to launch Advanced Rate Limiting, a leap forward for throttling technologies. It allows counting requests based on virtually any characteristics of the HTTP request, regardless of its source IP. Rate Limiting is a great defense against brute force, scraping, or targeted DDoS attacks. Consequences of these attacks include leaking of sensitive data, account takeover or exhausting back-end resources. Keeping the rate of requests under control is especially crucial for APIs where each call can trigger costly computation on the server origin.

A step-change innovation for throttling

Advanced Rate Limiting is now part of the Web Application Firewall (WAF). It’s integrated with Firewall Rules and allows counting requests based on characteristics other than IP.

With Advanced Rate Limiting, you can:

Define the rule filter using all HTTP request characteristics, such as URI, method, headers, cookies and body fields. Customers on a Bot Management plan get access to the bot score dynamic field too. You can also use two characteristics of the HTTP response to trigger rate limiting: status code and response headers.
Choose to count requests based on: IP, country, header, cookie, AS Number (ASN), value of a query parameter, or bots fingerprint (JA3). You can use any of these fields individually or by combining them, so that requests are bucketed when these values are the same. It can also set the threshold as the maximum complexity your origin can handle, rather than the maximum number of requests you want to allow.
Use it on all your traffic. As an Enterprise customer, Rate Limiting could be bought on a portion of your total traffic. With Advanced Rate Limiting, you can use the product on all of your traffic without having to worry about caps. Finally, Advanced Rate Limiting is available on the entire Cloudflare network, including in China.

Designed to integrate with your application

In this section, we discuss a few common use cases for using Advanced Rate Limiting to protect your web or API traffic. You can mix and match all these configurations to better suit your security needs and your application. All these use cases can be achieved via dashboard, API and Terraform.

Use case - Protect web traffic with more granular rules

Flexible filters. You can now write rate limiting rules using all the fields of the HTTP request. For example, you can trigger a rule for requests with specific headers (such as User Agent) or throttle traffic from bots sharing the same ASN.

Separate mitigation expression. You can now separate the mitigation expression from the counting expression. This allows you to define on what part of your website you want to block users once the threshold is reached, and what conditions the request (and response) needs to meet in order to increase the counter. For example, you can count requests to your /login endpoint and then block the same user on the whole site. This is especially useful when you want to include response fields in your counting expression, for example, by counting only requests that return a specific response code but then block a larger portion of traffic.

Use dynamic fields. Customers can now combine Rate Limiting with rules detecting known vulnerabilities, such as WAF machine learning score. For example, you can block eyeballs after a number of consecutive requests flagged as SQLi have hit your site. Another use case is to trigger a throttling rule only for requests likely originated from bots (by using the bot score in the rule filter) or after a number of login attempts with stolen credentials have been performed (link). You can also use the JA3 fingerprint as a counting dimension, so that you leverage our Bot Machine Learning algorithm to bucket traffic from bots with the same fingerprint.

Use case - Protect APIs by integrating Rate Limiting with your application

Count requests based on session ID. API traffic is often authenticated, and the session can be tracked with a cookie, header (such as x-api-key) or query value. Advanced Rate Limiting allows you to define where the ID is in the request and track the number of requests relative to the same session, regardless of the IP. This can be an effective way to fend off distributed bot attacks that scrape sensitive data, such as product prices or airline passenger data.

Trigger rule based on a request body content. The rule filter gives access to the raw body and the JSON-parsed body. You can count requests where a body JSON field has a specific value using the function lookup_json_string available in the rule filter. This can be useful for GraphQL APIs, where different calls (or mutations) can be performed through the same endpoint but specifying different operations in the request body.

Rate Limiting based on complexity (coming soon in beta available now via API). Some API calls are more complex to serve than others, so counting on the number of requests doesn’t really reflect the actual cost to serve. GraphQL APIs are an example: each call complexity can vary widely based on how much processing the server needs to carry out to serve the request. Your origin can estimate the complexity of each request and return it along with the response, and rate limiting can increment the counter by the complexity estimate provided by the origin. You can then set a complexity threshold in the rule and, when it’s exceeded, subsequent requests will trigger an action, such as block.

Packaging

Advanced Rate Limiting is generally available for Enterprise customers on the new Advanced plan. See below for more details on what’s included in each plan. Reach out to your Cloudflare account team or Customer Success Manager (CSM) to learn more. If you are a Pro or Biz customer, you won’t be able to use Advanced Rate Limiting, but we are planning to give some advantages to Pro and Biz plans as well.

*requires Bot Management plan

What’s next for Rate Limiting

In the coming months, we are going to collect feedback from our customers to decide what additional features we should include in Advanced Rate Limiting. We have already a few ideas we are exploring, including automatically profiling your traffic and recommending thresholds for your rules.

Rate Limiting: Delivering more rules, and greater control

Alex Cruz Farmer — Mon, 21 May 2018 20:41:37 GMT

With more and more platforms taking the necessary precautions against DDoS attacks like integrating DDoS mitigation services and increasing bandwidth at weak points, Layer 3 and 4 attacks are just not as effective anymore. For Cloudflare, we have fully automated Layer 3/4 based protections with our internal platform, Gatebot. In the last 6 months we have seen a large upward trend of Layer 7 based DDoS attacks. The key difference to these attacks is they are no longer focused on using huge payloads (volumetric attacks), but based on Requests per Second to exhaust server resources (CPU, Disk and Memory). On a regular basis we see attacks that are over 1 million requests per second. The graph below shows the number of Layer 7 attacks Cloudflare has monitored, which is trending up. On average seeing around 160 attacks a day, with some days spiking up to over 1000 attacks.

A year ago, Cloudflare released Rate Limiting, and it is proving to be a hugely effective tool for customers to protect their web applications and APIs from all sorts of attacks, from “low and slow” DDoS attacks, through to bot-based attacks, such as credential stuffing and content scraping. We’re pleased about the success our customers are seeing with Rate Limiting and are excited to announce additional capabilities to give our customers further control.

So what’s changing?

There are times when you clearly know that traffic is malicious. In cases like this, our existing Block action is proving effective for our customers. But there are times when it is not the best option, and causes a negative user experience. Rather than risk a false negative, customers often want to challenge a client to ensure they are who they represent themselves to be, which is in most situations, human not a bot.

Firstly, to help customers more accurately identify the traffic, we are adding Cloudflare JavaScript Challenge, and Google reCAPTCHA (Challenge) mitigation actions to the UI and API for Pro and Business plans. The existing Block and Simulate actions still exist. As a reminder, to test any rule, deploying in Simulate means that you will not be charged for any requests. This is a great way to test your new rules to make sure they have been configured correctly.

Secondly, we’re making Rate Limiting more dynamically scalable. A new feature has been added which allows Rate Limiting to count on Origin Response Headers for Business and Enterprise customers. The way this feature works is by matching attributes which are returned by the Origin to Cloudflare.

The new capabilities - in action!

One of the things that really drives our innovation is solving the real problems we hear from customers every day. With that, we wanted to provide some real world examples of these new capabilities in action.

Each of the use cases have Basic and Advanced implementation options. After some testing, we found that tiering rate limits is an extremely effective solution against repeat offenders.

Credential Stuffing Protection for Login Pages and APIs. The best way to build applications is to utilise the standardized Status Codes. For example, if I fail to authenticate against an endpoint or a website, I should receive a “401” or “403”. Generally speaking a user to a website will often get their password wrong three times before selecting the “I forgot my password” option. Most Credential Stuff bots will try thousands of times cycling through many usernames and password combinations to see what works.

Here are some example rate limits which you can configure to protect your application from credential stuffing.

Basic:Cloudflare offers a “Protect My Login” feature out the box. Enter the URL for your login page and Cloudflare will create a rule such that clients that attempt to log in more than 5 times in 5 minutes will be blocked for 15 minutes.

With the new Challenge capabilities of Rate Limiting, you can customize the response parameters for log in to more closely match the behavior pattern for bots you see on your site through a custom-built rule.

Logging in four times in one minute is hard - I type fast, but couldn’t even do this. If I’m seeing this pattern in my logs, it is likely a bot. I can now create a Rate Limiting rule based on the following criteria:

With this new rule, if someone tries to log in four times within a minute, they will be thrown a challenge. My regular human users will likely never hit it, but if they do - the challenge insures they can still access the site.

Advanced:
And sometimes bots are just super persistent in their attacks. We can tier rules together to tackle repeat offenders. For example, instead of creating just a single rule, we can create a series of rules which can be tiered to protect against persistent threats:

With this type of tiering, any genuine users that are just having a hard time remembering their login details whilst also being extremely fast typers will not be fully blocked. Instead, they will first be given out automated JavaScript challenge followed by a traditional CAPTCHA if they hit the next limit. This is a much more user-friendly approach while still securing your login endpoints.

Time-based Firewall

Our IP Firewall is a powerful feature to block problematic IP addresses from accessing your app. Particularly this is related to repeated abuse, or based on IP Reputation or Threat Intelligence feeds that are integrated at the origin level.

While the IP firewall is powerful, maintaining and managing a list of IP addresses which are currently being blocked can be cumbersome. It becomes more complicated if you want to allow blocked IP addresses to “age out” if bad behavior stops after a period of time. This often requires authoring and managing a script and multiple API calls to Cloudflare.

The new Rate Limiting Origin Headers feature makes this all so much easier. You can now configure your origin to respond with a Header to trigger a Rate-Limit. To make this happen, we need to generate a Header at the Origin, which is then added to the response to Cloudflare. As we are matching on a static header, we can set a severity level based on the content of the Header. For example, if it was a repeat offender, you could respond with High as the Header value, which could Block for a longer period.

Create a Rate Limiting rule based on the following criteria:

Once that Rate-Limit has been created, Cloudflare’s Rate-Limiting will then kick-in immediately when that Header is received.

Enumeration Attacks

Enumeration attacks are proving to be increasingly popular and pesky to mitigate. With enumeration attacks, attackers identify an expensive operation in your app and hammer at it to tie up resources and slow or crash your app. For example, an app that offers the ability to look up a user profile requires a database lookup to validate whether the user exists. In an enumeration attack, attackers will send a random set of characters to that endpoint in quick succession, causing the database to ground to a halt.

Rate Limiting to the rescue!

One of our customers was hit with a huge enumeration attack on their platform earlier this year, where the aggressors were trying to do exactly what we described above, in an attempt to overload their database platform. Their Rate Limiting configuration blocked over 100,000,000 bad requests during the 6-hour attack.

When a query is sent to the app, and the user is not found, the app serves a 404 (page not found). A very basic approach is to set a rate limit for 404s. If a user crosses a threshold of 404’s in a period of time, set the app to challenge the user to prove themselves to be a real person.

To catch repeat offenders, you can tier the tier Rate Limits:

With this type of tiered defense in place, it means that you can “caution” an offender with a JavaScript challenge or Challenge (Google Captcha), and then “block” them if they continue.

Content Scraping

Increasingly, content owners are wrestling with content scraping - malicious bots copying copyrighted images or assets and redistributing or reusing them. For example, we work with an eCommerce store that uses copyrighted images and their images are appearing elsewhere on the web without their consent. Rate Limiting can help!

In their app, each page displays 4 copyrighted images, 1 which is actual size, and 3 which are thumbnails. By looking at logs and user patterns, they determined that most users, at a stretch, would never view more than 10–15 products in a minute, which would equate to 40–60 loads from the images store.

They chose to tier their Rate Limiting rules to prevent end users from getting unnecessarily blocked when they were browsing heavily. To block malicious attempts at content scraping can be quite simple, however it does require some forward planning. Placing the rate limit on the right URL is key to insure you are placing the rule on exactly what you are trying to protect and not the broader content. Here’s an example set of rate limits this customer set to protect their images:

As we can see here, rules 1 and 2 are counting based on the number of requests to each endpoint. Rule 3 is counting based on all hits to the image store, and if it gets above 75 requests, the user will be blocked for 4 hours. Finally, to avoid any enumeration or bots guessing image names and numbers, we are counting on 404 and 403s and challenging if we see unusual spikes.

One more thing ... more rules, totally rules!

We want to ensure you have the rules you need to secure your app. To do that, we are increasing the number of available rules for Pro and Business, for no additional charge.

Pro plans increase from 3 to 10 rules
Business plans increase from 3 to 15 rules

As always, Cloudflare only charges for good traffic - requests that are allowed through Rate Limiting, not blocked. For more information click here.

The Rate-Limiting feature can be enabled within the Firewall tab on the Dashboard, or by visiting: cloudflare.com/a/firewall

How we built rate limiting capable of scaling to millions of domains

Julien Desgats — Wed, 07 Jun 2017 12:47:51 GMT

Back in April we announced Rate Limiting of requests for every Cloudflare customer. Being able to rate limit at the edge of the network has many advantages: it’s easier for customers to set up and operate, their origin servers are not bothered by excessive traffic or layer 7 attacks, the performance and memory cost of rate limiting is offloaded to the edge, and more.

In a nutshell, rate limiting works like this:

Customers can define one or more rate limit rules that match particular HTTP requests (failed login attempts, expensive API calls, etc.)
Every request that matches the rule is counted per client IP address
Once that counter exceeds a threshold, further requests are not allowed to reach the origin server and an error page is returned to the client instead

This is a simple yet effective protection against brute force attacks on login pages and other sorts of abusive traffic like L7 DoS attacks.

Doing this with possibly millions of domains and even more millions of rules immediately becomes a bit more complicated. This article is a look at how we implemented a rate limiter able to run quickly and accurately at the edge of the network which is able to cope with the colossal volume of traffic we see at Cloudflare.

Let’s just do this locally!

As the Cloudflare edge servers are running NGINX, let’s first see how the stock rate limiting module works:

This module works great: it is reasonably simple to use (but requires a config reload for each change), and very efficient. The only problem is that if the incoming requests are spread across a large number of servers, this doesn’t work any more. The obvious alternative is to use some kind of centralized data store. Thanks to NGINX’s Lua scripting module, that we already use extensively, we could easily implement similar logic using any kind of central data backend.

But then another problem arises: how to make this fast and efficient?

All roads lead to Rome? Not with anycast!

Since Cloudflare has a vast and diverse network, reporting all counters to a single central point is not a realistic solution as the latency is far too high and guaranteeing the availability of the central service causes more challenges.

First let’s take a look at how the traffic is routed in the Cloudflare network. All the traffic going to our edge servers is anycast traffic. This means that we announce the same IP address for a given web application, site or API worldwide, and traffic will be automatically and consistently routed to the closest live data center.

This property is extremely valuable: we are sure that, under normal conditions[1], the traffic from a single IP address will always reach the same PoP. Unfortunately each new TCP connection might hit a different server inside that PoP. But we can still narrow down our problem: we can actually create an isolated counting system inside each PoP. This mostly solves the latency problem and greatly improves the availability as well.

Storing counters

At Cloudflare, each server in our edge network is as independent as possible to make their administration simple. Unfortunately for rate limiting, we saw that we do need to share data across many different servers.

We actually had a similar problem in the past with SSL session IDs: each server needed to fetch TLS connection data about past connections. To solve that problem we created a Twemproxy cluster inside each of our PoPs: this allows us to split a memcache[2] database across many servers. A consistent hashing algorithm ensures that when the cluster is resized, only a few number of keys are hashed differently.

In our architecture, each server hosts a shard of the database. As we already had experience with this system, we wanted to leverage it for the rate limit as well.

Algorithms

Now let’s take a deeper look at how the different rate limit algorithms work. What we call the sampling period in the next paragraph is the reference unit of time for the counter (1 second for a 10 req/sec rule, 1 minute for a 600 req/min rule, ...).

The most naive implementation is to simply increment a counter that we reset at the start of each sampling period. This works but is not terribly accurate as the counter will be arbitrarily reset at regular intervals, allowing regular traffic spikes to go through the rate limiter. This can be a problem for resource intensive endpoints.

Another solution is to store the timestamp of every request and count how many were received during the last sampling period. This is more accurate, but has huge processing and memory requirements as checking the state of the counter require reading and processing a lot of data, especially if you want to rate limit over long period of time (for instance 5,000 req per hour).

The leaky bucket algorithm allows a great level of accuracy while being nicer on resources (this is what the stock NGINX module is using). Conceptually, it works by incrementing a counter when each request comes in. That same counter is also decremented over time based on the allowed rate of requests until it reaches zero. The capacity of the bucket is what you are ready to accept as “burst” traffic (important given that legitimate traffic is not always perfectly regular). If the bucket is full despite its decay, further requests are mitigated.

However, in our case, this approach has two drawbacks:

It has two parameters (average rate and burst) that are not always easy to tune properly
We were constrained to use the memcached protocol and this algorithm requires multiple distinct operations that we cannot do atomically[3]

So the situation was that the only operations available were GET, SET and INCR (atomic increment).

Sliding windows to the rescue

CC BY-SA 2.0 image by halfrain

The naive fixed window algorithm is actually not that bad: we just have to solve the problem of completely resetting the counter for each sampling period. But actually, can’t we just use the information from the previous counter in order to extrapolate an accurate approximation of the request rate?

Let’s say I set a limit of 50 requests per minute on an API endpoint. The counter can be thought of like this:

In this situation, I did 18 requests during the current minute, which started 15 seconds ago, and 42 requests during the entire previous minute. Based on this information, the rate approximation is calculated like this:

One more request during the next second and the rate limiter will start being very angry!

This algorithm assumes a constant rate of requests during the previous sampling period (which can be any time span), this is why the result is only an approximation of the actual rate. This algorithm can be improved, but in practice it proved to be good enough:

It smoothes the traffic spike issue that the fixed window method has
It very easy to understand and configure: no average vs. burst traffic, longer sampling periods can be used to achieve the same effect
It is still very accurate, as an analysis on 400 million requests from 270,000 distinct sources shown:
- 0.003% of requests have been wrongly allowed or rate limited
- An average difference of 6% between real rate and the approximate rate
- 3 sources have been allowed despite generating traffic slightly above the threshold (false negatives), the actual rate was less than 15% above the threshold rate
- None of the mitigated sources was below the threshold (false positives)

Moreover, it offers interesting properties in our case:

Tiny memory usage: only two numbers per counter
Incrementing a counter can be done by sending a single INCR command
Calculating the rate is reasonably easy: one GET command[4] and some very simple, fast math

So here we are: we can finally implement a good counting system using only a few memcache primitives and without much contention. Still we were not happy with that: it requires a memcached query to get the rate. At Cloudflare we’ve seen a few of the largest L7 attacks ever. We knew that large scale attacks would have crushed the memcached cluster like this. More importantly, such operations would slow down legitimate requests a little, even under normal conditions. This is not acceptable.

This is why the increment jobs are run asynchronously without slowing down the requests. If the request rate is above the threshold, another piece of data is stored asking all servers in the PoP to start applying the mitigation for that client. Only this bit of information is checked during request processing.

Even more interesting: once a mitigation has started, we know exactly when it will end. This means that we can cache that information in the server memory itself. Once a server starts to mitigate a client, it will not even run another query for the subsequent requests it might see from that source!

This last tweak allowed us to efficiently mitigate large L7 attacks without noticeably penalizing legitimate requests.

Conclusion

Despite being a young product, the rate limiter is already being used by many customers to control the rate of requests that their origin servers receive. The rate limiter already handles several billion requests per day and we recently mitigated attacks with as many as 400,000 requests per second to a single domain without degrading service for legitimate users.

We just started to explore how we can efficiently protect our customers with this new tool. We are looking into more advanced optimizations and create new features on the top of the existing work.

Interested in working on high-performance code running on thousands of servers at the edge of the network? Consider applying to one of our open positions!

The inner workings of anycast route changes are outside of the scope of this article, but we can assume that they are rare enough in this case. ↩︎
Twemproxy also supports Redis, but our existing infrastructure was backed by Twemcache (a Memcached fork) ↩︎
Memcache does support CAS (Compare-And-Set) operations and so optimistic transactions are possible, but it is hard to use in our case: during attacks, we will have a lot of requests, creating a lot of contention, in turn resulting in a lot of CAS transactions failing. ↩︎
The counters for the previous and current minute can be retrieved with a single GET command ↩︎

Cloudflare Rate Limiting - Insight, Control, and Mitigation against Layer 7 DDoS Attacks

Timothy Fong — Thu, 13 Apr 2017 20:34:00 GMT

Today, Cloudflare is extending its Rate Limiting service by allowing any of our customers to sign up. Our Enterprise customers have enjoyed the benefits of Cloudflare’s Rate Limiting offering for the past several months. As part of our mission to build a better internet, we believe that everyone should have the ability to sign up for the service to protect their websites and APIs.

CC-BY 2.0 image by Benjamin Child

Rate Limiting is one more feature in our arsenal of tools that help to protect our customers against denial-of-service attacks, brute-force password attempts, and other types of abusive behavior targeting the application layer. Application layer attacks are usually a barrage of HTTP/S requests which may look like they originate from real users, but are typically generated by machines (or bots). As a result, application layer attacks are often harder to detect and can more easily bring down a site, application, or API. Rate Limiting complements our existing DDoS protection services by providing control and insight into Layer 7 DDoS attacks.

Rate Limiting is now available to all customers across all plans as an optional paid feature. The first 10,000 qualifying requests are free, which allows customers to start using the feature without any cost .

Real world examples of how Rate Limiting helped Cloudflare customers

Over the past few months, Cloudflare customers ranging from e-commerce companies to high-profile, ad-driven platforms have been using this service to mitigate malicious attacks. It made a big difference to their business: they’ve stopped revenue loss, reduced infrastructure costs, and protected valuable information, such as intellectual property and/or customer data.

Several common themes have emerged for customers who have been successfully using Rate Limiting during the past couple months. The following are examples of some of the issues those customers have faced and how Rate Limiting addressed them.

High-volume attacks designed to bring down e-commerce sites

Buycraft, a Minecraft e-commerce platform, was subjected to denial-of-service attacks which could have brought down the e-commerce stores of its 500,000+ customers. Rate Limiting addresses this common attack type by blocking offending IP addresses at its network edge, so the malicious traffic doesn’t reach the origin servers and impact customers.

Attacks against API endpoints

Haveibeenpwned.com provides an API that surfaces accounts that have been hacked to help potential victims identify whether their credentials have been compromised. Troy Hunt (the service’s creator), decided to use Cloudflare’s Rate Limiting to protect his API from malicious traffic, leading to improved performance and reduced infrastructure costs.

Brute-force login attacks

After IT consulting firm 2600 Solutions, which manages Wordpress sites for clients, was brute-forced over 200 times in a month, owner Jeff Williams decided to use Cloudflare Rate Limiting. By blocking excessive failed login attempts, they were able to not only protect their clients’ sites from being compromised, they also ensured legitimate users were not impacted by slower application performance.

Bots scraping the site for content

Another Cloudflare customer saw valuable content being scraped from their site by competitors using bots. Competitors then used this scraped content to boost their own search engine ranking at the expense of the targeted site. Our customer lost tens of thousands of dollars before using Cloudflare’s Rate Limiting to prevent the bots from scraping content.

How do I get started with Rate Limiting?

Anyone can start utilizing the benefits of Cloudflare’s Rate Limiting. With the Cloudflare Dashboard, go to the Firewall tab, and within the Rate Limiting card, click on “Enable Rate Limiting.”

Even though you will be prompted to enter a payment method to start using the service, you will not be charged for the first 10,000 qualifying requests. Once done, you’ll be able to create rules.

If you are on an Enterprise plan, contact your Cloudflare Customer Success Manager to enable Rate Limiting.

Tighter control over the type of traffic to rate limit

As customers begin to understand attack patterns and their own application’s potential vulnerabilities, they can tighten criteria. All customers can create path-specific rules, using wildcards (for example: www.example.com/login/\* or www.example.com/\*/checkout.php). Customers on a Business or higher plan can specify to rate limit only certain HTTP request methods.

Simulate traffic to tune your rules

Customers on the Pro and higher plans will be able to ‘simulate’ rules. A rule in simulate mode will not actually block malicious traffic, but will allow you to understand what traffic will be blocked if you were to setup a ‘live’ rule. All Customers will have analytics (coming soon) to let them gain insights into the traffic patterns to their site, and the efficacy of their rules.

Next Steps

If you haven’t enabled Rate Limiting yet, go to the Firewall App and enable Rate Limiting
Create your first rule
For more information, including a demo of Rate Limiting in action, visit www.cloudflare.com/rate-limiting/.

Rate Limiting: Live Demo

Timothy Fong — Fri, 30 Sep 2016 19:56:00 GMT

Cloudflare helps customers control their own traffic at the edge. One of two products that we introduced to empower customers to do so is Cloudflare Rate Limiting*.

CC BY 2.0 image by Brian Hefele

Rate Limiting allows a customer to rate limit, shape or block traffic based on the rate of requests per client IP address, cookie, authentication token, or other attributes of the request. Traffic can be controlled on a per-URI (with wildcards for greater flexibility) basis giving pinpoint control over a website, application, or API.

Cloudflare has been dogfooding Rate Limiting to add more granular controls against Layer 7 DOS and brute-force attacks. For example, we've experienced attacks on cloudflare.com from more than 4,000 IP addresses sending 600,000+ requests in 5 minutes to the same URL but with random parameters. These types of attacks send large volumes of HTTP requests intended to bring down our site or to crack login passwords.

Rate Limiting protects websites and APIs from similar types of bad traffic. By leveraging our massive network, we are able to process and enforce rate limiting near the client, shielding the customer's application from unnecessary load.

To make this more concrete, let's look at a live demonstration rule for cloudflare.com. Multiple rules may be used and combined to great effect -- this is just a limited example.

Read on, and then test it yourself.

Creating the rule

Imagine an endpoint that is resource intensive. To maintain availability, we want to protect it from high-volume request rates - like those from an aggressive bot or attacker.

URL *.cloudflare.com/rate-limit-test

Rate Limiting allows for * wildcards to give more flexibility. An API with multiple endpoints might use a pattern of api.example.com/v2/*

With that pattern, all resources under /v2 would be protected by the same rule.

ThresholdWe set this demonstration rule to 10 requests per minute, which is too sensitive for a real web application, but allows a curious user refreshing their browser ten times to see Rate Limiting in action.

ActionWe set this value to block which means that once an IP addresses triggers a rule, all traffic from that IP address will be blocked at the edge and served with a default 429 HTTP error code.

Other possible choices include simulate which means no action taken, but analytics would indicate which requests would have been mitigated to help customers evaluate the potential impact of a given rule.

Timeout

This is the duration of the mitigation once the rule has been triggered. In this example, an offending IP address will be blocked for 1 minute.

Response body type

This type was set to HTML in the demo so that Rate Limiting returns a web page to mitigated requests. For an API endpoint, the response body type could return JSON.

Response body

The response body can be anything you want. Refresh the link below 10 times very quickly to see our choice for this demonstration rule.

https://www.cloudflare.com/rate-limit-test

Other possible configurations

We could have specified a Method. If we only cared to rate limit POST requests, we could adjust the rule to do so. This rule could be used for a login page where high frequency of POSTs by the same IP is potentially suspicious.

We also could have specified a Response Code. If we only wanted to rate limit IPs which were consistently failing to authenticate, we could create the rule to trigger only after a certain threshold of 403’s have been served. Once an IP is flagged, perhaps because it was pounding a login endpoint with incorrect credentials, that client IP could be blocked from hitting either that endpoint or the whole site.

We will expand the matching criteria, such as adding headers or cookies. We will also extend the mitigation options to include CAPTCHA or other challenges. This will give our users even more flexibility and power to protect their websites and API endpoints.

Early Access

We'd love to have you try Rate Limiting. Read more and sign up for Early Access.

**Note: This post was updated 4/13/17 to reflect the current product name. All references to Traffic Control have been changed to Rate Limiting.*

Control your traffic at the edge with Cloudflare

John Roberts — Thu, 29 Sep 2016 14:04:00 GMT

Today, we're introducing two new Cloudflare Traffic products to give customers control over how Cloudflare’s edge network handles their traffic, allowing them to shape and direct it for their specific needs.

More than 10 trillion requests flow through Cloudflare every month. More than 4 million customers and 10% of internet requests benefit from our global network. Cloudflare's virtual backbone gives every packet improved performance, security, and reliability.

That's the macro picture.

What's more interesting is keeping each individual customer globally available. While every customer benefits from the network effect of Cloudflare, each customer is (appropriately) focused on their application uptime, security and performance.

Rate Limiting*

Cloudflare’s new Rate Limiting allows a customer to rate limit, shape or block traffic based on the number of requests per second per IP, cookie, or authentication token. Traffic can be controlled on a per-URI (with wildcards for greater flexibility) basis giving pinpoint control over a website, application, or API.

Customers seek reliability and availability in the face of popularity or unexpected traffic such as slow brute force attacks on a WordPress site, Denial of Service against dynamic pages, or the stampede of requests that comes with success. We are the leader at stopping significant DDoS attacks and offer a comprehensive WAF to target specific application-level attacks.

Now we are adding the capability to give each customer fine-grained control over the traffic that reaches their origin servers.

Even well-engineered applications have a resource limit. Any dynamic endpoint either has a hard system limit or an economic limit on the number of servers you can afford. Those expensive endpoints need additional protection against floods of traffic, including legitimate visitors. You can provision for the worst case...but when you find a new Pokémon Go on your hands, the best case hurts, too.

To shield origins from attack and preserve uptime, Cloudflare Rate Limiting lets you throttle, block, and otherwise control the flow of traffic to maintain availability, limit economic impact and preserve performance. All of which can be done with thoughtful configuration, testing rules to measure their impact, and applying changes globally within seconds.

That solves several problems. Rate Limiting protects APIs as well as web pages. Different version of your APIs can not only have different rate limit triggers, but they can return custom JSON responses or response codes if, for instance, you want to obfuscate the standard 429 HTTP error code.

Static pages are easy to cache at Cloudflare's edge, so high traffic on a home page is welcome. But a competitor scraping your search results is different, and may cause economic pain in server resources in addition to disrupting business as usual. So Rate Limiting lets you define specific URLs with lower limits and different policies.

Similarly, rules designed to protect a login endpoint enables real users to still access your application while defending yourself from brute-force attacks designed to break into your system or simply exhaust your resources. Rate Limiting gives customers this power, in part, by distinguishing between POSTs versus GETs and identifying authentication failures through the server response code.

From rate limiting within your applications to replacing hardware capabilities at the top of your rack, Rate Limiting solves a problem that otherwise requires several tools and custom development to solve. In the future, Rate Limiting will become even more intelligent, automatically enabling caching on your marketing site on a product launch, queueing customers on Black Friday to ensure your e-commerce system handles the demand, and helping to maximize the return on your IT investments by protecting them from damaging spikes and traffic patterns.

Traffic Manager

For many customers, gone are the days of running a single server for their web application. Two scenarios are common: a single datacenter or cloud provider running multiple load-balanced servers, and replication of that infrastructure across multiple geographies.

Customers have moved to load balanced infrastructure to provide reliability, handle traffic spikes, and handle traffic locally in different regions of the world.

The beauty of the single server approach was that it was simple to manage: all traffic from everywhere on the Internet hit the same server. Unfortunately, that doesn’t scale to today’s Internet and so controls are needed to handle load balancing across servers and locations.

Cloudflare’s new Traffic Manager enables a customer to keep their application running during a failure or better handle unexpected spikes in traffic by load balancing across multiple servers, datacenters, and geographies.

Traffic Manager has four major features: health checks, load balancing, failover and geo-steering.

Health checks automatically test the availability of individual origin servers so that Traffic Manager has a real-time view of the health of your origin servers. This information is used for failover and load balancing.

Load balancing automatically shares traffic across a collection of origin servers; if an origin server fails a health check it is automatically removed and load is shared across the remaining servers. For more complex environments an entire collection of origin servers can be removed from receiving traffic if the number of failed servers in the collection falls below some safe threshold.

Geo-steering allows a customer to configure traffic delivery to specific origin server groups based on the physical location of the visitor.

Health checks, load balancing, failover, and geo-steering work together to give customers fine grained control over their traffic.

Cloudflare Traffic Manager checks the health of applications from 100+ locations around the world, taking automatic action to route around failure, based on policies crafted by each customer.

Fiber cut in Malaysia? Host not responding for customers in Minneapolis? Checking from every location on a network with more public internet exchanges than any other company means you get localized, specific decision making. When a problem crops up on the network path to one of our customers' servers, we'll route that traffic instantly to another healthy, available server -- without changing the policy for other, healthy routes.

Sometimes, it's not the network. With Traffic Manager, you may load balance across individual hosts and across multiple hosts. Application down on AWS? Send those visitors to your Rackspace-hosted application in seconds. Failover from a cloud provider to your own datacenter, and back again as soon as your primary location is healthy again.

Other times, customers run different instances to account for the speed of light, and be as close as possible to their customers -- much as Cloudflare does. With Traffic Manager, you can route visitors to your site to the nearest host, based on their region. Choose to send visitors in Munich to your datacenter in Amsterdam, and visitors from Kansas City to your St. Louis datacenter. These policies can be combined, so if your European datacenter is down, then and only then send that traffic to your United States datacenter.

Many of our customers put significant investment into the availability of their infrastructure. Traffic Manager extends that investment across Cloudflare's ever-growing global network.

Early Access

CloudFlare Traffic is now available in Early Access for all customers, and will be available publicly before the end of the year. Read more about Traffic Manager and Rate Limiting and request access in your Cloudflare dashboard, in the Traffic app.

**Note: This post was updated 4/13/17 to reflect the current product name. All references to Traffic Control have been changed to Rate Limiting.*