The Cloudflare Blog

Active defense: introducing a stateful vulnerability scanner for APIs

John Cosgrove — Mon, 09 Mar 2026 14:00:00 GMT

Security is traditionally a game of defense. You build walls, set up gates, and write rules to block traffic that looks suspicious. For years, Cloudflare has been a leader in this space: our Application Security platform is designed to catch attacks in flight, dropping malicious requests at the edge before they ever reach your origin. But for API security, defensive posturing isn’t enough.

That’s why today, we are launching the beta of Cloudflare’s Web and API Vulnerability Scanner.

We are starting with the most pervasive and difficult-to-catch threat on the OWASP API Top 10: Broken Object Level Authorization, or BOLA. We will add more vulnerability scan types over time, including both API and web application threats.

The most dangerous API vulnerabilities today aren’t generic injection attacks or malformed requests that a WAF can easily spot. They are logic flaws—perfectly valid HTTP requests that meet the protocol and application spec but defy the business logic.

To find these, you can’t just wait for an attack. You have to actively hunt for them.

The Web and API Vulnerability Scanner will be available first for API Shield customers. Read on to learn why we are focused on API security scans for this first release.

Why purely defensive security misses the mark

In the web application world, vulnerabilities often look like syntax errors. A SQL injection attempt looks like code where data should be. A cross-site scripting (XSS) attack looks like a script tag in a form field. These have signatures.

API vulnerabilities are different. To illustrate, let’s imagine a food delivery mobile app that communicates solely with an API on the backend. Let’s take the orders endpoint:

Endpoint Definition: /api/v1/orders

Method	Resource Path	Description
GET	/api/v1/orders/{order_id}	Check Status. Returns the tracking status of a specific order (e.g., "Kitchen is preparing").
PATCH	/api/v1/orders/{order_id}	Update Order. Allows the user to modify the drop-off location or add delivery instructions.

In a broken authorization attack like BOLA, User A (the attacker) requests to update the delivery address of a paid-for order belonging to User B (the victim). The attacker simply inserts User B’s {order_id} in the PATCH request.

Here is what that request looks like, with ‘8821’ as User B’s order ID. Notice that User A is fully authenticated with their own valid token:

PATCH /api/v1/orders/8821 HTTP/1.1
Host: api.example.com
Authorization: Bearer 
Content-Type: application/json

{
  "delivery_address": "123 Attacker Way, Apt 4",
  "instructions": "Leave at front door, ring bell"
}

The request headers are valid. The authentication token is valid. The schema is correct. To a standard WAF, this request looks perfect. A bot management offering may even be fooled if a human is manually sending the attack requests.

User A will now get B’s food delivered to them! The vulnerability exists because the API endpoint fails to validate if User A actually has permission to view or update user B’s data. This is a failure of logic, not syntax. To fix this, the API developer could implement a simple check: if (order.userID != user.ID) throw Unauthorized;

You can detect these types of vulnerabilities by actively sending API test traffic or passively listening to existing API traffic. Finding these vulnerabilities through passive scanning requires context. Last year we launched BOLA vulnerability detection for API Shield. This detection automatically finds these vulnerabilities by passively scanning customer traffic for usage anomalies. To be successful with this type of scanning, you need to know what a "valid" API call looks like, what the variable parameters are, how a typical user behaves, and how the API behaves when those parameters are manipulated.

Yet there are reasons security teams may not have any of that context, even with access to API Shield’s BOLA vulnerability detection. Development environments may need to be tested but lack user traffic. Production environments may (thankfully) have a lack of attack traffic yet still need analysis, and so on. In these circumstances, and to be proactive in general, teams can turn to Dynamic Application Security Testing (DAST). By creating net-new traffic profiles intended specifically for security testing, DAST tools can look for vulnerabilities in any environment at any time.

Unfortunately, traditional DAST tools have a high barrier to entry. They are often difficult to configure, require you to manually upload and maintain Swagger/OpenAPI files, struggle to authenticate correctly against modern complex login flows, and can simply lack any API-specific security tests (e.g. BOLA).

Cloudflare’s API scanning advantage

In the food delivery order example above, we assumed the attacker could find a valid order to modify. While there are often avenues for attackers to gather this type of intelligence in a live production environment, in a security testing exercise you must create your own objects before testing the API’s authorization controls. For typical DAST scans, this can be a problem, because many scanners treat each individual request on its own. This method fails to chain requests together in the logical pattern necessary to find broken authorization vulnerabilities. Legacy DAST scanners can also exist as an island within your security tooling and orchestration environment, preventing their findings from being shared or viewed in context.

Vulnerability scanning from Cloudflare is different for a few key reasons.

First, Security Insights will list results from our new scans alongside any existing Cloudflare security findings for added context. You’ll see all your posture management information in one place.

Second, we already know your API’s inputs and outputs. If you are an API Shield customer, Cloudflare already understands your API. Our API Discovery and Schema Learning features passively catalog your endpoints and learn your traffic patterns. While you’ll need to manually upload an OpenAPI spec to get started for our initial release, you will be able to get started quickly without one in a future release.

Third, because we sit at the edge, we can turn passive traffic inspection knowledge into active intelligence. It will be easy to verify BOLA vulnerability detection risks (found via traffic inspection) by sending net-new HTTP requests with the vulnerability scanner.

And finally, we have built a new, stateful DAST platform, as we detail below. Most scanners require hours of setup to "teach" the tool how to talk to your API. With Cloudflare, you can effectively skip that step and get started quickly. You provide the API credentials, and we’ll use your API schemas to automatically construct a scan plan.

Building automatic scan plans

APIs are commonly documented using OpenAPI schemas. These schemas denote the host, method, and path (commonly, an “endpoint”) along with the expected parameters of incoming requests and resulting responses. In order to automatically build a scan plan, we must first make sense of these API specifications for any given API to be scanned.

Our scanner works by building up an API call graph from an OpenAPI document and subsequently walking it, using attacker and owner contexts. Owners create resources, attackers subsequently try to access them. Attackers are fully authenticated with their own set of valid credentials. If an attacker successfully reads, modifies or deletes an unowned resource, an authorization vulnerability is found.

Consider for example the above delivery order with ID 8821. For the server-side resource to exist, it needed to be originally created by an owner, most likely in a “genesis” POST request with no or minimal dependencies (previous necessary calls and resulting data). Modelling the API as a call graph, such an endpoint constitutes a node with no or few incoming edges (dependencies). Any subsequent request, such as the attacker’s PATCH above, then has a data dependency (the data is order_id) on the genesis request (the POST). Without all data provided, the PATCH cannot proceed.

Here we see in purple arrows the nodes in this API graph that are necessary to visit to add a note to an order via the POST /api/v1/orders/{order_id}/note/{note_id} endpoint. Importantly, none of the steps or logic shown in the diagram is available in the OpenAPI specification! It must be inferred logically through some other means, and that is exactly what our vulnerability scanner will do automatically.

In order to reliably and automatically plan scans across a variety of APIs, we must accurately model these endpoint relationships from scratch. However, two problems arise: data quality of API specifications is not guaranteed, and even functionally complete schemas can have ambiguous naming schemes. Consider a simplified OpenAPI specification for the above API, which might look like

openapi: 3.0.0
info:
  title: Order API
  version: 1.0.0
paths:
  /api/v1/orders:
    post:
      summary: Create an order
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                product:
                  type: string
                count:
                  type: integer
              required:
                - product
                - count
      responses:
        '201':
          description: Item created successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  result:
                    type: object
                    properties:
                      id:
                        type: integer
                      created_at:
                        type: integer
                  errors:
                    type: array
                    items:
                      type: string
  /api/v1/orders/{order_id}:
    patch:
      summary: Modify an order by ID
      parameters:
        - name: order_id
          in: path

We can see that the POST endpoint returns responses such as

{
    "result": {
        "id": 8821,
        "created_at": 1741476777
    },
   "errors": []
}

To a human observer, it is quickly evident that $.result.id is the value to be injected in order_id for the PATCH endpoint. The id property might also be called orderId, value or something else, and be nested arbitrarily. These subtle inconsistencies in OpenAPI documents of arbitrary shape are intractable for heuristics-based approaches.

Our scanner uses Cloudflare’s own Workers AI platform to tackle this fuzzy problem space. Models such as OpenAI’s open-weight gpt-oss-120b are powerful enough to match data dependencies reliably, and to generate realistic fake data where necessary, essentially filling in the blanks of OpenAPI specifications. Leveraging structured outputs, the model produces a representation of the API call graph for our scanner to walk, injecting attacker and owner credentials appropriately.

This approach tackles the problem of needing human intelligence to infer authorization and data relationships in OpenAPI schemas with artificial intelligence to do the same. Structured outputs bridge the gap from the natural language world of gpt-oss back to machine-executable instructions. In addition to Workers AI solving the planning problem, self-hosting on Workers AI means our system automatically benefits from Cloudflare’s highly available, globally distributed architecture.

Built on proven foundations

Building a vulnerability scanner that customers will trust with their API credentials demands proven infrastructure. We did not reinvent the wheel here. Instead, we integrated services that have been validated and deployed across Cloudflare for two crucial components of our scanner platform: the scanner’s control plane and the scanner’s secrets store.

The scanner's control plane integrates with Temporal for Scan Orchestration, on which other internal services at Cloudflare already rely. The complexity of the numerous test plans executed in each Scan is effectively managed by Temporal's durable execution framework.

The entire backend is written in Rust, which is widely adopted at Cloudflare for infrastructure services. This lets us reuse internal libraries and share architectural patterns across teams. It also positions our scanner for potential future integration with other Cloudflare systems like FL2 or our test framework Flamingo – enabling scenarios where scanning could coordinate more tightly with edge request handling or testing infrastructure.

Credential security through HashiCorp’s Vault Transit Secret Engine

Scanning for broken authentication and broken authorization vulnerabilities requires handling API user credentials. Cloudflare takes this responsibility very seriously.

We ensure that our public API layer has minimal access to unencrypted customer credentials by using HashiCorp's Vault Transit Secret Engine (TSE) for encryption-as-a-service. Immediately upon submission, credentials are encrypted by TSE—which handles the encryption but does not store the ciphertext—and are subsequently stored on Cloudflare infrastructure.

Our API is not authorized to decrypt this data. Instead, decryption occurs only at the last stage when a TestPlan makes a request to the customer's infrastructure. Only the Worker executing the test is authorized to request decryption, a restriction we strengthen using strict typing with additional safety rails inside Rust to enforce minimal access to decryption methods.

We further secure our customers’ credentials through regular rotation and periodic rewraps using TSE to mitigate risk. This process means we only interact with the new ciphertext, and the original secret is kept unviewable.

What’s next?

We are releasing BOLA vulnerability scanning starting today as an Open Beta for all API Shield customers, and are working on future API threat scans for future release. Via the Cloudflare API, you can trigger scans, manage configuration, and retrieve results programmatically to integrate directly into your CI/CD pipelines or security dashboards. For API Shield Customers: check the developer docs to start scanning your endpoints for BOLA vulnerabilities today.

We are starting with BOLA vulnerabilities because they are the hardest API vulnerability to solve and the highest risk for our customers. However, this scanning engine is built to be extensible.

In the near future, we plan to expand the scanner’s capabilities to cover the most popular of the OWASP Web Top 10 as well: classic web vulnerabilities like SQL injection (SQLi) and cross-site scripting (XSS). To be notified upon release, sign up for the waitlist here, and you’ll be first to learn when we expand the engine to general web application vulnerabilities.

One platform to manage your company’s predictive security posture with Cloudflare

Zhiyuan Zheng — Tue, 18 Mar 2025 13:00:00 GMT

In today’s fast-paced digital landscape, companies are managing an increasingly complex mix of environments — from SaaS applications and public cloud platforms to on-prem data centers and hybrid setups. This diverse infrastructure offers flexibility and scalability, but also opens up new attack surfaces.

To support both business continuity and security needs, “security must evolve from being reactive to predictive”. Maintaining a healthy security posture entails monitoring and strengthening your security defenses to identify risks, ensure compliance, and protect against evolving threats. With our newest capabilities, you can now use Cloudflare to achieve a healthy posture across your SaaS and web applications. This addresses any security team’s ultimate (daily) question: How well are our assets and documents protected?

A predictive security posture relies on the following key components:

Real-time discovery and inventory of all your assets and documents
Continuous asset-aware threat detection and risk assessment
Prioritised remediation suggestions to increase your protection

Today, we are sharing how we have built these key components across SaaS and web applications, and how you can use them to manage your business’s security posture.

Your security posture at a glance

Regardless of the applications you have connected to Cloudflare’s global network, Cloudflare actively scans for risks and misconfigurations associated with each one of them on a regular cadence. Identified risks and misconfigurations are surfaced in the dashboard under Security Center as insights.

Insights are grouped by their severity, type of risks, and corresponding Cloudflare solution, providing various angles for you to zoom in to what you want to focus on. When applicable, a one-click resolution is provided for selected insight types, such as setting minimum TLS version to 1.2 which is recommended by PCI DSS. This simplicity is highly appreciated by customers that are managing a growing set of assets being deployed across the organization.

To help shorten the time to resolution even further, we have recently added role-based access control (RBAC) to Security Insights in the Cloudflare dashboard. Now for individual security practitioners, they have access to a distilled view of the insights that are relevant for their role. A user with an administrator role (a CSO, for example) has access to, and visibility into, all insights.

In addition to account-wide Security Insights, we also provide posture overviews that are closer to the corresponding security configurations of your SaaS and web applications. Let’s dive into each of them.

Securing your SaaS applications

Without centralized posture management, SaaS applications can feel like the security wild west. They contain a wealth of sensitive information – files, databases, workspaces, designs, invoices, or anything your company needs to operate, but control is limited to the vendor’s settings, leaving you with less visibility and fewer customization options. Moreover, team members are constantly creating, updating, and deleting content that can cause configuration drift and data exposure, such as sharing files publicly, adding PII to non-compliant databases, or giving access to third party integrations. With Cloudflare, you have visibility across your SaaS application fleet in one dashboard.

Posture findings across your SaaS fleet

From the account-wide Security Insights, you can review insights for potential SaaS security issues:

You can choose to dig further with Cloud Access Security Broker (CASB) for a thorough review of the misconfigurations, risks, and failures to meet best practices across your SaaS fleet. You can identify a wealth of security information including, but not limited to:

Publicly available or externally shared files
Third-party applications with read or edit access
Unknown or anonymous user access
Databases with exposed credentials
Users without two-factor authentication
Inactive user accounts

You can also explore the Posture Findings page, which provides easy searching and navigation across documents that are stored within the SaaS applications.

Additionally, you can create policies to prevent configuration drift in your environment. Prevention-based policies help maintain a secure configuration and compliance standards, while reducing alert fatigue for Security Operations teams, and these policies can prevent the inappropriate movement or exfiltration of sensitive data. Unifying controls and visibility across environments makes it easier to lock down regulated data classes, maintain detailed audit trails via logs, and improve your security posture to reduce the risk of breaches.

How it works: new, real-time SaaS documents discovery

Delivering SaaS security posture information to our customers requires collecting vast amounts of data from a wide range of platforms. In order to ensure that all the documents living in your SaaS apps (files, designs, etc.) are secure, we need to collect information about their configuration — are they publicly shared, do third-party apps have access, is multi-factor authentication (MFA) enabled?

We previously did this with crawlers, which would pull data from the SaaS APIs. However, we were plagued with rate limits from the SaaS vendors when working with larger datasets. This forced us to work in batches and ramp scanning up and down as the vendors permitted. This led to stale findings and would make remediation cumbersome and unclear – for example, Cloudflare would be reporting that a file is still shared publicly for a short period after the permissions were removed, leading to customer confusion.

To fix this, we upgraded our data collection pipeline to be dynamic and real-time, reacting to changes in your environment as they occur, whether it’s a new security finding, an updated asset, or a critical alert from a vendor. We started with our Microsoft asset discovery and posture findings, providing you real-time insight into your Microsoft Admin Center, OneDrive, Outlook, and SharePoint configurations. We will be rapidly expanding support to additional SaaS vendors going forward.

Listening for update events from Cloudflare Workers

Cloudflare Workers serve as the entry point for vendor webhooks, handling asset change notifications from external services. The workflow unfolds as follows:

Webhook listener: An initial Worker acts as the webhook listener, receiving asset change messages from vendors.
Data storage & queuing: Upon receiving a message, the Worker uploads the raw payload of the change notification to Cloudflare R2 for persistence, and publishes it to a Cloudflare Queue dedicated to raw asset changes.
Transformation Worker: A second Worker, bound as a consumer to the raw asset change queue, processes the incoming messages. This Worker transforms the raw vendor-specific data into a generic format suitable for CASB. The transformed data is then:
- Stored in Cloudflare R2 for future reference.
- Published on another Cloudflare Queue, designated for transformed messages.

CASB Processing: Consumers & Crawlers

Once the transformed messages reach the CASB layer, they undergo further processing:

Polling consumer: CASB has a consumer that polls the transformed message queue. Upon receiving a message, it determines the relevant handler required for processing.
Crawler execution: The handler then maps the message to an appropriate crawler, which interacts with the vendor API to fetch the most up-to-date asset details.
Data storage: The retrieved asset data is stored in the CASB database, ensuring it is accessible for security and compliance checks.

With this improvement, we are now processing 10 to 20 Microsoft updates per second, or 864,000 to 1.72 million updates daily, giving customers incredibly fast visibility into their environment. Look out for expansion to other SaaS vendors in the coming months.

Securing your web applications

A unique challenge of securing web applications is that no one size fits all. An asset-aware posture management bridges the gap between a universal security solution and unique business needs, offering tailored recommendations for security teams to protect what matters.

Posture overview from attacks to threats and risks

Starting today, all Cloudflare customers have access to Security Overview, a new landing page customized for each of your onboarded domains. This page aggregates and prioritizes security suggestions across all your web applications:

Any (ongoing) attacks detected that require immediate attention
Disposition (mitigated, served by Cloudflare, served by origin) of all proxied traffic over the last 7 days
Summary of currently active security modules that are detecting threats
Suggestions of how to improve your security posture with a step-by-step guide
And a glimpse of your most active and lately updated security rules

These tailored security suggestions are surfaced based on your traffic profile and business needs, which is made possible by discovering your proxied web assets.

Discovery of web assets

Many web applications, regardless of their industry or use case, require similar functionality: user identification, accepting payment information, etc. By discovering the assets serving this functionality, we can build and run targeted threat detection to protect them in depth.

As an example, bot traffic towards marketing pages versus login pages have different business impacts. Content scraping may be happening targeting your marketing materials, which you may or may not want to allow, while credential stuffing on your login page deserves immediate attention.

Web assets are described by a list of endpoints; and labelling each of them defines their business goals. A simple example can be POST requests to path /portal/login, which likely describes an API for user authentication. While the GET requests to path /portal/login denote the actual login webpage.

To describe business goals of endpoints, labels come into play. POST requests to the /portal/login endpoint serving end users and to the /api/admin/login endpoint used by employees can both can be labelled using the same cf-log-in managed label, letting Cloudflare know that usernames and passwords would be expected to be sent to these endpoints.

API Shield customers can already make use of endpoint labelling. In early Q2 2025, we are adding label discovery and suggestion capabilities, starting with three labels, cf-log-in, cf-sign-up, and cf-rss-feed. All other customers can manually add these labels to the saved endpoints. One example, explained below, is preventing disposable emails from being used during sign-ups.

Always-on threat detection and risk assessment

Use-case driven threat detection

Customers told us that, with the growing excitement around generative AI, they need support to secure this new technology while not hindering innovation. Being able to discover LLM-powered services allows fine-tuning security controls that are relevant for this particular technology, such as inspecting prompts, limit prompting rates based on token usage, etc. In a separate Security Week blog post, we will share how we build Cloudflare Firewall for AI, and how you can easily protect your generative AI workloads.

Account fraud detection, which encompasses multiple attack vectors, is another key area that we are focusing on in 2025.

On many login and signup pages, a CAPTCHA solution is commonly used to only allow human beings through, assuming only bots perform undesirable actions. Put aside that most visual CAPTCHA puzzles can be easily solved by AI nowadays, such an approach cannot effectively solve the root cause of most account fraud vectors. For example, human beings using disposable emails to sign up single-use accounts to take advantage of signup promotions.

To solve this fraudulent sign up issue, a security rule currently under development could be deployed as below to block all attempts that use disposable emails as a user identifier, regardless of whether the requester was automated or not. All existing or future cf-log-in and cf-sign-up labelled endpoints are protected by this single rule, as they both require user identification.

Our fast expanding use-case driven threat detections are all running by default, from the first moment you onboarded your traffic to Cloudflare. The instant available detection results can be reviewed through security analytics, helping you make swift informed decisions.

API endpoint risk assessment

APIs have their own set of risks and vulnerabilities, and today Cloudflare is delivering seven new risk scans through API Posture Management. This new capability of API Shield helps reduce risk by identifying security issues and fixing them early, before APIs are attacked. Because APIs are typically made up of many different backend services, security teams need to pinpoint which backend service is vulnerable so that development teams may remediate the identified issues.

Our new API posture management risk scans do exactly that: users can quickly identify which API endpoints are at risk to a number of vulnerabilities, including sensitive data exposure, authentication status, Broken Object Level Authorization (BOLA) attacks, and more.

Authentication Posture is one risk scan you’ll see in the new system. We focused on it to start with because sensitive data is at risk when API authentication is assumed to be enforced but is actually broken. Authentication Posture helps customers identify authentication misconfigurations for APIs and alerts of their presence. This is achieved by scanning for successful requests against the API and noting their authentication status. API Shield scans traffic daily and labels API endpoints that have missing and mixed authentication for further review.

For customers that have configured session IDs in API Shield, you can find the new risk scan labels and authentication details per endpoint in API Shield. Security teams can take this detail to their development teams to fix the broken authentication.

We’re launching today with scans for authentication posture, sensitive data, underprotected APIs, BOLA attacks, and anomaly scanning for API performance across errors, latency, and response size.

Simplify maintaining a good security posture with Cloudflare

Achieving a good security posture in a fast-moving environment requires innovative solutions that can transform complexity into simplicity. Bringing together the ability to continuously assess threats and risks across both public and private IT environments through a single platform is our first step in supporting our customers’ efforts to maintain a healthy security posture.

To further enhance the relevance of security insights and suggestions provided and help you better prioritize your actions, we are looking into integrating Cloudflare’s global view of threat landscapes. With this, you gain additional perspectives, such as what the biggest threats to your industry are, and what attackers are targeting at the current moment. Stay tuned for more updates later this year.

If you haven’t done so yet, onboard your SaaS and web applications to Cloudflare today to gain instant insights into how to improve your business’s security posture.

Protecting APIs with JWT Validation

John Cosgrove — Tue, 05 Mar 2024 14:00:34 GMT

Today, we are happy to announce that Cloudflare customers can protect their APIs from broken authentication attacks by validating incoming JSON Web Tokens (JWTs) with API Gateway. Developers and their security teams need to control who can communicate with their APIs. Using API Gateway’s JWT Validation, Cloudflare customers can ensure that their Identity Provider previously validated the user sending the request, and that the user’s authentication tokens have not expired or been tampered with.

What’s new in this release?

After our beta release in early 2023, we continued to gather feedback from customers on what they needed from JWT validation in API Gateway. We uncovered four main feature requests and shipped updates in this GA release to address them all:

Old, Beta limitation	New, GA release capability
Only supported validating the raw JWT	Support for the Bearer token format
Only supported one JWKS configuration	Create up to four different JWKS configs to support different environments per zone
Only supported validating JWTs sent in HTTP headers	Validate JWTs if they are sent in a cookie, not just an HTTP header
JWT validation ran on all requests to the entire zone	Exclude any number of managed endpoints in a JWT validation rule

What is the threat?

Broken authentication is the #1 threat on the OWASP Top 10 and the #2 threat on the OWASP API Top 10. We’ve written before about how flaws in API authentication and authorization at Optus led to a threat actor offering 10 million user records for sale, and government agencies have warned about these exact API attacks.

According to Gartner®¹, “attacks and data breaches involving poorly secured application programming interfaces (APIs) are occurring frequently.” Getting authentication correct for your API users can be challenging, but there are best practices you can employ to cover your bases. JSON Web Token Validation in API Gateway fulfills one of these best practices by enforcing a positive security model for your authenticated API users.

A primer on authentication and authorization

Authentication establishes identity. Imagine you’re collaborating with multiple colleagues and writing a document in Google Docs. When you’re all authors of the document, you have the same privileges, and you can overwrite each other’s text. You can all see each other’s name next to your respective cursor while you’re typing. You’re all authenticated to Google Docs, so Docs can show all the users on a document who everyone is.

Authorization establishes ownership or permissions to objects. Imagine you’re collaborating with your colleague in Docs again, but this time they’ve written a document ahead of time and simply wish for you to review it and add comments without changing the document. As the owner of the document, your colleague sets an authorization policy to only allow you ‘comment’ access. As such, you cannot change their writing at all, but you can still view the document and leave comments.

While the words themselves might sound similar, the differences between them are hugely important for security. It’s not enough to simply check that a user logging in has the correct login credentials (authentication). If you never check their permissions (authorization), they would be free to overwrite, add, or delete other users’ content. When this happens for APIs, OWASP calls it a Broken Object Level Authorization attack.

A primer on API access tokens

Users authenticate to services in many different ways on the web today. Let’s take a look at the history of authentication with username and password authentication, API key authentication, and JWT authentication before we mention how JWTs can help stop API attacks.

In the early days, the web used HTTP Basic Authentication, where browsers transmitted username and password pairs as an HTTP header, posing significant security risks and making credentials visible to any observer when the application failed to adopt SSL/TLS certificates. Basic authentication also complicated API access, requiring hard-coded credentials and potentially giving broad authorization policies to a single user.

The introduction of API access keys improved security by detaching authentication from user credentials and instead sending secret text strings along with requests. This approach allowed for more nuanced access control by key instead of by user ID, though API keys still faced risks from man-in-the-middle attacks and problematic storage of secrets in source code.

JSON Web Tokens (JWTs) address these issues by removing the need to send long-lived secrets on every request, introducing cryptographically verifiable, auto-expiring, short-lived sessions. Think of a JWT like a tamper-evident seal on a bottle of medication. Along with the seal, medication also has an expiration date printed on it. Users notice when the seal is tampered with or missing altogether, and when the medication expires.

These attributes enhance security any time a JWT is used instead of a long-lived shared secret. JWTs are not an end-all-be-all solution, but they do represent an evolution in authentication technology and are widely used for authentication and authorization on the Internet today.

What’s the structure of a JWT?

JWTs are composed of three fields separated by periods. The first field is a header, the second a payload, and the third a signature:

eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJNeURlbW9JRFAiLCJzdWIiOiJqb2huZG9lIiwiYXVkIjoiTXlBcHAiLCJpYXQiOjE3MDg5ODU2MDEsImV4cCI6MTcwODk4NjIwMSwiY2xhc3MiOiJhZG1pbiJ9.v0nywcQemlEU4A18QD9UTgJLyH4ZPXppuW-n0iOmtj4x-hWJuExlMKeNS-vMZt4K6n0pDCFIAKo7_VZqACx4gILXObXMU4MEleFoKKd0f58KscNrC3BQqs3Gnq-vb5Ut9CmcvevQ5h9cBCI4XhpP2_LkYcZiuoSd3zAm2W_0LNZuFXp1wo8swDoKETYmtrdTjuF-IlVjLDAxNsWm2e7T5A8HmCnAWRItEPedm_8XVJAOemx_KqIH5w6zHY1U-M6PJkHK6D2gDU5eiN35A4FCrC5bQ1-0HSTtJkLIed2-1mRO1oANWHpscvpNLQBWQLLiIZ_evbcq_tnwh1X1sA3uxQ

If we base64 decode the first two sections, we arrive at the following structure (comments added for clarity):

{
  "alg": "RS256",     // JWT signature algorithm
  "typ": "JWT"        // JWT type
}

{
  "iss": "MyDemoIDP", // Which identity provider issued this JWT
  "sub": "johndoe",   // Which user this JWT identifies
  "aud": "MyApp",     // Which app this JWT is destined for
  "iat": 1708985601,  // When this JWT was issued
  "exp": 1709986201,  // When this JWT expires
  "class": "admin"    // Extra, customer-defined metadata
}

We can then use the algorithm mentioned in the header (RS256) as well as the Identity Provider’s public key (example below) to check the last segment in the JWT, the signature (code not shown).

The signature is what makes a JWT special. The token issuer, taking into account the claims, generates a signature based on a private secret or a public/private key pair. The public key can be published online, allowing anyone to check if a JWT was legitimately issued by an organization.

Proper authentication and authorization stop API attacks

No developer wants to release an insecure application, and no security team wants their developers to skip secure coding practices, but we know both happen. In the Enterprise Strategy Group report “Securing the API Attack Surface”², a survey found that 39% of developers skip security processes due to the faster development cycles of continuous integration and continuous delivery (CI/CD). The same survey found more than half (57%) of responding organizations faced multiple security incidents related to insecure APIs in the last 12 months, and 35% of responding organizations faced at least one incident within the last year.

Along with its accompanying database, permissions, and user roles, your origin application is the ultimate security backstop of your API. However, Cloudflare can assist in keeping attacks away from your origin when you configure API Gateway with the correct context. Let’s examine three different API attacks and how to protect against them.

Missing or broken authentication

The ability for a user to send or receive data to an API and entirely bypass authentication falls into ‘broken authentication’. It’s easy to think of the expected use cases your users will take with your application. You may assume that just because a user logs in and your application is written so that users can only access their own data in their dashboard, that all users are logged in and would only access their own data. This assumption fails to account for a user making an HTTP request outside your application requesting or modifying another user’s data and there being nothing in the way to stop your API from replying. In the worst case, a lack of authorization policy checks can enable an API client to change data without an authentication token at all!

Ensuring that incoming requests have an authentication token attached to them and dropping the requests that don’t is a great way to stop the simplest API attacks.

Expired token reuse

Maybe your application already uses JWTs for user authentication. Your application decodes the JWT and looks for user claims for group membership, and you validate the claims before allowing customers access to your API. But are you checking the JWT expiration time?

Imagine a user pays for your service, but they secretly know they will soon downgrade to a free account. If the user’s tier is stored within the JWT and the application or gateway doesn’t validate the expiration time of the JWT, the user could save an old JWT and replay it to continue their access to their paid benefits. Validating JWT expiration time can prevent this type of replay attack.

Broken Function Level Authorization attacks: Tampering with claims

Let’s say you’re using JWTs for authentication, validating the claims inside them, and also validating expiration time. But do you verify the JWT signature? Practically every JWT is signed by its issuer such that API admins and security teams that know the issuer’s signing key can verify that the JWT hasn’t been tampered with. Without the API Gateway or application checking the JWT signature, a malicious user could change their JWT claims, elevating their privileges to assume an administrator role in an application by starting with a normal, non-privileged user account.

JWT Validation from API Gateway safeguards your API from broken authentication and authorization attacks by checking that JWT signatures are intact, expiry times haven’t yet passed, and that authentication tokens are present to begin with.

Don’t other Cloudflare products do this?

Other Cloudflare products also use JWTs. Cloudflare Access is part of our suite of Zero Trust products, and is meant to tie into your Identity Provider. As a best practice, customers should validate the JWT that Access creates and sends to the origin.

Conversely, JWT Validation for API Gateway is a security layer compatible with any API without changing the setup, management, or expectation of the existing user flow. API Gateway’s JWT Validation is meant to validate pre-existing JWTs that may be used by any number of services at your API origin. You really need both: Access for your internal users or employees and API Gateway for your external users.

In addition, some customers use a custom Cloudflare Worker to validate JWTs, which is a great use case for the Workers platform. However, for straightforward use cases customers may find the JWT Validation experience of API Gateway easier to interact with and manage over the lifecycle of their application. If you are validating JWTs with a Worker and today’s release of JWT Validation isn’t yet at feature parity for your custom Worker, let your account representative know. We’re interested in expanding our capabilities to meet your requirements.

What’s next?

In a future release, we will go beyond checking pre-existing JWTs, and customers will be able to generate and enforce authorization policies entirely within API Gateway. We’ll also upgrade our on-demand developer portal creation with the ability to issue keys and authentication tokens to your development team directly, streamlining API management with Cloudflare.

In addition, stay tuned for future API Gateway feature launches where we’ll use our knowledge of API traffic norms to automatically suggest security policies that highlight and stop Broken Object/Function Level Authorization attacks outside the JWT Validation use case.

Existing API Gateway customers can try the new feature now. Enterprise customers without API Gateway should sign up for the trial to try the latest from API Gateway.

¹Gartner, “API Security: What You Need to Do to Protect Your APIs”, Analyst(s) Mark O'Neill, Dionisio Zumerle, Jeremy D'Hoinne, January 13, 2023

²Enterprise Strategy Group, “Securing the API Attack Surface”, Analyst, Melinda Marks, May 2023

Defensive AI: Cloudflare’s framework for defending against next-gen threats

Daniele Molteni — Mon, 04 Mar 2024 14:00:24 GMT

Generative AI has captured the imagination of the world by being able to produce poetry, screenplays, or imagery. These tools can be used to improve human productivity for good causes, but they can also be employed by malicious actors to carry out sophisticated attacks.

We are witnessing phishing attacks and social engineering becoming more sophisticated as attackers tap into powerful new tools to generate credible content or interact with humans as if it was a real person. Attackers can use AI to build boutique tooling made for attacking specific sites with the intent of harvesting proprietary data and taking over user accounts.

To protect against these new challenges, we need new and more sophisticated security tools: this is how Defensive AI was born. Defensive AI is the framework Cloudflare uses when thinking about how intelligent systems can improve the effectiveness of our security solutions. The key to Defensive AI is data generated by Cloudflare’s vast network, whether generally across our entire network or specific to individual customer traffic.

At Cloudflare, we use AI to increase the level of protection across all security areas, ranging from application security to email security and our Zero Trust platform. This includes creating customized protection for every customer for API or email security, or using our huge amount of attack data to train models to detect application attacks that haven’t been discovered yet.

In the following sections, we will provide examples of how we designed the latest generation of security products that leverage AI to secure against AI-powered attacks.

Protecting APIs with anomaly detection

APIs power the modern Web, comprising 57% of dynamic traffic across the Cloudflare network, up from 52% in 2021. While APIs aren’t a new technology, securing them differs from securing a traditional web application. Because APIs offer easy programmatic access by design and are growing in popularity, fraudsters and threat actors have pivoted to targeting APIs. Security teams must now counter this rising threat. Importantly, each API is usually unique in its purpose and usage, and therefore securing APIs can take an inordinate amount of time.

Cloudflare is announcing the development of API Anomaly Detection for API Gateway to protect APIs from attacks designed to damage applications, take over accounts, or exfiltrate data. API Gateway provides a layer of protection between your hosted APIs and every device that interfaces with them, giving you the visibility, control, and security tools you need to manage your APIs.

API Anomaly Detection is an upcoming, ML-powered feature in our API Gateway product suite and a natural successor to Sequence Analytics. In order to protect APIs at scale, API Anomaly Detection learns an application’s business logic by analyzing client API request sequences. It then builds a model of what a sequence of expected requests looks like for that application. The resulting traffic model is used to identify attacks that deviate from the expected client behavior. As a result, API Gateway can use its Sequence Mitigation functionality to enforce the learned model of the application’s intended business logic, stopping attacks.

While we’re still developing API Anomaly Detection, API Gateway customers can sign up here to be included in the beta for API Anomaly Detection. Today, customers can get started with Sequence Analytics and Sequence Mitigation by reviewing the docs. Enterprise customers that haven’t purchased API Gateway can self-start a trial in the Cloudflare Dashboard, or contact their account manager for more information.

Identifying unknown application vulnerabilities

Another area where AI improves security is in our Web Application Firewall (WAF). Cloudflare processes 55 million HTTP requests per second on average and has an unparalleled visibility into attacks and exploits across the world targeting a wide range of applications.

One of the big challenges with the WAF is adding protections for new vulnerabilities and false positives. A WAF is a collection of rules designed to identify attacks directed at web applications. New vulnerabilities are discovered daily and at Cloudflare we have a team of security analysts that create new rules when vulnerabilities are discovered. However, manually creating rules takes time — usually hours — leaving applications potentially vulnerable until a protection is in place. The other problem is that attackers continuously evolve and mutate existing attack payloads that can potentially bypass existing rules.

This is why Cloudflare has, for years, leveraged machine learning models that constantly learn from the latest attacks, deploying mitigations without the need for manual rule creation. This can be seen, for example, in our WAF Attack Score solution. WAF Attack Score is based on an ML model trained on attack traffic identified on the Cloudflare network. The resulting classifier allows us to identify variations and bypasses of existing attacks as well as extending the protection to new and undiscovered attacks. Recently, we have made Attack Score available to all Enterprise and Business plans.

Attack Score uses AI to classify each HTTP request based on the likelihood that it’s malicious

While the contribution of security analysts is indispensable, in the era of AI and rapidly evolving attack payloads, a robust security posture demands solutions that do not rely on human operators to write rules for each novel threat. Combining Attack Score with traditional signature-based rules is an example of how intelligent systems can support tasks carried out by humans. Attack Score identifies new malicious payloads which can be used by analysts to optimize rules that, in turn, provide better training data for our AI models. This creates a reinforcing positive feedback loop improving the overall protection and response time of our WAF.

Long term, we will adapt the AI model to account for customer-specific traffic characteristics to better identify deviations from normal and benign traffic.

Using AI to fight phishing

Email is one of the most effective vectors leveraged by bad actors with the US Cybersecurity and Infrastructure Security Agency (CISA) reporting that 90% of cyber attacks start with phishing and Cloudflare Email Security marking 2.6% of 2023's emails as malicious. The rise of AI-enhanced attacks are making traditional email security providers obsolete, as threat actors can now craft phishing emails that are more credible than ever with little to no language errors.

Cloudflare Email Security is a cloud-native service that stops phishing attacks across all threat vectors. Cloudflare’s email security product continues to protect customers with its AI models, even as trends like Generative AI continue to evolve. Cloudflare’s models analyze all parts of a phishing attack to determine the risk posed to the end user. Some of our AI models are personalized for each customer while others are trained holistically. Privacy is paramount at Cloudflare, so only non-personally identifiable information is used by our tools for training. In 2023, Cloudflare processed approximately 13 billion, and blocked 3.4 billion, emails, providing the email security product a rich dataset that can be used to train AI models.

Two detections that are part of our portfolio are Honeycomb and Labyrinth.

Honeycomb is a patented email sender domain reputation model. This service builds a graph of who is sending messages and builds a model to determine risk. Models are trained on specific customer traffic patterns, so every customer has AI models trained on what their good traffic looks like.
Labyrinth uses ML to protect on a per-customer basis. Actors attempt to spoof emails from our clients’ valid partner companies. We can gather a list with statistics of known & good email senders for each of our clients. We can then detect the spoof attempts when the email is sent by someone from an unverified domain, but the domain mentioned in the email itself is a reference/verified domain.

AI remains at the core of our email security product, and we are constantly improving the ways we leverage it within our product. If you want to get more information about how we are using our AI models to stop AI enhanced phishing attacks check out our blog post here.

Zero-Trust security protected and powered by AI

Cloudflare Zero Trust provides administrators the tools to protect access to their IT infrastructure by enforcing strict identity verification for every person and device regardless of whether they are sitting within or outside the network perimeter.

One of the big challenges is to enforce strict access control while reducing the friction introduced by frequent verifications. Existing solutions also put pressure on IT teams that need to analyze log data to track how risk is evolving within their infrastructure. Sifting through a huge amount of data to find rare attacks requires large teams and substantial budgets.

Cloudflare simplifies this process by introducing behavior-based user risk scoring. Leveraging AI, we analyze real-time data to identify anomalies in the users’ behavior and signals that could lead to harms to the organization. This provides administrators with recommendations on how to tailor the security posture based on user behavior.

Zero Trust user risk scoring detects user activity and behaviors that could introduce risk to your organizations, systems, and data and assigns a score of Low, Medium, or High to the user involved. This approach is sometimes referred to as user and entity behavior analytics (UEBA) and enables teams to detect and remediate possible account compromise, company policy violations, and other risky activity.

The first contextual behavior we are launching is “impossible travel”, which helps identify if a user’s credentials are being used in two locations that the user could not have traveled to in that period of time. These risk scores can be further extended in the future to highlight personalized behavior risks based on contextual information such as time of day usage patterns and access patterns to flag any anomalous behavior. Since all traffic would be proxying through your SWG, this can also be extended to resources which are being accessed, like an internal company repo.

We have an exciting launch during security week. Check out this blog to learn more.

Conclusion

From application and email security to network security and Zero Trust, we are witnessing attackers leveraging new technologies to be more effective in achieving their goals. In the last few years, multiple Cloudflare product and engineering teams have adopted intelligent systems to better identify abuses and increase protection.

Besides the generative AI craze, AI is already a crucial part of how we defend digital assets against attacks and how we discourage bad actors.

Introducing Cloudflare’s 2024 API security and management report

John Cosgrove — Tue, 09 Jan 2024 14:00:03 GMT

You may know Cloudflare as the company powering nearly 20% of the web. But powering and protecting websites and static content is only a fraction of what we do. In fact, well over half of the dynamic traffic on our network consists not of web pages, but of Application Programming Interface (API) traffic — the plumbing that makes technology work. This blog introduces and is a supplement to the API Security Report for 2024 where we detail exactly how we’re protecting our customers, and what it means for the future of API security. Unlike other industry API reports, our report isn’t based on user surveys — but instead, based on real traffic data.

If there’s only one thing you take away from our report this year, it’s this: many organizations lack accurate API inventories, even when they believe they can correctly identify API traffic. Cloudflare helps organizations discover all of their public-facing APIs using two approaches. First, customers configure our API discovery tool to monitor for identifying tokens present in their known API traffic. We then use a machine learning model that scans not just these known API calls, but all HTTP requests, identifying API traffic that may be going unaccounted for. The difference between these approaches is striking: we found 30.7% more API endpoints through machine learning-based discovery than the self-reported approach, suggesting that nearly a third of APIs are “Shadow APIs” — and may not be properly inventoried and secured.

Read on for extras and highlights from our inaugural API security report. In the full report, you’ll find updated statistics about the threats we see and prevent, along with our predictions for 2024. We predict that a lack of API security focus at organizations will lead to increased complexity and loss of control, and increased access to generative AI will lead to more API risk. We also anticipate an increase in API business logic attacks in 2024. Lastly, all of the above risks will necessitate growing governance around API security.

Hidden attack surfaces

How are web pages and APIs different? APIs are a quick and easy way for applications to retrieve data in the background, or ask that work be done from other applications. For example, anyone can write a weather app without being a meteorologist: a developer can write the structure of the page or mobile application and ask a weather API for the forecast using the user’s location. Critically, most end users don’t know that the data was provided by the weather API and not the app’s owner.

While APIs are the critical plumbing of the Internet, they're also ripe for abuse. For example, flaws in API authentication and authorization at Optus led to a threat actor offering 10 million user records for sale, and government agencies have warned about these exact API attacks. Developers in an organization will often create Internet-facing APIs, used by their own applications to function more efficiently, but it's on the security team to protect these new public interfaces. If the process of documenting APIs and bringing them to the attention of the security team isn't clear, they become Shadow APIs — operating in production but without the organization's knowledge. This is where the security challenge begins to emerge.

To help customers solve this problem, we shipped API Discovery. When we introduced our latest release, we mentioned how few organizations have accurate API inventories. Security teams sometimes are forced to adopt an “email and ask” approach to build an inventory, and in doing so responses are immediately stale upon the next application release when APIs change. Better is to track API changes by code base changes, keeping up with new releases. However, this still has a drawback of only inventorying actively maintained code. Legacy applications may not see new releases, despite receiving production traffic.

Cloudflare's approach to API management involves creating a comprehensive, accurate API inventory using a blend of machine learning-based API discovery and network traffic inspection. This is integral to our API Gateway product, where customers can manage their Internet-facing endpoints and monitor API health. The API Gateway also allows customers to identify their API traffic using session identifiers (typically a header or cookie), which aids in specifically identifying API traffic for the discovery process.

As noted earlier, our analysis reveals that even knowledgeable customers often overlook significant portions of their API traffic. By comparing session-based API discovery (using API sessions to pinpoint traffic) with our machine learning-based API discovery (analyzing all incoming traffic), we found that the latter uncovers on average 30.7% more endpoints! Without broad traffic analysis, you may be missing almost a third of your API inventory.

If you aren’t a Cloudflare customer, you can still get started building an API inventory. APIs are typically cataloged in a standardized format called OpenAPI, and many development tools can build OpenAPI formatted schema files. If you have a file with that format, you can start to build an API inventory yourself by collecting these schemas. Here is an example of how you can pull the endpoints out of a schema file, assuming your have an OpenAPI v3 formatted file named my_schema.json:

import json
import csv
from io import StringIO

# Load the OpenAPI schema from a file
with open("my_schema.json", "r") as file:
    schema = json.load(file)

# Prepare CSV output
output = StringIO()
writer = csv.writer(output)

# Write CSV header
writer.writerow(["Server", "Path", "Method"])

# Extract and write data to CSV
servers = schema.get("servers", [])
for server in servers:
    url = server['url']
    for path, methods in schema['paths'].items():
        for method in methods.keys():
            writer.writerow([url, path, method])

# Get and print CSV string
csv_output = output.getvalue().strip()
print(csv_output)

Unless you have been generating OpenAPI schemas and tracking API inventory from the beginning of your application’s development process, you’re probably missing some endpoints across your production application API inventory.

Precise rate limits minimize attack potential

When it comes to stopping abuse, most practitioners’ thoughts first come to rate limiting. Implementing limits on your API is a valuable tool to keep abuse in check and prevent accidental overload of the origin. But how do you know if you’ve chosen the correct rate limiting approach? Approaches can vary, but they generally come down to the error code chosen, and the basis for the limit value itself.

For some APIs, practitioners configure rate limiting errors to respond with an HTTP 403 (forbidden), while others will respond with HTTP 429 (too many requests). Using HTTP 403 sounds innocent enough until you realize that other security tools are also responding with 403 codes. When you’re under attack, it can be hard to decipher which tools are responsible for which errors / blocking.

Alternatively, if you utilize HTTP 429 for your rate limits, attackers will instantly know that they’ve been rate limited and can “surf” right under the limit without being detected. This can be OK if you’re only limiting requests to ensure your back-end stays alive, but it can tip your cards to attackers. In addition, attackers can “scale out” to more API clients to effectively request above the rate limit.

There are pros and cons to both approaches, but we find that by far most APIs respond with HTTP 429 out of all the 4xx and 5xx error messages (almost 52%).

What about the logic of the rate limit rule itself, not just the response code? Implementing request limits on IP addresses can be tempting, but we recommend you base the limit on a session ID as a best practice and only fall back to IP address (or IP + JA3 fingerprint) when session IDs aren’t available. Setting rate limits on user sessions instead of IPs will reliably identify your real users and minimize false positives due to shared IP space. Cloudflare’s Advanced Rate Limiting and API Gateway’s volumetric abuse protection make it easy to enforce these limits by profiling session traffic on each API endpoint and giving one-click solutions to set up the per-endpoint rate limits.

To find values for your rate limits, Cloudflare API Gateway computes session request statistics for you. We suggest a limit by looking at the distribution of requests per session across all sessions to your API as identified by the customer-configured API session identifier. We then compute statistical p-levels — which describe the request rates for different cohorts of traffic — for p50, p90, and p99 on this distribution and use the variance of the distribution to come up with a recommended threshold for every single endpoint in your API inventory. The recommendation might not match the p-levels, which is an important distinction and a reason not to use p-levels alone. Along with the recommendation, API Gateway informs users of our confidence in the recommendation. Generally, the more API sessions we’re able to collect, the more confident we’ll be in the recommendation.

Activating a rate limit is as easy as clicking the ‘create rule’ link, and API Gateway will automatically bring your session identifier over to the advanced rate limit rule creation page, ensuring your rules have pinpoint accuracy to defend against attacks and minimize false positives compared to traditional, overly broad limits.

APIs are also victim to web application attacks

APIs aren’t immune from normal OWASP Top 10 style attacks like SQL injection. The body of API requests can also find its way as a database input just like a web page form input or URL argument. It’s important to ensure that you have a web application firewall (WAF) also protecting your API traffic to defend against these styles of attacks.

In fact, when we looked at Cloudflare’s WAF managed rules, injection attacks were the second most common threat vector Cloudflare saw carried out on APIs. The most common threat was HTTP Anomaly. Examples of HTTP anomalies include malformed method names, null byte characters in headers, non-standard ports or content length of zero with a POST request. Here are the stats on the other top threats we saw against APIs:

Absent from the chart is broken authentication and authorization. Broken authentication and authorization occur when an API fails to check whether the entity sending requests for information to an API actually has the permission to request that data or not. It can also happen when attacks try to forge credentials and insert less restricted permissions into their existing (valid) credentials that have more restricted permissions. OWASP categorizes these attacks in a few different ways, but the main categories are Broken Object Level Authorization (BOLA) and Broken Function Level Authorization (BFLA) attacks.

The root cause of a successful BOLA / BFLA attack lies in an origin API not checking proper ownership of database records against the identity requesting those records. Tracking these specific attacks can be difficult, as the permission structure may be simply absent, inadequate, or improperly implemented. Can you see the chicken-and-egg problem here? It would be easy to stop these attacks if we knew the proper permission structure, but if we or our customers knew the proper permission structure or could guarantee its enforcement, the attacks would be unsuccessful to begin with. Stay tuned for future API Gateway feature launches where we’ll use our knowledge of API traffic norms to automatically suggest security policies that highlight and stop BOLA / BFLA attacks.

Here are four ways to plug authentication loopholes that may exist for your APIs, even if you don’t have a fine-grained authorization policy available:

First, enforce authentication on each publicly accessible API unless there's a business approved exception. Look to technologies like mTLS and JSON Web Tokens.
Limit the speed of API requests to your servers to slow down potential attackers.
Block abnormal volumes of sensitive data outflow.
Block attackers from skipping legitimate sequences of API requests.

APIs are surprisingly human driven, not machine driven anymore

If you’ve been around technology since the pre-smartphone days when fewer people were habitually online, it can be tempting to think of APIs as only used for machine-to-machine communication in something like an overnight batch job process. However, the truth couldn’t be more different. As we’ve discussed, many web and mobile applications are powered by APIs, which facilitate everything from authentication to transactions to serving media files. As people use these applications, there is a corresponding increase in API traffic volume.

We can illustrate this by looking at the API traffic patterns observed during holidays, when people gather around friends and family and spend more time socializing in person and less time online. We’ve annotated the following Worldwide API traffic graph with common holidays and promotions. Notice how traffic peaks around Black Friday and Cyber Monday around the +10% level when people shop online, but then traffic drops off for the festivities of Christmas and New Years days.

This pattern closely resembles what we observe in regular HTTP traffic. It’s clear that APIs are no longer just the realm of automated processes but are intricately linked with human behaviors and social trends.

Recommendations

There is no silver bullet for holistic API security. For the best effect, Cloudflare recommends four strategies for increasing API security posture:

Combine API application development, visibility, performance, and security with a unified control plane that can keep an up-to-date API inventory.
Use security tools that utilize machine learning technologies to free up human resources and reduce costs.
Adopt a positive security model for your APIs (see below for an explanation on positive and negative security models).
Measure and improve your organization’s API maturity level over time (also see below for an explanation of an API maturity level).

What do we mean by a ‘positive’ or ‘negative’ security model? In a negative model, security tools look for known signs of attack and take action to stop those attacks. In a positive model, security tools look for known good requests and only let those through, blocking all else. APIs are often so structured that positive security models make sense for the highest levels of security. You can also combine security models, such as using a WAF in a negative model sense, and using API Schema Validation in a positive model sense.

Here’s a quick way to gauge your organization’s API security maturity level over time: Novice organizations will get started by assembling their first API inventory, no matter how incomplete. More mature organizations will strive for API inventory accuracy and automatic updates. The most mature organizations will actively enforce security checks in a positive security model on their APIs, enforcing API schema, valid authentication, and checking behavior for signs of abuse.

Predictions

In closing, our top four predictions for 2024 and beyond:

Increased loss of control and complexity: we surveyed practitioners in the API Security and Management field and 73% responded that security requirements interfere with their productivity and innovation. Coupled with increasingly sprawling applications and inaccurate inventories, API risks and complexity will rise.

Easier access to AI leading to more API risks: the rise in generative AI brings potential risks, including AI models’ APIs being vulnerable to attack, but also developers shipping buggy, AI-written code. Forrester predicts that, in 2024, without proper guardrails, “at least three data breaches will be publicly blamed on insecure AI-generated code – either due to security flaws in the generated code itself or vulnerabilities in AI-suggested dependencies.”

Increase in business logic-based fraud attacks: professional fraudsters run their operations just like a business, and they have costs like any other. We anticipate attackers will run fraud bots efficiently against APIs even more than in previous years.

Growing governance: The first version of PCI DSS that directly addresses API security will go into effect in March 2024. Check your industry’s specific requirements with your audit department to be ready for requirements as they come into effect.

If you’re interested in the full report, you can download the 2024 API Security Report here, which includes full detail on our recommendations.

Cloudflare API Gateway is our API security solution, and it is available for all Enterprise customers. If you aren’t subscribed to API Gateway, click here to view your initial API Discovery results and start a trial in the Cloudflare dashboard. To learn how to use API Gateway to secure your traffic, click here to view our development docs and here for our getting started guide.

Speeding up APIs with Ricochet for API Gateway

John Cosgrove — Fri, 23 Jun 2023 13:00:17 GMT

APIs form the backbone of communication between apps and services on the Internet. They are a quick way for an application to ask for data or ask that a task be performed by a service. For example, anyone can write a weather app without being a meteorologist: simply ask a weather API for the forecast and display it in your app.

Speed is inherent to the API use case. Rather than transferring bulky files like images and HTML, APIs only share the essential data needed to render a webpage or an app. However, despite their efficiency, Internet latency can still impede API data transfers. If the server processing a user’s API request is located far from that user, the network round trip time can degrade that user’s experience.

Cloudflare's global network is specifically designed to optimize and accelerate internet traffic, including APIs. Our users enjoy features like 11ms DNS responses, load balancing, and Argo Smart Routing, which significantly improve API traffic speed. For web content, Cloudflare customers have always been able to cache their web traffic, serving requests from the closest data center and thereby reducing network round trip time and server processing time to a bare minimum. Now, we are leveraging these benefits to enhance API traffic in exciting new ways.

Today we’re announcing Ricochet for API Gateway, the easiest way for Cloudflare customers to achieve faster API responses. Customers using Cloudflare’s API Gateway will be able to enable Ricochet for their API endpoints and automatically reduce average latency through intelligent caching of API requests that would otherwise go to origin. Ricochet will even work for things you previously thought un-cacheable, like GraphQL POST requests. Best of all, there are no changes to make at your origin. Just configure API Gateway with your API session identifiers and leave the rest to us.

Enabling Ricochet for your APIs will cause Cloudflare to cache many of the basic, repetitive API calls from your applications and deliver them to users faster than ever. At first, your product metrics might even look broken with lower latency and fewer requests at origin. But these metrics will be a new sign of success, reflecting your app’s new speedy user experience.

Why you should cache API responses

It isn’t news that page load times directly correlate to dollar spend of site visitors. Organizations have spent the last decade obsessing over static content optimization to deliver websites quicker every year. Faster apps result in higher business metrics for most web sites. For example, faster sites receive more sales orders and have higher customer loyalty. Faster sites are also critical for engagement during marketing campaigns. Caching API requests will make your sites and apps faster by lowering the amount of time required to populate your app with data for your users.

The tools for caching APIs have always been available. But why isn’t it more common? We hypothesize a few reasons. It could be that API developers assume that APIs are too dynamic to cache, and that it only helps after lots of analysis of the application’s performance. It could also be that the security concerns around caching APIs are non-trivial. If both of those are true, you can imagine how caching APIs would only be successful with a large cross-organizational approach and lots of effort.

Let’s say your organization decided to try API caching anyway. We know there are a few problems if you want to cache APIs: First, traditional caching methods aren’t necessarily ready out of the box for caching APIs; cache invalidation needs to happen quickly and automatically. Second, special standalone cache tooling exists, but it doesn’t help you if it’s placed next to the origin and your users are globally distributed. Lastly, it’s hard to get security right when caching user data. It’s strictly forbidden to serve one user’s data to another on accident. Cloudflare has superpowers in these areas: knowledge of origin response time and existing cache-hit ratios, customer-configured API session IDs to establish secure user association with API requests, and the scale of our global network. We’re bringing together API Management with our robust caching infrastructure to safely and automatically cache API requests.

Cloudflare’s unique approach

The HTTP methods POST, PUT, and DELETE aren't generally cacheable. These methods are meant to change information at the origin on a one-time basis, and are therefore “non-safe”. If you responded to a non-safe request from cache, the data on the API server would not be updated. Compare this with “safe” HTTP methods: GET, OPTIONS, and HEAD. Caching requests with safe methods is straightforward as data at the origin does not change per-request.

But what do we know about RESTful APIs that can make caching easier? The endpoints usually stay the same when operating on objects, but the methods change. We will enable caching for safe methods and then automatically invalidate the cache when we see a non-safe method request for a RESTful endpoint managed by API Gateway. It’s also possible you have updates on one endpoint that change another endpoint’s data. In that event, we urge you to consider whether API Gateway’s short default TTL timers fit your use case by allowing a small delay between updating data at the origin and serving that update from cache. Check out the below diagram for an example of how automatic cache invalidation would work for shared paths with different methods:

Even for safe requests, caching API data is risky when it comes to security. Done incorrectly, you could accidentally serve sensitive user data to the wrong person. That’s why Ricochet’s cache key includes the user’s API session identifier. This extra information in the cache key ensures that only an authorized user is able to receive their own cached data. For APIs without authentication, we hash the request parameters themselves and include that hash in the cache key, to ensure the correct data is returned for endpoints with static paths but variable inputs. Here’s an example for authenticated APIs:

And here’s an example for anonymous APIs, where we use a hash of the request body to preserve privacy and still enable unique, useful cache keys:

APIs can be ripe for caching

There are many API caching use cases out there, and we decided to start with two where our customers have asked us for help so far: mixed-authentication APIs where returning the correct data is critical, and APIs that have single endpoints that can return varied query results (think RESTful endpoints with variable inputs and GraphQL POST requests). These use cases include things like weather forecasts and current conditions, airline flight tracking and flight status, live sports scores, and collaborative tools with many users.

Even short cache control timers can be beneficial to reduce load at the origin and speed up responses. Consider an example of a popular public endpoint receiving 1,000 requests/second, or 60,000 requests/min at origin. Let’s assume the data at the origin changes unpredictably but not due to unique user interaction. For this use case, reporting stale data for a few seconds or even a minute could be acceptable, and most users wouldn’t know or mind the difference. You could set your cache control to a very low 1 second and serve 999 requests/second from cache. That would reduce the origin requests to only 60 requests/minute!

This is a simple example, and we urge you to think about your API and the potential performance improvements caching could bring.

Potential impact of caching APIs

We profiled five top global airline website APIs for flight status checks and compared their retrieval time against the airline logo’s retrieval time. Here are the results:

All five airlines saw on average a ~7x slow down for data that could easily be cached!

Successful caching will also lower load on your API origin. The hidden benefit with lower load at origin means faster responses from the requests that miss the cache and do end up hitting origin. So it’s two-for-the-price-of-one and latency decreases all-around. We aren’t going to reinvent your API overnight, but we are going to make a difference in your application’s response times by making it easy to add caching to your APIs.

Conclusion

We're launching Ricochet in 2024. It’s going to make a measurable difference speeding up your APIs, and the best part is that as an API Gateway customer it will be easy to get started without requiring tons of your team’s time. Let us know if you’d like to be on our waitlist for this feature. Our goal is to increase the amount of API caching on the Internet so that we can all benefit from faster response times and snappier apps.

Watch on Cloudflare TV

Protecting GraphQL APIs from malicious queries

John Cosgrove — Mon, 12 Jun 2023 13:00:08 GMT

Starting today, Cloudflare’s API Gateway can protect GraphQL APIs against malicious requests that may cause a denial of service to the origin. In particular, API Gateway will now protect against two of the most common GraphQL abuse vectors: deeply nested queries and queries that request more information than they should.

Typical RESTful HTTP APIs contain tens or hundreds of endpoints. GraphQL APIs differ by typically only providing a single endpoint for clients to communicate with and offering highly flexible queries that can return variable amounts of data. While GraphQL’s power and usefulness rests on the flexibility to query an API about only the specific data you need, that same flexibility adds an increased risk of abuse. Abusive requests to a single GraphQL API can place disproportional load on the origin, abuse the N+1 problem, or exploit a recursive relationship between data dimensions. In order to add GraphQL security features to API Gateway, we needed to obtain visibility inside the requests so that we could apply different security settings based on request parameters. To achieve that visibility, we built our own GraphQL query parser. Read on to learn about how we built the parser and the security features it enabled.

The power of GraphQL

Unlike a REST API, where the API’s users are limited to what data they can query and change on a per-endpoint basis, a GraphQL API offers users the ability to query and change any data they wish with an open-ended, yet structured request to a single endpoint. This open-endedness makes GraphQL APIs very powerful. Each user can query for a completely custom set of data and receive their custom response in a single HTTP request. Here are two example queries and their responses. These requests are typically sent via HTTP POST methods to an endpoint at /graphql.

# A query asking for multiple nested subfields of the "hero" object. This query has a depth level of 2.
{
  hero {
    name
    friends {
      name
    }
  }
}

# The corresponding response.
{
  "data": {
    "hero": {
      "name": "R2-D2",
      "friends": [
        {
          "name": "Luke Skywalker"
        },
        {
          "name": "Han Solo"
        },
        {
          "name": "Leia Organa"
        }
      ]
    }
  }
}

# A query asking for just one subfield on the same "hero" object. This query has a depth level of 1.
{
  hero {
    name
  }
}

# The corresponding response.
{
  "data": {
    "hero": {
      "name": "R2-D2"
    }
  }
}

These custom queries give GraphQL endpoints more flexibility than conventional REST endpoints. But this flexibility also means GraphQL APIs can be subject to very different load or security risks based on the requests that they are receiving. For example, an attacker can request the exact same, valid data as a benevolent user would, but exploit the data’s self-referencing structure and ask that an origin return hundreds of thousands of rows replicated over and over again. Let’s consider an example, in which we operate a petitioning platform where our data model contains petitions and signers objects. With GraphQL, an attacker can, in a single request, query for a single petition, then for all people who signed that petition, then for all petitions each of those people have signed, then for all people that signed any of those petitions, then for all petitions that… you see where this is going!

query {
 petition(ID: 123) {
   signers {
     nodes {
       petitions {
         nodes {
           signers {
             nodes {
               petitions {
                 nodes {
                    ...
                 }
               }
             }
           }
         }
       }
     }
   }
 }
}

A rate limit won’t protect against such an attack because the entire query fits into a single request.

So how can we secure GraphQL APIs? There is little agreement in the industry around what makes a GraphQL endpoint secure. For some, this means rejecting invalid queries. Normally, an invalid query refers to a query that would fail to compile by a GraphQL server and not cause any substantial load on the origin, but would still add noise and error logs and reduce operational visibility. For others, this means creating complexity-based rate limits or perhaps flagging broken object-level authorization. Still others want deeper visibility into query behavior and an ability to validate queries against a predefined schema.

When creating new features in API Gateway, we often start by providing deeper visibility for customers into their traffic behavior related to the feature in question. This way we create value from the large amount of data we see in the Cloudflare network, and can have conversations with customers where we ask: “Now that you have these data insights, what actions would you like to take with them?”. This process puts us in a good position to build a second, more actionable iteration of the feature.

We decided to follow the same process with GraphQL protection, with parsing GraphQL requests and gathering data as our first goal.

Parsing GraphQL quickly

As a starting point, we wanted to collect request query size and depth attributes. These attributes offer a surprising amount of insight into the query – if the query is requesting a single field at depth level 15, is it really innocuous or is it exploiting some recursive data relationship? if the query is asking for hundreds of fields at depth level 3, why wouldn’t it just ask for the entire object at level 2 instead?

To do this, we needed to parse queries without adding latency to incoming requests. We evaluated multiple open source GraphQL parsers and quickly realized that their performance would put us at the risk of adding hundreds of microseconds of latency to the request duration. Our goal was to have a p95 parsing time of under 50 microseconds. Additionally, the infrastructure we were planning to use to ship this functionality has a strict no-heap-allocation policy – this means that any memory allocated by a parser to process a request has to be amortized by being reused when parsing any subsequent requests. Parsing GraphQL in a no-allocation manner is not a fundamental technical requirement for us over the long-term, but it was a necessity if we wanted to build something quickly with confidence that the proof of concept will meet our performance expectations.

Meeting the latency and memory allocation constraints meant that we had to write a parser of our own. Building an entire abstract syntax tree of unpredictable structure requires allocating memory on the heap, and that’s what made conventional parsers unfit for our requirements. What if instead of building a tree, we processed the query in a streaming fashion, token by token? We realized that if we were to write our own GraphQL lexer that produces a list of GraphQL tokens (“comment”, “string”, “variable name”, “opening parenthesis”, etc.), we could use a number of heuristics to infer the query depth and size without actually building a tree or fully validating the query. Using this approach meant that we could deliver the new feature fast, both in engineering time and wall clock time – and, most importantly, visualize data insights for our customers.

To start, we needed to prepare GraphQL queries for parsing. Most of the time, GraphQL queries are delivered as HTTP POST requests with application/json or application/graphql Content-Type. Requests with application/graphql content type are easy to work with – they contain the raw query you can just parse. However, JSON-encoded queries present a challenge since JSON objects contain escaped characters – normally, any deserialization library will allocate new memory into which the raw string is copied with escape sequences removed, but we committed to allocating no memory, remember? So to parse GraphQL queries encoded in JSON fields, we used serde RawValue to locate the JSON field in which the escaped query was placed and then iterated over the constituent bytes one-by-one, feeding them into our tokenizer and removing escape sequences on the fly.

Once we had our query input ready, we built a simple Rust program that converts raw GraphQL input into a list of lexical tokens according to the GraphQL grammar. Tokenization is the first step in any parser – our insight was that this step was all we needed for what we wanted to achieve in the MVP.

mutation CreateMessage($input: MessageInput) {
    createMessage(input: $input) {
        id
    }
}

For example, the mutation operation above gets converted into the following list of tokens:

name
name
punctuator (
punctuator $
name
punctuator :
name
punctuator )
punctuator {
name
punctuator (
name
punctuator :
punctuator $
name
punctuator )
punctuator {
name
punctuator }
punctuator }

With this list of tokens available to us, we built our validation engine and added the ability to calculate query depth and size. Again, everything is done one-the-fly in a single pass. A limitation of this approach is that we can’t parse 100% of the requests – there are some syntactic features of GraphQL that we have to fail open on; however, a major advantage of this approach is its performance – in our initial trial run against a stream of 10s of thousands of requests per second, we achieved a p95 parsing time of 25 microseconds. This is a good starting point to collect some data and to prototype our first GraphQL security features.

Getting started

Today, any API Gateway customer can use the Cloudflare GraphQL API to retrieve information about depth and size of GraphQL queries we see for them on the edge.

As an example, we’ve run the analysis below visualizing over 400,000 data points for query sizes and depths for a production domain utilizing API Gateway.

First let’s look at query sizes in our sample:

It looks like queries almost never request more than 60 fields. Let’s also look at query depths:

It looks like queries are never more than seven levels deep.

These two insights can be converted into security rules: we added three new Wirefilter fields that API Gateway customers can use to protect their GraphQL endpoints:

1. cf.api_gateway.graphql.query_size
2. cf.api_gateway.graphql.query_depth
3. cf.api_gateway.graphql.parsed_successfully

For now, we recommend the use of cf.api_gateway.graphql.parsed_successfully in all rules. Rules created with the use of this field will be backwards compatible with future GraphQL protection releases.

If a customer feels that there is nothing out of the ordinary with the traffic sample and that it represents a meaningful amount of normal usage, they can manually create and deploy the following custom rule to log all queries that were parsed by Cloudflare and that look like outliers:

cf.api_gateway.graphql.parsed_successfully and
(cf.api_gateway.graphql.query_depth > 7 or 
cf.api_gateway.graphql.query_size > 60)

Learn more and run your own analysis with our documentation.

What’s next?

We are already receiving feedback from our first customers and are planning out the next iteration of this feature. These are the features we will build next:

Integrating GraphQL security with complexity-based rate limiting such that we automatically calculate query cost and let customers rate limit eyeballs based on the total query execution cost the eyeballs use during their entire session.
Allowing customers to configure specifically which endpoints GraphQL security features run on.
Creating data insights on the relationship between query complexity and the time it takes the customer origin to respond to the query.
Creating automatic GraphQL threshold recommendations based on historical trends.

If you’re an Enterprise customer that hasn't purchased API Gateway and you’re interested in protecting your GraphQL APIs today, you can get started by enabling the API Gateway trial inside the Cloudflare Dashboard or by contacting your account manager. Check out our documentation on the feature to get started once you have access.

Detecting API abuse automatically using sequence analysis

John Cosgrove — Wed, 15 Mar 2023 13:00:00 GMT

Today, we're announcing Cloudflare Sequence Analytics for APIs. Using Sequence Analytics, Customers subscribed to API Gateway can view the most important sequences of API requests to their endpoints. This new feature helps customers to apply protection to the most important endpoints first.

What is a sequence? It is simply a time-ordered list of HTTP API requests made by a specific visitor as they browse a website, use a mobile app, or interact with a B2B partner via API. For example, a portion of a sequence made during a bank funds transfer could look like:

Order	Method	Path	Description
1	GET	/api/v1/users/{user_id}/accounts	user_id is the active user
2	GET	/api/v1/accounts/{account_id}/balance	account_id is one of the user’s accounts
3	GET	/api/v1/accounts/{account_id}/balance	account_id is a different account belonging to the user
4	POST	/api/v1/transferFunds	Containing a request body detailing an account to transfer funds from, an account to transfer funds to, and an amount of money to transfer

Why is it important to pay attention to sequences for API security? If the above API received requests for POST /api/v1/transferFunds without any of the prior requests, it would seem suspicious. Think about it: how would the API client know what the relevant account IDs are without listing them for the user? How would the API client know how much money is available to transfer? While this example may be obvious, the sheer number of API requests to any given production API can make it hard for human analysts to spot suspicious usage.

In security, one approach to defending against an untold number of threats that are impossible to screen by a team of humans is to create a positive security model. Instead of trying to block everything that could potentially be a threat, you allow all known good or benign traffic and block everything else by default.

Customers could already create positive security models with API Gateway in two main areas: volumetric abuse protection and schema validation. Sequences will form the third pillar of a positive security model for API traffic. API Gateway will be able to enforce the precedence of endpoints in any given API sequence. By establishing precedence within an API sequence, API Gateway will log or block any traffic that doesn’t match expectations, reducing abusive traffic.

Detecting abuse by sequence

When attackers attempt to exfiltrate data in an abusive way, they rarely follow the patterns of expected API traffic. Attacks often use special software to ‘fuzz’ the API, sending several requests with different request parameters hoping to find unexpected responses from the API indicating opportunities to exfiltrate data. Attackers can also manually send requests to APIs that attempt to trick the API in performing unauthorized actions, like granting an attacker elevated privileges or access to data through a Broken Object Level Authentication attack. Protecting APIs with rate limits is a common best practice; however, in both of the above examples attackers may deliberately execute request sequences slowly, in an attempt to thwart volumetric abuse detection.

Think of the sequence of requests above again, but this time imagine an attacker copying the legitimate funds transfer request and modifying the request payload in an attempt to trick the system:

Order	Method	Path	Description
1	GET	/api/v1/users/{user_id}/accounts	user_id is the active user
2	GET	/api/v1/accounts/{account_id}/balance	account_id is one of the user’s accounts
3	GET	/api/v1/accounts/{account_id}/balance	account_id is a different account belonging to the user
4	POST	/api/v1/transferFunds	Containing a request body detailing an account to transfer funds from, an account to transfer funds to, and an amount of money to transfer
… attacker copies the request to a debugging tool like Postman …
5	POST	/api/v1/transferFunds	Attacker has modified the POST body to try and trick the API
6	POST	/api/v1/transferFunds	A further modified POST body to try and trick the API
7	POST	/api/v1/transferFunds	Another, further modified POST body to try and trick the API

If the customer knew beforehand that the funds transfer endpoint was critical to protect and only occurred once during a sequence, they could write a rule to ensure that it was never called twice in a row and a GET /balance always preceded a POST /transferFunds. But without prior knowledge of which endpoint sequences are critical to protect, how would the customer know which rules to define? A low rate limit is too risky, since an API user might legitimately have a few funds transfer requests to perform in a short amount of time. In the present reality there are few tools to prevent this type of abuse, and most customers are left with reactive efforts to clean up abuse with their application teams and fraud departments after it’s happened.

Ultimately, we believe that providing our customers with the ability to define positive security models on API request sequences requires a three-pronged approach:

Sequence Analytics: Determining which sequences of API requests occurred and when, as well as summarizing the data into readily understandable form.
Sequence Abuse Detection: Identifying which sequences of API requests are likely of benign or malicious origin.
Sequence Mitigation: Identifying relevant rules on sequences of API requests for deciding which traffic to allow or block.

Challenges of sequence creation

Sequence Analytics presents some difficult technical challenges, because sessions may be long-lived and may consist of many requests. As a result, it is not sufficient to define sequences by session identifier alone. Instead, it was necessary for us to develop a solution capable of automatically identifying multiple sequences which occur within a given session. Additionally, since important sequences are not necessarily characterized by volume alone and the set of possible sequences is large, it was necessary to develop a solution capable of identifying important sequences, as opposed to simply surfacing frequent sequences.

To help illustrate these challenges for the example of api.cloudflare.com, we can group API requests by session and plot the number of distinct sequences versus sequence length:

The plot is based on a one hour snapshot comprising approximately 88,000 sessions and 260 million API requests, with 301 distinct API endpoints. We process the data by applying a fixed-length sliding window to each session, then we count the total number of different fixed-length sequences (‘n-grams’) that we observe as a result of applying the sliding window. The plot displays results for a window size (‘n-gram length’) varying between 1 and 10 requests. The number of distinct sequences ranges from 301 (for a window size of 1 request) to approximately 780,000 (for a window size of 10 requests). Based on the plot, we observe a large number of possible sequences which grows with sequence length: As we increase the sliding window size, we see an increasingly large amount of different sequences in the sample. The smooth trend can be explained by the fact that we apply a sliding window (sessions may themselves contain many sequences) in combination with many long sessions relative to the sequence length.

Given the large number of possible sequences, trying to find abusive sequences is a ‘needles in a haystack’ situation.

Introducing Sequence Analytics

Here is a screenshot from the API Gateway dashboard highlighting Sequence Analytics:

Let’s break down the new functionality seen in the screenshot.

API Gateway intelligently determines sequences of requests made by your API consumers using the methods described earlier in this article. API Gateway scores sequences by a metric we call Correlation Score. Sequence Analytics displays the top 20 sequences by highest correlation score, and we refer to these as your most important sequences. High-importance sequences contain API requests which are likely to occur together in order.

You should inspect each of your sequences to understand their correlation scores. High correlation score sequences may consist of rarely used endpoints (potentially anomalous user behavior) as well as commonly used endpoints (likely benign user behavior). Since the endpoints found in these sequences commonly occur together, they represent true usage patterns of your API. You should apply all possible API Gateway protections to these endpoints (rate limiting suggestions, Schema Validation, JWT Validation, and mTLS) and check their specific endpoint order with your development team.

We know customers want to explicitly set allowable behavior on their APIs beyond the active protections offered by API Gateway today. Coming soon, we’re releasing sequence precedence rules and enabling the ability to block requests based on those rules. The new sequence precedence rules will allow customers to specify the exact order of allowable API requests, bringing yet another way of establishing a positive security model to protect your API against unknown threats.

How to get started

All API Gateway customers now have access to Sequence Analytics. Navigate to a zone in the Cloudflare dashboard, then click the Security tab > API Gateway tab > Sequences tab. You’ll see the most important sequences that your API consumers request.

Enterprise customers that haven’t purchased API Gateway can get started by enabling the API Gateway trial inside the Cloudflare Dashboard or contacting their account manager.

What’s next

Sequence-based detection is a powerful and unique capability that unlocks many new opportunities to identify and stop attacks. As we fine-tune the methods of identifying these sequences and shipping them to our global network, we will release custom sequence matching and real-time mitigation features at a future date. We will also ensure you have the actionable intelligence to take back to your team on who the API users were that attempted to request sequences that don’t match your policy.

Automatically discovering API endpoints and generating schemas using machine learning

John Cosgrove — Wed, 15 Mar 2023 13:00:00 GMT

Cloudflare now automatically discovers all API endpoints and learns API schemas for all of our API Gateway customers. Customers can use these new features to enforce a positive security model on their API endpoints even if they have little-to-no information about their existing APIs today.

The first step in securing your APIs is knowing your API hostnames and endpoints. We often hear that customers are forced to start their API cataloging and management efforts with something along the lines of “we email around a spreadsheet and ask developers to list all their endpoints”.

Can you imagine the problems with this approach? Maybe you have seen them first hand. The “email and ask” approach creates a point-in-time inventory that is likely to change with the next code release. It relies on tribal knowledge that may disappear with people leaving the organization. Last but not least, it is susceptible to human error.

Even if you had an accurate API inventory collected by group effort, validating that API was being used as intended by enforcing an API schema would require even more collective knowledge to build that schema. Now, API Gateway’s new API Discovery and Schema Learning features combine to automatically protect APIs across the Cloudflare global network and remove the need for manual API discovery and schema building.

API Gateway discovers and protects APIs

API Gateway discovers APIs through a feature called API Discovery. Previously, API Discovery used customer-specific session identifiers (HTTP headers or cookies) to identify API endpoints and display their analytics to our customers.

Doing discovery in this way worked, but it presented three drawbacks:

Customers had to know which header or cookie they used in order to delineate sessions. While session identifiers are common, finding the proper token to use can take time.
Needing a session identifier for API Discovery precluded us from monitoring and reporting on completely unauthenticated APIs. Customers today still want visibility into session-less traffic to ensure all API endpoints are documented and that abuse is at a minimum.
Once the session identifier was input into the dashboard, customers had to wait up to 24 hours for the Discovery process to complete. Nobody likes to wait.

While this approach had drawbacks, we knew we could quickly deliver value to customers by starting with a session-based product. As we gained customers and passed more traffic through the system, we knew our new labeled data would be extremely useful to further build out our product. If we could train a machine learning model with our existing API metadata and the new labeled data, we would no longer need a session identifier to pinpoint which endpoints were for APIs. So we decided to build this new approach.

We took what we learned from the session identifier-based data and built a machine learning model to uncover all API traffic to a domain, regardless of session identifier. With our new Machine Learning-based API Discovery, Cloudflare continually discovers all API traffic routed through our network without any prerequisite customer input. With this release, API Gateway customers will be able to get started with API Discovery faster than ever, and they’ll uncover unauthenticated APIs that they could not discover before.

Session identifiers are still important to API Gateway, as they form the basis of our volumetric abuse prevention rate limits as well as our Sequence Analytics. See more about how the new approach performs in the “How it works” section below.

API Protection starting from nothing

Now that you’ve found new APIs using API Discovery, how do you protect them? To defend against attacks, API developers must know exactly how they expect their APIs to be used. Luckily, developers can programmatically generate an API schema file which codifies acceptable input to an API and upload that into API Gateway’s Schema Validation.

However, we already talked about how many customers can't find their APIs as fast as their developers build them. When they do find APIs, it’s very difficult to accurately build a unique OpenAPI schema for each of potentially hundreds of API endpoints, given that security teams seldom see more than the HTTP request method and path in their logs.

When we looked at API Gateway’s usage patterns, we saw that customers would discover APIs but almost never enforce a schema. When we ask them ‘why not?’ the answer was simple: “Even when I know an API exists, it takes so much time to track down who owns each API so that they can provide a schema. I have trouble prioritizing those tasks higher than other must-do security items.” The lack of time and expertise was the biggest gap in our customers enabling protections.

So we decided to close that gap. We found that the same learning process we used to discover API endpoints could then be applied to endpoints once they were discovered in order to automatically learn a schema. Using this method we can now generate an OpenAPI formatted schema for every single endpoint we discover, in real time. We call this new feature Schema Learning. Customers can then upload that Cloudflare-generated schema into Schema Validation to enforce a positive security model.

How it works

Machine learning-based API discovery

With RESTful APIs, requests are made up of different HTTP methods and paths. Take for example the Cloudflare API. You’ll notice a common trend with the paths that might make requests to this API stand out amongst requests to this blog: API requests all start with /client/v4 and continue with the service name, a unique identifier, and sometimes service feature names and further identifiers.

How could we easily identify API requests? At first glance, these requests seem easy to programmatically discover with a heuristic like “path starts with /client”, but the core of our new Discovery contains a machine-learned model that powers a classifier that scores HTTP transactions. If API paths are so structured, why does one need machine-learning for this and can’t one just use some simple heuristic?

The answer boils down to the question: what actually constitutes an API request and how does it differ from a non-API request? Let’s look at two examples.

Like the Cloudflare API, many of our customers’ APIs follow patterns such as prefixing the path of their API request with an “api” identifier and a version, for example: /api/v2/user/7f577081-7003-451e-9abe-eb2e8a0f103d.

So just looking for “api” or a version in the path is already a pretty good heuristic that tells us this is very likely part of an API, but it is unfortunately not always as easy.

Let’s consider two further examples, /users/7f577081-7003-451e-9abe-eb2e8a0f103d.jpg and /users/7f577081-7003-451e-9abe-eb2e8a0f103d, both just differ in a .jpg extension. The first path could just be a static resource like the thumbnail of a user. The second path does not give us a lot of clues just from the path alone.

Manually crafting such heuristics quickly becomes difficult. While humans are great at finding patterns, building heuristics is challenging considering the scale of the data that Cloudflare sees each day. As such, we use machine learning to automatically derive these heuristics such that we know that they are reproducible and adhere to a certain accuracy.

Input to the training are features of HTTP request/response samples such as the content-type or file extension that we collected through the session identifiers-based Discovery mentioned earlier. Unfortunately, not everything that we have in this data is clearly an API. Additionally, we also need samples that represent non-API traffic. As such, we started out with the session-identifier Discovery data, manually cleaned it up and derived further samples of non-API traffic. We took great care in trying to not overfit the model to the data. That is, we want that the model generalizes beyond the training data.

To train the model, we’ve used the CatBoost library for which we already have a good chunk of expertise as it also powers our Bot Management ML-models. In a simplification, one can regard the resulting model as a flow chart that tells us which conditions we should check after another, for example: if the path contains “api” then also check if there is no file extension and so forth. At the end of this flowchart is a score that tells us the likelihood that a HTTP transaction belongs to an API.

Given the trained model, we can thus input features of HTTP request/responses that run through the Cloudflare network and calculate the likelihood that this HTTP transaction belongs to an API or not. Feature extraction and model scoring is done in Rust and takes only a couple of microseconds on our global network. Since Discovery sources data from our powerful data pipeline, it is not actually necessary to score each transaction. We can reduce the load on our servers by only scoring those transactions that we know will end up in our data pipeline to begin with thus saving CPU time and allowing the feature to be cost effective.

With the classification results in our data pipeline, we can use the same API Discovery mechanism that we’ve been using for the session identifier-based discovery. This existing system works great and allows us to reuse code efficiently. It also aided us when comparing our results with the session identifier-based Discovery, as the systems are directly comparable.

For API Discovery results to be useful, Discovery’s first task is to simplify the unique paths we see into variables. We’ve talked about this before. It is not trivial to deduce the various different identifier schemes that we see across the global network, especially when sites use custom identifiers beyond a straightforward GUID or integer format. API Discovery aptly normalizes paths containing variables with the help of a few different variable classifiers and supervised learning.

Only after normalizing paths are the Discovery results ready for our users to use in a straightforward fashion.

The results: hundreds of found endpoints per customer

So, how does ML Discovery compare to the session identifier-based Discovery which relies on headers or cookies to tag API traffic?

Our expectation is that it detects a very similar set of endpoints. However, in our data we knew there would be two gaps. First, we sometimes see that customers are not able to cleanly dissect only API traffic using session identifiers. When this happens, Discovery surfaces non-API traffic. Second, since we required session identifiers in the first version of API Discovery, endpoints that are not part of a session (e.g. login endpoints or unauthenticated endpoints) were conceptually not discoverable.

The following graph shows a histogram of the number of endpoints detected on customer domains for both discovery variants.

From a bird's eye perspective, the results look very similar, which is a good indicator that ML Discovery performs as it is supposed to. There are some differences already visible in this plot, which is also expected since we’ll also discover endpoints that are conceptually not discoverable with just a session identifier. In fact, if we take a closer look at a domain-by-domain comparison we see that there is no change for roughly ~46% of the domains. The next graph compares the difference (by percent of endpoints) between session-based and ML-based discovery:

For ~15% of the domains, we see an increase in endpoints between 1 and 50, and for ~9%, we see a similar reduction. For ~28% of the domains, we find more than 50 additional endpoints.

These results highlight that ML Discovery is able to surface additional endpoints that have previously been flying under the radar, and thus expands the set tools API Gateway offers to help bring order to your API landscape.

On-the-fly API protection through API schema learning

With API Discovery taken care of, how can a practitioner protect the newly discovered endpoints? We already looked at the API request metadata, so now let’s look at the API request body. The compilation of all expected formats for all API endpoints of an API is known as an API schema. API Gateway’s Schema Validation is a great way to protect against OWASP Top 10 API attacks, ensuring the body, path, and query string of a request contains the expected information for that API endpoint in an expected format. But what if you don’t know the expected format?

Even if the schema of a specific API is not known to a customer, the clients using this API will have been programmed to mostly send requests that conform to this unknown schema (or they would not be able to successfully query the endpoint). Schema Learning makes use of this fact and will look at successful requests to this API to reconstruct the input schema automatically for the customer. As an example, an API might expect the user-ID parameter in a request to have the form id12345-a. Even if this expectation is not explicitly stated, clients that want to have a successful interaction with the API will send user-IDs in this format.

Schema Learning first identifies all recent successful requests to an API-endpoint, and then parses the different input parameters for each request according to their position and type. After parsing all requests, Schema Learning looks at the different input values for each position and identifies which characteristics they have in common. After verifying that all observed requests share these commonalities, Schema Learning creates an input schema that restricts input to comply with these commonalities and that can directly be used for Schema Validation.

To allow for more accurate input schemas, Schema Learning identifies when a parameter can receive different types of input. Let's say you wanted to write an OpenAPIv3 schema file and manually observe in a small sample of requests that a query parameter is a unix timestamp. You write an API schema that forces that query parameter to be an integer greater than the start of last year's unix epoch. If your API also allowed that parameter in ISO 8601 format, your new rule would create false positives when the differently formatted (yet valid) parameter hit the API. Schema Learning automatically does all this heavy lifting for you and catches what manual inspection can't.

To prevent false positives, Schema Learning performs a statistical test on the distribution of these values and only writes the schema when the distribution is bounded with high confidence.

So how well does it work? Below are some statistics about the parameter types and values we see:

Parameter learning classifies slightly more than half of all parameters as strings, followed by integers which make up almost a third. The remaining 17% are made up of arrays, booleans, and number (float) parameters, while object parameters are seen more rarely in the path and query.

The number of parameters in the path is usually very low, with 94% of all endpoints seeing at most one parameter in their path.

For the query, we do see a lot more parameters, sometimes reaching 50 different parameters for one endpoint!

Parameter learning is able to estimate numeric constraints with 99.9% confidence for the majority of parameters observed. These constraints can either be a maximum/minimum on the value, length, or size of the parameter, or a limited set of unique values that a parameter has to take.

Protect your APIs in minutes

Starting today, all API Gateway customers can now discover and protect APIs in just a few clicks, even if you're starting with no previous information. In the Cloudflare dash, click into API Gateway and on to the Discovery tab to observe your discovered endpoints. These endpoints will be immediately available with no action required from you. Then, add relevant endpoints from Discovery into Endpoint Management. Schema Learning runs automatically for all endpoints added to Endpoint Management. After 24 hours, export your learned schema and upload it into Schema Validation.

Enterprise customers that haven’t purchased API Gateway can get started by enabling the API Gateway trial inside the Cloudflare Dashboard or contacting their account manager.

What’s next

We plan to enhance Schema Learning by supporting more learned parameters in more formats, like POST body parameters with both JSON and URL-encoded formats as well as header and cookie schemas. In the future, Schema Learning will also notify customers when it detects changes in the identified API schema and present a refreshed schema.

We’d like to hear your feedback on these new features. Please direct your feedback to your account team so that we can prioritize the right areas of improvement. We look forward to hearing from you!