The Cloudflare Blog

Protecting APIs from abuse using sequence learning and variable order Markov chains

Peter Foster — Thu, 12 Sep 2024 14:15:00 GMT

Consider the case of a malicious actor attempting to inject, scrape, harvest, or exfiltrate data via an API. Such malicious activities are often characterized by the particular order in which the actor initiates requests to API endpoints. Moreover, the malicious activity is often not readily detectable using volumetric techniques alone, because the actor may intentionally execute API requests slowly, in an attempt to thwart volumetric abuse protection. To reliably prevent such malicious activity, we therefore need to consider the sequential order of API requests. We use the term sequential abuse to refer to malicious API request behavior. Our fundamental goal thus involves distinguishing malicious from benign API request sequences.

In this blog post, you’ll learn about how we address the challenge of helping customers protect their APIs against sequential abuse. To this end, we’ll unmask the statistical machine learning (ML) techniques currently underpinning our Sequence Analytics product. We’ll build on the high-level introduction to Sequence Analytics provided in a previous blog post.

API sessions

Introduced in the previous blog post, let’s consider the idea of a time-ordered series of HTTP API requests initiated by a specific user. These occur as the user interacts with a service, such as while browsing a website or using a mobile app. We refer to the user’s time-ordered series of API requests as a session. Choosing a familiar example, the session for a customer interacting with a banking service might look like:

Time Order	Method	Path	Description
1	POST	/api/v1/auth	Authenticates a user
2	GET	/api/v1/accounts/{account_id}	Displays account balance, where account_id is an account belonging to the user
3	POST	/api/v1/transferFunds	Containing a request body detailing an account to transfer funds from, an account to transfer funds to, and an amount of money to transfer

One of our aims is to enable our customers to secure their APIs by automatically suggesting rules applicable to our Sequence Mitigation product for enforcing desired sequential behavior. If we enforce the expected behavior, we can prevent unwanted sequential behavior. In our example, desired sequential behavior might entail that /api/v1/auth must always precede /api/v1/accounts/{account_id}.

One important challenge we had to address is that the number of possible sessions grows rapidly as the session length increases. To see why, we can consider the alternative ways in which a user might interact with the example banking service: The user may, for example, execute multiple transfers, and/or check the balance of multiple accounts, in any order. Assuming that there are 3 possible endpoints, the following graph illustrates possible sessions when the user interacts with the banking service:

Because of this large number of possible sessions, suggesting mitigation rules requires that we address the challenge of summarizing sequential behavior from past session data as an intermediate step. We’ll refer to a series of consecutive endpoints in a session (for example /api/v1/accounts/{account_id} → /api/v1/transferFunds) in our example as a sequence. Specifically, a challenge we needed to address is that the sequential behavior relevant for creating rules isn’t necessarily apparent from volume alone: Consider for example that /api/transferFunds might nearly always be preceded by /api/v1/accounts/{account_id}, but also that the sequence /api/v1/accounts/{account_id} → /api/v1/transferFunds might occur relatively rarely, compared to the sequence /api/v1/auth → /api/v1/accounts/{account_id}. It is therefore conceivable that if we were to summarize based on volume alone, we might potentially deem the sequence /api/v1/accounts/{account_id} → /api/v1/transferFunds as unimportant, when in fact we ought to surface it as a potential rule.

Learning important sequences from API sessions

A widely-applied modeling approach applicable to sequential data is the Markov chain, in which the probability of each endpoint in our session data depends only on a fixed number of preceding endpoints. First, we’ll show how standard Markov chains can be applied to our session data, while pointing out some of their limitations. Second, we’ll show how we use a less well-known, but powerful, type of Markov chain to determine important sequences.

For illustrative purposes, let’s assume that there are 3 possible endpoints in our session data. We’ll represent these endpoints using the letters a, b and c:

a: /api/v1/auth
b: /api/v1/accounts/{account_id}
c: /api/v1/transferFunds

In its simplest form, a Markov chain is nothing more than a table which tells us the probability of the next letter, given knowledge of the immediately preceding letter. If we were to model past session data using the simplest kind of Markov chain, we might end up with a table like this one:

Known preceding endpoint in the session	Estimated probability of next endpoint in the session
a	b	c
a	0.10 (1555)	0.89 (13718)	0.01 (169)
b	0.03 (9618)	0.63 (205084)	0.35 (113382)
c	0.02 (3340)	0.67 (109896)	0.31 (51553)

Table 1

Table 1 lists the parameters of the Markov chain, namely the estimated probabilities of observing a, b or c as the next endpoint in a session, given knowledge of the immediately preceding endpoint in the session. For example, the 3rd row cell with value 0.67 means that given knowledge of immediately preceding endpoint c, the estimated probability of observing b as the next endpoint in a session is 67%, regardless of whether c was preceded by any endpoints. Thus, each entry in the table corresponds to a sequence of two endpoints. The values in brackets are the number of times we saw each two-endpoint sequence in past session data and are used to compute the probabilities in the table. For example, the value 0.01 is the result of evaluating the fraction 169 / (1555+13718+169). This method of estimating probabilities is known as maximum likelihood estimation.

To determine important sequences, we rely on credible intervals for estimating probabilities instead of maximum likelihood estimation. Instead of producing a single point estimate (as described above), credible intervals represent a plausible range of probabilities. This range reflects the amount of data available, i.e. the total number of sequence occurrences in each row. More data produces narrower credible intervals (reflecting a greater degree of certainty) and conversely less data produces wider credible intervals (reflecting a lesser degree of certainty). Based on the values in brackets in the table above, we thus might obtain the following credible intervals (entries in boldface will be explained further on):

Known preceding endpoint in the session	Estimated probability of next endpoint in the session
a	b	c
a	0.09-0.11 (1555)	0.88-0.89 (13718)	0.01-0.01 (169)
b	0.03-0.03 (9618)	0.62-0.63 (205084)	0.34-0.35 (113382)
c	0.02-0.02 (3340)	0.66-0.67 (109896)	0.31-0.32 (51553)

Table 2

For brevity, we won’t demonstrate here how to work out the credible intervals by hand (they involve evaluating the quantile function of a beta distribution). Notwithstanding, the revised table indicates how more data causes credible intervals to shrink: note the first row with a total of 15442 occurrences in comparison to the second row with a total of 328084 occurrences.

To determine important sequences, we use slightly more complex Markov chains than those described above. As an intermediate step, let’s first consider the case where each table entry corresponds to a sequence of 3 endpoints (instead of 2 as above), exemplified by the following table:

Known preceding endpoints in the session	Estimated probability of next endpoint in the session
a	b	c
aa	0.09-0.13 (173)	0.86-0.90 (1367)	0.00-0.02 (13)
ba	0.09-0.11 (940)	0.88-0.90 (8552)	0.01-0.01 (109)
ca	0.09-0.12 (357)	0.87-0.90 (2945)	0.01-0.02 (35)
ab	0.02-0.02 (272)	0.56-0.58 (7823)	0.40-0.42 (5604)
bb	0.03-0.03 (6067)	0.60-0.60 (122796)	0.37-0.37 (75801)
cb	0.03-0.03 (3279)	0.68-0.68 (74449)	0.29-0.29 (31960)
ac	0.01-0.09 (6)	0.77-0.91 (144)	0.06-0.19 (19)
bc	0.02-0.02 (2326)	0.77-0.77 (87215)	0.21-0.21 (23612)
cc	0.02-0.02 (1008)	0.43-0.44 (22527)	0.54-0.55 (27919)

Table 3

Table 3 again lists the estimated probabilities of observing a, b or c as the next endpoint in a session, but given knowledge of 2 immediately preceding endpoints in the session (instead of 1 immediately preceding endpoint as before). That is, the 3rd row cell with interval 0.09-0.13 means that given knowledge of immediately preceding endpoints ca, the probability of observing a as the next endpoint has a credible interval spanning 9% and 13%, regardless of whether ca was preceded by any endpoints. In parlance, we say that the above table represents a Markov chain of order 2. This is because the entries in the table represent probabilities of observing the next endpoint, given knowledge of 2 immediately preceding endpoints as context.

As a special case, the Markov chain of order 0 simply represents the distribution over endpoints in a session. We can tabulate the probabilities as follows, in relation to a single row corresponding to an ‘empty context’:

Known preceding endpoints in the session	Estimated probability of next endpoint in the session
a	b	c
	0.03-0.03 (15466)	0.64-0.65 (328732)	0.32-0.33 (165117)

Table 4

Note that the probabilities in Table 4 do not solely represent the case where there were no preceding endpoints in the session. Rather, the probabilities are for the occurrence of endpoints in the session, for the general case where we have no knowledge of the preceding endpoints and regardless of how many endpoints previously occurred.

Returning to our task of identifying important sequences, one possible approach might be to simply use a Markov chain of some fixed order N. For example, if we were to apply a threshold of 0.85 to the lower bounds of credible intervals in Table 3, we’d retain 3 sequences in total. On the other hand, this approach comes with two noteworthy limitations:

We need a way to select a suitable value for the model order N.
Since the model order remains fixed, identified sequences all have the same length N+1.

Variable order Markov chains

Variable order Markov chains (VOMCs) are a more powerful extension of the described fixed-order Markov chains which address the preceding limitations. VOMCs make use of the fact that for some chosen value of the Markov chain of fixed order N, the probability table might include statistically redundant information: Let’s compare Tables 3 and 2 above and consider in Table 3 the rows in boldface corresponding to contexts aa, ba, ca (these 3 contexts all share a as their suffix).

For all the 3 possible next endpoints a, b, c, these rows specify credible intervals which overlap with their respective estimates in Table 2 corresponding to context a (also indicated in boldface). We can interpret these overlapping intervals as representing no discernible difference between probability estimates, given knowledge of a as the preceding endpoint. With no discernible effect of what preceded a on the probability of the next endpoint, we can consider these 3 rows in Table 3 redundant: We may ‘collapse’ them by replacing them with the row in Table 2 corresponding to context a.

The result of revising Table 3 as described looks as follows (with the new row indicated in boldface):

Known preceding endpoints in the session	Estimated probability of next endpoint in the session
a	b	c
a	0.09-0.11 (1555)	0.88-0.89 (13718)	0.01-0.01 (169)
ab	0.02-0.02 (272)	0.56-0.58 (7823)	0.40-0.42 (5604)
ac	0.03-0.03 (6067)	0.60-0.60 (122796)	0.37-0.37 (75801)
bb	0.03-0.03 (3279)	0.68-0.68 (74449)	0.29-0.29 (31960)
bc	0.01-0.09 (6)	0.77-0.91 (144)	0.06-0.19 (19)
cb	0.02-0.02 (2326)	0.77-0.77 (87215)	0.21-0.21 (23612)
cc	0.02-0.02 (1008)	0.43-0.44 (22527)	0.54-0.55 (27919)

Table 5

Table 5 represents a VOMC, because the context length varies: In the example, we have context lengths 1 and 2. It follows that entries in the table represent sequences of length varying between 2 and 3 endpoints, depending on context length. Generalizing the described approach of collapsing contexts leads to the following algorithm sketch for learning a VOMC in an offline setting:

(1) Define the table T containing the estimated probability of the next endpoint in a session, given alternatively 0, 1, 2, …, N_max preceding endpoints in the session. That is, form a single table by concatenating the rows corresponding to Markov chains of fixed orders 0, 1, 2, …, N_max.

(2) is_modified := true

(3) DO WHILE is_modified

(4) D := all contexts in T which are not suffixes of at least 1 other context in T

(5) is_modified = false

(6) FOR ctx IN C

(7) IF length(ctx) > 0

(8) parent_ctx := the context obtained by deleting the leftmost endpoint in ctx

(9) IF is_collapsible(ctx, parent_ctx)

(10) Modify T by discarding ctx

(11) is_modified = true

In the pseudo-code, length(ctx) is the length of context ctx. On line 9, is_collapsible() involves comparing credible intervals for the contexts ctx and parent_ctx in the manner described for generating Table 5: is_collapsible() evaluates to true, if and only if we observe that all credible intervals overlap, when comparing contexts ctx and parent_ctx separately for each of the possible next endpoints. The maximum sequence length is N_max+1, where N_max is some constant. On line 4, we say that context q is a suffix of another context p if we can form p by prepending zero or more endpoints to q. (According to this definition, the ‘empty context’ mentioned above for the order 0 model is a suffix of all contexts in T.) The above algorithm sketch is a variant of the ideas first introduced by Rissanen [1], Ron et al. [2].

Finally, we take the entries in the resulting table T as our important sequences. Thus, the result of applying VOMCs is a set of sequences that we deem important. For Sequence Analytics however, we believe that it is additionally useful to rank sequences. We do this by computing a ‘precedence score’ between 0.0 and 1.0, which is the number of occurrences of the sequence divided by the number of occurrences of the last endpoint in the sequence. Thus, precedence scores close to 1.0 indicate that a given endpoint is nearly always preceded by the remaining endpoints in the sequence. In this way, manual inspection of the highest-scoring sequences is a semi-automated heuristic for creating precedence rules in our Sequence Mitigation product.

Learning sequences at scale

The preceding represents a very high-level overview of the statistical ML techniques that we use in Sequence Analytics. In practice, we have devised an efficient algorithm which does not require an upfront training step, but rather updates the model continuously as the data arrive and generates a frequently-updating summary of important sequences. This approach allows us to overcome additional challenges around memory cost not touched on in this blog post. Most significantly, a straightforward implementation of the algorithm sketch above would still result in the number of table rows (contexts) exploding with increasing maximum sequence length. A further challenge we had to address is how to ensure that our system is able to deal with high-volume APIs, without adversely impacting CPU load. We use a horizontally scalable adaptive sampling strategy upfront, such that more aggressive sampling is applied to high-volume APIs. Our algorithm then consumes the sampled streams of API requests. After a customer onboards, sequences are assembled and learned over time, so that the current summary of important sequences represents a sliding window with a look-back interval of approximately 24 hours. Sequence Analytics further stores sequences in Clickhouse and exposes them via a GraphQL API and via the Cloudflare dashboard. Customers who would like to enforce sequence rules can do so using Sequence Mitigation. Sequence Mitigation is what is responsible for ensuring that rules are shared and matched in distributed fashion on Cloudflare’s global network — another exciting topic for a future blog post!

What’s next

Now that you have a better understanding of how we surface important API request sequences, stay tuned for a future blog post in this series, where we’ll describe how we find the anomalous API request sequences that customers may want to stop. For now, API Gateway customers can get started in two ways: with Sequence Analytics to explore important API request sequences and with Sequence Mitigation to enforce sequences of API requests. Enterprise customers that haven’t purchased API Gateway can get started by enabling the API Gateway trial inside the Cloudflare Dashboard or contacting their account manager.

Securing Cloudflare with Cloudflare: a Zero Trust journey

Derek Pitts — Tue, 05 Mar 2024 14:00:51 GMT

Cloudflare is committed to providing our customers with industry-leading network security solutions. At the same time, we recognize that establishing robust security measures involves identifying potential threats by using processes that may involve scrutinizing sensitive or personal data, which in turn can pose a risk to privacy. As a result, we work hard to balance privacy and security by building privacy-first security solutions that we offer to our customers and use for our own network.

In this post, we'll walk through how we deployed Cloudflare products like Access and our Zero Trust Agent in a privacy-focused way for employees who use the Cloudflare network. Even though global legal regimes generally afford employees a lower level of privacy protection on corporate networks, we work hard to make sure our employees understand their privacy choices because Cloudflare has a strong culture and history of respecting and furthering user privacy on the Internet. We’ve found that many of our customers feel similarly about ensuring that they are protecting privacy while also securing their networks.

So how do we balance our commitment to privacy with ensuring the security of our internal corporate environment using Cloudflare products and services? We start with the basics: We only retain the minimum amount of data needed, we de-identify personal data where we can, we communicate transparently with employees about the security measures we have in place on corporate systems and their privacy choices, and we retain necessary information for the shortest time period needed.

How we secure Cloudflare using Cloudflare

We take a comprehensive approach to securing our globally distributed hybrid workforce with both organizational controls and technological solutions. Our organizational approach includes a number of measures, such as a company-wide Acceptable Use Policy, employee privacy notices tailored by jurisdiction, required annual and new-hire privacy and security trainings, role-based access controls (RBAC), and least privilege principles. These organizational controls allow us to communicate expectations for both the company and the employees that we can implement with technological controls and that we enforce through logging and other mechanisms.

Our technological controls are rooted in Zero Trust best practices and start with a focus on our Cloudflare One services to secure our workforce as described below.

Securing access to applications

Cloudflare secures access to self-hosted and SaaS applications for our workforce, whether remote or in-office, using our own Zero Trust Network Access (ZTNA) service, Cloudflare Access, to verify identity, enforce multi-factor authentication with security keys, and evaluate device posture using the Zero Trust client for every request. This approach evolved over several years and has enabled Cloudflare to more effectively protect our growing workforce.

Defending against cyber threats

Cloudflare leverages Cloudflare Magic WAN to secure our office networks and the Cloudflare Zero Trust agent to secure our workforce. We use both of these technologies as an onramp to our own Secure Web Gateway (also known as Gateway) to secure our workforce from a rise in online threats.

As we have evolved our hybrid work and office configurations, our security teams have benefited from additional controls and visibility for forward-proxied Internet traffic, including:

Granular HTTP controls: Our security teams inspect HTTPS traffic to block access to specific websites identified as malicious by our security team, conduct antivirus scanning, and apply identity-aware browsing policies.
Selectively isolating Internet browsing: With remote browser isolated (RBI) sessions, all web code is run on Cloudflare’s network far from local devices, insulating users from any untrusted and malicious content. Today, Cloudflare isolates social media, news outlets, personal email, and other potentially risky Internet categories, and we have set up feedback loops for our employees to help us fine-tune these categories.
Geography-based logging: Seeing where outbound requests originate helps our security teams understand the geographic distribution of our workforce, including our presence in high-risk areas.
Data Loss Prevention: To keep sensitive data inside our corporate network, this tool allows us to identify data we’ve flagged as sensitive in outbound HTTP/S traffic and prevent it from leaving the network.
Cloud Access Security Broker: This tool allows us to monitor our SaaS apps for misconfigurations and sensitive data that is potentially exposed or shared too broadly.

Protecting inboxes with cloud email security

Additionally, we have deployed our Cloud Email Security solution to protect our workforce from increased phishing and business email compromise attacks that we have not only seen directed against our employees, but that are plaguing organizations globally. One key feature we use is email link isolation, which uses RBI and email security functionality to open potentially suspicious links in an isolated browser. This allows us to be slightly more relaxed with blocking suspicious links without compromising security. This is a big win for productivity for our employees and the security team, as both sets of employees aren’t having to deal with large volumes of false positives.

More details on our implementation can be found in our Securing Cloudflare with Cloudflare One case study.

How we respect privacy

The very nature of these powerful security technologies Cloudflare has created and deployed underscores the responsibility we have to use privacy-first principles in handling this data, and to recognize that the data should be respected and protected at all times.

The journey to respecting privacy starts with the products themselves. We develop products that have privacy controls built in at their foundation. To achieve this, our product teams work closely with Cloudflare’s product and privacy counsels to practice privacy by design. A great example of this collaboration is the ability to manage personally identifiable information (PII) in the Secure Web Gateway logs. You can choose to exclude PII from Gateway logs entirely or redact PII from the logs and gain granular control over access to PII with the Zero Trust PII Role.

In addition to building privacy-first security products, we are also committed to communicating transparently with Cloudflare employees about how these security products work and what they can – and can’t – see about traffic on our internal systems. This empowers employees to see themselves as part of the security solution, rather than set up an “us vs. them” mentality around employee use of company systems.

For example, while our employee privacy policies and our Acceptable Use Policy provide broad notice to our employees about what happens to data when they use the company’s systems, we thought it was important to provide even more detail. As a result, our security team collaborated with our privacy team to create an internal wiki page that plainly explains the data our security tools collect and why. We also describe the privacy choices available to our employees. This is particularly important for the “bring your own device” (BYOD) employees who have opted for the convenience of using their personal mobile device for work. BYOD employees must install endpoint management (provided by a third party) and Cloudflare’s Zero trust client on their devices if they want to access Cloudflare systems. We described clearly to our employees what this means about what traffic on their devices can be seen by Cloudflare teams, and we explained how they can take steps to protect their privacy when they are using their devices for purely personal purposes.

For the teams that develop for and support our Zero Trust services, we ensure that data is available only on a strict, need-to-know basis and is restricted to Cloudflare team members that require access as an essential part of their job. The set of people with access are required to take training that reminds them of their responsibility to respect this data and provides them with best practices for handling sensitive data. Additionally, to ensure we have full auditability, we log all the queries run against this database and by whom they are run.

Cloudflare has also made it easy for our employees to express any concerns they may have about how their data is handled or what it is used for. We have mechanisms in place that allow employees to ask questions or express concerns about the use of Zero Trust Security on Cloudflare’s network.

In addition, we make it easy for employees to reach out directly to the leaders responsible for these tools. All of these efforts have helped our employees better understand what information we collect and why. This has helped to expand our strong foundation for security and privacy at Cloudflare.

Encouraging privacy-first security for all

We believe firmly that great security is critical for ensuring data privacy, and that privacy and security can co-exist harmoniously. We also know that it is possible to secure a corporate network in a way that respects the employees using those systems.

For anyone looking to secure a corporate network, we encourage focusing on network security products and solutions that build in personal data protections, like our Zero Trust suite of products. If you are curious to explore how to implement these Cloudflare services in your own organizations, request a consultation on Zero Trust here.

We also urge organizations to make sure they communicate clearly with their users. In addition to making sure company policies are transparent and accessible, it is important to help employees understand their privacy choices. Under the laws of almost every jurisdiction globally, individuals have a lower level of privacy on a company device or a company’s systems than they do on their own personal accounts or devices, so it’s important to communicate clearly to help employees understand the difference. If an organization has privacy champions, works councils, or other employee representation groups, it is critical to communicate early and often with these groups to help employees understand what controls they can exercise over their data.

Protecting APIs with JWT Validation

John Cosgrove — Tue, 05 Mar 2024 14:00:34 GMT

Today, we are happy to announce that Cloudflare customers can protect their APIs from broken authentication attacks by validating incoming JSON Web Tokens (JWTs) with API Gateway. Developers and their security teams need to control who can communicate with their APIs. Using API Gateway’s JWT Validation, Cloudflare customers can ensure that their Identity Provider previously validated the user sending the request, and that the user’s authentication tokens have not expired or been tampered with.

What’s new in this release?

After our beta release in early 2023, we continued to gather feedback from customers on what they needed from JWT validation in API Gateway. We uncovered four main feature requests and shipped updates in this GA release to address them all:

Old, Beta limitation	New, GA release capability
Only supported validating the raw JWT	Support for the Bearer token format
Only supported one JWKS configuration	Create up to four different JWKS configs to support different environments per zone
Only supported validating JWTs sent in HTTP headers	Validate JWTs if they are sent in a cookie, not just an HTTP header
JWT validation ran on all requests to the entire zone	Exclude any number of managed endpoints in a JWT validation rule

What is the threat?

Broken authentication is the #1 threat on the OWASP Top 10 and the #2 threat on the OWASP API Top 10. We’ve written before about how flaws in API authentication and authorization at Optus led to a threat actor offering 10 million user records for sale, and government agencies have warned about these exact API attacks.

According to Gartner®¹, “attacks and data breaches involving poorly secured application programming interfaces (APIs) are occurring frequently.” Getting authentication correct for your API users can be challenging, but there are best practices you can employ to cover your bases. JSON Web Token Validation in API Gateway fulfills one of these best practices by enforcing a positive security model for your authenticated API users.

A primer on authentication and authorization

Authentication establishes identity. Imagine you’re collaborating with multiple colleagues and writing a document in Google Docs. When you’re all authors of the document, you have the same privileges, and you can overwrite each other’s text. You can all see each other’s name next to your respective cursor while you’re typing. You’re all authenticated to Google Docs, so Docs can show all the users on a document who everyone is.

Authorization establishes ownership or permissions to objects. Imagine you’re collaborating with your colleague in Docs again, but this time they’ve written a document ahead of time and simply wish for you to review it and add comments without changing the document. As the owner of the document, your colleague sets an authorization policy to only allow you ‘comment’ access. As such, you cannot change their writing at all, but you can still view the document and leave comments.

While the words themselves might sound similar, the differences between them are hugely important for security. It’s not enough to simply check that a user logging in has the correct login credentials (authentication). If you never check their permissions (authorization), they would be free to overwrite, add, or delete other users’ content. When this happens for APIs, OWASP calls it a Broken Object Level Authorization attack.

A primer on API access tokens

Users authenticate to services in many different ways on the web today. Let’s take a look at the history of authentication with username and password authentication, API key authentication, and JWT authentication before we mention how JWTs can help stop API attacks.

In the early days, the web used HTTP Basic Authentication, where browsers transmitted username and password pairs as an HTTP header, posing significant security risks and making credentials visible to any observer when the application failed to adopt SSL/TLS certificates. Basic authentication also complicated API access, requiring hard-coded credentials and potentially giving broad authorization policies to a single user.

The introduction of API access keys improved security by detaching authentication from user credentials and instead sending secret text strings along with requests. This approach allowed for more nuanced access control by key instead of by user ID, though API keys still faced risks from man-in-the-middle attacks and problematic storage of secrets in source code.

JSON Web Tokens (JWTs) address these issues by removing the need to send long-lived secrets on every request, introducing cryptographically verifiable, auto-expiring, short-lived sessions. Think of a JWT like a tamper-evident seal on a bottle of medication. Along with the seal, medication also has an expiration date printed on it. Users notice when the seal is tampered with or missing altogether, and when the medication expires.

These attributes enhance security any time a JWT is used instead of a long-lived shared secret. JWTs are not an end-all-be-all solution, but they do represent an evolution in authentication technology and are widely used for authentication and authorization on the Internet today.

What’s the structure of a JWT?

JWTs are composed of three fields separated by periods. The first field is a header, the second a payload, and the third a signature:

eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJNeURlbW9JRFAiLCJzdWIiOiJqb2huZG9lIiwiYXVkIjoiTXlBcHAiLCJpYXQiOjE3MDg5ODU2MDEsImV4cCI6MTcwODk4NjIwMSwiY2xhc3MiOiJhZG1pbiJ9.v0nywcQemlEU4A18QD9UTgJLyH4ZPXppuW-n0iOmtj4x-hWJuExlMKeNS-vMZt4K6n0pDCFIAKo7_VZqACx4gILXObXMU4MEleFoKKd0f58KscNrC3BQqs3Gnq-vb5Ut9CmcvevQ5h9cBCI4XhpP2_LkYcZiuoSd3zAm2W_0LNZuFXp1wo8swDoKETYmtrdTjuF-IlVjLDAxNsWm2e7T5A8HmCnAWRItEPedm_8XVJAOemx_KqIH5w6zHY1U-M6PJkHK6D2gDU5eiN35A4FCrC5bQ1-0HSTtJkLIed2-1mRO1oANWHpscvpNLQBWQLLiIZ_evbcq_tnwh1X1sA3uxQ

If we base64 decode the first two sections, we arrive at the following structure (comments added for clarity):

{
  "alg": "RS256",     // JWT signature algorithm
  "typ": "JWT"        // JWT type
}

{
  "iss": "MyDemoIDP", // Which identity provider issued this JWT
  "sub": "johndoe",   // Which user this JWT identifies
  "aud": "MyApp",     // Which app this JWT is destined for
  "iat": 1708985601,  // When this JWT was issued
  "exp": 1709986201,  // When this JWT expires
  "class": "admin"    // Extra, customer-defined metadata
}

We can then use the algorithm mentioned in the header (RS256) as well as the Identity Provider’s public key (example below) to check the last segment in the JWT, the signature (code not shown).

The signature is what makes a JWT special. The token issuer, taking into account the claims, generates a signature based on a private secret or a public/private key pair. The public key can be published online, allowing anyone to check if a JWT was legitimately issued by an organization.

Proper authentication and authorization stop API attacks

No developer wants to release an insecure application, and no security team wants their developers to skip secure coding practices, but we know both happen. In the Enterprise Strategy Group report “Securing the API Attack Surface”², a survey found that 39% of developers skip security processes due to the faster development cycles of continuous integration and continuous delivery (CI/CD). The same survey found more than half (57%) of responding organizations faced multiple security incidents related to insecure APIs in the last 12 months, and 35% of responding organizations faced at least one incident within the last year.

Along with its accompanying database, permissions, and user roles, your origin application is the ultimate security backstop of your API. However, Cloudflare can assist in keeping attacks away from your origin when you configure API Gateway with the correct context. Let’s examine three different API attacks and how to protect against them.

Missing or broken authentication

The ability for a user to send or receive data to an API and entirely bypass authentication falls into ‘broken authentication’. It’s easy to think of the expected use cases your users will take with your application. You may assume that just because a user logs in and your application is written so that users can only access their own data in their dashboard, that all users are logged in and would only access their own data. This assumption fails to account for a user making an HTTP request outside your application requesting or modifying another user’s data and there being nothing in the way to stop your API from replying. In the worst case, a lack of authorization policy checks can enable an API client to change data without an authentication token at all!

Ensuring that incoming requests have an authentication token attached to them and dropping the requests that don’t is a great way to stop the simplest API attacks.

Expired token reuse

Maybe your application already uses JWTs for user authentication. Your application decodes the JWT and looks for user claims for group membership, and you validate the claims before allowing customers access to your API. But are you checking the JWT expiration time?

Imagine a user pays for your service, but they secretly know they will soon downgrade to a free account. If the user’s tier is stored within the JWT and the application or gateway doesn’t validate the expiration time of the JWT, the user could save an old JWT and replay it to continue their access to their paid benefits. Validating JWT expiration time can prevent this type of replay attack.

Broken Function Level Authorization attacks: Tampering with claims

Let’s say you’re using JWTs for authentication, validating the claims inside them, and also validating expiration time. But do you verify the JWT signature? Practically every JWT is signed by its issuer such that API admins and security teams that know the issuer’s signing key can verify that the JWT hasn’t been tampered with. Without the API Gateway or application checking the JWT signature, a malicious user could change their JWT claims, elevating their privileges to assume an administrator role in an application by starting with a normal, non-privileged user account.

JWT Validation from API Gateway safeguards your API from broken authentication and authorization attacks by checking that JWT signatures are intact, expiry times haven’t yet passed, and that authentication tokens are present to begin with.

Don’t other Cloudflare products do this?

Other Cloudflare products also use JWTs. Cloudflare Access is part of our suite of Zero Trust products, and is meant to tie into your Identity Provider. As a best practice, customers should validate the JWT that Access creates and sends to the origin.

Conversely, JWT Validation for API Gateway is a security layer compatible with any API without changing the setup, management, or expectation of the existing user flow. API Gateway’s JWT Validation is meant to validate pre-existing JWTs that may be used by any number of services at your API origin. You really need both: Access for your internal users or employees and API Gateway for your external users.

In addition, some customers use a custom Cloudflare Worker to validate JWTs, which is a great use case for the Workers platform. However, for straightforward use cases customers may find the JWT Validation experience of API Gateway easier to interact with and manage over the lifecycle of their application. If you are validating JWTs with a Worker and today’s release of JWT Validation isn’t yet at feature parity for your custom Worker, let your account representative know. We’re interested in expanding our capabilities to meet your requirements.

What’s next?

In a future release, we will go beyond checking pre-existing JWTs, and customers will be able to generate and enforce authorization policies entirely within API Gateway. We’ll also upgrade our on-demand developer portal creation with the ability to issue keys and authentication tokens to your development team directly, streamlining API management with Cloudflare.

In addition, stay tuned for future API Gateway feature launches where we’ll use our knowledge of API traffic norms to automatically suggest security policies that highlight and stop Broken Object/Function Level Authorization attacks outside the JWT Validation use case.

Existing API Gateway customers can try the new feature now. Enterprise customers without API Gateway should sign up for the trial to try the latest from API Gateway.

¹Gartner, “API Security: What You Need to Do to Protect Your APIs”, Analyst(s) Mark O'Neill, Dionisio Zumerle, Jeremy D'Hoinne, January 13, 2023

²Enterprise Strategy Group, “Securing the API Attack Surface”, Analyst, Melinda Marks, May 2023

Introducing Cloudflare’s 2024 API security and management report

John Cosgrove — Tue, 09 Jan 2024 14:00:03 GMT

You may know Cloudflare as the company powering nearly 20% of the web. But powering and protecting websites and static content is only a fraction of what we do. In fact, well over half of the dynamic traffic on our network consists not of web pages, but of Application Programming Interface (API) traffic — the plumbing that makes technology work. This blog introduces and is a supplement to the API Security Report for 2024 where we detail exactly how we’re protecting our customers, and what it means for the future of API security. Unlike other industry API reports, our report isn’t based on user surveys — but instead, based on real traffic data.

If there’s only one thing you take away from our report this year, it’s this: many organizations lack accurate API inventories, even when they believe they can correctly identify API traffic. Cloudflare helps organizations discover all of their public-facing APIs using two approaches. First, customers configure our API discovery tool to monitor for identifying tokens present in their known API traffic. We then use a machine learning model that scans not just these known API calls, but all HTTP requests, identifying API traffic that may be going unaccounted for. The difference between these approaches is striking: we found 30.7% more API endpoints through machine learning-based discovery than the self-reported approach, suggesting that nearly a third of APIs are “Shadow APIs” — and may not be properly inventoried and secured.

Read on for extras and highlights from our inaugural API security report. In the full report, you’ll find updated statistics about the threats we see and prevent, along with our predictions for 2024. We predict that a lack of API security focus at organizations will lead to increased complexity and loss of control, and increased access to generative AI will lead to more API risk. We also anticipate an increase in API business logic attacks in 2024. Lastly, all of the above risks will necessitate growing governance around API security.

Hidden attack surfaces

How are web pages and APIs different? APIs are a quick and easy way for applications to retrieve data in the background, or ask that work be done from other applications. For example, anyone can write a weather app without being a meteorologist: a developer can write the structure of the page or mobile application and ask a weather API for the forecast using the user’s location. Critically, most end users don’t know that the data was provided by the weather API and not the app’s owner.

While APIs are the critical plumbing of the Internet, they're also ripe for abuse. For example, flaws in API authentication and authorization at Optus led to a threat actor offering 10 million user records for sale, and government agencies have warned about these exact API attacks. Developers in an organization will often create Internet-facing APIs, used by their own applications to function more efficiently, but it's on the security team to protect these new public interfaces. If the process of documenting APIs and bringing them to the attention of the security team isn't clear, they become Shadow APIs — operating in production but without the organization's knowledge. This is where the security challenge begins to emerge.

To help customers solve this problem, we shipped API Discovery. When we introduced our latest release, we mentioned how few organizations have accurate API inventories. Security teams sometimes are forced to adopt an “email and ask” approach to build an inventory, and in doing so responses are immediately stale upon the next application release when APIs change. Better is to track API changes by code base changes, keeping up with new releases. However, this still has a drawback of only inventorying actively maintained code. Legacy applications may not see new releases, despite receiving production traffic.

Cloudflare's approach to API management involves creating a comprehensive, accurate API inventory using a blend of machine learning-based API discovery and network traffic inspection. This is integral to our API Gateway product, where customers can manage their Internet-facing endpoints and monitor API health. The API Gateway also allows customers to identify their API traffic using session identifiers (typically a header or cookie), which aids in specifically identifying API traffic for the discovery process.

As noted earlier, our analysis reveals that even knowledgeable customers often overlook significant portions of their API traffic. By comparing session-based API discovery (using API sessions to pinpoint traffic) with our machine learning-based API discovery (analyzing all incoming traffic), we found that the latter uncovers on average 30.7% more endpoints! Without broad traffic analysis, you may be missing almost a third of your API inventory.

If you aren’t a Cloudflare customer, you can still get started building an API inventory. APIs are typically cataloged in a standardized format called OpenAPI, and many development tools can build OpenAPI formatted schema files. If you have a file with that format, you can start to build an API inventory yourself by collecting these schemas. Here is an example of how you can pull the endpoints out of a schema file, assuming your have an OpenAPI v3 formatted file named my_schema.json:

import json
import csv
from io import StringIO

# Load the OpenAPI schema from a file
with open("my_schema.json", "r") as file:
    schema = json.load(file)

# Prepare CSV output
output = StringIO()
writer = csv.writer(output)

# Write CSV header
writer.writerow(["Server", "Path", "Method"])

# Extract and write data to CSV
servers = schema.get("servers", [])
for server in servers:
    url = server['url']
    for path, methods in schema['paths'].items():
        for method in methods.keys():
            writer.writerow([url, path, method])

# Get and print CSV string
csv_output = output.getvalue().strip()
print(csv_output)

Unless you have been generating OpenAPI schemas and tracking API inventory from the beginning of your application’s development process, you’re probably missing some endpoints across your production application API inventory.

Precise rate limits minimize attack potential

When it comes to stopping abuse, most practitioners’ thoughts first come to rate limiting. Implementing limits on your API is a valuable tool to keep abuse in check and prevent accidental overload of the origin. But how do you know if you’ve chosen the correct rate limiting approach? Approaches can vary, but they generally come down to the error code chosen, and the basis for the limit value itself.

For some APIs, practitioners configure rate limiting errors to respond with an HTTP 403 (forbidden), while others will respond with HTTP 429 (too many requests). Using HTTP 403 sounds innocent enough until you realize that other security tools are also responding with 403 codes. When you’re under attack, it can be hard to decipher which tools are responsible for which errors / blocking.

Alternatively, if you utilize HTTP 429 for your rate limits, attackers will instantly know that they’ve been rate limited and can “surf” right under the limit without being detected. This can be OK if you’re only limiting requests to ensure your back-end stays alive, but it can tip your cards to attackers. In addition, attackers can “scale out” to more API clients to effectively request above the rate limit.

There are pros and cons to both approaches, but we find that by far most APIs respond with HTTP 429 out of all the 4xx and 5xx error messages (almost 52%).

What about the logic of the rate limit rule itself, not just the response code? Implementing request limits on IP addresses can be tempting, but we recommend you base the limit on a session ID as a best practice and only fall back to IP address (or IP + JA3 fingerprint) when session IDs aren’t available. Setting rate limits on user sessions instead of IPs will reliably identify your real users and minimize false positives due to shared IP space. Cloudflare’s Advanced Rate Limiting and API Gateway’s volumetric abuse protection make it easy to enforce these limits by profiling session traffic on each API endpoint and giving one-click solutions to set up the per-endpoint rate limits.

To find values for your rate limits, Cloudflare API Gateway computes session request statistics for you. We suggest a limit by looking at the distribution of requests per session across all sessions to your API as identified by the customer-configured API session identifier. We then compute statistical p-levels — which describe the request rates for different cohorts of traffic — for p50, p90, and p99 on this distribution and use the variance of the distribution to come up with a recommended threshold for every single endpoint in your API inventory. The recommendation might not match the p-levels, which is an important distinction and a reason not to use p-levels alone. Along with the recommendation, API Gateway informs users of our confidence in the recommendation. Generally, the more API sessions we’re able to collect, the more confident we’ll be in the recommendation.

Activating a rate limit is as easy as clicking the ‘create rule’ link, and API Gateway will automatically bring your session identifier over to the advanced rate limit rule creation page, ensuring your rules have pinpoint accuracy to defend against attacks and minimize false positives compared to traditional, overly broad limits.

APIs are also victim to web application attacks

APIs aren’t immune from normal OWASP Top 10 style attacks like SQL injection. The body of API requests can also find its way as a database input just like a web page form input or URL argument. It’s important to ensure that you have a web application firewall (WAF) also protecting your API traffic to defend against these styles of attacks.

In fact, when we looked at Cloudflare’s WAF managed rules, injection attacks were the second most common threat vector Cloudflare saw carried out on APIs. The most common threat was HTTP Anomaly. Examples of HTTP anomalies include malformed method names, null byte characters in headers, non-standard ports or content length of zero with a POST request. Here are the stats on the other top threats we saw against APIs:

Absent from the chart is broken authentication and authorization. Broken authentication and authorization occur when an API fails to check whether the entity sending requests for information to an API actually has the permission to request that data or not. It can also happen when attacks try to forge credentials and insert less restricted permissions into their existing (valid) credentials that have more restricted permissions. OWASP categorizes these attacks in a few different ways, but the main categories are Broken Object Level Authorization (BOLA) and Broken Function Level Authorization (BFLA) attacks.

The root cause of a successful BOLA / BFLA attack lies in an origin API not checking proper ownership of database records against the identity requesting those records. Tracking these specific attacks can be difficult, as the permission structure may be simply absent, inadequate, or improperly implemented. Can you see the chicken-and-egg problem here? It would be easy to stop these attacks if we knew the proper permission structure, but if we or our customers knew the proper permission structure or could guarantee its enforcement, the attacks would be unsuccessful to begin with. Stay tuned for future API Gateway feature launches where we’ll use our knowledge of API traffic norms to automatically suggest security policies that highlight and stop BOLA / BFLA attacks.

Here are four ways to plug authentication loopholes that may exist for your APIs, even if you don’t have a fine-grained authorization policy available:

First, enforce authentication on each publicly accessible API unless there's a business approved exception. Look to technologies like mTLS and JSON Web Tokens.
Limit the speed of API requests to your servers to slow down potential attackers.
Block abnormal volumes of sensitive data outflow.
Block attackers from skipping legitimate sequences of API requests.

APIs are surprisingly human driven, not machine driven anymore

If you’ve been around technology since the pre-smartphone days when fewer people were habitually online, it can be tempting to think of APIs as only used for machine-to-machine communication in something like an overnight batch job process. However, the truth couldn’t be more different. As we’ve discussed, many web and mobile applications are powered by APIs, which facilitate everything from authentication to transactions to serving media files. As people use these applications, there is a corresponding increase in API traffic volume.

We can illustrate this by looking at the API traffic patterns observed during holidays, when people gather around friends and family and spend more time socializing in person and less time online. We’ve annotated the following Worldwide API traffic graph with common holidays and promotions. Notice how traffic peaks around Black Friday and Cyber Monday around the +10% level when people shop online, but then traffic drops off for the festivities of Christmas and New Years days.

This pattern closely resembles what we observe in regular HTTP traffic. It’s clear that APIs are no longer just the realm of automated processes but are intricately linked with human behaviors and social trends.

Recommendations

There is no silver bullet for holistic API security. For the best effect, Cloudflare recommends four strategies for increasing API security posture:

Combine API application development, visibility, performance, and security with a unified control plane that can keep an up-to-date API inventory.
Use security tools that utilize machine learning technologies to free up human resources and reduce costs.
Adopt a positive security model for your APIs (see below for an explanation on positive and negative security models).
Measure and improve your organization’s API maturity level over time (also see below for an explanation of an API maturity level).

What do we mean by a ‘positive’ or ‘negative’ security model? In a negative model, security tools look for known signs of attack and take action to stop those attacks. In a positive model, security tools look for known good requests and only let those through, blocking all else. APIs are often so structured that positive security models make sense for the highest levels of security. You can also combine security models, such as using a WAF in a negative model sense, and using API Schema Validation in a positive model sense.

Here’s a quick way to gauge your organization’s API security maturity level over time: Novice organizations will get started by assembling their first API inventory, no matter how incomplete. More mature organizations will strive for API inventory accuracy and automatic updates. The most mature organizations will actively enforce security checks in a positive security model on their APIs, enforcing API schema, valid authentication, and checking behavior for signs of abuse.

Predictions

In closing, our top four predictions for 2024 and beyond:

Increased loss of control and complexity: we surveyed practitioners in the API Security and Management field and 73% responded that security requirements interfere with their productivity and innovation. Coupled with increasingly sprawling applications and inaccurate inventories, API risks and complexity will rise.

Easier access to AI leading to more API risks: the rise in generative AI brings potential risks, including AI models’ APIs being vulnerable to attack, but also developers shipping buggy, AI-written code. Forrester predicts that, in 2024, without proper guardrails, “at least three data breaches will be publicly blamed on insecure AI-generated code – either due to security flaws in the generated code itself or vulnerabilities in AI-suggested dependencies.”

Increase in business logic-based fraud attacks: professional fraudsters run their operations just like a business, and they have costs like any other. We anticipate attackers will run fraud bots efficiently against APIs even more than in previous years.

Growing governance: The first version of PCI DSS that directly addresses API security will go into effect in March 2024. Check your industry’s specific requirements with your audit department to be ready for requirements as they come into effect.

If you’re interested in the full report, you can download the 2024 API Security Report here, which includes full detail on our recommendations.

Cloudflare API Gateway is our API security solution, and it is available for all Enterprise customers. If you aren’t subscribed to API Gateway, click here to view your initial API Discovery results and start a trial in the Cloudflare dashboard. To learn how to use API Gateway to secure your traffic, click here to view our development docs and here for our getting started guide.

Protecting GraphQL APIs from malicious queries

John Cosgrove — Mon, 12 Jun 2023 13:00:08 GMT

Starting today, Cloudflare’s API Gateway can protect GraphQL APIs against malicious requests that may cause a denial of service to the origin. In particular, API Gateway will now protect against two of the most common GraphQL abuse vectors: deeply nested queries and queries that request more information than they should.

Typical RESTful HTTP APIs contain tens or hundreds of endpoints. GraphQL APIs differ by typically only providing a single endpoint for clients to communicate with and offering highly flexible queries that can return variable amounts of data. While GraphQL’s power and usefulness rests on the flexibility to query an API about only the specific data you need, that same flexibility adds an increased risk of abuse. Abusive requests to a single GraphQL API can place disproportional load on the origin, abuse the N+1 problem, or exploit a recursive relationship between data dimensions. In order to add GraphQL security features to API Gateway, we needed to obtain visibility inside the requests so that we could apply different security settings based on request parameters. To achieve that visibility, we built our own GraphQL query parser. Read on to learn about how we built the parser and the security features it enabled.

The power of GraphQL

Unlike a REST API, where the API’s users are limited to what data they can query and change on a per-endpoint basis, a GraphQL API offers users the ability to query and change any data they wish with an open-ended, yet structured request to a single endpoint. This open-endedness makes GraphQL APIs very powerful. Each user can query for a completely custom set of data and receive their custom response in a single HTTP request. Here are two example queries and their responses. These requests are typically sent via HTTP POST methods to an endpoint at /graphql.

# A query asking for multiple nested subfields of the "hero" object. This query has a depth level of 2.
{
  hero {
    name
    friends {
      name
    }
  }
}

# The corresponding response.
{
  "data": {
    "hero": {
      "name": "R2-D2",
      "friends": [
        {
          "name": "Luke Skywalker"
        },
        {
          "name": "Han Solo"
        },
        {
          "name": "Leia Organa"
        }
      ]
    }
  }
}

# A query asking for just one subfield on the same "hero" object. This query has a depth level of 1.
{
  hero {
    name
  }
}

# The corresponding response.
{
  "data": {
    "hero": {
      "name": "R2-D2"
    }
  }
}

These custom queries give GraphQL endpoints more flexibility than conventional REST endpoints. But this flexibility also means GraphQL APIs can be subject to very different load or security risks based on the requests that they are receiving. For example, an attacker can request the exact same, valid data as a benevolent user would, but exploit the data’s self-referencing structure and ask that an origin return hundreds of thousands of rows replicated over and over again. Let’s consider an example, in which we operate a petitioning platform where our data model contains petitions and signers objects. With GraphQL, an attacker can, in a single request, query for a single petition, then for all people who signed that petition, then for all petitions each of those people have signed, then for all people that signed any of those petitions, then for all petitions that… you see where this is going!

query {
 petition(ID: 123) {
   signers {
     nodes {
       petitions {
         nodes {
           signers {
             nodes {
               petitions {
                 nodes {
                    ...
                 }
               }
             }
           }
         }
       }
     }
   }
 }
}

A rate limit won’t protect against such an attack because the entire query fits into a single request.

So how can we secure GraphQL APIs? There is little agreement in the industry around what makes a GraphQL endpoint secure. For some, this means rejecting invalid queries. Normally, an invalid query refers to a query that would fail to compile by a GraphQL server and not cause any substantial load on the origin, but would still add noise and error logs and reduce operational visibility. For others, this means creating complexity-based rate limits or perhaps flagging broken object-level authorization. Still others want deeper visibility into query behavior and an ability to validate queries against a predefined schema.

When creating new features in API Gateway, we often start by providing deeper visibility for customers into their traffic behavior related to the feature in question. This way we create value from the large amount of data we see in the Cloudflare network, and can have conversations with customers where we ask: “Now that you have these data insights, what actions would you like to take with them?”. This process puts us in a good position to build a second, more actionable iteration of the feature.

We decided to follow the same process with GraphQL protection, with parsing GraphQL requests and gathering data as our first goal.

Parsing GraphQL quickly

As a starting point, we wanted to collect request query size and depth attributes. These attributes offer a surprising amount of insight into the query – if the query is requesting a single field at depth level 15, is it really innocuous or is it exploiting some recursive data relationship? if the query is asking for hundreds of fields at depth level 3, why wouldn’t it just ask for the entire object at level 2 instead?

To do this, we needed to parse queries without adding latency to incoming requests. We evaluated multiple open source GraphQL parsers and quickly realized that their performance would put us at the risk of adding hundreds of microseconds of latency to the request duration. Our goal was to have a p95 parsing time of under 50 microseconds. Additionally, the infrastructure we were planning to use to ship this functionality has a strict no-heap-allocation policy – this means that any memory allocated by a parser to process a request has to be amortized by being reused when parsing any subsequent requests. Parsing GraphQL in a no-allocation manner is not a fundamental technical requirement for us over the long-term, but it was a necessity if we wanted to build something quickly with confidence that the proof of concept will meet our performance expectations.

Meeting the latency and memory allocation constraints meant that we had to write a parser of our own. Building an entire abstract syntax tree of unpredictable structure requires allocating memory on the heap, and that’s what made conventional parsers unfit for our requirements. What if instead of building a tree, we processed the query in a streaming fashion, token by token? We realized that if we were to write our own GraphQL lexer that produces a list of GraphQL tokens (“comment”, “string”, “variable name”, “opening parenthesis”, etc.), we could use a number of heuristics to infer the query depth and size without actually building a tree or fully validating the query. Using this approach meant that we could deliver the new feature fast, both in engineering time and wall clock time – and, most importantly, visualize data insights for our customers.

To start, we needed to prepare GraphQL queries for parsing. Most of the time, GraphQL queries are delivered as HTTP POST requests with application/json or application/graphql Content-Type. Requests with application/graphql content type are easy to work with – they contain the raw query you can just parse. However, JSON-encoded queries present a challenge since JSON objects contain escaped characters – normally, any deserialization library will allocate new memory into which the raw string is copied with escape sequences removed, but we committed to allocating no memory, remember? So to parse GraphQL queries encoded in JSON fields, we used serde RawValue to locate the JSON field in which the escaped query was placed and then iterated over the constituent bytes one-by-one, feeding them into our tokenizer and removing escape sequences on the fly.

Once we had our query input ready, we built a simple Rust program that converts raw GraphQL input into a list of lexical tokens according to the GraphQL grammar. Tokenization is the first step in any parser – our insight was that this step was all we needed for what we wanted to achieve in the MVP.

mutation CreateMessage($input: MessageInput) {
    createMessage(input: $input) {
        id
    }
}

For example, the mutation operation above gets converted into the following list of tokens:

name
name
punctuator (
punctuator $
name
punctuator :
name
punctuator )
punctuator {
name
punctuator (
name
punctuator :
punctuator $
name
punctuator )
punctuator {
name
punctuator }
punctuator }

With this list of tokens available to us, we built our validation engine and added the ability to calculate query depth and size. Again, everything is done one-the-fly in a single pass. A limitation of this approach is that we can’t parse 100% of the requests – there are some syntactic features of GraphQL that we have to fail open on; however, a major advantage of this approach is its performance – in our initial trial run against a stream of 10s of thousands of requests per second, we achieved a p95 parsing time of 25 microseconds. This is a good starting point to collect some data and to prototype our first GraphQL security features.

Getting started

Today, any API Gateway customer can use the Cloudflare GraphQL API to retrieve information about depth and size of GraphQL queries we see for them on the edge.

As an example, we’ve run the analysis below visualizing over 400,000 data points for query sizes and depths for a production domain utilizing API Gateway.

First let’s look at query sizes in our sample:

It looks like queries almost never request more than 60 fields. Let’s also look at query depths:

It looks like queries are never more than seven levels deep.

These two insights can be converted into security rules: we added three new Wirefilter fields that API Gateway customers can use to protect their GraphQL endpoints:

1. cf.api_gateway.graphql.query_size
2. cf.api_gateway.graphql.query_depth
3. cf.api_gateway.graphql.parsed_successfully

For now, we recommend the use of cf.api_gateway.graphql.parsed_successfully in all rules. Rules created with the use of this field will be backwards compatible with future GraphQL protection releases.

If a customer feels that there is nothing out of the ordinary with the traffic sample and that it represents a meaningful amount of normal usage, they can manually create and deploy the following custom rule to log all queries that were parsed by Cloudflare and that look like outliers:

cf.api_gateway.graphql.parsed_successfully and
(cf.api_gateway.graphql.query_depth > 7 or 
cf.api_gateway.graphql.query_size > 60)

Learn more and run your own analysis with our documentation.

What’s next?

We are already receiving feedback from our first customers and are planning out the next iteration of this feature. These are the features we will build next:

Integrating GraphQL security with complexity-based rate limiting such that we automatically calculate query cost and let customers rate limit eyeballs based on the total query execution cost the eyeballs use during their entire session.
Allowing customers to configure specifically which endpoints GraphQL security features run on.
Creating data insights on the relationship between query complexity and the time it takes the customer origin to respond to the query.
Creating automatic GraphQL threshold recommendations based on historical trends.

If you’re an Enterprise customer that hasn't purchased API Gateway and you’re interested in protecting your GraphQL APIs today, you can get started by enabling the API Gateway trial inside the Cloudflare Dashboard or by contacting your account manager. Check out our documentation on the feature to get started once you have access.

Everything you might have missed during Security Week 2023

Reid Tatoris — Mon, 20 Mar 2023 13:00:00 GMT

Security Week 2023 is officially in the books. In our welcome post last Saturday, I talked about Cloudflare’s years-long evolution from protecting websites, to protecting applications, to protecting people. Our goal this week was to help our customers solve a broader range of problems, reduce external points of vulnerability, and make their jobs easier.

We announced 34 new tools and integrations that will do just that. Combined, these announcement will help you do five key things faster and easier:

Making it easier to deploy and manage Zero Trust everywhere
Reducing the number of third parties customers must use
Leverage machine learning to let humans focus on critical thinking
Opening up more proprietary Cloudflare threat intelligence to our customers
Making it harder for humans to make mistakes

And to help you respond to the most current attacks in real time, we reported on how we’re seeing scammers use the Silicon Valley Bank news to phish new victims, and what you can do to protect yourself.

In case you missed any of the announcements, take a look at the summary and navigation guide below.

Monday

Blog	Summary
Top phished brands and new phishing and brand protections	Today we have released insights from our global network on the top 50 brands used in phishing attacks coupled with the tools customers need to stay safer. Our new phishing and brand protection capabilities, part of Security Center, let customers better preserve brand trust by detecting and even blocking “confusable” and lookalike domains involved in phishing campaigns.
How to stay safe from phishing	Phishing attacks come in all sorts of ways to fool people. Email is definitely the most common, but there are others. Following up on our Top 50 brands in phishing attacks post, here are some tips to help you catch these scams before you fall for them.
Locking down your JavaScript: positive blocking with Page Shield policies	Page Shield now ensures only vetted and secure JavaScript is being executed by browsers to stop unwanted or malicious JavaScript from loading to keep end user data safer.
Cloudflare Aegis: dedicated IPs for Zero Trust migration	With Aegis, customers can now get dedicated IPs from Cloudflare we use to send them traffic. This allows customers to lock down services and applications at an IP level and build a protected environment that is application, protocol, and even IP-aware.
Mutual TLS now available for Workers	mTLS support for Workers allows for communication with resources that enforce an mTLS connection. mTLS provides greater security for those building on Workers so they can identify and authenticate both the client and the server helps protect sensitive data.
Using Cloudflare Access with CNI	We have introduced an innovative new approach to secure hosted applications via Cloudflare Access without the need for any installed software or custom code on application servers.

Tuesday

Blog	Summary
No hassle migration from Zscaler to Cloudflare One with The Descaler Program	Cloudflare is excited to launch the Descaler Program, a frictionless path to migrate existing Zscaler customers to Cloudflare One. With this announcement, Cloudflare is making it even easier for enterprise customers to make the switch to a faster, simpler, and more agile foundation for security and network transformation.
The state of application security in 2023	For Security Week 2023, we are providing updated insights and trends related to mitigated traffic, bot and API traffic, and account takeover attacks.
Adding Zero Trust signals to Sumo Logic for better security insights	Today we’re excited to announce the expansion of support for automated normalization and correlation of Zero Trust logs for Logpush in Sumo Logic’s Cloud SIEM. Joint customers will reduce alert fatigue and accelerate the triage process by converging security and network data into high-fidelity insights.
Cloudflare One DLP integrates with Microsoft Information Protection labels	Cloudflare One now offers Data Loss Prevention (DLP) detections for Microsoft Purview Information Protection labels. This extends the power of Microsoft’s labels to any of your corporate traffic in just a few clicks.
Scan and secure Atlassian with Cloudflare CASB	We are unveiling two new integrations for Cloudflare CASB: one for Atlassian Confluence and the other for Atlassian Jira. Security teams can begin scanning for Atlassian- and Confluence-specific security issues that may be leaving sensitive corporate data at risk.
Zero Trust security with Ping Identity and Cloudflare Access	Cloudflare Access and Ping Identity offer a powerful solution for organizations looking to implement Zero Trust security controls to protect their applications and data. Cloudflare is now offering full integration support, so Ping Identity customers can easily integrate their identity management solutions with Cloudflare Access to provide a comprehensive security solution for their applications

Wednesday

Blog	Summary
Announcing Cloudflare Fraud Detection	We are excited to announce Cloudflare Fraud Detection that will provide precise, easy to use tools that can be deployed in seconds to detect and categorize fraud such as fake account creation or card testing and fraudulent transactions. Fraud Detection will be in early access later this year, those interested can sign up here.
Automatically discovering API endpoints and generating schemas using machine learning	Customers can use these new features to enforce a positive security model on their API endpoints even if they have little-to-no information about their existing APIs today.
Detecting API abuse automatically using sequence analysis	With our new Cloudflare Sequence Analytics for APIs, organizations can view the most important sequences of API requests to their endpoints to better understand potential abuse and where to apply protections first.
Using the power of Cloudflare’s global network to detect malicious domains using machine learning	Read our post on how we keep users and organizations safer with machine learning models that detect attackers attempting to evade detection with DNS tunneling and domain generation algorithms.
Announcing WAF Attack Score Lite and Security Analytics for business customers	We are making the machine learning empowered WAF and Security analytics view available to our Business plan customers, to help detect and stop attacks before they are known.
Analyze any URL safely using the Cloudflare Radar URL Scanner	We have made Cloudflare Radar’s newest free tool available, URL Scanner, providing an under-the-hood look at any webpage to make the Internet more transparent and secure for all.

Thursday

Blog	Summary
Post-quantum crypto should be free, so we’re including it for free, forever	One of our core beliefs is that privacy is a human right. To achieve that right, we are announcing that our implementations of post-quantum cryptography will be available to everyone, free of charge, forever.
No, AI did not break post-quantum cryptography	The recent news reports of AI cracking post-quantum cryptography are greatly exaggerated. In this blog, we take a deep dive into the world of side-channel attacks and how AI has been used for more than a decade already to aid it.
Super Bot Fight Mode is now configurable	We are making Super Bot Fight Mode even more configurable with new flexibility to allow legitimate, automated traffic to access their site.
How Cloudflare and IBM partner to help build a better Internet	IBM and Cloudflare continue to partner together to help customers meet the unique security, performance, resiliency and compliance needs of their customers through the addition of exciting new product and service offerings.
Protect your key server with Keyless SSL and Cloudflare Tunnel integration	Customers will now be able to use our Cloudflare Tunnels product to send traffic to the key server through a secure channel, without publicly exposing it to the rest of the Internet.

Friday

Blog	Summary
Stop Brand Impersonation with Cloudflare DMARC Management	Brand impersonation continues to be a big problem globally. Setting SPF, DKIM and DMARC policies is a great way to reduce that risk, and protect your domains from being used in spoofing emails. But maintaining a correct SPF configuration can be very costly and time consuming, and that’s why we’re launching Cloudflare DMARC Management.
How we built DMARC Management using Cloudflare Workers	At Cloudflare, we use the Workers platform and our product stack to build new services. Read how we made the new DMARC Management solution entirely on top of our APIs.
Cloudflare partners with KnowBe4 to equip organizations with real-time security coaching to avoid phishing attacks	Cloudflare’s cloud email security solution now integrates with KnowBe4, allowing mutual customers to offer real-time coaching to employees when a phishing campaign is detected by Cloudflare.
Introducing custom pages for Cloudflare Access	We are excited to announce new options to customize user experience in Access, including customizable pages including login, blocks and the application launcher.
Cloudflare Access is the fastest Zero Trust proxy	Cloudflare Access is 75% faster than Netskope and 50% faster than Zscaler, and our network is faster than other providers in 48% of last mile networks.

Saturday

Blog	Summary
One-click ISO 27001 certified deployment of Regional Services in the EU	Cloudflare announces one-click ISO certified region, a super easy way for customers to limit where traffic is serviced to ISO 27001 certified data centers inside the European Union.
Account level Security Analytics and Security Events: better visibility and control over all account zones at once	All WAF customers will benefit fromAccount Security Analytics and Events. This allows organizations to new eyes on your account in Cloudflare dashboard to give holistic visibility. No matter how many zones you manage, they are all there!
Wildcard and multi-hostname support in Cloudflare Access	We are thrilled to announce the full support of wildcard and multi-hostname application definitions in Cloudflare Access. Until now, Access had limitations that restricted it to a single hostname or a limited set of wildcards

Watch our Security Week sessions on Cloudflare TV

Watch all of the Cloudflare TV segments here.

What’s next?

While that’s it for Security Week 2023, you all know by now that Innovation weeks never end for Cloudflare. Stay tuned for a week full of new developer tools coming soon, and a week dedicated to making the Internet faster later in the year.

Detecting API abuse automatically using sequence analysis

John Cosgrove — Wed, 15 Mar 2023 13:00:00 GMT

Today, we're announcing Cloudflare Sequence Analytics for APIs. Using Sequence Analytics, Customers subscribed to API Gateway can view the most important sequences of API requests to their endpoints. This new feature helps customers to apply protection to the most important endpoints first.

What is a sequence? It is simply a time-ordered list of HTTP API requests made by a specific visitor as they browse a website, use a mobile app, or interact with a B2B partner via API. For example, a portion of a sequence made during a bank funds transfer could look like:

Order	Method	Path	Description
1	GET	/api/v1/users/{user_id}/accounts	user_id is the active user
2	GET	/api/v1/accounts/{account_id}/balance	account_id is one of the user’s accounts
3	GET	/api/v1/accounts/{account_id}/balance	account_id is a different account belonging to the user
4	POST	/api/v1/transferFunds	Containing a request body detailing an account to transfer funds from, an account to transfer funds to, and an amount of money to transfer

Why is it important to pay attention to sequences for API security? If the above API received requests for POST /api/v1/transferFunds without any of the prior requests, it would seem suspicious. Think about it: how would the API client know what the relevant account IDs are without listing them for the user? How would the API client know how much money is available to transfer? While this example may be obvious, the sheer number of API requests to any given production API can make it hard for human analysts to spot suspicious usage.

In security, one approach to defending against an untold number of threats that are impossible to screen by a team of humans is to create a positive security model. Instead of trying to block everything that could potentially be a threat, you allow all known good or benign traffic and block everything else by default.

Customers could already create positive security models with API Gateway in two main areas: volumetric abuse protection and schema validation. Sequences will form the third pillar of a positive security model for API traffic. API Gateway will be able to enforce the precedence of endpoints in any given API sequence. By establishing precedence within an API sequence, API Gateway will log or block any traffic that doesn’t match expectations, reducing abusive traffic.

Detecting abuse by sequence

When attackers attempt to exfiltrate data in an abusive way, they rarely follow the patterns of expected API traffic. Attacks often use special software to ‘fuzz’ the API, sending several requests with different request parameters hoping to find unexpected responses from the API indicating opportunities to exfiltrate data. Attackers can also manually send requests to APIs that attempt to trick the API in performing unauthorized actions, like granting an attacker elevated privileges or access to data through a Broken Object Level Authentication attack. Protecting APIs with rate limits is a common best practice; however, in both of the above examples attackers may deliberately execute request sequences slowly, in an attempt to thwart volumetric abuse detection.

Think of the sequence of requests above again, but this time imagine an attacker copying the legitimate funds transfer request and modifying the request payload in an attempt to trick the system:

Order	Method	Path	Description
1	GET	/api/v1/users/{user_id}/accounts	user_id is the active user
2	GET	/api/v1/accounts/{account_id}/balance	account_id is one of the user’s accounts
3	GET	/api/v1/accounts/{account_id}/balance	account_id is a different account belonging to the user
4	POST	/api/v1/transferFunds	Containing a request body detailing an account to transfer funds from, an account to transfer funds to, and an amount of money to transfer
… attacker copies the request to a debugging tool like Postman …
5	POST	/api/v1/transferFunds	Attacker has modified the POST body to try and trick the API
6	POST	/api/v1/transferFunds	A further modified POST body to try and trick the API
7	POST	/api/v1/transferFunds	Another, further modified POST body to try and trick the API

If the customer knew beforehand that the funds transfer endpoint was critical to protect and only occurred once during a sequence, they could write a rule to ensure that it was never called twice in a row and a GET /balance always preceded a POST /transferFunds. But without prior knowledge of which endpoint sequences are critical to protect, how would the customer know which rules to define? A low rate limit is too risky, since an API user might legitimately have a few funds transfer requests to perform in a short amount of time. In the present reality there are few tools to prevent this type of abuse, and most customers are left with reactive efforts to clean up abuse with their application teams and fraud departments after it’s happened.

Ultimately, we believe that providing our customers with the ability to define positive security models on API request sequences requires a three-pronged approach:

Sequence Analytics: Determining which sequences of API requests occurred and when, as well as summarizing the data into readily understandable form.
Sequence Abuse Detection: Identifying which sequences of API requests are likely of benign or malicious origin.
Sequence Mitigation: Identifying relevant rules on sequences of API requests for deciding which traffic to allow or block.

Challenges of sequence creation

Sequence Analytics presents some difficult technical challenges, because sessions may be long-lived and may consist of many requests. As a result, it is not sufficient to define sequences by session identifier alone. Instead, it was necessary for us to develop a solution capable of automatically identifying multiple sequences which occur within a given session. Additionally, since important sequences are not necessarily characterized by volume alone and the set of possible sequences is large, it was necessary to develop a solution capable of identifying important sequences, as opposed to simply surfacing frequent sequences.

To help illustrate these challenges for the example of api.cloudflare.com, we can group API requests by session and plot the number of distinct sequences versus sequence length:

The plot is based on a one hour snapshot comprising approximately 88,000 sessions and 260 million API requests, with 301 distinct API endpoints. We process the data by applying a fixed-length sliding window to each session, then we count the total number of different fixed-length sequences (‘n-grams’) that we observe as a result of applying the sliding window. The plot displays results for a window size (‘n-gram length’) varying between 1 and 10 requests. The number of distinct sequences ranges from 301 (for a window size of 1 request) to approximately 780,000 (for a window size of 10 requests). Based on the plot, we observe a large number of possible sequences which grows with sequence length: As we increase the sliding window size, we see an increasingly large amount of different sequences in the sample. The smooth trend can be explained by the fact that we apply a sliding window (sessions may themselves contain many sequences) in combination with many long sessions relative to the sequence length.

Given the large number of possible sequences, trying to find abusive sequences is a ‘needles in a haystack’ situation.

Introducing Sequence Analytics

Here is a screenshot from the API Gateway dashboard highlighting Sequence Analytics:

Let’s break down the new functionality seen in the screenshot.

API Gateway intelligently determines sequences of requests made by your API consumers using the methods described earlier in this article. API Gateway scores sequences by a metric we call Correlation Score. Sequence Analytics displays the top 20 sequences by highest correlation score, and we refer to these as your most important sequences. High-importance sequences contain API requests which are likely to occur together in order.

You should inspect each of your sequences to understand their correlation scores. High correlation score sequences may consist of rarely used endpoints (potentially anomalous user behavior) as well as commonly used endpoints (likely benign user behavior). Since the endpoints found in these sequences commonly occur together, they represent true usage patterns of your API. You should apply all possible API Gateway protections to these endpoints (rate limiting suggestions, Schema Validation, JWT Validation, and mTLS) and check their specific endpoint order with your development team.

We know customers want to explicitly set allowable behavior on their APIs beyond the active protections offered by API Gateway today. Coming soon, we’re releasing sequence precedence rules and enabling the ability to block requests based on those rules. The new sequence precedence rules will allow customers to specify the exact order of allowable API requests, bringing yet another way of establishing a positive security model to protect your API against unknown threats.

How to get started

All API Gateway customers now have access to Sequence Analytics. Navigate to a zone in the Cloudflare dashboard, then click the Security tab > API Gateway tab > Sequences tab. You’ll see the most important sequences that your API consumers request.

Enterprise customers that haven’t purchased API Gateway can get started by enabling the API Gateway trial inside the Cloudflare Dashboard or contacting their account manager.

What’s next

Sequence-based detection is a powerful and unique capability that unlocks many new opportunities to identify and stop attacks. As we fine-tune the methods of identifying these sequences and shipping them to our global network, we will release custom sequence matching and real-time mitigation features at a future date. We will also ensure you have the actionable intelligence to take back to your team on who the API users were that attempted to request sequences that don’t match your policy.

Automatically discovering API endpoints and generating schemas using machine learning

John Cosgrove — Wed, 15 Mar 2023 13:00:00 GMT

Cloudflare now automatically discovers all API endpoints and learns API schemas for all of our API Gateway customers. Customers can use these new features to enforce a positive security model on their API endpoints even if they have little-to-no information about their existing APIs today.

The first step in securing your APIs is knowing your API hostnames and endpoints. We often hear that customers are forced to start their API cataloging and management efforts with something along the lines of “we email around a spreadsheet and ask developers to list all their endpoints”.

Can you imagine the problems with this approach? Maybe you have seen them first hand. The “email and ask” approach creates a point-in-time inventory that is likely to change with the next code release. It relies on tribal knowledge that may disappear with people leaving the organization. Last but not least, it is susceptible to human error.

Even if you had an accurate API inventory collected by group effort, validating that API was being used as intended by enforcing an API schema would require even more collective knowledge to build that schema. Now, API Gateway’s new API Discovery and Schema Learning features combine to automatically protect APIs across the Cloudflare global network and remove the need for manual API discovery and schema building.

API Gateway discovers and protects APIs

API Gateway discovers APIs through a feature called API Discovery. Previously, API Discovery used customer-specific session identifiers (HTTP headers or cookies) to identify API endpoints and display their analytics to our customers.

Doing discovery in this way worked, but it presented three drawbacks:

Customers had to know which header or cookie they used in order to delineate sessions. While session identifiers are common, finding the proper token to use can take time.
Needing a session identifier for API Discovery precluded us from monitoring and reporting on completely unauthenticated APIs. Customers today still want visibility into session-less traffic to ensure all API endpoints are documented and that abuse is at a minimum.
Once the session identifier was input into the dashboard, customers had to wait up to 24 hours for the Discovery process to complete. Nobody likes to wait.

While this approach had drawbacks, we knew we could quickly deliver value to customers by starting with a session-based product. As we gained customers and passed more traffic through the system, we knew our new labeled data would be extremely useful to further build out our product. If we could train a machine learning model with our existing API metadata and the new labeled data, we would no longer need a session identifier to pinpoint which endpoints were for APIs. So we decided to build this new approach.

We took what we learned from the session identifier-based data and built a machine learning model to uncover all API traffic to a domain, regardless of session identifier. With our new Machine Learning-based API Discovery, Cloudflare continually discovers all API traffic routed through our network without any prerequisite customer input. With this release, API Gateway customers will be able to get started with API Discovery faster than ever, and they’ll uncover unauthenticated APIs that they could not discover before.

Session identifiers are still important to API Gateway, as they form the basis of our volumetric abuse prevention rate limits as well as our Sequence Analytics. See more about how the new approach performs in the “How it works” section below.

API Protection starting from nothing

Now that you’ve found new APIs using API Discovery, how do you protect them? To defend against attacks, API developers must know exactly how they expect their APIs to be used. Luckily, developers can programmatically generate an API schema file which codifies acceptable input to an API and upload that into API Gateway’s Schema Validation.

However, we already talked about how many customers can't find their APIs as fast as their developers build them. When they do find APIs, it’s very difficult to accurately build a unique OpenAPI schema for each of potentially hundreds of API endpoints, given that security teams seldom see more than the HTTP request method and path in their logs.

When we looked at API Gateway’s usage patterns, we saw that customers would discover APIs but almost never enforce a schema. When we ask them ‘why not?’ the answer was simple: “Even when I know an API exists, it takes so much time to track down who owns each API so that they can provide a schema. I have trouble prioritizing those tasks higher than other must-do security items.” The lack of time and expertise was the biggest gap in our customers enabling protections.

So we decided to close that gap. We found that the same learning process we used to discover API endpoints could then be applied to endpoints once they were discovered in order to automatically learn a schema. Using this method we can now generate an OpenAPI formatted schema for every single endpoint we discover, in real time. We call this new feature Schema Learning. Customers can then upload that Cloudflare-generated schema into Schema Validation to enforce a positive security model.

How it works

Machine learning-based API discovery

With RESTful APIs, requests are made up of different HTTP methods and paths. Take for example the Cloudflare API. You’ll notice a common trend with the paths that might make requests to this API stand out amongst requests to this blog: API requests all start with /client/v4 and continue with the service name, a unique identifier, and sometimes service feature names and further identifiers.

How could we easily identify API requests? At first glance, these requests seem easy to programmatically discover with a heuristic like “path starts with /client”, but the core of our new Discovery contains a machine-learned model that powers a classifier that scores HTTP transactions. If API paths are so structured, why does one need machine-learning for this and can’t one just use some simple heuristic?

The answer boils down to the question: what actually constitutes an API request and how does it differ from a non-API request? Let’s look at two examples.

Like the Cloudflare API, many of our customers’ APIs follow patterns such as prefixing the path of their API request with an “api” identifier and a version, for example: /api/v2/user/7f577081-7003-451e-9abe-eb2e8a0f103d.

So just looking for “api” or a version in the path is already a pretty good heuristic that tells us this is very likely part of an API, but it is unfortunately not always as easy.

Let’s consider two further examples, /users/7f577081-7003-451e-9abe-eb2e8a0f103d.jpg and /users/7f577081-7003-451e-9abe-eb2e8a0f103d, both just differ in a .jpg extension. The first path could just be a static resource like the thumbnail of a user. The second path does not give us a lot of clues just from the path alone.

Manually crafting such heuristics quickly becomes difficult. While humans are great at finding patterns, building heuristics is challenging considering the scale of the data that Cloudflare sees each day. As such, we use machine learning to automatically derive these heuristics such that we know that they are reproducible and adhere to a certain accuracy.

Input to the training are features of HTTP request/response samples such as the content-type or file extension that we collected through the session identifiers-based Discovery mentioned earlier. Unfortunately, not everything that we have in this data is clearly an API. Additionally, we also need samples that represent non-API traffic. As such, we started out with the session-identifier Discovery data, manually cleaned it up and derived further samples of non-API traffic. We took great care in trying to not overfit the model to the data. That is, we want that the model generalizes beyond the training data.

To train the model, we’ve used the CatBoost library for which we already have a good chunk of expertise as it also powers our Bot Management ML-models. In a simplification, one can regard the resulting model as a flow chart that tells us which conditions we should check after another, for example: if the path contains “api” then also check if there is no file extension and so forth. At the end of this flowchart is a score that tells us the likelihood that a HTTP transaction belongs to an API.

Given the trained model, we can thus input features of HTTP request/responses that run through the Cloudflare network and calculate the likelihood that this HTTP transaction belongs to an API or not. Feature extraction and model scoring is done in Rust and takes only a couple of microseconds on our global network. Since Discovery sources data from our powerful data pipeline, it is not actually necessary to score each transaction. We can reduce the load on our servers by only scoring those transactions that we know will end up in our data pipeline to begin with thus saving CPU time and allowing the feature to be cost effective.

With the classification results in our data pipeline, we can use the same API Discovery mechanism that we’ve been using for the session identifier-based discovery. This existing system works great and allows us to reuse code efficiently. It also aided us when comparing our results with the session identifier-based Discovery, as the systems are directly comparable.

For API Discovery results to be useful, Discovery’s first task is to simplify the unique paths we see into variables. We’ve talked about this before. It is not trivial to deduce the various different identifier schemes that we see across the global network, especially when sites use custom identifiers beyond a straightforward GUID or integer format. API Discovery aptly normalizes paths containing variables with the help of a few different variable classifiers and supervised learning.

Only after normalizing paths are the Discovery results ready for our users to use in a straightforward fashion.

The results: hundreds of found endpoints per customer

So, how does ML Discovery compare to the session identifier-based Discovery which relies on headers or cookies to tag API traffic?

Our expectation is that it detects a very similar set of endpoints. However, in our data we knew there would be two gaps. First, we sometimes see that customers are not able to cleanly dissect only API traffic using session identifiers. When this happens, Discovery surfaces non-API traffic. Second, since we required session identifiers in the first version of API Discovery, endpoints that are not part of a session (e.g. login endpoints or unauthenticated endpoints) were conceptually not discoverable.

The following graph shows a histogram of the number of endpoints detected on customer domains for both discovery variants.

From a bird's eye perspective, the results look very similar, which is a good indicator that ML Discovery performs as it is supposed to. There are some differences already visible in this plot, which is also expected since we’ll also discover endpoints that are conceptually not discoverable with just a session identifier. In fact, if we take a closer look at a domain-by-domain comparison we see that there is no change for roughly ~46% of the domains. The next graph compares the difference (by percent of endpoints) between session-based and ML-based discovery:

For ~15% of the domains, we see an increase in endpoints between 1 and 50, and for ~9%, we see a similar reduction. For ~28% of the domains, we find more than 50 additional endpoints.

These results highlight that ML Discovery is able to surface additional endpoints that have previously been flying under the radar, and thus expands the set tools API Gateway offers to help bring order to your API landscape.

On-the-fly API protection through API schema learning

With API Discovery taken care of, how can a practitioner protect the newly discovered endpoints? We already looked at the API request metadata, so now let’s look at the API request body. The compilation of all expected formats for all API endpoints of an API is known as an API schema. API Gateway’s Schema Validation is a great way to protect against OWASP Top 10 API attacks, ensuring the body, path, and query string of a request contains the expected information for that API endpoint in an expected format. But what if you don’t know the expected format?

Even if the schema of a specific API is not known to a customer, the clients using this API will have been programmed to mostly send requests that conform to this unknown schema (or they would not be able to successfully query the endpoint). Schema Learning makes use of this fact and will look at successful requests to this API to reconstruct the input schema automatically for the customer. As an example, an API might expect the user-ID parameter in a request to have the form id12345-a. Even if this expectation is not explicitly stated, clients that want to have a successful interaction with the API will send user-IDs in this format.

Schema Learning first identifies all recent successful requests to an API-endpoint, and then parses the different input parameters for each request according to their position and type. After parsing all requests, Schema Learning looks at the different input values for each position and identifies which characteristics they have in common. After verifying that all observed requests share these commonalities, Schema Learning creates an input schema that restricts input to comply with these commonalities and that can directly be used for Schema Validation.

To allow for more accurate input schemas, Schema Learning identifies when a parameter can receive different types of input. Let's say you wanted to write an OpenAPIv3 schema file and manually observe in a small sample of requests that a query parameter is a unix timestamp. You write an API schema that forces that query parameter to be an integer greater than the start of last year's unix epoch. If your API also allowed that parameter in ISO 8601 format, your new rule would create false positives when the differently formatted (yet valid) parameter hit the API. Schema Learning automatically does all this heavy lifting for you and catches what manual inspection can't.

To prevent false positives, Schema Learning performs a statistical test on the distribution of these values and only writes the schema when the distribution is bounded with high confidence.

So how well does it work? Below are some statistics about the parameter types and values we see:

Parameter learning classifies slightly more than half of all parameters as strings, followed by integers which make up almost a third. The remaining 17% are made up of arrays, booleans, and number (float) parameters, while object parameters are seen more rarely in the path and query.

The number of parameters in the path is usually very low, with 94% of all endpoints seeing at most one parameter in their path.

For the query, we do see a lot more parameters, sometimes reaching 50 different parameters for one endpoint!

Parameter learning is able to estimate numeric constraints with 99.9% confidence for the majority of parameters observed. These constraints can either be a maximum/minimum on the value, length, or size of the parameter, or a limited set of unique values that a parameter has to take.

Protect your APIs in minutes

Starting today, all API Gateway customers can now discover and protect APIs in just a few clicks, even if you're starting with no previous information. In the Cloudflare dash, click into API Gateway and on to the Discovery tab to observe your discovered endpoints. These endpoints will be immediately available with no action required from you. Then, add relevant endpoints from Discovery into Endpoint Management. Schema Learning runs automatically for all endpoints added to Endpoint Management. After 24 hours, export your learned schema and upload it into Schema Validation.

Enterprise customers that haven’t purchased API Gateway can get started by enabling the API Gateway trial inside the Cloudflare Dashboard or contacting their account manager.

What’s next

We plan to enhance Schema Learning by supporting more learned parameters in more formats, like POST body parameters with both JSON and URL-encoded formats as well as header and cookie schemas. In the future, Schema Learning will also notify customers when it detects changes in the identified API schema and present a refreshed schema.

We’d like to hear your feedback on these new features. Please direct your feedback to your account team so that we can prioritize the right areas of improvement. We look forward to hearing from you!

Welcome to Security Week 2023

Reid Tatoris — Sun, 12 Mar 2023 17:00:00 GMT

Last month I had the chance to attend a dinner with 56 CISOs and CSOs across a range of banking, gaming, ecommerce, and retail companies. We rotated between tables of eight people and talked about the biggest challenges those in the group were facing, and what they were most worried about around the corner. We talk to customers every day at Cloudflare, but this was a unique opportunity to listen to customers (and non-customers) talk to each other. It was a fascinating evening and a few things stood out.

The common thread that dominated the discussions was “how do I convince my business and product teams to do the things I want them to”. Surprisingly little time was spent on specific technical challenges. No one brought up a concern about recent advanced mage cart skimmers, or about protecting their new GraphQL APIs, or how to secure two different cloud vendors at once, or about the size of DDoS attacks consistently getting larger. Over and over again the conversation came back to struggles with getting humans to do the secure thing, or to not do the insecure thing.

This instantly brought to mind a major phishing attack that Cloudflare was able to thwart last August. The attack was extremely sophisticated, using targeted text messages and an extremely professional impersonation of our Okta login page. Cloudflare did have individual employees fall for the phishing messages, because we are made up of a team of humans who are human. But we were able to thwart the attack through our own use of Cloudflare One products, and physical security keys issued to every employee that are required to access all our applications. The attacker was able to obtain compromised username and password credentials, but they could not get past the hard key requirement to log in. In 2023 phishing attacks are only getting more frequent.

Today's security challenges are often a case of having the right tools deployed to prevent people from making mistakes. Last year when we kicked off Security Week, we talked about making a shift from protecting websites, to protecting applications. Today, the shift is from protecting applications, to protecting employees, and making sure they are protected everywhere. Just a few weeks ago, the White House released a new national cybersecurity strategy directing all agencies to “implement multi-factor authentication, gain visibility into their entire attack surface, manage authorization and access, and adopt cloud security tools”. Over the next six days you’ll read more than 30 announcements that will make it as easy as possible to do just that.

Welcome to Security Week 2023.

“The more tools you use the less secure you are”

This was a direct quote from the CISO of a large online gaming platform. Adding more vendors might seem like you are adding layers of security, but you do also open up avenues for risk. First, every third party you add by definition adds another potential vulnerability. The recent LastPass breach is a perfect example. Attackers gained access to a cloud storage service, which gave them information they used in a secondary attack to phish an employee. Second, more tools means more complexity. More systems to log into, more dashboards to check. If information is spread across multiple systems you are more likely to miss important changes. Third, the more tools you use, the less likely it is that anyone is able to master them all. If you need the person who knows the application security tool, and the person who knows the SIEM, and the person who knows the access tool to coordinate on every potential vulnerability, things will get lost in translation. Complexity is the enemy of security. Fourth, adding more tools can add a false sense of security. Simply adding a new tool can give the impression you’ve added defense in depth. But that tool only adds protection if it works, if it's configured properly, and if people actually use it.

This week, you will hear about all of the initiatives we’ve been working on to help you solve this problem. We will announce multiple integrations that make it easier for you to deploy and manage Zero Trust anywhere, across multiple platforms, but all within the Cloudflare dashboard. We’re also extending our proven detection capabilities into new areas that will help you solve problems you couldn’t solve before, and thus allow you to get rid of additional vendors. And we’ll announce a brand new migration tool that makes it dead simple to move from those other vendors to Cloudflare.

Leverage machine learning to let humans focus on critical thinking

We all hear machine learning thrown around as a buzzword too often, but it boils down to this: computers are really good at finding patterns. When we train them on what a good pattern looks like, they can spot them really well, and spot the outliers. Humans are great at finding patterns too. But it takes us a long time, and any time we spend finding patterns distracts us from the thing that even the best AI or ML model still can’t do: critical thinking. By using machine learning to find these good and bad patterns, you can optimize the time of your most valuable people. Rather than searching for exceptions, they can focus on only those exceptions, and use their wisdom to make the hard decisions about what to do next.

Cloudflare has used machine learning to catch DDoS attacks, malicious bots, and malicious web traffic. We were able to do this differently from others because we built a unique network where we run all of our code at every single data center, on every single machine. Since we have a massive global network that is close to end users, we can run machine learning close to those users, unlike competitors who have to use centralized data centers. The result is a machine learning pipeline that runs inference in a few microseconds. That unique speed is an advantage for our customers, one we now use to run inference more than 40 million times every second.

This week, we have an entire day focused on how we are using that machine learning pipeline to build new models that will allow you to find new patterns, like fraud and API endpoints.

Our intelligence is your intelligence

In June we announced Cloudforce One, the first step in our threat operations team dedicated to turning the intelligence we gather from handling nearly 20% of Internet traffic into actionable insights. Since that launch, we’ve heard customers ask us to do more with those insights and give them easy buttons and products to take the appropriate action on their behalf. This week you’ll read multiple announcements on new ways that you can view and take action on unique Cloudflare threat intelligence. We’ll also be announcing multiple new reporting views, like being able to view more data at an account level so you can have one single lens into security trends across your entire organization.

Make it harder for humans to make mistakes

Each product, development, or business team wants to use their own tools, and wants to move as quickly as possible. For good reason! Any security that comes after the fact, and creates additional work for those teams, will be difficult to get internal buy on for. Which can lead to situations like the recent T-mobile hack where an API that was not intended to be public was exposed, discovered, and exploited. You need to meet teams where they are by making the tools they already use more secure, and preventing them from making mistakes, rather than giving them additional tasks.

In addition to making it easier to deploy our Application Security and Zero Trust products to a wider scope, you’ll also read about how we are adding new features that prevent humans from making the mistakes they always do. You’ll hear about how you can make it impossible to click on a phishing link by automatically blocking the domains that host them, prevent data from leaving regions it should never leave, give your users security alerts directly in the tools they already use, and automatically detect shadow APIs without making your developers change their development process. All of this without having to convince internal teams to make any changes to their behavior.

If you’re reading this and any part of your job involves securing an organization, I think that by the end of the week we’ll have made your job easier. With the new tools and integrations we release, you’ll be able to protect more of your infrastructure from a wider range of threats, but reduce the number of third parties you rely on. More importantly, you’ll be able to reduce the number of mistakes that the incredible humans you work with can make. I hope that helps you rest a bit easier!

API Endpoint Management and Metrics are now GA

Jin-Hee Lee — Thu, 22 Sep 2022 13:00:00 GMT

The Internet is an endless flow of conversations between computers. These conversations, the constant exchange of information from one computer to another, are what allow us to interact with the Internet as we know it. Application Programming Interfaces (APIs) are the vital channels that carry these conversations, and their usage is quickly growing: in fact, more than half of the traffic handled by Cloudflare is for APIs, and this is increasing twice as fast as traditional web traffic.

In March, we announced that we’re expanding our API Shield into a full API Gateway to make it easy for our customers to protect and manage those conversations. We already offer several features that allow you to secure your endpoints, but there’s more to endpoints than their security. It can be difficult to keep track of many endpoints over time and understand how they’re performing. Customers deserve to see what’s going on with their API-driven domains and have the ability to manage their endpoints.

Today, we’re excited to announce that the ability to save, update, and monitor the performance of all your API endpoints is now generally available to API Shield customers. This includes key performance metrics like latency, error rate, and response size that give you insights into the overall health of your API endpoints.

A Refresher on APIs

The bar for what we expect an application to do for us has risen tremendously over the past few years. When we open a browser, app, or IoT device, we expect to be able to connect to data instantly, compare dozens of flights within seconds, choose a menu item from a food delivery app, or see the weather for ten locations at once.

How are applications able to provide this kind of dynamic engagement for their users? They rely on APIs, which provide access to data and services—either from the application developer or from another company. APIs are fundamental in how computers (or services) talk to each other and exchange information.

You can think of an API as a waiter: say a customer orders a delicious bowl of Mac n Cheese. The waiter accepts this order from the customer, communicates the request to the chef in a format the chef can understand, and then delivers the Mac n Cheese back to the customer (assuming the chef has the ingredients in stock). The waiter is the crucial channel of communication, which is exactly what the API does.

Managing API Endpoints

The first step in managing APIs is to get a complete list of all the endpoints exposed to the internet. API Discovery automatically does this for any traffic flowing through Cloudflare. Undiscovered APIs can’t be monitored by security teams (since they don't know about them) and they're thus less likely to have proper security policies and best practices applied. However, customers have told us they also want the ability to manually add and manage APIs that are not yet deployed, or they want to ignore certain endpoints (for example those in the process of deprecation). Now, API Shield customers can choose to save endpoints found by Discovery or manually add endpoints to API Shield.

But security vulnerabilities aren’t the only risk or area of concern with APIs – they can be painfully slow or connections can be unsuccessful. We heard questions from our customers such as: what are my most popular endpoints? Is this endpoint significantly slower than it was yesterday? Are any endpoints returning errors that may indicate a problem with the application?

That’s why we built Performance Metrics into API Shield, which allows our customers to quickly answer these questions themselves with real-time data.

Prioritizing Performance

Once you’ve discovered, saved, or removed endpoints, you want to know what’s going well and what’s not. To end-users, a huge part of what defines the experience as “going well” is good performance. Poor performance can lead to a frustrating experience: when you’re shopping online and press a button to check out, you don’t want to wait around for minutes for the page to load. And you certainly never want to see a dreaded error symbol telling you that you can’t get what you came for.

Exposing performance metrics of API endpoints puts concrete numerical data into your developers’ hands to tell you how things are going. When things are going poorly, these dashboard metrics will point out exactly which aspect of performance is causing concern: maybe you expected to see a spike in requests, but find out that request count is normal and latency is just higher than usual.

Empowering our customers to make data-driven decisions to better manage their APIs ends up being a win for our customers and our customers’ customers, who expect to seamlessly engage with the domain’s APIs and get exactly what they came for.

Management and Performance Metrics in the Dashboard

So, what’s available today? Log onto your Cloudflare dashboard, go to the domain-level Security tab, and open up the API Shield page. Here, you’ll see the Endpoint Management tab, which shows you all the API endpoints that you’ve saved, alongside placeholders for metrics that will soon be gathered.

Here you can easily delete endpoints you no longer want to track, or click manually add additional endpoints. You can also export schemas for each host to share internally or externally.

Once you’ve saved the endpoints that you want to keep tabs on, Cloudflare will start collecting data on its performance and make it available to you as soon as possible.

In Endpoint Management, you can see a few summary metrics in the collapsed view of each endpoint, including recommended rate limits, average latency, and error rate. It can be difficult to tell whether things are going well or not just from seeing a value alone, so we added sparklines that show relative performance, comparing an endpoint’s current metrics with its usual or previous data.

If you want to view further details about a given endpoint, you can expand it for additional metrics such as response size and errors separated by 4xx and 5xx. The expanded view also allows you to view all metrics at a single timestamp by hovering over the charts.

For each saved endpoint, customers can see the following metrics:

Request count: total number of requests to the endpoint over time.
Rate limiting recommendation per 10 minutes, which is guided by the request count.
Latency: average origin response time, in milliseconds (ms). How long does it take from the moment a visitor makes a request to the moment the visitor gets a response back from the origin?
Error rate vs. overall traffic: grouped by 4xx, 5xx, and their sum.
Response size: average size of the response (in bytes) returned to the request.

You can toggle between viewing these metrics on a 24-hour period or a 7-day period, depending on the scale on which you’d like to view your data. And in the expanded view, we provide a percentage difference between the averages of the current vs. the previous period. For example, say I’m viewing my metrics on a 24-hour timeline. My average latency yesterday was 10 ms, and my average latency today is 30 ms, so the dashboard shows a 200% increase. We also use anomaly detection to bring attention to endpoints that have concerning performance changes.

Additional improvements to Discovery and Schema Validation

As part of making endpoint management GA, we’re also adding two additional enhancements to API Shield.

First, API Discovery now accepts cookies — in addition to authorization headers — to discover endpoints and suggest rate limiting thresholds. Previously, you could only identify an API session with HTTP headers, which didn’t allow customers to protect endpoints that use cookies as session identifiers. Now these endpoints can be protected as well. Simply go to the API Shield tab in the dashboard, choose edit session identifiers, and either change the type, or click Add additional identifier.

Second, we added the ability to validate the body of requests via Schema Validation for all customers. Schema Validation allows you to provide an OpenAPI schema (a template for your API traffic) and have Cloudflare block non-conformant requests as they arrive at our edge. Previously, you provided specific headers, cookies, and other features to validate. Now that we can validate the body of requests, you can use Schema Validation to confirm every element of a request matches what is expected. If a request contains strange information in the payload, we’ll notice. Note: customers who have already uploaded schemas will need to re-upload to take advantage of body validation.

Take a look at our developer documentation for more details on both of these features.

Get started

Endpoint Management, performance metrics, schema exporting, discovery via cookies, and schema body validation are all available now for all API Shield customers. To use them, log into the Cloudflare dashboard, click on Security in the navigation bar, and choose API Shield. Once API Shield is enabled, you’ll be able to start discovering endpoints immediately. You can also use all features through our API.

If you aren’t yet protecting a website with Cloudflare, it only takes a few minutes to sign up.

Announcing the Cloudflare API Gateway

Ben Solomon — Wed, 16 Mar 2022 12:59:25 GMT

Over the past decade, the Internet has experienced a tectonic shift. It used to be composed of static websites: with text, images, and the occasional embedded movie. But the Internet has grown enormously. We now rely on API-driven applications to help with almost every aspect of life. Rather than just download files, we are able to engage with apps by exchanging rich data. We track workouts and send the results to the cloud. We use smart locks and all kinds of IoT devices. And we interact with our friends online.

This is all wonderful, but it comes with an explosion of complexity on the back end. Why? Developers need to manage APIs in order to support this functionality. They need to monitor and authenticate every single request. And because these tasks are so difficult, they’re usually outsourced to an API gateway provider.

Unfortunately, today’s gateways leave a lot to be desired. First: they’re not cheap. Then there’s the performance impact. And finally, there’s a data and privacy risk, since more than 50% of traffic reaches APIs (and is presumably sent through a third party gateway). What a mess.

Today we’re announcing the Cloudflare API Gateway. We’re going to completely replace your existing gateway at a fraction of the cost. And our solution uses the technology behind Workers, Bot Management, Access, and Transform Rules to provide the most advanced API toolset on the market.

What is API Gateway?

In short, it’s a package of features that will do everything for your APIs. We break it down into three categories:

SecurityThese are the products we have already blogged about. Tools like Discovery, Schema Validation, Abuse Detection, and more. We’ve spent a lot of time applying our security expertise to the world of APIs.

Management & MonitoringThese are the foundational tools that keep your APIs in order. Some examples: analytics, routing, and authentication. We are already able to do these things with existing products like Cloudflare Access, and more features are on the way.

Everything ElseThese are the small (but crucial) items that keep everything running. Cloudflare already offers SSL/TLS termination, load balancing, and proxy services that can run by default.

Today’s blog post describes each feature in detail. We’re excited to announce that all the security features are now generally available, so let’s start by discussing those.

Discovery

Our customers are eager to protect their APIs. Unfortunately, they don’t always have these endpoints documented—or worse, they think everything is documented, but have unknowingly lost or modified endpoints. These hidden endpoints are sometimes called shadow APIs. We need to begin our journey with an exhaustive (and accurate) picture of API surface area.

That’s where Discovery comes in. Head to the Cloudflare dashboard, select the Security tab, then choose “API Shield.” Activate the feature and tell us how you want to identify your API traffic. Most users provide a header (available today), but we can also use the request body or cookie (available soon).

We provide an exhaustive list of your API endpoints. Cloudflare lists each method, path, and additional metadata to help you understand your surface area. We even collapse endpoints that include variables (e.g., /account/217) to become generally applicable (e.g., /account/{var1}).

Discovery is a powerful countermeasure to entropy. Our customers often expect to find 30 endpoints, but are surprised to learn they have over 100 active endpoints.

Schema Validation

Perhaps you already have a schema for your API endpoints. A schema is like a template: it provides the paths, methods, and additional data you expect API requests to include. Many developers follow the OpenAPI standard to generate (and maintain) a schema.

To harden your security, we can validate incoming traffic against this schema. This is a great way to stop basic attacks. Cloudflare will turn away nonconforming requests, discarding nonsense traffic that ignored the dress code. Simply upload your schema to the dashboard, select the actions you want to take, and deploy:

Schema Validation has already vetted traffic for some of the world’s largest crypto sites, delivery services, and payment platforms. It’s available now, and we’ll add body validation soon.

Abuse Detection

A robust security approach will use Schema Validation and Discovery in tandem, ensuring traffic matches the expected format. But what about abusive traffic that makes it through?

As Cloudflare discovers new API endpoints, we actually suggest rate limits for each one. That’s the role of Abuse Detection, and it opens the door to a more sophisticated kind of security.

Consider an API endpoint that returns weather updates. Specifically, the endpoint will return “yes” if it is likely to snow in the next hour, and “no” otherwise. Our algorithm might detect that the average user requests this data once every 10 minutes. A small group of scrapers, however, makes 37 requests per 10 minutes. Cloudflare automatically recommends a threshold in between, weighted to provide normal users with some breathing room. This would prevent abusive scraping services from fetching the weather too often.

We provide the option to create a rule using our new Advanced Rate Limiting engine. You can use cookies, headers, and more to tune thresholds. We’ve been using Abuse Detection to protect api.cloudflare.com for months now.

Our favorite part of this feature: it relies on the machine learning approach we use for Bot Management. Just another way our products can feed into (and benefit from) each other.

Abuse Detection is available now. If you’re interested in Sequential Abuse Detection, which we use to flag anomalous request flows, check out our previous blog post. The sequential piece is in early access, and we’re continuing to tune it before an official launch.

mTLS

Mutual TLS takes security to a new level. You can use certificates to validate incoming traffic as it reaches your APIs—which is especially useful for mobile and IoT devices. Moreover, this is an excellent positive security model that can (and should) be adopted for most device ecosystems.

As an example, let’s return to our weather API. Perhaps this service includes a second endpoint that receives the current temperature from a thermometer. But there’s a problem: anyone can make fake requests, providing inaccurate readings to the endpoint. To prevent this, use mTLS to install a client certificate on the legitimate thermometer, then let Cloudflare validate that certificate. Any other requests will be turned away. Problem solved!

We already offer a set of free certificates to every Cloudflare customer. That will continue. But starting today, API Gateway customers get unlimited certificates by default.

Authentication

Many modern APIs require authentication. In fact, authentication unlocks all sorts of capabilities—it allows sessions (with login), personal data exchange, and infrastructure efficiency. And of course, Cloudflare protects authenticated traffic as it passes through our network.

But with API Gateway, Cloudflare plays a more active role in authenticating traffic, helping to issue and validate the following:

API keys
JSON web tokens (JWT)
OAuth 2.0 tokens

Using access control lists, we help you manage different user groups with varying permissions. And this matters—because your current provider is introducing tons of latency and unnecessary data exchange. If a request has to go somewhere outside the Cloudflare ecosystem, it’s traveling farther than it needs to:

Cloudflare can authenticate on our global network and handle requests in a fraction of the time. This kind of technology is difficult to implement, but we felt it was too important to ignore. How did we build it so quickly? Cloudflare Access. We took our experience working with identity providers and, once again, ported it over to the world of APIs. Our gateway includes unlimited authentication and token exchange. These features will be available soon.

Routing & Management

Let’s talk briefly about microservices. Modern applications are behemoths, so developers break them up into smaller chunks called “microservices.”

Consider an application that helps you book a hotel room. It might use a microservice to fetch available dates, another to fetch prices, and still another to fetch room types. Perhaps a different team manages each microservice, but they all need to be available from a single public entry point:

That single entry point—traditionally managed by an API gateway—is responsible for routing each request to the right microservice. Many of our customers have been paying standalone services to do this for years. That’s no longer necessary. We’ve built on our Transform Rules product to dynamically re-write and re-route at our edge. It’s easy to configure, fast to deploy, and natively built into API Gateway. Cloudflare can now be your API’s single point of entry.

That’s just the tip of the iceberg. API Gateway can actually replace your microservices through an integration with our Workers product. How? Consider writing a Worker that performs some action; perhaps return hotel prices, which are stored with Durable Objects on our network. With API Gateway, requests arrive at our network, are routed to the correct microservice with Transform Rules, and then are fully served with Workers (still on our network!). These Workers may contact your origin for additional information, where necessary.

Workers are faster, cheaper, and simpler than microservice alternatives. This integration will be available soon.

API Analytics

Customers tell us that seeing API traffic is sometimes more important than even acting on it. In fact, this trend isn’t specific to APIs. We published another blog today that explores how one customer uses our bot intelligence to passively log information about threats.

With API Analytics, we’ve drawn on our other products to show useful data in real time. You can view popular endpoints, filter by ML-driven insights, see histograms of abuse thresholds, and capture trends.

API Analytics will be available soon. When this happens, you’ll also be able to export custom reports and share insights within your organization.

Logging, Quota Management, and More

All of our established features, like caching, load balancing, and log integrations work natively with API Gateway. These shouldn’t be overlooked as primitive gateway features; they’re essential. And because Cloudflare performs all of these functions in the same place, you get the latency benefits without having to do a thing.

We are also expanding our Enterprise Logs functionality to perform real-time logging. If you choose to authenticate on Cloudflare’s network, you can view detailed logs of each user who has accessed an API. Similarly, we keep track of each request’s lifespan as it is received, validated, routed, and responded to. Everything is logged.

Finally, we are building Quota Management, a feature that counts API requests over a longer period of time (like a month) and allows you to manage thresholds for your users. We’ve also launched Advanced Rate Limiting to help with more sophisticated cases (including body inspection for GraphQL).

Conclusion

Our API security features—Discovery, Schema Validation, Abuse Detection, and mTLS—are available now! We call these features API Shield because they form the shield that protects the remaining gateway functions. Enterprise customers can ask their account teams for access today.

Many of the other portions of API Gateway are now in early access. According to Gartner®, “by 2025, less than 50% of enterprise APIs will be managed, as explosive growth in APIs surpasses the capabilities of API management tools.” Our goal is to offer an affordable gateway that will fight this trend. If you have a specific feature you want to test, let your account team know, so we can onboard you as soon as possible.

Source: Gartner, “Predicts 2022: APIs Demand Improved Security and Management”, Shameen Pillai, Jeremy D'Hoinne, John Santoro, Mark O'Neill, Sham Gill, 6 December 2021. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Cryptocurrency API Gateway using Typescript+Workers

Steven Pack — Fri, 29 Jun 2018 13:00:00 GMT

If you followed part one, I have an environment setup where I can write Typescript with tests and deploy to the Cloudflare Edge with npm run upload. For this post, I want to take one of the Worker Recipes further.

I'm going to build a mini HTTP request routing and handling framework, then use it to build a gateway to multiple cryptocurrency API providers. My point here is that in a single file, with no dependencies, you can quickly build pretty sophisticated logic and deploy fast and easily to the Edge. Furthermore, using modern Typescript with async/await and the rich type structure, you also write clean, async code.

OK, here we go...

My API will look like this:

Verb	Path	Description
GET	`/api/ping`	Check the Worker is up
GET	`/api/all/spot/:symbol`	Aggregate the responses from all our configured gateways
GET	`/api/race/spot/:symbol`	Return the response of the provider who responds fastest
GET	`/api/direct/:exchange/spot/:symbol`	Pass through the request to the gateway. E.g. gdax or bitfinex

The Framework

OK, this is Typescript, I get interfaces and I'm going to use them. Here's my ultra-mini-http-routing framework definition:

export interface IRouter {
  route(req: RequestContextBase): IRouteHandler;
}

/**
 * A route
 */
export interface IRoute {
  match(req: RequestContextBase): IRouteHandler | null;
}

/**
 * Handles a request.
 */
export interface IRouteHandler {
  handle(req: RequestContextBase): Promise;
}

/**
 * Request with additional convenience properties
 */
export class RequestContextBase {
  public static fromString(str: string) {
    return new RequestContextBase(new Request(str));
  }

  public url: URL;
  constructor(public request: Request) {
    this.url = new URL(request.url);
  }
}

So basically all requests will go to IRouter. If it finds an IRoute that returns an IRouterHandler, then it will call that and pass in RequestContextBase, which is just the request with a parsed URL for convenience.

I stopped short of dependency injection, so here's the router implementation with 4 routes we've implemented (Ping, Race, All and Direct). Each route corresponds to one of the four operations I defined in the API above and returns the corresponding IRouteHandler.

export class Router implements IRouter {
  public routes: IRoute[];

  constructor() {
    this.routes = [
      new PingRoute(),
      new RaceRoute(),
      new AllRoute(),
      new DirectRoute(),
    ];
  }

  public async handle(request: Request): Promise {
    try {
      const req = new RequestContextBase(request);
      const handler = this.route(req);
      return handler.handle(req);
    } catch (e) {
      return new Response(undefined, {
        status: 500,
        statusText: `Error. ${e.message}`,
      });
    }
  }

  public route(req: RequestContextBase): IRouteHandler {
    const handler: IRouteHandler | null = this.match(req);
    if (handler) {
      logger.debug(`Found handler for ${req.url.pathname}`);
      return handler;
    }
    return new NotFoundHandler();
  }

  public match(req: RequestContextBase): IRouteHandler | null {
    for (const route of this.routes) {
      const handler = route.match(req);
      if (handler != null) {
        return handler;
      }
    }
    return null;
  }
}

You can see above I return a NotFoundHandler if we can't find a matching route. Its implementation is below. It's easy to see how 401, 405, 500 and all the common handlers could be implemented.

/**
 * 404 Not Found
 */
export class NotFoundHandler implements IRouteHandler {
  public async handle(req: RequestContextBase): Promise {
    return new Response(undefined, {
      status: 404,
      statusText: 'Unknown route',
    });
  }
}

Now let's start with Ping. The framework separates matching a route and handling the request. Firstly the route:

export class PingRoute implements IRoute {
  public match(req: RequestContextBase): IRouteHandler | null {
    if (req.request.method !== 'GET') {
      return new MethodNotAllowedHandler();
    }
    if (req.url.pathname.startsWith('/api/ping')) {
      return new PingRouteHandler();
    }
    return null;
  }
}

Simple enough, if the URL starts with /api/ping, handle the request with a PingRouteHandler

export class PingRouteHandler implements IRouteHandler {
  public async handle(req: RequestContextBase): Promise {
    const pong = 'pong;';
    const res = new Response(pong);
    logger.info(`Responding with ${pong} and ${res.status}`);
    return new Response(pong);
  }
}

So at this point, if you followed along with Part 1, you can do:

$ npm run upload
$ curl https://cryptoserviceworker.com/api/ping
pong

OK, next the AllHandler, this aggregates the responses. Firstly the route matcher:

export class AllRoute implements IRoute {
  public match(req: RequestContextBase): IRouteHandler | null {
    if (req.url.pathname.startsWith('/api/all/')) {
      return new AllHandler();
    }
    return null;
  }
}

And if the route matches, we'll handle it by farming off the requests to our downstream handlers:

export class AllHandler implements IRouteHandler {
  constructor(private readonly handlers: IRouteHandler[] = []) {
    if (handlers.length === 0) {
      const factory = new HandlerFactory();
      logger.debug('No handlers, getting from factory');
      this.handlers = factory.getProviderHandlers();
    }
  }

  public async handle(req: RequestContextBase): Promise {
    const responses = await Promise.all(
      this.handlers.map(async h => h.handle(req))
    );
    const jsonArr = await Promise.all(responses.map(async r => r.json()));
    return new Response(JSON.stringify(jsonArr));
  }
}

I'm cheating a bit here because I haven't shown you the code for HandlerFactory or the implementation of handle for each one. You can look up the full source here.

Take a moment here to appreciate just what's happening. You're writing very expressive async code that in a few lines, is able to multiplex a request to multiple endpoints and aggregate the results. Furthermore, it's running in a sandboxed environment in a data center very close to your end user. Edge-side code is a game changer.

Let's see it in action.

$ curl https://cryptoserviceworker.com/api/all/spot/btc-usd
[  
   {  
      "symbol":"btc-usd",
      "price":"6609.06000000",
      "utcTime":"2018-06-20T05:26:19.512000Z",
      "provider":"gdax"
   },
   {  
      "symbol":"btc-usd",
      "price":"6600.7",
      "utcTime":"2018-06-20T05:26:22.284Z",
      "provider":"bitfinex"
   }
]

Cool, OK, who's fastest? First, the route handler:

export class RaceRoute implements IRoute {
  public match(req: RequestContextBase): IRouteHandler | null {
    if (req.url.pathname.startsWith('/api/race/')) {
      return new RaceHandler();
    }
    return null;
  }
}

And the handler. Basically just using Promise.race to pick the winner

export class RaceHandler implements IRouteHandler {
  constructor(private readonly handlers: IRouteHandler[] = []) {
    const factory = new HandlerFactory();
    this.handlers = factory.getProviderHandlers();
  }

  public handle(req: RequestContextBase): Promise {
    return this.race(req, this.handlers);
  }

  public async race(
    req: RequestContextBase,
    responders: IRouteHandler[]
  ): Promise {
    const arr = responders.map(r => r.handle(req));
    return Promise.race(arr);
  }
}

So who's fastest? Tonight it's gdax.

curl https://cryptoserviceworker.com/api/race/spot/btc-usd
{  
   "symbol":"btc-usd",
   "price":"6607.15000000",
   "utcTime":"2018-06-20T05:33:16.074000Z",
   "provider":"gdax"
}

Summary

Using Typescript+Workers, in < 500 lines of code, we were able to

Define an interface for a mini HTTP routing and handling framework
Implement a basic implementation of that framework
Build Routes and Handlers to provide Ping, All, Race and Direct handlers
Deploy it to 160+ data centers with npm run upload

Stay tuned for more, and PRs welcome, particularly for more providers.

If you have a worker you'd like to share, or want to check out workers from other Cloudflare users, visit the “Recipe Exchange” in the Workers section of the Cloudflare Community Forum.