Developers, bloggers, business owners, and large corporations all rely on Cloudflare to keep their applications secure, available, and performant.
To meet these goals, over the last twelve years we have built a smart network capable of protecting many millions of Internet properties. As of March 2022, W3Techs reports that:
“Cloudflare is used by 80.6% of all the websites whose reverse proxy service we know. This is 19.7% of all websites”
Netcraft, another provider who crawls the web and monitors adoption puts this figure at more than 20M active sites in their latest Web Server Survey (February 2022):
“Cloudflare continues to make strong gains amongst the million busiest websites, where it saw the only notable increases, with an additional 3,200 sites helping to bring its market share up to 19.4%”
The breadth and diversity of the sites we protect, and the billions of browsers and devices that interact with them, gives us unique insight into the ever-changing application security trends on the Internet. In this post, we share some of those insights we’ve gathered from the 32 million HTTP requests/second that pass through our network.
Definitions
Before we examine the data, it is useful to define the terminology we use. Throughout this post, we will refer to the following terms:
Mitigated Traffic: any eyeball HTTP***** request that had a “terminating” action applied to by the Cloudflare platform. These include actions such as
BLOCK
,CHALLENGE
(such as captchas or JavaScript based challenges). This does not include requests that had the following actions applied:LOG
,SKIP
,ALLOW
.Bot Traffic/Automated Traffic: any HTTP request identified by Cloudflare’s Bot Management system as being generated by a bot. This includes requests scored between 1 and 29.
API Traffic: any HTTP request with a response content type of
XML
,JSON
,gRPC
, or similar. Where the response content type is not available, such as for mitigated requests, the equivalentAccept
content type (specified by the user agent) is used instead. In this latter case API traffic won’t be fully accounted for, but for insight purposes it still provides a good representation.
Unless otherwise stated, the time frame evaluated in this post is the three-month period from December 1, 2021, to March 1, 2022.
Finally, please note that the data is calculated based only on traffic observed across the Cloudflare network and does not necessarily represent overall HTTP traffic patterns across the Internet.
*When referring to HTTP traffic we mean both HTTP and HTTPS.
Global Traffic Insights
The first thing we can look at is traffic mitigated across all HTTP requests proxied by the Cloudflare network. This will give us a good baseline view before drilling into specific traffic types, such as bot and API traffic.
8% of all Cloudflare HTTP traffic is mitigated
Cloudflare proxies ~32 million HTTP requests per second on average, with more than ~44 million HTTP requests per second at peak. Overall, ~2.5 million requests per second are mitigated by our global network and never reach our caches or the origin servers, ensuring our customers’ bandwidth and compute power is only used for clean traffic.
Site owners using Cloudflare gain access to tools to mitigate unwanted or malicious traffic and allow access to their applications only when a request is deemed clean. This can be done both using fully managed features, such as our DDoS mitigation, WAF managed ruleset or schema validation, as well as custom rules that allow users to define their own filters for blocking traffic.
If we look at the top five Cloudflare features (sources) that mitigated traffic, we get a clear picture of how much each Cloudflare feature is contributing towards helping keep customer sites and applications online and secure:
Tabular format for reference:
table { width: 100%; }
Source
Percentage %
Layer 7 DDoS mitigation
66.0%
Custom WAF Rules
19.0%
Rate Limiting
10.5%
IP Threat Reputation
2.5%
Managed WAF Rules
1.5%
Looking at each mitigation source individually:
Layer 7 DDoS mitigation, perhaps unsurprisingly, is the largest contributor to mitigated HTTP requests by total count (66% overall). Cloudflare’s layer 7 DDoS rules are fully managed and don’t require user configuration: they automatically detect a vast array of HTTP DDoS attacks including those generated by the Meris botnet, Mirai botnet, known attack tools, and others. Volumetric DDoS attacks, by definition, create a lot of malicious traffic!
Custom WAF Rules contribute to more than 19% of mitigated HTTP traffic. These are user-configured rules defined using Cloudflare’s wirefilter syntax. We explore common rule patterns further down in this post.
Our Rate Limiting feature allows customers to define custom thresholds based on application preferences. It is often used as an additional layer of protection for applications against traffic patterns that are too low to be detected as a DDoS attack. Over the time frame analyzed, rate limiting contributed to 10.5% of mitigated HTTP requests.
IP Threat Reputation is exposed in the Cloudflare dashboard as Security Level. Based on behavior we observe across the network, Cloudflare automatically assigns a threat score to each IP address. When the threat score is above the specified threshold, we challenge the traffic. This accounts for 2.5% of all mitigated HTTP requests.
Our Managed WAF Rules are rules that are handcrafted by our internal security analyst team aimed at matching only against valid malicious payloads. They contribute to about 1.5% of all mitigated requests.
HTTP anomalies are the most common attack vector
If we drill into Managed WAF Rules, we get a clear picture of what type of attack vectors malicious users are attempting against the Internet properties we protect.
The vast majority (over 54%) of HTTP requests blocked by our Managed WAF Rules contain HTTP anomalies, such as malformed method names, null byte characters in headers, non-standard ports or content length of zero with a POST
request.
Common attack types in this category are shown below. These have been grouped when relevant:
table { width: 100%; }
Rule Type
Description
Missing User Agent
These rules will block any request without a User-Agent
header. All browsers and legitimate crawlers present this header when connecting to a site. Not having a user agent is a common signal of a malicious request.
Not GET
, POST
or HEAD
Method
Most applications only allow standard GET
or POST
requests (normally used for viewing pages or submitting forms). HEAD
requests are also often sent from browsers for security purposes. Customers using our Managed Rules can easily block any other method - which normally results in blocking a large number of vulnerability scanners.
Missing Referer
When users navigate applications, browsers use the Referer
header to indicate where they are coming from. Some applications expect this header to always be present.
Non-standard port
Customers can configure Cloudflare Managed Rules to block HTTP requests trying to access non-standard ports (such as 80 and 443). This is activity normally seen by vulnerability scanners.
Invalid UTF-8 encoding
It is common for attackers to attempt to break an application server by sending “special” characters that are not valid in UTF-8 encoding.
More commonly known and referenced attack vectors such as XSS and SQLi only contribute to about 13% of total mitigated requests. More interestingly, attacks aimed at information disclosure are third most popular (10%) and software-specific CVE-based attacks account for about 12% of mitigated requests (more than SQLi alone) highlighting both the importance of needing to patch software quickly, and the likelihood of CVE proof-of-concepts (PoCs) being used to compromise applications, such as with the recent Log4J vulnerability. The top 10 attack vectors by percentage of mitigated requests are shown below:
Tabular format for reference:
table { width: 100%; }
Source
Percentage %
HTTP Anomaly
54.5%
Vendor Specific CVE
11.8%
Information Disclosure
10.4%
SQLi
7.0%
XSS
6.1%
File Inclusion
3.3%
Fake Bots
3.0%
Command Injection
2.7%
Open Redirects
0.1%
Other
1.5%
Businesses still rely on IP address-based access lists to protect their assets
In the prior section, we noted that 19% of mitigated requests come from Custom WAF Rules. These are rules that Cloudflare customers have implemented using the wirefilter syntax. At time of writing, Cloudflare customers had a total of ~6.5 million Custom WAF rules deployed.
It is interesting to look at what rule fields customers are using to identify malicious traffic, as this helps us focus our efforts on what other fully automated mitigations could be implemented to improve the Cloudflare platform.
The most common field, found in approximately 64% of all custom rules, remains the source IP address or fields easily derived from the IP address, such as the client country location. Note that IP addresses are becoming less useful signals for security policies, but they are often the quickest and simplest type of filter to implement during an attack. Customers are also starting to adopt better approaches such as those offered in our Zero Trust portfolio to further reduce reliance on IP address-based fields.
The top 10 fields are shown below:
Tabular format for reference:
table { width: 100%; }
Field name
Used in % of rules
ip
64.9%
ip_geoip_country
27.3%
http_request_uri
24.1%
http_user_agent
21.8%
http_request_uri_path
17.8%
http_referer
8.6%
cf_client_bot
8.3%
http_host
7.8%
ip_geoip_asnum
5.8%
cf_threat_score
4.4%
Beyond IP addresses, standard HTTP request fields (URI
, User-Agent
, Path
, Referer
) tend to be the most popular. Note, also, that across the entire rule corpus, the average rule combines at least three independent fields.
Bot Traffic Insights
Cloudflare has long offered a Bot Management solution to allow customers to gain insights into the automated traffic that might be accessing their application. Using Bot Management classification data, we can perform a deep dive into the world of bots.
38% of HTTP traffic is automated
Over the time period analyzed, bot traffic accounted for about 38% of all HTTP requests. This traffic includes bot traffic from hundreds of Verified Bots tracked by Cloudflare, as well as any request that received a bot score below 30, indicating a high likelihood that it is automated.
Overall, when bot traffic matches a security configuration, customers allow 41% of bot traffic to pass to their origins, blocking only 6.4% of automated requests. Remember that this includes traffic coming from Verified Bots like GoogleBot, which ultimately benefits site owners and end users. It’s a reminder that automation in and of itself is not necessarily detrimental to a site. This is why we segment Verified Bot traffic, and why we give customers a granular bot score, rather than a binary “bot or not bot” indicator. Website operators want the flexibility to be precise with their response to different types of bot traffic, and we can see that they do in fact use this flexibility. Note that our self-serve customers can also decide how to handle bot traffic using our Super Bot Fight Mode feature.
Tabular data for reference:
table { width: 100%; }
Action on all bot traffic
Percentage %
allow
40.9%
log
31.9%
bypass
19.0%
block
6.4%
jschallenge
0.5%
More than a third of non-verified bot HTTP traffic is mitigated
31% of all bot traffic observed by Cloudflare is not verified, and comes from thousands of custom-built automated tools like scanners, crawlers, and bots built by hackers. As noted above, automation does not necessarily mean these bots are performing malicious actions. If we look at customer responses to identified bot traffic, we find that 38.5% of HTTP requests from non-verified bots are mitigated. This is obviously a much more defensive configuration compared to overall bot traffic actions shown above:
Tabular data for reference:
table { width: 100%; }
Action on non-verified bot traffic
Percentage %
block
34.0%
log
28.6%
allow
14.5%
bypass
13.2%
managed_challenge
3.7%
You’ll notice that almost 30% of customers log traffic rather than take immediate action. We find that many enterprise customers choose to not immediately block bot traffic, so they don’t give a feedback signal to attackers. Rather, they prefer to tag and monitor this traffic, and either drop at a later time or redirect to alternate content. As targeted attack vectors have evolved, responses to those attacks have had to evolve and become more sophisticated as well. Additionally, nearly 3% of non-verified bot traffic is automatically mitigated by our DDoS protection (connection_close
). These requests tend to be part of botnets used to attack customer applications.
API Traffic Insights
Many applications built on the Internet today are not meant to be consumed by humans. Rather, they are intended for computer-to-computer communication. The common way to expose an application for this purpose is to build an Application Programming Interface (API) that can be accessed using HTTP.
Due to the underlying format of the data in transit, API traffic tends to be a lot more structured than standard web applications, causing all sorts of problems from a security standpoint. First, the structured data often causes Web Application Firewalls (WAFs) to generate a large number of false positives. Secondly, due to the nature of APIs, they often go unnoticed, and many companies end up exposing old and unmaintained APIs without knowing, often referred to as “shadow APIs”.
Below, we look at some differences in API trends compared to the global traffic insights shown above.
10% of API traffic is mitigated
A good portion of bot traffic is accessing API endpoints, and as discussed previously, API traffic is the fastest growing traffic type on the Cloudflare network, currently accounting for 55% of total requests.
API endpoints globally receive more malicious requests compared to standard web applications (10% vs 8%) potentially indicating that attackers are focusing more on APIs for their attack surface as opposed to standard web apps.
Our DDoS mitigation is still the top source of mitigated events for API endpoints, accounting for just over 63% of the total mitigated requests. More interestingly, Custom WAF rules account for 35% compared to 19% when looking at global traffic. Customers have, to date, been heavily using WAF Custom Rules to lock down and validate traffic to API endpoints, although we expect our API Gateway schema validation feature to soon surpass Custom WAF Rules in terms of mitigated traffic.
SQLi is the most common attack vector on API endpoints
If we look at our WAF Managed Rules mitigations on API traffic only, we see notable differences compared to global trends. These differences include much more equal distribution across different types of attacks, but more noticeably, SQL injection attacks in the top spot.
Command Injection attacks are also much more prominent (14.3%), and vectors such as Deserialization make an appearance, contributing to more than 1% of the total mitigated requests.
Tabular data for reference:
table { width: 100%; }
Source
Percentage %
SQLi
34.5%
HTTP Anomaly
18.2%
Vendor Specific CVE
14.5%
Command Injection
14.3%
XSS
7.3%
Fake Bots
5.8%
File Inclusion
2.3%
Deserialization
1.2%
Information Disclosure
0.6%
Other
1.3%
Looking ahead
In this post we shared some initial insights around Internet application security trends based on traffic to Cloudflare’s network. Of course, we have only just scratched the surface. Moving forward, we plan to publish quarterly reports with dynamic filters directly on Cloudflare Radar and provide much deeper insights and investigations.