How we made Firewall Rules

Recently we launched Firewall Rules, a new feature that allows you to construct expressions that perform complex matching against HTTP requests and then choose how that traffic is handled. As a Firewall feature you can, of course, block traffic. The expressions we support within Firewall Rules along with powerful control over the order in which they are applied allows complex new behaviour.

In this blog post I tell the story of Cloudflare’s Page Rules mechanism and how Firewall Rules came to be. Along the way I’ll look at the technical choices that led to us building the new matching engine in Rust.

The evolution of the Cloudflare Firewall

Cloudflare offers two types of firewall for web applications, a managed firewall in the form of a WAF where we write and maintain the rules for you, and a configurable firewall where you write and maintain rules. In this article, we will focus on the configurable firewall.

One of the earliest Cloudflare firewall features was the IP Access Rule. It dates backs to the earliest versions of the Cloudflare Firewall and simply allows you to block traffic from specific IP addresses:

if request IP equals 203.0.113.1 then block the request

As attackers and spammers frequently launched attacks from a given network we also introduced the ASN matching capability:

if request IP belongs to ASN 64496 then block the request

We also allowed blocking ranges of addresses defined by CIDR notation when an IP was too specific and an ASN too broad:

if request IP is within 203.0.113.0/24 then block the request

Blocking is not the only action you might need and so other actions are available:

Allowlist = apply no other firewall rules and allow the request to pass this part of the firewall
Challenge = issue a CAPTCHA and if this is passed then allow the request but otherwise deny. This would be used to determine if the request came from a human operator
JavaScript challenge = issue an automated JavaScript challenge and if this is passed then allow the request. This would be used to determine if the request came from a simple stateless bot (perhaps a wget or curl script)
Block = deny the request

Cloudflare also has Page Rules. Page Rules allow you to match full URIs and then perform actions such as redirects or to raise the security level which can be considered firewall functions:

if request URI matches /nullroute then redirect to http://127.0.0.1

Cloudflare also added GeoIP information within an HTTP header and the firewall was extended to include that:

if request IP originates from county GB then CAPTCHA the request

All of the above existed in Cloudflare pre-2014, and then during 2016 we set about to identify the most commonly requested firewall features (according to Customer Support tickets and feedback from paying customers) and provide a self-service solution. From that analysis, we added three new capabilities during late 2016: Rate Limiting, User Agent Rules, and Zone Lockdown.

Whilst Cloudflare automatically stops very large denial of service attacks, rate limiting allowed customers to stop smaller attacks that were a real concern to them but were low enough volume that Cloudflare’s DDoS defences were not being applied.

if request method is POST and request URI matches /wp-admin/index.php and response status code is 403 and more than 3 requests like this occur in a 15 minute time period then block the traffic for 2 hours

User Agent rules are as simple as:

if request user_agent is `Fake User Agent` then CAPTCHA the request

Zone Lockdown, however was the first default deny feature. The Cloudflare Firewall could be thought of as “allow all traffic, except where a rule exists to block it”. Zone Lockdown is the opposite “for a given URI, block all traffic, except where a rule exists to allow it”.

Zone Lockdown allowed customers could to block access to a public website for all but a few IP addresses or IP ranges. For example, many customers wanted access to a staging website to only be available to their office IP addresses.

if request URI matches https://staging.example.com/ and request IP not in 203.0.113.0/24 then block the request

Finally, an Enterprise customer could also contact Cloudflare and have a truly bespoke rule created for them within the WAF engine.

Seeing the problem

The firewall worked well for simple mitigation, but it didn’t fully meet the needs of our customers.

Each of the firewall features had targeted a single attribute, and the interfaces and implementations reflected that. Whilst the Cloudflare Firewall had evolved to solve a problem as each problem arose, these did not work together. In late 2017 you could sum up the firewall capabilities as:

You can block any attack traffic on any criteria, so long as you only pick one of:

IP
CIDR
ASN
Country
User Agent
URI

We saw the problem, but how to fix it?

We match our firewall rules in two ways:

Lookup matching
String pattern matching

Lookup matching covers the IP, CIDR, ASN, Country and User Agent rules. We would create a key in our globally distributed key/value data store Quicksilver, and store the action in the value:

Key   = zone:www.example.com_ip:203.0.113.1
Value = block

When a request for www.example.com is received, we look at the IP address of the client that made the request, construct the key and perform the lookup. If the key exists in the store, then the value would tell us what action to perform, in this case if the client IP were 203.0.113.1 then we would block the request.

Lookup matching is a joy to work with, it is O(1) complexity meaning that a single request would only perform a single lookup for an IP rule regardless of how many IP rules a customer had. Whilst most customers had a few rules, some customers had hundreds of thousands of rules (typically created automatically by combining fail2ban or similar with a Cloudflare API call).

Lookups work well when you are only looking up a single value. If you need to combine an IP and a User Agent we would need to produce keys that composed these values together. This massively increases the number of keys that you need to publish.

String pattern matching occurs where URI matching is required. For our Page Rules feature this meant combining all of the Page Rules into a single regular expression that we would apply to the request URI whilst handling a request.

If you had Page Rules that said (in order):

Match */wp-admin/index.php and then block
Then match */xmlrpc.php and then block

These are converted into:

^(?<block__1>(?:.*/wp-admin/index.php))|(?<block__2>(?:.*/xmlrpc.php))$

Yes, you read that correctly. Each Page Rule was appended to a single regular expression in the order of execution, and the naming group is used as an overload for the desired action.

This works surprisingly well as regular expression matching can be simple and fast especially when the regular expression matches against a single value like the URI, but as soon as you want to match the URI plus an IP range it becomes less obvious how to extend this.

This is what we had, a set of features that worked really well providing you want to match a single property of a request. The implementation also meant that none of these features could be trivially extended to embrace multiple properties at a time. We needed something else, a fast way to compute if a request matches a rule that could contain multiple properties as well as pattern matching.

A solution that works now and in the future

Over time Cloudflare engineers authored internal posts exploring how a new matching engine might work. The first thing that occurred to every engineer was that the matching must be an expression. These ideas followed a similar approach which we would construct an expression within JSON as a DSL (Domain Specific Language) of our expression language. This DSL could describe matching a request and a UI could render this, and a backend could process it.

Early proposals looked like this:

{
  "And": [
    {
      "Equals"{
        "host": "www.example.com"
      }
    },
    "Or": [
      {
        "Regex": {
          "path": "^(?: .*/wp-admin/index.php)$"
        }
      }{
        "Regex": {
          "path": "^(?: .*/xmlrpc.php)$"
        }
      }
    ]
  ]
}

The JSON describes an expression that computers can easily turn into a rule to apply, but people find this hard to read and work with.

As we did not wish to display JSON like this in our dashboard we thought about how we might summarise it for a UI:

if request host equals www.example.com and (request path matches ^(?:.*/wp-admin/index.php)$ or request path matches ^(?:.*/xmlrpc.php)$)

And there came an epiphany. As engineers working we’ve seen an expression language similar to this before, so may I introduce to you our old friend Wireshark®.

Wireshark is a network protocol analyzer. To use it you must run a packet capture to record network traffic from a capture device (usually a network card). This is then saved to disk as a .pcap file which you subsequently open in the Wireshark GUI. The Wireshark GUI has a display filter entry box, and when you fill in a display filter the GUI will dissect the saved packet capture such that it will determine which packets match the expression and then show those in the GUI.

But we don’t need to do that. In fact, for our scenario that approach does not work as we have a firewall and need to make decisions in real-time as part of the HTTP request handling rather than via the packet capture process.

For Cloudflare, we would want to use something like the expression language that is the Wireshark Display Filters but without the capture and dissection as we would want to do this potentially thousands of times per request without noticeable delay.

If we were able to use a Wireshark-style expression language then we can reduce the JSON encapsulated expression above to:

http.host eq "www.example.com" and (http.request.path ~ "wp-admin/index\.php" or http.request.path ~ "xmlrpc.php")

This is human readable, machine parseable, succinct.

It also benefits from being highly similar to Wireshark. For security engineers used to working with Wireshark when investigating attacks it offers a degree of portability from an investigation tool to a mitigation engine.

To make this work we would need to collect the properties of the request into a simple data structure to match the expressions against. Unlike the packet capture approach we run our firewall within the context of an HTTP server and the web server has already computed the request properties, so we can avoid dissection and populate the fields from the web server knowledge:

Field	Value
http.cookie	`session=8521F670545D7865F79C3D7BED C29CCE;-background=light`
http.host	`www.example.com`
http.referer
http.request.method	`GET`
http.request.uri	`/articles/index?section=539061&expand=comments`
http.request.uri.path	`/articles/index`
http.request.uri.query	`section=539061&expand=comments`
http.user_agent	`Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36`
http.x_forwarded_for
ip.src	`203.0.113.1`
ip.geoip.asnum	`64496`
ip.geoip.country	`GB`
ssl	`true`

With a table of HTTP request properties and an expression language that can provide a matching expression we were 90% of the way towards a solution! All we needed for the last 90% was the matching engine itself that would provide us with an answer to the question: Does this request match one of the expressions?

Enter wirefilter.

Wirefilter is the name of the Rust library that Cloudflare has created, and it provides:

The ability for Cloudflare to define a set of fields of types, i.e. ip.src is a field of type IPAddress
The ability to define a table of properties from all of the fields that are defined
The ability to parse an expression and to say whether it is syntactically valid, whether the fields in the expression are valid against the fields defined, and whether the operators used for a field are valid for the type of the field
The ability to apply an expression to a table and return a true|false response indicating whether the evaluated expression matches the request

It is named wirefilter as a hat tip towards Wireshark for inspiring our Wireshark-like expression language and also because in our context of the Cloudflare Firewall these expressions act as a filter over traffic.

The implementation of wirefilter allows us to embed this matching engine within our REST API which is written in Go:

// scheme stores the list of fields and their types that an expression can use
var scheme = filterexpr.Scheme{
	"http.cookie":                     filterexpr.TypeString,
	"http.host":                       filterexpr.TypeString,
	"http.referer":                    filterexpr.TypeString,
	"http.request.full_uri":           filterexpr.TypeString,
	"http.request.method":             filterexpr.TypeString,
	"http.request.uri":                filterexpr.TypeString,
	"http.request.uri.path":           filterexpr.TypeString,
	"http.request.uri.query":          filterexpr.TypeString,
	"http.user_agent":                 filterexpr.TypeString,
	"http.x_forwarded_for":            filterexpr.TypeString,
	"ip.src":                          filterexpr.TypeIP,
	"ip.geoip.asnum":                  filterexpr.TypeNumber,
	"ip.geoip.country":                filterexpr.TypeString,
	"ssl":                             filterexpr.TypeBool,
}

Later we validate expressions provided to the API:

// expression here is a string that may look like:
// `ip.src eq 203.0.113.1`
expressionHash, err := filterexpr.ValidateFilter(scheme, expression)
if fve, ok := err.(*filterexpr.ValidationError); ok {
	validationErrs = append(validationErrs, fve.Ascii)
} else if err != nil {
	return nil, stderrors.Errorf("failed to validate filter: %v", err)
}

This tells us whether the expression is syntactically correct and also whether the field operators and values match the field type. If the expression is valid then we can use the returned hash to determine uniqueness (the hash is generated inside wirefilter so that uniqueness can ignore blanks and minor differences).

The expressions are then published to our global network of PoPs and are consumed by Lua within our web proxy. The web proxy has the same list of fields that the API does, and is now responsible for building the table of properties from the context within the web proxy:

-- The `traits` table defines the mapping between the fields and
-- the corresponding values from the nginx evaluation context.
local traits = {
   ['http.host'] = field.str(function(ctx) return ctx.host end),
   ['http.cookie'] = field.str(function(ctx)
      local value = ctx.req_headers.cookie or ''
      if type(value) == 'table' then
         value = table.concat(value, ";")
      end
      return value
   end),
   ['http.referer'] = field.str(function(ctx) return ctx.req_headers.referer or '' end),
   ['http.request.method'] = field.str(function(ctx) return ctx.method end),
   ['http.request.uri'] = field.str(function(ctx)
      return ctx.rewrite_uri or ctx.request_uri
   end),
   ['http.request.uri.path'] = field.str(function(ctx)
      return ctx.uri or '/'
   end),
   ...

With this per-request table describing a request we can see test the filters. In our case what we’re doing here is:

Fetch a list of all the expressions we would like to match against the request
Check whether any expression, when applied via wirefilter to the table above, return true as having matched
For all matched expressions check the associated actions and their priority

The actions are not part of the matching itself. Once we have a list of matched expressions we determine which action takes precedence and that is the one that we will execute.

Wirefilter then, is a generic library that provides this matching capability that we’ve plugged into our Go APIs and our Lua web proxy, and we use that to power the Cloudflare Firewall.

We chose Rust for wirefilter as early in the project we recognised that if we attempted to make implementations of this in Go and Lua, that it would result in inconsistencies that attackers may be able to exploit. We needed our API and edge proxy to behave exactly the same. For this needed a library, both could call and we could choose one of our existing languages at the edge like C, C++, Go, Lua or even implement this not as a library but as a worker in JavaScript. With a mixed set of requirements of performance, memory safety, low memory use, and the capability to be part of other products that we’re working on like Spectrum, Rust stood out as the strongest option.

With a library in place and the ability to now match all HTTP traffic, how to get that to a public API and UI without diluting the capability? The problems that arose related to specificity and mutual exclusion.

In the past all of our firewall rules had a single dimension to them: i.e. act on IP addresses. And this meant that we had a single property of a single type and whilst there were occasionally edge cases for the most part there were strategies to answer the question “Which is the most specific rule?”. I.e. an IP address is more specific then a /24 which is more specific than a /8. Likewise with URI matching an overly simplistic strategy is that the longer a URI the more specific it is. And if we had 2 IP rules, then only 1 could ever have matched as a request does not come from 2 IPs at once so mutual exclusion is in effect.

The old system meant that given 2 rules, we could implicitly and trivially say “this rule is most specific so use the action associated with this rule”.

With wirefilter powering Firewall Rules, it isn’t obvious that an IP address is more or less specific when compared to a URI. It gets even more complex when a rule can have negation, as a rule that matches a /8 is less specific than a rule that does not match a single IP (the whole address space except this IP - one of the gotchas of Firewall Rules is also a source of it’s power; you can invert your firewall into a positive security model.

As we couldn’t answer specificity using the expression alone, we needed another aspect of the Firewall Rule to provide us this guidance and we realised that customers already had a mechanism to tell us which rules were important… the action.

Given a set of rules, we logically have ordered them according to their action (Log has highest priority, Block has lowest):

Log
Allow
Challenge (CAPTCHA)
JavaScript Challenge
Block

For the vast majority of scenarios this proves to be good enough.

What about when that isn’t good enough though? Do we have examples of complex configuration that break that approach? Yes!

Because the expression language within Firewall Rules is so powerful, and we can support many Firewall Rules, it means that we can now create different firewall configuration for different parts of a web site. i.e. /blog could have wholly different rules than /shop, or for different audiences, i.e. visitors from your office IPs might be allowed on a given URI but everyone else trying to access that URI may be blocked.

In this scenario you need the ability to say “run all of these rules first, and then run the other rules”.

In single machine firewalls like iptables, OS X Firewall, or your home router firewall, the firewall rules were explicitly ordered so that when you match the first rule it terminates execution and you won’t hit the next rule. When you add a new rule the entire set of rules is republished and this helps to guarantee this behaviour. But this approach does not work well for a Cloud Firewall as a large website with many web applications typically also has a large number of firewall rules. Republishing all of these rules in a single transaction can be slow and if you are adding lots of rules quickly this can lead to delays to the final state being live.

If we published individual rules and supported explicit ordering, we risked race conditions where two rules that both were configured in position 4 might exist at the same time and the behaviour if they matched the request would be non-determinable.

We solved this by introducing a priority value, where 1 is the highest priority and as an int32 you can create low priority rules all the way down to priority = 2147483647. Not providing a priority value is the equivalent of “lowest” and runs after all rules that have a priority.

Priority does not have to be a unique value within Firewall Rules. If two rules are of equal priority then we resort to the order of precedence of the actions as defined earlier.

This provides us a few benefits:

Because priority allows rules that share a priority to exist we can publish rules 1 at a time… when you add a new rule the speed at which we deploy that globally is not affected by the number of rules you already have.
If you do have existing rules in a system that does sequentially order the rules, you can import those into Firewall Rules and preserve their order, i.e. this rule should always run before that rule.
But you don’t have to use priority exclusively for ordering as you can also use priority for grouping. For example you may say that all spammers are priority=10000 and all trolls are priority = 5000.

Finally… let’s look at those fields again, http.request.path notice that http prefix? By following the naming convention Wireshark has (see their Display Filter Reference) we have not limited this firewall capability solely to a HTTP web proxy. It is a small leap to imagine that if a Spectrum application declares itself as running SMTP that we could also define fields that understand SMTP and allow filtering of traffic on other application protocols or even at layer 4.

What we have built in Firewall Rules gives us these features today:

Rich expression language capable of targeting traffic precisely and in real-time
Fast global deployment of individual rules
A lot of control over the management and organisation of Firewall Rules

And in the future, we have a product that can go beyond HTTP and be a true Cloud Firewall for all protocols…the Cloudflare Firewall with Firewall Rules.

The Cloudflare Blog

How we made Firewall Rules

The evolution of the Cloudflare Firewall

Seeing the problem

A solution that works now and in the future

Celebrate Micro-Small, and Medium-sized Enterprises Day with Cloudflare

Everything you need to know about NIST’s new guidance in “SP 1800-35: Implementing a Zero Trust Architecture”

Cloudflare Log Explorer is now GA, providing native observability and forensics

Celebrating 11 years of Project Galileo’s global impact