Introducing thresholds in Security Event Alerting: a z-score love story

Today we are excited to announce thresholds for our Security Event Alerts: a new and improved way of detecting anomalous spikes of security events on your Internet properties. Previously, our calculations were based on z-score methodology alone, which was able to determine most of the significant spikes. By introducing a threshold, we are able to make alerts more accurate and only notify you when it truly matters. One can think of it as a romance between the two strategies. This is the story of how they met.

Author’s note: as an intern at Cloudflare I got to work on this project from start to finish from investigation all the way to the final product.

Once upon a time

In the beginning, there were Security Event Alerts. Security Event Alerts are notifications that are sent whenever we detect a threat to your Internet property. As the name suggests, they track the number of security events, which are requests to your application that match security rules. For example, you can configure a security rule that blocks access from certain countries. Every time a user from that country tries to access your Internet property, it will log as a security event. While a security event may be harmless and fired as a result of the natural flow of traffic, it is important to alert on instances when a rule is fired more times than usual. Anomalous spikes of too many security events in a short period of time can indicate an attack. To find these anomalies and distinguish between the natural number of security events and that which poses a threat, we need a good strategy.

The lonely life of a z-score

Before a threshold entered the picture, our strategy worked only on the basis of a z-score. Z-score is a methodology that looks at the number of standard deviations a certain data point is from the mean. In our current configuration, if a spike crosses the z-score value of 3.5, we send you an alert. This value was decided on after careful analysis of our customers’ data, finding it the most effective in determining a legitimate alert. Any lower and notifications will get noisy for smaller spikes. Any higher and we may miss out on significant events. You can read more about our z-score methodology in this blog post.

The following graphs are an example of how the z-score method works. The first graph shows the number of security events over time, with a recent spike.

A graph of the number of security rules broken over time with a big recent spike.

To determine whether this spike is significant, we calculate the z-score and check if the value is above 3.5:

A graph of z-score values over time with a big recent spike

As the graph shows, the deviation is above 3.5 and so an alert is triggered.

However, relying on z-score becomes tricky for domains that experience no security events for a long period of time. With many security events at zero, the mean and standard deviation depress to zero as well. When a non-zero value finally appears, it will always be infinite standard deviations away from the mean. As a result, it will always trigger an alert even on spikes that do not pose any threat to your domain, such as the below:

A graph of the number of security rules broken over time with a recent spike of five security events

With five security events, you are likely going to ignore this spike, as it is too low to indicate a meaningful threat. However, the z-score in this instance will be infinite:

A graph of z-score values over time with a recent spike to infinity

Since a z-score of infinity is greater than 3.5, an alert will be triggered. This means that customers with few security events would often be overwhelmed by event alerts that are not worth worrying about.

Letting go of zeros

To avoid the mean and standard deviation becoming zero and thus alerting on every non-zero spike, zero values can be ignored in the calculation. In other words, to calculate the mean and standard deviation, only data points that are higher than zero will be considered.

With those conditions, the same spike to five security events will now generate a different z-score:

A graph of z-score values over time with all values equal to zero

Great! With the z-score at zero, it will no longer trigger an alert on the harmless spike!

But what about spikes that could be harmful? When calculations ignore zeros, we need enough non-zero data points to accurately determine the mean and standard deviation. If only one non-zero value is present, that data point determines the mean and standard deviation. As such, the mean will always be equal to the spike, z-score will always be zero and an alert will never be triggered:

A graph of the number of security rules broken over time with a spike from zero to 1000

For a spike of 1000 events, we can tell that there is something wrong and we should trigger an alert. However, because there is only one non-zero data point, the z-score will remain zero:

The z-score does not cross the value 3.5 and an alert will not be triggered.

So what’s better? Including zeros in our calculations can skew the results for domains with too many zero events and alert them every time a spike appears. Not including zeros is mathematically wrong and will never alert on these spikes.

Threshold, the prince charming

Clearly, a z-score is not enough on its own.

Instead, we paired up the z-score with a threshold. The threshold represents the raw number of security events an Internet property can have, below which an alert will not be sent. While z-score checks whether the spike is at least 3.5 standard deviations above the mean, the threshold makes sure it is above a certain static value. If both of these conditions are met, we will send you an alert:

A graph of security events over time with a recent spike of 225 security events

The above spike crosses the threshold of 200 security events. We now have to check that the z-score is above 3.5:

A graph of z-score values over time with a recent spike crossing the value 3.5

The z-score value crosses 3.5 and an alert will be sent.

A threshold for the number of security events comes as the perfect complement. By itself, the threshold cannot determine whether something is a spike, and would simply alert on any value crossing it. This blog post describes in more detail why thresholds alone do not work. However, when paired with z-score, they are able to share their strengths and cover for each other's weaknesses. If the z-score falsely detects an insignificant spike, the threshold will stop the alert from triggering. Conversely, if a value does cross the security events threshold, the z-score ensures there is a reasonable variance from the data average before allowing an alert to be sent.

The invaluable value

To foster a successful relationship between the z-score and security events threshold, we needed to determine the most effective threshold value. After careful analysis of our previous attacks on customers, we set the value to 200. This number is high enough to filter out the smaller, noisier spikes, but low enough to expose any threats.

Am I invited to the wedding?

Yes, you are! The z-score and threshold relationship is already enabled for all WAF customers, so all you need to do is sit back and relax. For enterprise customers, the threshold will be applied to each type of alert enabled on your domain.

Happily ever after

The story certainly does not end here. We are constantly iterating on our alerts, so keep an eye out for future updates on the road to make our algorithms even more personalized for your Internet properties!

The Cloudflare Blog

Introducing thresholds in Security Event Alerting: a z-score love story

Once upon a time

The lonely life of a z-score

Letting go of zeros

Threshold, the prince charming

The invaluable value

Am I invited to the wedding?

Happily ever after

Celebrate Micro-Small, and Medium-sized Enterprises Day with Cloudflare

Everything you need to know about NIST’s new guidance in “SP 1800-35: Implementing a Zero Trust Architecture”

Cloudflare Log Explorer is now GA, providing native observability and forensics

Celebrating 11 years of Project Galileo’s global impact