A Solution to Compression Oracles on the Web

CC 3.0 by Jean-Jacques MILAN

This is a guest post by Blake Loring, a PhD student at Royal Holloway, University of London. Blake worked at Cloudflare as an intern in the summer of 2017.

Compression is often considered an essential tool when reducing the bandwidth usage of internet services. The impact that the use of such compression schemes can have on security, however, has often been overlooked. The recently detailed CRIME, BREACH, TIME and HEIST attacks on TLS have shown that if an attacker can make requests on behalf of a user then secret information can be extracted from encrypted messages using only the length of the response. Deciding whether an element of a web-page should be secret often depends on the content of the page, however there are some common elements of web-pages which should always remain secret such as Cross-Site Request Forgery (CSRF) tokens. Such tokens are used to ensure that malicious webpages cannot forge requests from a user by enforcing that any request must contain a secret token included in a previous response.

I worked at Cloudflare last summer to investigate possible solutions to this problem. The result is a project called cf-nocompress. The aim of this project was to develop a tool which automatically mitigates instances of the attack, in particular CSRF extraction, on Cloudflare hosted services transparently without significantly impacting the effectiveness of compression. We have published a proof-of-concept implementation on GitHub, and provide a challenge site and tool which demonstrates the attack in action).

The Problem

Most web compression schemes reduce the size of data by replacing common sequences with references to a dictionary of terms created during the compression. When using such compression schemes the size of the encrypted response will be reduced if there are repeated strings within the plaintext. This can be exploited through the use of a canary, an element in a request which we know will be added to the response, to test whether a string exists within the original response using the compressed response length. From this we can extract the contents of portions of a webpage incrementally by guessing each subsequent character. This attack creates an opportunity for malicious JavaScript to extract CSRF tokens and other confidential information from a webpage through malicious code served to a browser using either a packet sniffer (a methodology created by Duong and Rizzo as part of the BEAST attack) or JavaScript APIs which reveal network statistics (described by Vanhoef and Van Goethem in HEIST).

There are two common mitigation schemes for this attack. The first is to send a unique CSRF every time a page is loaded. By removing the consistent element from the page the threat of attack is removed. This approach requires the server to keep state of valid CSRFs and whether they have been used, additionally it can only be used to protect page tokens and not user-readable data. Another approach is to XOR all secrets in a response with a per-request random number and then transmit the number with the response. Once received a piece of JavaScript can then be used to recover the original secret by XORing the data again. Alternatively, the server can be modified to expect the XOR variant and the random number rather than the original secret. This approach allows for all secrets to be protected, however it requires client side post-processing. Additionally, both approaches require extensive, per page, modification which make mitigation incredibly cumbersome in practice. At present the only way to fully mitigate such an attack is to disable compression entirely on vulnerable websites, an impractical solution for most websites and content delivery networks.

Our Solution

We decided to use selective compression, compressing only non-secret parts of a page, in order to stop the extraction of secret information from a page. We found that in most cases a secret within a webpage can be described in terms of a classical regular expression. These descriptions allow us to identify secrets online as a response is streamed. Once the secrets are identified they can be flagged so that a modified compression library can ensure that they are not added to the dictionary. The primary advantage of this approach is that protection can be offered transparently by the web-server and the application does not need to be modified as long as a regular expression can be used to clearly express which portions of a response are secret. In addition, we do not need to maintain state for each user or require client-side JavaScript to appropriately render the page.

The proof-of-concept is implemented as a plugin for NGINX and requires a small patch to the gzip module. The plugin uses sregex to identify secrets within a page. The modified gzip functions as normal, however when a secret is processed compression is disabled. This ensures secrets do not get added to the compression dictionary, removing any on response size.

Additional security considerations

The regular expression matching engine we use in this proof-of-concept is not guaranteed to run in constant time. As such, matching a string against some regular expressions could introduce a timing based side-channel attack. This issue is compounded by the complexity of modern regular expressions as matching time can often be non-intuitive. Whilst in many cases the risk such an attack would pose is minimal, a limited matcher with constant runtime and restrictions on unbounded loops should be developed if our mitigation is adopted.

The Challenge Site

We have set up the challenge website compression.website with protection, and a clone of the site compression.website/unsafe without it. The page is a simple form with a per-client CSRF designed to emulate common CSRF protection. Using the example attack presented with the library we have shown that we are able to extract the CSRF from the size of request responses in the unprotected variant but we have not been able to extract it on the protected site. We welcome attempts to extract the CSRF without access to the unencrypted response.

The Cloudflare Blog

A Solution to Compression Oracles on the Web

The Problem

Our Solution

Additional security considerations

The Challenge Site

Shared Dictionaries: compression that keeps up with the agentic web

Securing non-human identities: automated revocation, OAuth, and scoped permissions

Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP

Managed OAuth for Access: make internal apps agent-ready in one click