I originally wrote this article for the Web Performance Calendar website, which is a terrific resource of expert opinions on making your website as fast as possible. We thought CloudFlare users would be interested so we reproduced it here. Enjoy!
Efficiently compressing dynamically generated web content
With the widespread adoption of high bandwidth Internet connections in the home, offices and on mobile devices, limitations in available bandwidth to download web pages have largely been eliminated.
At the same time latency remains a major problem. According to a recent presentation by Google, broadband Internet latency is 18ms for fiber technologies, 26ms for cable-based services, 43ms for DSL and 150ms-400ms for mobile devices. Ultimately, bandwidth can be expanded greatly with new technologies but latency is limited by the speed of light. The latency of an Internet connection directly affects the speed with which a web page can be downloaded.
The latency problem occurs because the TCP protocol requires round trips to acknowledge received information (since packets can and do get lost while traversing the Internet) and to prevent Internet congestion TCP has mechanisms to limit the amount of data sent per round trip until it has learnt how much it can send without causing congestion.
The collision between the speed of light and the TCP protocol is made worse by the fact that web site owners are likely to choose the cheapest hosting available without thinking about its physical location. In fact, the move to ‘the cloud' encourages the idea that web sites are simply ‘out there' without taking into account the very real problem of latency introduced by the distance between the end user's web browser and the server. It is not uncommon, for example, to see web sites aimed at UK consumers being hosted in the US. A web user in London accessing a .co.uk site that is actually hosted in Chicago incurs an additional 60ms round trip time because of the distance traversed.
Dealing with speed-of-light induced latency requires moving web content closer to user who are browsing, or making the web content smaller so that fewer round trips are required (or both).
The caching challenge
But the most critical part of a web page, the actual HTML content is often dynamically generated and cannot be cached. Because none of the relatively fast to load content that's in cache cannot even be loaded before the HTML, any delay in the web browser receiving the HTML affects the entire web browsing experience.
Thus being able to deliver the page HTML as quickly as possible even in high latency environments is vital to ensuring a good browsing experience. Studies have shown that the slower the page load time the more likely the user is to give up and move elsewhere. A recent Google study said that a response time of less than 100ms is perceived by a human as ‘instant' (a human eye blink is somewhere in the 100ms to 400ms range); less than 300ms the computer seems sluggish; above 1s and the user's train of thought is lost to distraction or other thoughts. TCP's congestion avoidance algorithm means that many round trips are necessary when downloading a web page. For example, getting just the HTML for the CNN home page takes approximately 15 round trips; it's not hard to see how long latency can quickly multiply into a situation where the end-user is losing patience with the web site.
Unfortunately, it is not possible to cache the HTML of most web pages because it is dynamically generated. Dynamic pages are commonplace because the HTML is programmatically generated and not static. For example, a news web site will generate fresh HTML as news stories change or to show a different page depending on the geographical location of the end user. Many web pages are also dynamically generated because they are personalized for the end user — each person's Facebook page is unique. And web application frameworks, such as WordPress, encourage the use dynamically generate HTML by default and mark the content as uncachable.
Compression to the rescue
Given that web pages need to be dynamically generated the only viable option is to reduce the page size so that fewer TCP round trips are needed minimizing the effect of latency. The current best option for doing this is the use of the gzip encoding. On typical web page content gzip encoding will reduce the page size to about 20-25% of the original size. But this still results in multiple-kilobytes of page data being transmitted incurring the TCP congestion avoidance and latency penalty; in the CNN example above there were 15 round-trips even though the page was gzip compressed.
Gzip encoding is completely generic. It does not take into account any special features of the content it is compressing. It is also self-referential: a gzip encoded page is entirely self-contained. This is advantageous because it means that a system that uses gzipped content can be stateless, but it means that even larger compression ratios that would be possible with external dictionaries of common content are not possible.
External dictionaries increase compression ratios dramatically because the compressed data can refer to items from the dictionary. Those references can be very small (a few bytes each) but expand to very large content from the dictionary.
For example, imagine that it's necessary to transmit The King James Bible to a user. The plain text version from Project Gutenberg is 4,452,097 bytes and compressed with gzip it is 1,404,452 bytes (a reduction in size to 31%). But imagine the case where the compressor knows that the end user has a separate copy of the Old Testament and New Testament in a dictionary of useful content. Instead of transmitting a megabyte of gzip compressed content they can transmit an instruction of the form \<Insert Old Testament>\<Insert New Testament>. That instruction will just be a few bytes long.
Clearly, that's an extreme and unusual case but it highlights the usefulness of external shared dictionaries of common content that can be used to reconstruct an original, uncompressed document. External dictionaries can be applied to dynamically generated web content to achieve compression that exceeds that possible with gzip.
Caching page parts
It's not hard to imagine that for any web site there are large parts of the HTML that are the same from request to request and from user to user. Even on a very personalized site like Facebook the HTML is similar from user to user.
And as more and more applications use HTTP for APIs there's an opportunity to increase API performance through the use of shared dictionaries of JSON or XML. APIs often contain even more common, repeated parts than HTML as they are intended for machine consumption and change slowly over time (whereas the HTML of a page will change more quickly as designers update the look of a page).
Two different proposals have tried to address this in different ways: SDCH and ESI. Neither have achieved acceptance as Internet standards partly because of the added complexity of deploying them.
In 2008, a group working at Google proposed a protocol for negotiating shared dictionaries of content so that a web server can compress a page in the knowledge that a web browser has chunks of the page in its cache. The proposal is known as SDCH (Shared Dictionary Compression over HTTP). Current versions of Google Chrome use SDCH to compress Google Search results.
This can be seen in the Developer Tools in Google Chrome. Any search request will contain an HTTP header specifying that the browser accepts SDCH compressed pages:
And if SDCH is used then the server responds indicating the dictionary that was used. If necessary Chrome will retrieve the dictionary. Since the dictionary should change infrequently it will be in local web browser cache most of the time. For example, here's a sample HTTP header seen in a real response from a Google Search:
In a Google Search the dictionary is used to locally cache infrequently changing parts of a page (such as the HTML header, navigation elements and page footer).
SDCH has not achieved widespread acceptance because of the difficulty of generating the shared dictionaries. Three problems arise: when to update the dictionary, how to update the dictionary and prevention of leakage of private information.
For maximum effectiveness it's desirable to produce a shared dictionary that will be useful in reducing page sizes across a large number of page views. To do this it's necessary to either implement an automatic technique that samples real web traffic and identifies common blocks of HTML, or to determine which pages are most viewed and compute dictionaries for them (perhaps based on specialised knowledge of what parts of the page are common across requests).
When automated techniques are used it's important to ensure that when sampling traffic that contains personal information (such as for a logged in user) that personal information does not end up in the dictionary.
Although SDCH is powerful when used, these dictionary generation difficulties have prevented its widespread use. The Apache mod_sdch project is inactive and the Google SDCH group has been largely inactive since 2011.
In 2001 a consortium of companies proposed addressing both latency and common content with ESI (Edge Side Includes). Edge Side Includes work by having a web page creator identify unchanging parts of the page and then making these available as separate mini-pages using HTTP.
For example, if a page contains a common header and navigation, a web page author might place that in a separate nav.html file and then in a page they are authoring enter the following XML in place of the header and navigation HTML:
\<esi:include src="http://example.com/nav.html" "continue"/>
ESI is intended for use with HTML content that is delivered via a Content Delivery Network and major CDNs were the sponsor of the original proposal.
When a user retrieves a CDN managed page that contains ESI components the CDN reconstructs the complete page from the component parts (which the CDN will either have to retrieve, or, more likely, have in cache since they change infrequently).
The CDN delivers the complete, normal HTML to the end user, but because the CDN has access nodes all over the world the latency between the end user web browser and the CDN is minimized. ESI tries to minimize the amount of data sent between the origin web server and the CDN (where the latency may be high) while being transparent to the browser.
The biggest problem with adoption of ESI is that it forces web page authors to break pages up into blocks that can be safely cached by a CDN adding to the complexity of web page authoring. In addition, a CDN has to be used to deliver the pages as web browsers do not understand the ESI directives.
The time dimension
The SDCH and ESI approaches rely on identifying parts of pages that are known to be unchanging and can be cached either at the edge of a CDN or in a shared dictionary in a web browser.
Another approach is to consider how web pages evolve over time. It is common for web users to visit the same web pages frequently (such as news sites, online email, social media and major retailers). This may mean that a user's web browser has some previous version of the web page they are loading in its local cache. Even though that web page may be out of date it could still be used as a shared dictionary as components of it are likely to appear in the latest version of the page.
For example, a daily visit to a news web site could be speeded up if a web server were only able to send the differences between yesterday's news and today's. It's likely that most of the HTML of a page like the BBC News homepage will have remained unchanged; only the stories will be new and they will only make up a small portion of the page.
CloudFlare looked at how much dynamically generated pages change over time and found that, for example, reddit.com changes by about 2.15% over five minutes and 3.16% over an hour. The New York Times home page changes by about 0.6% over five minutes and 3% over an hour. BBC News changes by about 0.4% over five minutes and 2% over an hour. With delta compression it would be possible to turn those figures directly into a compression ratio by only sending the tiny percentage of the page that has changed. Compressing the BBC News web site to 0.4% is an enormous improvement compared to gzip's 20-25% compression ratio meaning that 116KB would result in just 464 bytes transmitted (which would likely all fit in a single TCP packet requiring a single round trip).
This delta method is the essence of RFC 3229 which was written in 2002.
This RFC proposes an extension to HTTP where a web browser can indicate to a server that it has a particular version of a page (using the value from the ETag HTTP header that was supplied when the page was previously downloaded). The receiving web server can then apply a delta compression technique (encoded using VCDIFF discussed above) to send only the parts that have changed since that particular version of the page.
The RFC also proposes that a web browser be able to send the identifiers of multiple versions of a single page so that the web server can choose among them. That way, if the web browser has multiple versions in cache there's an increased chance that the server will have one of the versions available to it for delta compression.
Although this technique is powerful because it greatly reduces the amount of data to be sent from a web server to browser it has not been widely deployed because of the enormous resources needed on web servers.
To be effective a web server would need to keep copies of versions of the pages it generates in order that when a request comes in it is able to perform delta compression. For a popular web site that would create a large storage burden; for a site with heavy personalization it would mean keeping a copy of the pages served to every single user. For example, Facebook has around 1 billion active users, just keeping a copy of the HTML of the last time they viewed their timeline would require 250TB of storage.
CloudFlare's Railgun is a transparent delta compression technology that takes advantage of CloudFlare's CDN network to greatly accelerate the transmission of dynamically generated web pages from origin web servers to the CDN node nearest end user web surfers. Unlike SDCH and ESI it does not require any work on the part of a web site creator and unlike RFC 3229 it does not require caching a version of each page for each end user.
Railgun consists of two components: the sender and the listener. The sender is installed at every CloudFlare data center around the world. The listener is a software component that customers install on their network.
The sender and listener establish a permanent TCP connection that's secured by TLS. This TCP connection is used for the Railgun protocol. It's an all binary multiplexing protocol that allows multiple HTTP requests to be run simultaneously and asynchronously across the link. To a web client the Railgun system looks like a proxy server, but instead of being a server it's a wide-area link with special properties. One of those properties is that it performs compression on non-cacheable content by synchronizing page versions.
Each end of the Railgun link keeps track of the last version of a web page that's been requested. When a new request comes in for a page that Railgun has already seen, only the changes are sent across the link. The listener component make an HTTP request to the real, origin web server for the uncacheable page, makes a comparison with the stored version and sends across the differences.
The sender then reconstructs the page from its cache and the difference sent by the other side. Because multiple users pass through the same Railgun link only a single cached version of the page is needed for delta compression as opposed to one per end user with techniques like RFC 3229.
For example, a test on a major news site sent 23,529 bytes of gzipped data which when decompressed become 92,516 bytes of page (so the page is compressed to 25.25% of its original size). Railgun compression between two version of the page at a five minute interval resulted in just 266 bytes of difference data being sent (a compression to 0.29% of the original page size). The one hour difference is 2,885 bytes (a compression to 3% of the original page size). Clearly, Railgun delta compression outperforms gzip enormously.
For pages that are frequently accessed the deltas are often so small that they fit inside a single TCP packet, and because the connection between the two parts of Railgun is kept active problems with TCP congestion avoidance are eliminated.
The use of external dictionaries of content is a powerful technique that can achieve much larger compression ratios that the self-contained gzip method. But only CloudFlare's Railgun implements delta compression in a manner that is completely transparent to end users and website owners.