Monsters in the Middleboxes: Introducing Two New Tools for Detecting HTTPS Interception

The practice of HTTPS interception continues to be commonplace on the Internet. HTTPS interception has encountered scrutiny, most notably in the 2017 study “The Security Impact of HTTPS Interception” and the United States Computer Emergency Readiness Team (US-CERT) warning that the technique weakens security. In this blog post, we provide a brief recap of HTTPS interception and introduce two new tools:

MITMEngine, an open-source library for HTTPS interception detection, and
MALCOLM, a dashboard displaying metrics about HTTPS interception we observe on Cloudflare’s network.

In a basic HTTPS connection, a browser (client) establishes a TLS connection directly to an origin server to send requests and download content. However, many connections on the Internet are not directly from a browser to the server serving the website, but instead traverse through some type of proxy or middlebox (a “monster-in-the-middle” or MITM). There are many reasons for this behavior, both malicious and benign.

Types of HTTPS Interception, as Demonstrated by Various Monsters in the Middle

One common HTTPS interceptor is TLS-terminating forward proxies. (These are a subset of all forward proxies; non-TLS-terminating forward proxies forward TLS connections without any ability to inspect encrypted traffic). A TLS-terminating forward proxy sits in front of a client in a TLS connection, transparently forwarding and possibly modifying traffic from the browser to the destination server. To do this, the proxy must terminate the TLS connection from the client, and then (hopefully) re-encrypt and forward the payload to the destination server over a new TLS connection. To allow the connection to be intercepted without a browser certificate warning appearing at the client, forward proxies often require users to install a root certificate on their machine so that the proxy can generate and present a trusted certificate for the destination to the browser. These root certificates are often installed for corporate managed devices, done by network administrators without user intervention.

Antivirus and Corporate Proxies

Some legitimate reasons for a client to connect through a forward proxy would be to allow antivirus software or a corporate proxy to inspect otherwise encrypted data entering and leaving a local network in order to detect inappropriate content, malware, and data breaches. The Blue Coat data loss prevention tools offered by Symantec are one example. In this case, HTTPS interception occurs to check if an employee is leaking sensitive information before sending the request to the intended destination.

Malware Proxies

Malicious forward proxies, however, might insert advertisements into web pages or exfiltrate private user information. Malware like Superfish insert targeted ads into encrypted traffic, which requires intercepting HTTPS traffic and modifying the content in the response given to a client.

Leaky Proxies

Any TLS-terminating forward proxy--whether it’s well-intentioned or not--also risks exposing private information and opens the door to spoofing. When a proxy root certificate is installed, Internet browsers lose the ability to validate the connection end-to-end, and must trust the proxy to maintain the security of the connection to ensure that sensitive data is protected. Some proxies re-encrypt and forward traffic to destinations using less secure TLS parameters.

Proxies can also require the installation of vendor root certificates that can be easily abused by other malicious parties. In November 2018, a type of Sennheiser wireless headphones required the user to install a root certificate which used insecure parameters. This root certificate could allow any adversary to impersonate websites and send spoofed responses to machines with this certificate, as well as observe otherwise encrypted data.

TLS-terminating forward proxies could even trust root certificates considered insecure, like Symantec’s CA. If poorly implemented, any TLS-terminating forward proxy can become a widespread attack vector, leaking private information or allowing for response spoofing.

Reverse Proxies

Reverse proxies also sit between users and origin servers. Reverse proxies (such as Cloudflare and Akamai) act on behalf of origin servers, caching static data to improve the speed of content delivery and offering security services such as DDoS mitigation. Critically, reverse proxies do not require special root certificates to be installed on user devices, since browsers establish connections directly to the reverse proxy to download content that is hosted at the origin server. Reverse proxies are often used by origin servers to improve the security of client HTTPS connections (for example, by enforcing strict security policies and using the newest security protocols like TLS 1.3). In this case, reverse proxies are intermediaries that provide better performance and security to TLS connections.

Why Continue Examining HTTPS Interception?

In a previous blog post, we argued that HTTPS interception is prevalent on the Internet and that it often degrades the security of Internet connections. A server that refuses to negotiate weak cryptographic parameters should be safe from many of the risks of degraded connection security, but there are plenty of reasons why a server operator may want to know if HTTPS traffic from its clients has been intercepted.

First, detecting HTTPS interception can help a server to identify suspicious or potentially vulnerable clients connecting to its network. A server can use this knowledge to notify legitimate users that their connection security might be degraded or compromised. HTTPS interception also increases the attack surface area of the system, and presents an attractive target for attackers to gain access to sensitive connection data.

Second, the presence of content inspection systems can not only weaken the security of TLS connections, but it can hinder the adoption of new innovations and improvements to TLS. Users connecting through older middleboxes may have their connections downgraded to older versions of TLS the middleboxes still support, and may not receive the security, privacy, and performance benefits of new TLS versions, even if newer versions are supported by both the browser and the server.

Introducing MITMEngine: Cloudflare’s HTTPS Interception Detector

Many TLS client implementations can be uniquely identified by features of the Client Hello message such as the supported version, cipher suites, extensions, elliptic curves, point formats, compression, and signature algorithms. The technique introduced by “The Security Impact of HTTPS Interception” is to construct TLS Client Hello signatures for common browser and middlebox implementations. Then, to identify HTTPS requests that have been intercepted, a server can look up the signature corresponding to the request’s HTTP User Agent, and check if the request’s Client Hello message matches the signature. A mismatch indicates either a spoofed User Agent or an intercepted HTTPS connection. The server can also compare the request’s Client Hello to those of known HTTPS interception tools to understand which interceptors are responsible for intercepting the traffic.

The Caddy Server MITM Detection tool is based on these heuristics and implements support for a limited set of browser versions. However, we wanted a tool that could be easily applied to the broad set of TLS implementations that Cloudflare supports, with the following goals:

Maintainability: It should be easy to add support for new browsers and to update existing browser signatures when browser updates are released.
Flexibility: Signatures should be able to capture a wide variety of TLS client behavior without being overly broad. For example, signatures should be able to account for the GREASE values sent in modern versions of Chrome.
Performance: Per-request MITM detection should be cheap so that the system can be deployed at scale.

To accomplish these goals, the Cryptography team at Cloudflare developed MITMEngine, an open-source HTTPS interception detector. MITMEngine is a Golang library that ingests User Agents and TLS Client Hello fingerprints, then returns the likelihood of HTTPS interception and the factors used to identify interception. To learn how to use MITMEngine, check out the project on GitHub.

MITMEngine works by comparing the values in an observed TLS Client Hello to a set of known browser Client Hellos. The fields compared include:

TLS version,
Cipher suites,
Extensions and their values,
Supported elliptic curve groups, and
Elliptic curve point formats.

When given a pair of User Agent and observed TLS Client Hello, MITMEngine detects differences between the given Client Hello and the one expected for the presented User Agent. For example, consider the following User Agent:

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/47.0.2526.111 Safari/537.36

This User Agent corresponds to Chrome 47 running on Windows 7. The paired TLS Client Hello includes the following cipher suites, displayed below as a hex dump:

0000  c0 2b c0 2f 00 9e c0 0a  c0 14 00 39 c0 09 c0 13   .+./.... ...9....
0010  00 33 00 9c 00 35 00 2f  00 0a                     .3...5./ ..

These cipher suites translate to the following list (and order) of 13 ciphers:

TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 (0xc02b)
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f)
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 (0x009e)
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (0xc00a)
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (0xc014)
TLS_DHE_RSA_WITH_AES_256_CBC_SHA (0x0039)
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (0xc009)
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (0xc013)
TLS_DHE_RSA_WITH_AES_128_CBC_SHA (0x0033)
TLS_RSA_WITH_AES_128_GCM_SHA256 (0x009c)
TLS_RSA_WITH_AES_256_CBC_SHA (0x0035)
TLS_RSA_WITH_AES_128_CBC_SHA (0x002f)
TLS_RSA_WITH_3DES_EDE_CBC_SHA (0x000a)

The reference TLS Client Hello cipher suites for Chrome 47 are the following:

0000  c0 2b c0 2f 00 9e cc 14  cc 13 c0 0a c0 14 00 39   .+./.... .......9
0010  c0 09 c0 13 00 33 00 9c  00 35 00 2f 00 0a         .....3.. .5./..

Looking closely, we see that the cipher suite list for the observed traffic is shorter than we expect for Chrome 47; two cipher suites have been removed, though the remaining cipher suites remain in the same order. The two missing cipher suites are

TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 (0xcc14)
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (0xcc13)

Chrome prioritizes these two ChaCha ciphers above AES-CBC ciphers--a good choice, given that CBC (cipher block chaining) mode is vulnerable to padding oracle attacks. It looks like the traffic we received underwent HTTPS interception, and the interceptor potentially didn't support ChaCha ciphers.

Using contextual clues like the used cipher suites, as well as additional User Agent text, we can also detect which software was used to intercept the HTTPS connection. In this case, MITMEngine recognizes that the fingerprint observed actually matches a fingerprint collected from Sophos antivirus software, and indicates that this software is likely the cause of this interception.

We welcome contributions to MITMEngine. We are particularly interested in collecting more fingerprints of MITM software and browser TLS Client Hellos, because MITMEngine depends on these reference fingerprints to detect HTTPS interception. Contributing these fingerprints is as simple as opening Wireshark, capturing a pcap file with a TLS Client Hello, and submitting the pcap file in a PR. More instructions on how to contribute can be found in the MITMEngine documentation.

Observing HTTPS Interception on Cloudflare’s Network with MALCOLM

To complement MITMEngine, we also built a dashboard, MALCOLM, to apply MITMEngine to a sample of Cloudflare’s overall traffic and observe HTTPS interception in the requests hitting our network. Recent MALCOLM data incorporates a fresh set of reference TLS Client Hellos, so readers will notice that percentage of "unknown" instances of HTTPS interception has decreased from Feburary 2019 to March 2019.

In this section of this blog post, we compare HTTPS interception statistics from MALCOLM to the 2017 study “The Security Impact of HTTPS Interception”. With this data, we can see the changes in HTTPS interception patterns observed by Cloudflare over the past two years.

Using MALCOLM, let’s see how HTTPS connections have been intercepted as of late. This MALCOLM data was collected between March 12 and March 13, 2019.

The 2017 study found that 10.9% of Cloudflare-bound TLS Client Hellos had been intercepted. MALCOLM shows that the number of interceptions has increased by a substantial amount, to 18.6%:

This result, however, is likely inflated compared to the results of the 2017 study. The 2017 study considered all traffic that went through Cloudflare, regardless of whether it had a recognizable User Agent or not. MALCOLM only considers results with recognizable User Agents that could be identified by uasurfer, a Golang library for parsing User Agent strings. Indeed, when we don’t screen out TLS Client Hellos with unidentified User Agents, we see that 11.3% of requests are considered intercepted--an increase of 0.4%. Overall, the prevalence of HTTPS interception activity does not seem to have changed much over the past two years.

Next, we examine the prevalence of HTTPS interception by browser and operating system. The paper presented the following table. We’re interested in finding the most popular browsers and most frequently intercepted browsers.

MALCOLM yields the following statistics for all traffic by browsers. MALCOLM presents mobile and desktop browsers as a single item. This can be broken into separate views for desktop and mobile using the filters on the dashboard.

Chrome usage has expanded substantially since 2017, while usage of Safari, IE, and Firefox has fallen somewhat (here, IE includes Edge). Examining the most frequently intercepted browsers, we see the following results:

We see above that Chrome again accounts for a larger percentage of intercepted traffic, likely given growth in Chrome’s general popularity. As a result, HTTPS interception rates for other browsers, like Internet Explorer, have fallen as IE is less frequently used. MALCOLM also highlights the prevalence of other browsers that have their traffic intercepted--namely, UCBrowser, a browser common in China.

Now, we examine the most common operating systems observed in Cloudflare’s traffic:

Android use has clearly increased over the past two years as smartphones become peoples’ primary device for accessing the Internet. Windows also remains a common operating system.

As Android becomes more popular, the likelihood of HTTPS interception occurring on Android devices also has increased substantially:

Since 2017, Android devices have overtaken those of Windows as the most intercepted.

As more of the world’s Internet consumption occurs through mobile devices, it’s important to acknowledge that simply changing platforms and browsers has not impacted the prevalence of HTTPS interception.

Conclusion

Using MITMEngine and MALCOLM, we’ve been able to continuously track the state of HTTPS interception on over 10% of Internet traffic. It’s imperative that we track the status of HTTPS interception to give us foresight when deploying new security measures and detecting breaking changes in security protocols. Tracking HTTPS interception also helps us contribute to our broader mission of “helping to build a better Internet” by keeping tabs on software that possibly weakens good security practices.

Interested in exploring more HTTPS interception data? Here are a couple of next steps:

Check out MALCOLM, click on a couple of percentage bars to apply filters, and share any interesting HTTPS interception patterns you see!
Experiment with MITMEngine today, and see if TLS connections to your website have been impacted by HTTPS interception.
Contribute to MITMEngine!

The Cloudflare Blog