Certificate Transparency (CT) is an ambitious project to help improve security online by bringing accountability to the system that protects HTTPS. Cloudflare is announcing support for this project by introducing two new public-good services:
Nimbus: A free and open certificate transparency log
Merkle Town: A dashboard for exploring the certificate transparency ecosystem
In this blog post we’ll explain what Certificate Transparency is and how it will become a critical tool for ensuring user safety online. It’s important for website operators and certificate authorities to learn about CT as soon as possible, because participating in CT becomes mandatory in Chrome for all certificates issued after April 2018. We’ll also explain how Nimbus works and how CT uses a structure called a Merkle tree to scale to the point of supporting all trusted certificates on the Internet. For more about Merkle Town, read the follow up post by my colleague Patrick Donahue.
Trust and Accountability
Everything we do online requires a baseline level of trust. When you use a browser to visit your bank’s website or your favorite social media site, you expect that the server on the other side of the connection is operated by the organization indicated in the address bar. This expectation is based on trust, and this trust is supported by a system called the web Public Key Infrastructure (web PKI).
From a high level, a PKI is similar to a system of notaries who can grant servers the authority to serve websites by giving them a signed object called a digital certificate. This certificate contains the name of the website, the name of the organization that requested the certificate, how long it’s valid for, and a public key. For each public key, there is an associated private key. In order to serve an HTTPS site with a given certificate, the server needs to prove ownership of the associated private key.
Sites obtain digital certificates from trusted third parties called Certificate Authorities (CAs). CAs validate the operator’s ownership of a domain before issuing them a certificate. If the issuing CA for a certificate is trusted by the browser, then that certificate can be used to serve content over HTTPS to visitors of the site. All this happens under the hood of the browser and is only surfaced to user by the little green lock in your address bar, or a nasty error message if things go wrong.
The Web’s PKI is an elaborate and complicated system. There are dozens of organizations that operate the certificate authorities trusted by popular browsers. Different browsers manage their own “root programs” in which they choose which certificate authorities are trusted. Becoming a trusted CA often requires passing an audit (WebTrust for CAs) and promising to follow a set of rules called the Baseline Requirements. These rules are set by a standards body called the CA/Browser forum, which consists of browsers and CAs. Each root program has their own application process and program-specific guidelines.
This system works great, until it doesn’t. As a system of trust, there are many ways for the PKI to fail. One of the risks inherent in the web PKI is that any certificate authority can issue any certificate for any website. That means that the Hong Kong Post Office can issue a certificate valid for gmail.com or facebook.com, and every browser will trust it. This forces users to put a lot of faith in certificate authorities and hope they don’t misbehave by issuing a certificate to the wrong person. Even worse, if a CA gets hacked, then an attacker could issue any certificate without the CA knowing, putting users at an even greater risk.
If someone manages to get a certificate for a site that isn’t theirs, they can impersonate that site to its visitors and steal their information. To get a sense of how many CAs are trusted by popular browsers, here’s the list of the 160 trusted CAs in Firefox. Microsoft and Apple maintain their own lists, which are comparably long (Cloudflare keeps track of these lists in our cfssl_trust repository). By trusting all of these organizations implicitly when you browse the internet, users bear the risk of certificate authority misbehavior.
What’s missing from the PKI is accountability. If a mis-issued certificate is used to target an individual, there is no feedback mechanism to let anyone know that the CA misbehaved.
This is not a theoretical situation. In 2011, DigiNotar, a Dutch CA, was hacked. The hacker used their access to the CA to issue a certificate for *.google.com. They then attempted to use this certificate to impersonate Gmail and targeted users in Iran in an attempt to compromise their personal information.
The attack was detected by Google using a technique called public key pinning. Key pinning is a risky technique and only viable for very technically savvy organizations. The value provided by key pinning is usually overshadowed by the operational risk it introduces. It’s often considered the canonical example of a “foot gun” mechanism. Key pinning is being deprecated by browsers.
If key pinning is not the solution, what can be done to detect CA malfeasance? This is where Certificate Transparency comes in. The goal of CT is to make all certificates public so that mis-issued certificates can be detected and appropriate action taken. This helps bring accountability to balance the trust we collectively place in the fragile web PKI.
Shining a light
Creative Commons Zero (CC0) - angele-j
If all certificates are public, then so are mis-issued certificates. Certificate Transparency brings accountability to the web PKI using a technology called a blockchain an append-only public ledger, an ordered list. It puts trusted certificates into a list and makes that list available to anyone. That sounds easy, but given the decentralized nature of the internet, there are many challenges in making this a reliable bedrock for accountability.
The CT ecosystem is an extension of the PKI that introduces additional participants and roles. Along with browsers and CAs, new parties are introduced to play a role in the overall health of the system. These parties are:
Log operators
Auditors
Monitors
At a high level, a log operator is someone who manages the list of certificates than can only be added to. If someone submits a browser-trusted certificate to a log, it must be incorporated into the list within a pre-set grace period called a maximum merge delay (MMD), which is usually 24 hours. The log gives the submitter back a receipt, called a Signed Certificate Timestamp (SCT). An SCT is a promise to include the certificate in the log within the grace period. You can interact with a CT log via the CT API (defined in RFC 6962).
An auditor is a third party that keeps log operators honest. They query logs from various vantage points on the internet and gossip with each other about what order certificates are in. They’re like the paparazzi of the PKI. They also keep track of whether an SCT has been honored or not by measuring the time it took between the SCT’s timestamp and the corresponding certificate showing up in the log. Along with being a dashboard, the Merkle Town backend is also an auditor; it has even detected issues in other logs.
A monitor is a service that helps alert websites of mis-issuance. It crawls logs for new certificates and alerts website owners if a new certificate is found for their domain. Popular monitors include SSLMate’s CertSpotter and Facebook’s Certificate Transparency Monitoring. Cloudflare is planning on offering a free log monitoring service integrated into the Cloudflare dashboard for customers by the end of the year.
Letting browsers know that certificates are logged
One way to push the web to adopt CT is for browsers to start requiring website certificates to be logged. Validating that a certificate is logged by querying logs directly when a connection is made is a potential privacy issue (exposing browser history to a third party), and adds latency. Instead, a better way to ensure that a certificate is logged is to require the server to present SCTs, the inclusion receipts described in the last section.
There are multiple ways that an SCT can be presented to the client. The most popular way to is to embed the SCTs into the certificate when it’s issued. This involves the CA submitting a pre-certificate, a precursor to a certificate, to various CT logs to obtain SCTs before finalizing the certificate.
For certificates that don’t have SCTs embedded, a server has other mechanisms to transmit SCTs to the client. SCTs can be included in the connection as a TLS extension, or in a stapled OCSP response. These mechanisms are more difficult to do reliably, but they allow any certificate to be included in CT.
A certificate with embedded SCTs
Browser CT requirements
Not all logs are created equally. A malicious CA could collude with a set of logs to create a certificate and a set of SCTs and not ever incorporate the certificate into the logs. This certificate could then be used to attack users. To protect the ecosystem from collusion risks, the browsers that support CT have chosen to only accept SCTs from a list of vetted logs that are actively audited. There are also diversity requirements: logs should be managed by different entities on different infrastructures to avoid collusion and shared outages. Connections are said to be “CT Qualified” if they provide the client with enough SCTs from vetted logs to suit the risk profile of the certificate.
In order for a connection to be CT qualified in Chrome, it has to follow Chrome’s CT Policy. For most certificates, this means an SCT needs to be presented for at least one Google and one non-Google log trusted by Chrome. Long-lived certificates more than two SCTs.
In order to become trusted by Chrome, a CT log must submit itself for inclusion and pass a 90 day monitoring period in which it must pass some stringent requirements including the following:
Have no outage that exceeds an MMD (maximum merge delay) of more than 24 hours
Have 99% uptime, with no outage lasting longer than the MMD (as measured by Google)
Incorporate a certificate for which an SCT has been issued by the Log within the MMD
Maintain the append-only property of the of the Log by providing consistent views of the Merkle Tree at all times and to all parties
The full policy is here. The complete list of vetted logs is available here. Nimbus, Cloudflare’s family of CT logs, was included in Chrome 65, which is the current stable version of the Browser.
All or nothing
Creative Commons Attribution-Share Alike 3.0 - Gorskiya
CT only protects users if all certificates are logged. If a CA issues a certificate that is not logged in CT and will be trusted by browsers, then users can still be subject to targeted attacks.
For the last few years, Chrome has required all Extended Validation (EV) certificates to be CT qualified. Cloudflare has been making sure all that connections to Cloudflare are CT Qualified for Chrome and have been since May 2017. This is done using an automatic submission service related to our recent OCSP stapling project and the SCT TLS extension. We monitor whether or not the set of SCTs we provide is conformant with browser policies using the Expect-CT header, which reports client errors back to our servers. Expect CT is included in every HTTPS response Cloudflare serves.
This isn’t enough. As long as there are certificates that are trusted by browsers that are not required to be CT Qualified, then users are at risk. This is why the Chrome team announced that they will require Certificate Transparency for all newly issued, publicly trusted certificates starting in April 2018.
Changes for Certificate Authorities
The safest way to make sure a certificate is always CT qualified when shown to a browser is to embed enough SCTs from vetted logs to conform to the browser’s policy. This is a big operational change for some CAs, because
It adds the additional step of having to understand which SCTs are required for different browsers’ CT policies and keeping up to date with policy changes
It adds a step in the issuance process where pre-certificates are to different logs before the certificate is issued
Some CT logs may have outages or be slow to respond, so fallback strategies may be required to not delay issuance
If you’re a CA operator and are struggling to implement this or worried about the additional delay caused by submitting pre-certificates to logs, Cloudflare can help. We’re offering an experimental API for CAs that takes pre-certificates and returns a valid set of SCTs for a given certificate. This API handles sorting out the browser policies and leverages Argo Smart Routing to return SCTs with as little delay as possible. Please contact our CT team at [email protected] if you have any interest in this offering.
Building a verifiable globally consistent log
The PKI is huge. The industry-wide push for HTTPS has introduced millions of new certificates into the web PKI. There are over a quarter-billion(!) certificates logged in CT, and this number is growing by almost a million per day. This number is sure to grow as we approach Chrome’s April deadline. Managing a high availability database of this size that you can only add to (a property called append-only) is a substantial engineering challenge.
The naïve data structure to use for an append-only database is a hash chain. In a hash chain, elements are aligned in order and combined using a one-way hash function like SHA-256. The diagram below describes how a hash chain is created from a list of values d1 to d8. Start with the first element, d1, which is hashed to a value a, which then becomes the head of the chain. Every time an element is added to the chain, a hash function is computed over two inputs: the current chain head and the new element. This hash value become the new chain head. Because one-way hash functions can’t be reversed, it’s computationally infeasible to change a value without it changing the entire chain that has been computed from it. This makes a hash chain’s history very difficult to modify maliciously.
A hash chain is the optimal data structure for inserting new elements: only one hash needs to be computed for each element added. However, it is not an efficient data structure for validating whether an element is correctly included in a chain given a chain head. In the example below, there are six additional elements (b, d4-d8) needed to validate that d3 is correct on a chain of 8 values. You need approximately n/2 elements to verify an average element in a chain of length n. In computer science terms, this is called “linear scaling.”
When building a system, it’s best to try to reduce complexity for the participants involved. For CT, the main participants we care about are the log operator and the auditor. If we were to choose a hash chain as our data structure, the job of the log operator would be easy but that of the auditor would be very hard. We can do better.
When you ask a computer scientist how to optimize an algorithm, nine out of ten times, the solution they will suggest is to use a tree (the other 1/10 times, they’ll suggest a Bloom filter). That’s exactly what we can do here. Specifically, we can use a data structure called a Merkle Tree. It’s like a hash chain, but rather than hashing elements in a line, you hash them in pairs.
For each new element, instead of hashing it into a running total, you arrange the elements into a balanced binary tree and compute the hash of the element with its sibling. This gives you half as many hashes as elements. These hashes are then arranged in pairs and hashed together to create the next level of the tree. This continues until you have one element, the top of the tree; this is called the tree head. Adding a new value to a Merkle tree requires the modification of at most one hash per level in the tree, following the path from the element up to the tree head.
The depth of a binary tree is logarithmic with respect to the number of elements. Roughly speaking, if the tree size is 8 = 2^3, then the depth is 4 (= 3+1), if it’s 1024 = 2^10 then the tree depth is 11 (= 10+1), for 1048576 = 2^20 the tree size is 21 (= 20+1). The cost of insertion is at most log_2(size), which is worse than in a hash chain, but generally not too bad.
What makes the Merkle tree so useful is the efficiency of element validation. Instead of having to compute n/2 hashes like in a hash chain, you only need the elements in the tree that lead you to the root. This is called the co-path. In the diagram below, the co-path is computed for d3.
The copath consists of one value per level of the tree. The computation necessary to prove that an element is correct (an inclusion proof) is therefore logarithmic, not linear as in the case of a hash tree. Both insertion and validation are cheap relative to the size of the tree, making a Merkle tree the ideal data structure for CT.
A certificate transparency log is a Merkle tree where the leaf elements are certificates. Each log has a private key that it uses to sign the current tree head at regular intervals. Some CT logs are huge with over a hundred million entries, but because of the efficiency of Merkle trees, inclusion proofs only require around 30 hashes. This structure provides a good balance between the cost to the log operator of adding certificates and the cost to the auditor of validating its consistency.
Nimbus
Our own addition to the log ecosystem is Nimbus. Nimbus is a family of Certificate Transparency logs with an open acceptance policy. Nimbus accepts any certificate that is signed by a CA from our cfssl_trust root store. The logs are organized by year, e.g. Nimbus 2018, Nimbus 2019, etc. with each log only accepting certificates that expire in that year.
Nimbus is built with Trillian, a Go-based implementation of a scalable Merkle tree. The data backend is custom, re-using components from Cloudflare’s high capacity logging infrastructure, which runs entirely on Cloudflare’s bare metal infrastructure. Having a high-reliability and completely open log that is not dependent on shared cloud infrastructure adds insurance in case of outages. Nimbus is intended to bring diversity and stability to the CT ecosystem, and in the end, make the internet safer for everyone.