How to build your own public key infrastructure

A major part of securing a network as geographically diverse as CloudFlare’s is protecting data as it travels between datacenters. Customer data and logs are important to protect but so is all the control data that our applications use to communicate with each other. For example, our application servers need to securely communicate with our new datacenter in Osaka, Japan.

CC BY-SA 2.0 image by kris krüg

Great security architecture requires a defense system with multiple layers of protection. As CloudFlare’s services have grown, the need to secure application-to-application communication has grown with it. As a result, we needed a simple and maintainable way to ensure that all communication between CloudFlare’s internal services stay protected, so we built one based on known and reliable protocols.

Our system of trust is based on a Public Key Infrastructure (PKI) using internally-hosted Certificate Authorities (CAs). In this post we will describe how we built our PKI, how we use it internally, and how to run your own with our open source software. This is a long post with lots of information, grab a coffee!

Protection at the application layer

Most reasonably complex modern web services are not made up of one monolithic application. In order to handle complex operations, an application is often split up into multiple “services” that handle different portions of the business logic or data storage. Each of these services may live on different machines or even in different datacenters.

The software stacks of large service providers are made up of many components. For CloudFlare, this includes a web application to handle user actions, a main database to maintain DNS records and user rules, a data pipeline to distribute these rules to the edge network, services for caching, a log transport pipeline, data analysis services and much, much more.

Some service-to-service communication can happen within a machine, some within a datacenter and some across a broader network like the Internet. Managing which communications should use which type of network in our evolving services is not a simple task. A single accidental configuration change could result in messages that are supposed to never leave a machine going through an untrusted connection on the Internet. The system should be designed so that these messages are secure even if they go over the wrong network.

Enter TLS

One approach to mitigate the risks posed by attackers is to encrypt and authenticate data in transit. Our approach is to require that all new services use an encrypted protocol, Transport Layer Security (TLS), to keep inter-service communication protected. It was a natural choice: TLS is the “S” in HTTPS and is the foundation of the encrypted web. Furthermore, modern web services and APIs have embraced TLS as the de-facto standard for application layer encryption. It works seamlessly with RESTful services, is supported in Kyoto Tycoon, PostgreSQL, and the Go standard library.

As we have described in previous blog posts, unauthenticated encryption can be foiled by on-path attackers. Encryption without authentication does not protect data in transit. For connections to be safe, each party needs to prove their identity to the other. Public key cryptography provides many mechanisms for trust, including PGP’s “web of trust” and HTTPS’s public key infrastructure (PKI) model. We chose the PKI model because of ease of use and deployment. TLS with PKI provides trusted communication.

Be picky with your PKI

Trust is the bedrock of secure communication. For two parties to securely exchange information, they need to know that the other party is legitimate. PKI provides just that: a mechanism for trusting identities online.

The tools that enable this are digital certificates and public key cryptography. A certificate lets a website or service prove its identity. Practically speaking, a certificate is a file with some identity information about the owner, a public key, and a signature from a certificate authority (CA). Each certificate also contains a public key. Each public key has an associated private key, which is kept securely under the certificate owner’s control. The private key can be used to create digital signatures that can be verified by the associated public key.

A certificate typically contains

Information about the organization that the certificate is issued to
A public key
Information about the organization that issued the certificate
The rights granted by the issuer
The validity period for the certificate
Which hostnames the certificate is valid for
The allowed uses (client authentication, server authentication)
A digital signature by the issuer certificate’s private key

A certificate is a powerful tool for proving your identity online. The owner of a certificate can digitally sign data, and a verifier can use the public key from the certificate to verify it. The fact that the certificate is itself digitally signed by a third party CA means that if the verifier trusts the third party, they have assurances that the certificate is legitimate. The CA can give a certificate certain rights, such as a period of time in which the identity of the certificate should be trusted.

Sometimes certificates are signed by what’s called an intermediate CA, which is itself signed by a different CA. In this case, a certificate verifier can follow the chain until they find a certificate that they trust — the root.

This chain of trust model can be very useful for the CA. It allows the root certificate’s private key to be kept offline and only used for signing intermediate certificates. Intermediate CA certificates can be shorter lived and be used to sign endpoint certificates on demand. Shorter-lived online intermediates are easier to manage and revoke if compromised.

This is the same system used for HTTPS on the web. For example, cloudflare.com has a certificate signed by Comodo’s Intermediate certificate, which is in turn signed by the Comodo offline root. Browsers trust the Comodo root, and therefore also trust the intermediate and web site certificate.

This model works for the web because browsers only trust a small set of certificate authorities, who each have stringent requirements to only issue certificates after validating the ownership of a website. For internal services that are not accessed via browsers, there is *no need * to go through a third party certificate authority. Trusted certificates do not have to be from Globalsign, Comodo, Verisign or any other CA — they can be from a CA you operate yourself.

Building your own CA

The most painful part of getting a certificate for a website is going through the process of obtaining it. For websites, we eliminated this pain by launching Universal SSL. The most painful part of running a CA is the administration. When we decided to build our internal CA, we sought to make both obtaining certificates and operating the CA painless and even fun.

The software we are using is CFSSL, CloudFlare’s open source PKI toolkit. This tool was open sourced last year and has all the capabilities needed to run a certificate authority. Although CFSSL was built for an internal CA, it’s robust enough to be use a publicly trusted CA; in fact, the Let’s Encrypt project is using CFSSL as a core part of their CA infrastructure.

Key protection

To run a CA, you need the CA certificate and corresponding private key. This private key is extremely sensitive. Any person who knows the value of the key can act as the CA and issue certificates. Browser-trusted certificate authorities are required to keep their private keys inside of specialized hardware known as Hardware Security Modules (HSMs). The requirements for protecting private keys for corporate infrastructures are not necessarily as stringent, so we provided several mechanisms to protect keys.

CFSSL supports three different modes of protection for private keys:

Hardware Security Module (HSM)CFSSL allows the CA server to use an HSM to compute digital signatures. Most HSMs use an interface called PKCS#11 to interact with them, and CFSSL natively supports this interface. Using an HSM ensures that private keys do not live in memory and it provides tamper protection against physical adversaries.
Red OctoberPrivate Keys can be encrypted using Red October (another open source CloudFlare project). A key protected with Red October can only be decrypted with the permission of multiple key owners. In order to use CFSSL with a Red October key, the key owners need to authorize the use of the private key. This ensures that the CA key is never unencrypted on disk, in source control, or in configuration management. Note: Red October support in CFSSL is experimental and subject to change.
PlaintextCFSSL accepts plain unencrypted private keys. This works well when the private key is generated on the machine running CFSSL or by another program. If the machine that is running the CA is highly secure, this mode is a compromise between security, cost, and usability. This is also useful in development mode, allowing users to test changes to their infrastructure designs.

Next I’ll show you how to quickly configure an internal CA using plaintext private keys. Note: The following expects CFSSL to be installed.

Generating a CA key and certificate

To start, you need some information about what metadata you want to include in your certificate. Start by creating a file called csr_ca.json containing this basic information (feel free to fill in your own organization's details):

{
  "CN": "My Awesome CA",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
    "names": [
       {
         "C": "US",
         "L": "San Francisco",
         "O": "My Awesome Company",
         "OU": "CA Services",
         "ST": "California"
       }
    ]
}

With this you can create your CA with a single call"$ cfssl gencert -initca csr_ca.json | cfssljson -bare ca

This generates the two files you need to start your CA:

ca-key.pem: your private key
ca.pem: your certificate
ca.csr: your certificate signing request (needed to get your CA cross-signed by another CA)

The key and certificate are the bare minimum you need to start running a CA.

Policy

Once the CA certificate and key are created, the CA software needs to know what kind of certificates it will issue. This is determined in the CFSSL configuration file’s signing policy section.

Here’s an example of a simple policy for a CA that can issue certificates that are valid for a year and can be used for server authentication.

config_ca.json
{
  "signing": {
    "default": {
      "auth_key": "key1",
      "expiry": "8760h",
      "usages": [
         "signing",
         "key encipherment",
         "server auth"
       ]
     }
  },
  "auth_keys": {
    "key1": {
      "key": <16 byte hex API key here>,
      "type": "standard"
    }
  }
}

We also added an authentication key to this signing policy. This authentication key should be randomly generated and kept private. The API key is a basic authentication mechanism that prevents unauthorized parties from requesting certificates. There are several other features you can use for the CA (subject name allowlisting, etc.), CFSSL documentation for more information.

To run the service, call$ cfssl serve -ca-key ca-key.pem -ca ca.pem -config config_ca.json

This opens up a CA service listening on port 8888.

Issuing certificates

Certificate authorities do not just create certificates out of a private key and thin air, they need a public key and metadata to populate the certificate’s data fields. This information is typically communicated to a CA via a certificate signing request (CSR).

A CSR is very similar in structure to a certificate. The CSR contains:

Information about the organization that is requesting the certificate
A public key
A digital signature by the requestor’s private key

Given a CSR, a certificate authority can create a certificate. First, it verifies that the requestor has control over the associated private key. It does this by checking the CSR’s signature. Then the CA will check to see if the requesting party should be given a certificate and which domains/IPs it should be valid for. This can be done with a database lookup or through a registration authority. If everything checks out, the CA uses its private key to create and sign the certificate to send back to the requestor.

Requesting Certificates

Let’s say you have CFSSL set up as CA as described above and it’s running on a server called “ca1.mysite.com” with an authentication API key. How do you get this CA to issue a certificate? CFSSL provides two commands to help with that: gencert and sign. They are available as JSON API endpoints or command line options.

The gencert command will automatically handle the whole certificate generation process. It will create your private key, generate a CSR, send the CSR to the CA to be signed and return your signed certificate.

There are two configuration files needed for this. One to tell the local CFSSL client where the CA is and how to authenticate the request, and a CSR configuration used to populate the CSR.

Here’s an example for creating a certificate for a generic database server, db1.mysite.com.

csr_client.json

{
  "hosts": [
    	"db1.mysite.com"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "US",
      "L": "San Francisco",
      "O": “My Awesome Company",
      "OU": “Data Services",
      "ST": "California"
    }
  ]
}

config_client.json

{
  "signing": {
    "default": {
      "auth_key": "key1",
      "remote": "caserver"
    }
  },
  "auth_keys": {
    "key1": {
    "key": <16 byte hex API key here>,
    "type": "standard"
    }
  },
  "remotes": {
    "caserver": “ca1.mysite.com:8888"
  }
}

With these two configuration files set, you can create your certificate.$ cfssl gencert -config config_client.json csr_client.json | cfssljson -bare db

This results in three files:

db-key.pem: your private key
db.pem: your certificate
db.csr: your CSR

The CSR can be resubmitted to the CA to be signed again at any point with the sign command

$ cfssl sign -config config_client.json db.csr | cfssljson -bare db-new

resulting in:

db-new.pem: your re-signed certificate

These two commands let you easily and conveniently set up a private PKI. As a startup or a growing business moving to a service-oriented or even a microservice architecture, having a PKI can be very convenient. Next we’ll describe how CloudFlare set up its own internal PKI to help make its services encrypted by default.

Using a PKI for services

So now you have a complex set of services that can all speak TLS, and a central certificate authority server that can issue certificates. What’s next? Getting certificates and keys for the applications. There are several ways to do this including a centrally managed way and a distributed way.

Centralized distribution vs on demand

One way to create certificates and keys for your applications is to create them all on a central provisioning server and then send them out to each of the servers. In this model, a central server creates certificates and keys and sends them over a secure tunnel to the application servers.

This model creates a sort of chicken and egg problem. How do you transport private keys if you need those private keys to encrypt your transport?

A distributed key management model fits better with the way modern services are typically deployed and run. The trick is creating the private keys directly in the application server. At install or run time, a service creates its own private key and sends a request to a certificate authority to issue a certificate. This can be repeated on demand if the current certificate is close to expiring.

For example, many companies are starting to use Docker, or other lightweight container technologies, to encapsulate and run individual services. Under load, services can be scaled up by automatically running new containers. In a centralized distribution model, certificates and keys for each container need to be created before the containers are deployed.

In the centralized distribution model, the provisioning service needs to create and manage a key and certificate for each service. Keeping this sort of central database in a complex and dynamic topology seems like the wrong approach. It would be preferable if the CA itself was stateless and generated a set of logs.

The idea that keys need to be transported to their destination instead of generated locally is also troubling. Transporting private keys introduces an unnecessary risk to the system. When a new service comes into being, it should generate its key locally and request a certificate for use.

Trust across services

Internal services need to trust each other. Browsers validate website certificates by checking the signature on the certificate and checking the hostname against the list of Subject Alternative Names (SANs) in the certificate. This type of explicit check is useful, but it has a record of not working as expected. Another way for services to trust each other is an implicit check based on per-service CAs.

The idea is simple: use a different CA for each set of services. Issue all database certificates from a database CA and all API servers from an API server CA.

When setting these services up to talk to each other with mutual TLS authentication configure the trust stores as follows:

API server only trusts DB CA
DB only trusts API CA

This approach is less fine-grained than an explicit check against a SAN, but it is more robust and easier to manage on the CA policy side. With an implicit trust system in place, you can guarantee that services of type A can only communicate with services of type B.

The following diagram describes how two applications can trust each other with mutually authenticated TLS.

In this diagram, the API server trusts the DB CA (in red). It will therefore only accept certificates that are signed by the DB CA (i.e. with a red ribbon). Conversely, the database server only accepts certificates with a signature from the API CA (orange ribbon). To establish a trusted connection, each party sends a key share to the other, signed with their certificate’s private key. The key shares are combined to create a session key, with which both parties can encrypt their data. The chain of verification from key share to certificate to CA assure that the other party is authentic.

Establishing a mutually authenticated trusted tunnel between services prevents attackers from accessing or modifying data in transit and causing havoc on your services. With a strong PKI in place, every service can communicate securely and confidentially over any network, even the Internet.

Using a PKI for remote services

Internal PKIs are very flexible and can be used to issue certificates to third parties who are integrating with your network. For example, CloudFlare has a service called Railgun that can be used to optimize connections between CloudFlare and an origin server. Communication between Railgun and CloudFlare is done over an encrypted and authenticated channel using certificates from a CloudFlare certificate authority.

This ensures that data is secure in transit. When a new Railgun instance is set up on the origin server, it creates a private key and sends a CSR to CloudFlare, which then issues the appropriate certificate. The Railgun server keeps the private key in memory and deletes it when it shuts down, preventing other services from getting access to it.

This model works great for not only Railgun, but several other initiatives at CloudFlare such as the Origin CA and Keyless SSL.

Conclusion

Securing data at the application level is an important step for securing a distributed systems architecture, but is only truly effective with a strong PKI in place.

The Cloudflare Blog