Cloudflare’s security architecture a few years ago was a classic “castle and moat” VPN architecture. Our employees would use our corporate VPN to connect to all the internal applications and servers to do their jobs. We enforced two-factor authentication with time-based one-time passcodes (TOTP), using an authenticator app like Google Authenticator or Authy when logging into the VPN but only a few internal applications had a second layer of auth. That architecture has a strong looking exterior, but the security model is weak. We recently detailed the mechanics of a phishing attack we prevented, which walks through how attackers can phish applications that are “secured” with second factor authentication methods like TOTP. Happily, we had long done away with TOTP and replaced it with hardware security keys and Cloudflare Access. This blog details how we did that.
The solution to the phishing problem is through a multi-factor authentication (MFA) protocol called FIDO2/WebAuthn. Today, all Cloudflare employees log in with FIDO2 as their secure multi-factor and authenticate to our systems using our own Zero Trust products. Our newer architecture is phish proof and allows us to more easily enforce the least privilege access control.
A little about the terminology of security keys and what we use
In 2018, we knew we wanted to migrate to phishing-resistant MFA. We had seen evilginx2 and the maturity around phishing push-based mobile authenticators, and TOTP. The only phishing-resistant MFA that withstood social engineering and credential stealing attacks were security keys that implement FIDO standards. FIDO-based MFA introduces new terminology, such as FIDO2, WebAuthn, hard(ware) keys, security keys, and specifically, the YubiKey (the name of a well-known manufacturer of hardware keys), which we will reference throughout this post.
WebAuthn refers to the web authentication standard, and we wrote in depth about how that protocol works when we released support for security keys in the Cloudflare dashboard.
CTAP1(U2F) and CTAP2 refers to the client to authenticator protocol which details how software or hardware devices interact with the platform performing the WebAuthn protocol.
FIDO2 is the collection of these two protocols being used for authentication. The distinctions aren’t important, but the nomenclature can be confusing.
The most important thing to know is all of these protocols and standards were developed to create open authentication protocols that are phishing-resistant and can be implemented with a hardware device. In software, they are implemented with Face ID, Touch ID, Windows Assistant, or similar. In hardware a YubiKey or other separate physical device is used for authentication with USB, Lightning, or NFC.
FIDO2 is phishing-resistant because it implements a challenge/response that is cryptographically secure, and the challenge protocol incorporates the specific website or domain the user is authenticating to. When logging in, the security key will produce a different response on example.net than when the user is legitimately trying to log in on example.com.
At Cloudflare, we’ve issued multiple types of security keys to our employees over the years, but we currently issue two different FIPS-validated security keys to all employees. The first key is a YubiKey 5 Nano or YubiKey 5C Nano that is intended to stay in a USB slot on our employee laptops at all times. The second is the YubiKey 5 NFC or YubiKey 5C NFC that works on desktops and on mobile either through NFC or USB-C.
In late 2018 we distributed security keys at a whole company event. We asked all employees to enroll their keys, authenticate with them, and ask questions about the devices during a short workshop. The program was a huge success, but there were still rough edges and applications that didn’t work with WebAuthn. We weren’t ready for full enforcement of security keys and needed some middle-ground solution while we worked through the issues.
The beginning: selective security key enforcement with Cloudflare Zero Trust
We have thousands of applications and servers we are responsible for maintaining, which were protected by our VPN. We started migrating all of these applications to our Zero Trust access proxy at the same time that we issued our employees their set of security keys.
Cloudflare Access allowed our employees to securely access sites that were once protected by the VPN. Each internal service would validate a signed credential to authenticate a user and ensure the user had signed in with our identity provider. Cloudflare Access was necessary for our rollout of security keys because it gave us a tool to selectively enforce the first few internal applications that would require authenticating with a security key.
We used Terraform when onboarding our applications to our Zero Trust products and this is the Cloudflare Access policy where we first enforced security keys. We set up Cloudflare Access to use OAuth2 when integrating with our identity provider and the identity provider informs Access about which type of second factor was used as part of the OAuth flow.
In our case, swk is a proof of possession of a security key. If someone logged in and didn’t use their security key they would be shown a helpful error message instructing them to log in again and press on their security key when prompted.
Selective enforcement instantly changed the trajectory of our security key rollout. We began enforcement on a single service on July 29, 2020, and authentication with security keys massively increased over the following two months. This step was critical to give our employees an opportunity to familiarize themselves with the new technology. A window of selective enforcement should be at least a month to account for people on vacation, but in hindsight it doesn’t need to be much longer than that.
What other security benefits did we get from moving our applications to use our Zero Trust products and off of our VPN? With legacy applications, or applications that don’t implement SAML, this migration was necessary for enforcement of role based access control and the principle of the least privilege. A VPN will authenticate your network traffic but all of your applications will have no idea who the network traffic belongs to. Our applications struggled to enforce multiple levels of permissions and each had to re-invent their own auth scheme.
When we onboarded to Cloudflare Access we created groups to enforce RBAC and tell our applications what permission level each person should have.
Here’s a site where only members of the ACL-CFA-CFDATA-argo-config-admin-svc group have access. It enforces that the employee used their security key when logging in, and no complicated OAuth or SAML integration was needed for this. We have over 600 internal sites using this same pattern and all of them enforce security keys.
The end of optional: the day Cloudflare dropped TOTP completely
In February 2021, our employees started to report social engineering attempts to our security team. They were receiving phone calls from someone claiming to be in our IT department, and we were alarmed. We decided to begin requiring security keys to be used for all authentication to prevent any employees from being victims of the social engineering attack.
After disabling all other forms of MFA (SMS, TOTP etc.), except for WebAuthn, we were officially FIDO2 only. “Soft token” (TOTP) isn’t perfectly at zero on this graph though. This is caused because those who lose their security keys or become locked out of their accounts need to go through a secure offline recovery process where logging in is facilitated through an alternate method. Best practice is to distribute multiple security keys for employees to allow for a back-up, in case this situation arises.
Now that all employees are using their YubiKeys for phishing-resistant MFA are we finished? Well, what about SSH and non-HTTP protocols? We wanted a single unified approach to identity and access management so bringing security keys to arbitrary other protocols was our next consideration.
Using security keys with SSH
To support bringing security keys to SSH connections we deployed Cloudflare Tunnel to all of our production infrastructure. Cloudflare Tunnel seamlessly integrates with Cloudflare Access regardless of the protocol transiting the tunnel, and running a tunnel requires the tunnel client cloudflared. This means that we could deploy the cloudflared binary to all of our infrastructure and create a tunnel to each machine, create Cloudflare Access policies where security keys are required, and ssh connections would start requiring security keys through Cloudflare Access.
In practice these steps are less intimidating than they sound and the Zero Trust developer docs have a fantastic tutorial on how to do this. Each of our servers have a configuration file required to start the tunnel. Systemd invokes cloudflared which uses this (or similar) configuration file when starting the tunnel.
- hostname: <identifier>.ssh.cloudflare.com
- service: http_status:404
When an operator needs to SSH into our infrastructure they use the ProxyCommand SSH directive to invoke cloudflared, authenticate using Cloudflare Access, and then forward the SSH connection through Cloudflare. Our employees’ SSH configurations have an entry that looks kind of like this, and can be generated with a helper command in cloudflared:
ProxyCommand /usr/local/bin/cloudflared access ssh –hostname %h.ssh.cloudflare.com
It’s worth noting that OpenSSH has supported FIDO2 since version 8.2, but we’ve found there are benefits to having a unified approach to access control where all access control lists are maintained in a single place.
What we’ve learned and how our experience can help you
There’s no question after the past few months that the future of authentication is FIDO2 and WebAuthn. In total this took us a few years, and we hope these learnings can prove helpful to other organizations who are looking to modernize with FIDO-based authentication.
If you’re interested in rolling out security keys at your organization, or you’re interested in Cloudflare’s Zero Trust products, reach out to us at [email protected]. Although we’re happy that our preventative efforts helped us resist the latest round of phishing and social engineering attacks, our security team is still growing to help prevent whatever comes next.