The Internet is an amazing invention. We marvel at how it connects people, connects ideas, and makes the world smaller. But the Internet isn’t perfect. It was put together piecemeal through publicly funded research, private investment, and organic growth that has left us with an imperfect tapestry. It’s also evolving. People are constantly developing creative applications and finding new uses for existing Internet technology. Issues like privacy and security that were afterthoughts in the early days of the Internet are now supremely important. People are being tracked and monetized, websites and web services are being attacked in interesting new ways, and the fundamental system of trust the Internet is built on is showing signs of age. The Internet needs an upgrade, and one of the tools that can make things better is cryptography.
Every day this week, Cloudflare will be announcing support for a new technology that uses cryptography to make the Internet better (hint: subscribe to the blog to make sure you don't miss any of the news). Everything we are announcing this week is free to use and provides a meaningful step towards supporting a new capability or structural reinforcement. So why are we doing this? Because it’s good for the users and good for the Internet. Welcome to Crypto Week!
Day 1: Distributed Web Gateway
Day 2: DNSSEC
Day 3: RPKI
Day 4: Onion Routing
Day 5: Roughtime
A more trustworthy Internet
Everything we do online depends on a relationship between users, services, and networks that is supported by some sort of trust mechanism. These relationships can be physical (I plug my router into yours), contractual (I paid a registrar for this domain name), or reliant on a trusted third party (I sent a message to my friend on iMessage via Apple). The simple act of visiting a website involves hundreds of trust relationships, some explicit and some implicit. The sheer size of the Internet and number of parties involved make trust online incredibly complex. Cryptography is a tool that can be used to encode and enforce, and most importantly scale these trust relationships.
To illustrate this, let’s break down what happens when you visit a website. But before we can do this, we need to know the jargon.
Autonomous Systems (100 thousand or so active): An AS corresponds to a network provider connected to the Internet. Each has a unique Autonomous System Number (ASN).
IP ranges (1 million or so): Each AS is assigned a set of numbers called IP addresses. Each of these IP addresses can be used by the AS to identify a computer on its network when connecting to other networks on the Internet. These addresses are assigned by the Regional Internet Registries (RIR), of which there are 5. Data sent from one IP address to another hops from one AS to another based on a “route” that is determined by a protocol called BGP.
Domain names (>1 billion): Domain names are the human-readable names that correspond to Internet services (like “cloudflare.com” or “mail.google.com”). These Internet services are accessed via the Internet by connecting to their IP address, which can be obtained from their domain name via the Domain Name System (DNS).
Content (infinite): The main use case of the Internet is to enable the transfer of specific pieces of data from one point on the network to another. This data can be of any form or type.
When you type a website such as blog.cloudflare.com into your browser, a number of things happen. First, a (recursive) DNS service is contacted to get the IP address of the site. This DNS server configured by your ISP when you connect to the Internet, or it can be a public service such as 1.1.1.1 or 8.8.8.8. A query to the DNS service travels from network to network along a path determined by BGP announcements. If the recursive DNS server does not know the answer to the query, then it contacts the appropriate authoritative DNS services, starting with a root DNS server, down to a top level domain server (such as com or org), down to the DNS server that is authoritative for the domain. Once the DNS query has been answered, the browser sends an HTTP request to its IP address (traversing a sequence of networks), and in response, the server sends the content of the website.
So what’s the problem with this picture? For one, every DNS query and every network hop needs to be trusted in order to trust the content of the site. Any DNS query could be modified, a network could advertise an IP that belongs to another network, and any machine along the path could modify the content. When the Internet was small, there were mechanisms to combat this sort of subterfuge. Network operators had a personal relationship with each other and could punish bad behavior, but given the number of networks in existence almost 400,000 as of this week this is becoming difficult to scale.
Cryptography is a tool that can encode these trust relationships and make the whole system reliant on hard math rather than physical handshakes and hopes.
Building a taller tower of turtles
Attribution 2.0 Generic (CC BY 2.0)
The two main tools that cryptography provides to help solve this problem are cryptographic hashes and digital signatures.
A hash function is a way to take any piece of data and transform it into a fixed-length string of data, called a digest or hash. A hash function is considered cryptographically strong if it is computationally infeasible (read: very hard) to find two inputs that result in the same digest, and that changing even one bit of the input results in a completely different digest. The most popular hash function that is considered secure is SHA-256, which has 256-bit outputs. For example, the SHA-256 hash of the word “crypto” is
DA2F073E06F78938166F247273729DFE465BF7E46105C13CE7CC651047BF0CA4
And the SHA-256 hash of “crypt0” is
7BA359D3742595F38347A0409331FF3C8F3C91FF855CA277CB8F1A3A0C0829C4
The other main tool is digital signatures. A digital signature is a value that can only be computed by someone with a private key, but can be verified by anyone with the corresponding public key. Digital signatures are way for a private key holder to “sign,” or attest to the authenticity of a specific message in a way that anyone can validate it.
These two tools can be combined to solidify the trust relationships on the Internet. By giving private keys to the trusted parties who are responsible for defining the relationships between ASs, IPs, domain names and content, you can create chains of trust that can be publicly verified. Rather than hope and pray, these relationships can be validated in real time at scale.
Let’s take our webpage loading example and see where digital signatures can be applied.
Routing. Time-bound delegation of trust is defined through a system called the RPKI. RPKI defines an object called a Resource Certificate, an attestation that states that a given IP range belongs to a specific ASN for this period of time, digitally signed by the RIR responsible for assigning the IP range. Networks share routes via BGP, and if a route is advertised for an IP that does not conform the the Resource Certificate, the network can choose not to accept it.
DNS. Adding cryptographic assurance to routing is powerful, but if a network adversary can change the content of the data (such as the DNS responses), then the system is still at risk. DNSSEC is a system built to provide a trusted link between names and IP addresses. The root of trust in DNSSEC is the DNS root key, which is managed with an elaborate signing ceremony.
HTTPS. When you connect to a site, not only do you want it to be coming from the right host, you also want the content to be private. The Web PKI is a system that issues certificates to sites, allowing you to bind the domain name to a time-bounded private key. Because there are many CAs, additional accountability systems like certificate transparency need to be involved to help keep the system in check.
This cryptographic scaffolding turns the Internet into an encoded system of trust. With these systems in place, Internet users no longer need to trust every network and party involved in this diagram, they only need to trust the RIRs, DNSSEC and the CAs (and know the correct time).
This week we’ll be making some announcements that help strengthen this system of accountability.
Privacy and integrity with friends
The Internet is great because it connects us to each other, but the details of how it connects us are important. The technical choices made when Internet was designed come with some interesting human implications.
One implication is trackability. Your IP address is contained on every packet you send over the Internet. This acts as a unique identifier for anyone (corporations, governments, etc.) to track what you do online. Furthermore, if you connect to a server, that server’s identity is sent in plaintext on the request even over HTTPS, revealing your browsing patterns to any intermediary who cares to look.
Another implication is malleability. Resources on the Internet are defined by where they are, not what they are. If you want to go to CNN or BBC, then you connect to the server for cnn.com or bbc.co.uk and validate the certificate to make sure it’s the right site. But once you’ve made the connection, there’s no good way to know that the actual content is what you expect it to be. If the server is hacked, it could send you anything, including dangerous malicious code. HTTPS is a secure pipe, but there’s no inherent way to make sure what gets sent through the pipe is what you expect.
Trackability and malleability are not inherent features of interconnectedness. It is possible to design networks that don’t have these downsides. It is also possible to build new networks with better characteristic on top of the existing Internet. The key ingredient is cryptography.
Tracking-resilient networking
One of the networks built on top of the Internet that provides good privacy properties is Tor. The Tor network is run by a group of users who allow their computers to be used to route traffic for other members of the network. Using cryptography, it is possible to route traffic from one place to another without points along the path knowing both the source and the destination at the same time. This is called onion routing because it involves multiple layers of encryption, like an onion. Traffic coming out of the Tor network is “anonymous” because it could have come from anyone connected to the network. Everyone just blends in, making it hard to track individuals.
Similarly, web services can use onion routing to serve content inside the Tor network without revealing their location to visitors. Instead of using a hostname to identify their network location, so-called onion services use a cryptographic public key as their address. There are hundreds of onion services in use, including the one we use for 1.1.1.1 or the one in use by Facebook.
Troubles occur at the boundary between Tor network and the rest of the Internet. This is especially true for user is attempting to access services that rely on abuse prevention mechanisms based on reputation. Since Tor is used by both privacy-conscious users and malicious bots, connections from both get lumped together and as the expression goes, one bad apple ruins the bunch. This unfortunately exposes legitimate visitors to anti-abuse mechanisms like CAPTCHAs. Tools like Privacy Pass help reduce this burden but don’t eliminate it completely. This week we’ll be announcing a new way to improve this situation.
Bringing integrity to content delivery
Let’s revisit the issue of malleability: the fact that you can’t always trust the other side of a connection to send you the content you expect. There are technologies that allow users to insure the integrity of content without trusting the server. One such technology is a feature of HTML called Subresource Integrity (SRI). SRI allows a webpage with sub-resources (like a script or stylesheet) to embed a unique cryptographic hash into the page so that when the sub-resource is loaded, it is checked to see that is matches the expected value. This protects the site from loading unexpected scripts from third parties, a known attack vector.
Another idea is to flip this on its head: what if instead of fetching a piece of content from a specific location on the network, you asked the network to find piece of content that matches a given hash? By assigning resources based on their actual content rather than by location it’s possible to create a network in which you can fetch content from anywhere on the network and still know it’s authentic. This idea is called content addressing and there are networks built on top of the Internet that use it. These content addressed networks, based on protocols such IPFS and DAT, are blazing a trail new trend in Internet applications called the Distributed Web. With the Distributed Web applications, malleability is no longer an issue, opening up a new set of possibilities.
Combining strengths
Networks based on cryptographic principles, like Tor and IPFS, have one major downside compared to networks based on names: usability. Humans are exceptionally bad at remembering or distinguishing between cryptographically-relevant numbers. Take, for instance, the New York Times’ onion address:
https://www.nytimes3xbfgragh.onion/
This is would easily confused with similar-looking onion addresses, such as
https://www.nytimes3xfkdbgfg.onion/
which may be controlled by a malicious actor.
Content addressed networks are even worse from the perspective of regular people. For example, there is a snapshot of the Turkish version of Wikipedia on IPFS with the hash:
QmT5NvUtoM5nWFfrQdVrFtvGfKFmG7AHE8P34isapyhCxX
Try typing this hash into your browser without making a mistake.
These naming issues are things Cloudflare is perfectly positioned to help solve.First, by putting the hash address of an IPFS site in the DNS (and adding DNSSEC for trust) you can give your site a traditional hostname while maintaining a chain of trust.
Second, by enabling browsers to use a traditional DNS name to access the web through onion services, you can provide safer access to your site for Tor user with the added benefit of being better able to distinguish between bots and humans.With Cloudflare as the glue, is is possible to connect both standard internet and tor users to web sites and services on both the traditional web with the distributed web.
This is the promise of Crypto Week: using cryptographic guarantees to make a stronger, more trustworthy and more private internet without sacrificing usability.
Happy Crypto Week
In conclusion, we’re working on many cutting-edge technologies based on cryptography and applying them to make the Internet better. The first announcement today is the launch of Cloudflare's Distributed Web Gateway and browser extension. Keep tabs on the Cloudflare blog for exciting updates as the week progresses.
I’m very proud of the team’s work on Crypto Week, which was made possible by the work of a dedicated team, including several brilliant interns. If this type of work is interesting to you, Cloudflare is hiring for the crypto team and others!