Encryption is one of the most powerful technologies that everyone uses on a daily basis without realizing it. Transport-layer encryption, which protects data as it’s sent across the Internet to its intended destination, is now ubiquitous because it’s a fundamental tool for creating a trustworthy Internet. Disk encryption, which protects data while it’s sitting idly on your phone or laptop’s hard drive, is also becoming ubiquitous because it prevents anybody who steals your device from also being able to see what’s on your desktop or read your email.
The next improvement on this technology that’s starting to gain popularity is end-to-end encryption, which refers to a system where only the end-users are able to access their data -- not any intermediate service providers. Some of the most popular examples of this type of encryption are chat apps like WhatsApp and Signal. End-to-end encryption significantly reduces the likelihood of a user’s data being maliciously stolen from, or otherwise mishandled by a service provider. This is because even if the service provider loses the data, nobody will have the keys to decrypt it!
Several months ago, I realized that I had a lot of sensitive files on my computer (my diary, if you must know) that I was afraid of losing, but I didn’t feel comfortable putting them in something like Google Drive or Dropbox. While Google and Dropbox are absolutely trustworthy companies, they don’t offer encryption and this is a case where I would really wanted complete control of my data.
From looking around, it was hard for me to find something that met all of my requirements:
Would both encrypt and authenticate the directory structure, meaning that file names are hidden and it’s not possible for others to move or rename files.
Viewing/changing part of a large file doesn’t require downloading and decrypting the entire file.
Is open-source and has a documented protocol.
So I set out to build such a system! The end result is called UtahFS, and the code for it is available here. Keep in mind that this system is not used in production at Cloudflare: it’s a proof-of-concept that I built while working on our Research Team. The rest of this blog post describes why I built it as I did, but there’s documentation in the repository on actually using it if you want to skip to that.
Storage Layer
The first and most important part of a storage system is… the storage. For this, I used Object Storage, because it’s one of the cheapest and most reliable ways to store data on somebody else’s hard drives. Object storage is nothing more than a key-value database hosted by a cloud provider, often tuned for storing values around a few kilobytes in size. There are a ton of different providers with different pricing schemes like Amazon S3, Backblaze B2, and Wasabi. All of them are capable of storing terabytes of data, and many also offer geographic redundancy.
Data Layer
One of the requirements that was important to me was that it shouldn’t be necessary to download and decrypt an entire file before being able to read a part of it. One place where this matters is audio and video files, because it enables playback to start quickly. Another case is ZIP files: a lot of file browsers have the ability to explore compressed archives, like ZIP files, without decompressing them. To enable this functionality, the browser needs to be able to read a specific part of the archive file, decompress just that part, and then move somewhere else.
Internally, UtahFS never stores objects that are larger than a configured size (32 kilobytes, by default). If a file has more than that amount of data, the file is broken into multiple objects which are connected by a skip list. A skip list is a slightly more complicated version of a linked list that allows a reader to move to a random position quickly by storing additional pointers in each block that point further than just one hop ahead.
When blocks in a skip list are no longer needed, because a file was deleted or truncated, they’re added to a special “trash” linked list. Elements of the trash list can then be recycled when blocks are needed somewhere else, to create a new file or write more data to the end of an existing file, for example. This maximizes reuse and means new blocks only need to be created when the trash list is empty. Some readers might recognize this as the Linked Allocation strategy described in The Art of Computer Programming: Volume 1, section 2.2.3!
The reason for using Linked Allocation is fundamentally that it’s the most efficient for most operations. But also, it’s the approach for allocating memory that’s going to be most compatible with the cryptography we talk about in the next three sections.
Encryption Layer
Now that we’ve talked about how files are broken into blocks and connected by a skip list, we can talk about how the data is actually protected. There are two aspects to this:
The first is confidentiality, which hides the contents of each block from the storage provider. This is achieved simply by encrypting each block with AES-GCM, with a key derived from the user’s password.
While simple, this scheme doesn’t provide forward secrecy or post-compromise security. Forward Secrecy means that if the user’s device was compromised, an attacker wouldn’t be able to read deleted files. Post-Compromise Security means that once the user’s device is no longer compromised, an attacker wouldn’t be able to read new files. Unfortunately, providing either of these guarantees means storing cryptographic keys on the user’s device that would need to be synchronized between devices and, if lost, would render the archive unreadable.
This scheme also doesn’t protect against offline password cracking, because an attacker can take any of the encrypted blocks and keep guessing passwords until they find one that works. This is somewhat mitigated by using Argon2, which makes guessing passwords more expensive, and by recommending that users choose strong passwords.
I'm definitely open to improving the encryption scheme in the future, but considered the security properties listed above too difficult and fragile for the initial release.
Integrity Layer
The second aspect of data protection is integrity, which ensures the storage provider hasn’t changed or deleted anything. This is achieved by building a Merkle Tree over the user’s data. Merkle Trees are described in-depth in our blog post about Certificate Transparency. The root hash of the Merkle Tree is associated with a version number that’s incremented with each change, and both the root hash and the version number are authenticated with a key derived from the user’s password. This data is stored in two places: under a special key in the object storage database, and in a file on the user’s device.
Whenever the user wants to read a block of data from the storage provider, they first request the root stored remotely and check that it’s either the same as what they have on disk, or has a greater version number than what’s on disk. Checking the version number prevents the storage provider from reverting the archive to a previous (valid) state undetected. Any data which is read can then be verified against the most recent root hash, which prevents any other types of modifications or deletions.
Using a Merkle Tree here has the same benefit as it does for Certificate Transparency: it allows us to verify individual pieces of data without needing to download and verify everything at once. Another common tool used for data integrity is called a Message Authentication Code (or MAC), and while it’s a lot simpler and more efficient, it doesn’t have a way to do only partial verification.
The one thing our use of Merkle Trees doesn’t protect against is forking, where the storage provider shows different versions of the archive to different users. However, detecting forks would require some kind of gossip between users, which is beyond the scope of the initial implementation for now.
Hiding Access Patterns
Oblivious RAM, or ORAM, is a cryptographic technique for reading and writing to random-access memory in a way that hides which operation was performed (a read, or a write) and to which part of memory the operation was performed, from the memory itself! In our case, the ‘memory’ is our object storage provider, which means we’re hiding from them which pieces of data we’re accessing and why. This is valuable for defending against traffic analysis attacks, where an adversary with detailed knowledge of a system like UtahFS can look at the requests it makes, and infer the contents of encrypted data. For example, they might see that you upload data at regular intervals and almost never download, and infer that you’re storing automated backups.
The simplest implementation of ORAM would consist of always reading the entire memory space and then rewriting the entire memory space with all new values, any time you want to read or write an individual value. An adversary looking at the pattern of memory accesses wouldn’t be able to tell which value you actually wanted, because you always touch everything. This would be incredibly inefficient, however.
The construction we actually use, which is called Path ORAM, abstracts this simple scheme a little bit to make it more efficient. First, it organizes the blocks of memory into a binary tree, and second, it keeps a client-side table that maps application-level pointers to random leafs in the binary tree. The trick is that a value is allowed to live in any block of memory that’s on the path between its assigned leaf and the root of the binary tree.
Now, when we want to lookup the value that a pointer goes to, we look in our table for its assigned leaf, and read all the nodes on the path between the root and that leaf. The value we’re looking for should be on this path, so we already have what we need! And in the absence of any other information, all the adversary saw is that we read a random path from the tree.
What looks like a random path is read from the tree, that ends up containing the data we're looking for.
However, we still need to hide whether we’re reading or writing, and to re-randomize some memory to ensure this lookup can’t be linked with others we make in the future. So to re-randomize, we assign the pointer we just read to a new leaf and move the value from whichever block it was stored in before to a block that’s a parent of both the new and old leaves. (In the worst case, we can use the root block since the root is a parent of everything.) Once the value is moved to a suitable block and done being consumed/modified by the application, we re-encrypt all the blocks we fetched and write them back to memory. This puts the value in the path between the root and its new leaf, while only changing the blocks of memory we’ve already fetched.
This construction is great because we’ve only had to touch the memory assigned to a single random path in a binary tree, which is a logarithmic amount of work relative to the total size of our memory. But even if we read the same value again and again, we’ll touch completely random paths from the tree each time! There’s still a performance penalty caused by the additional memory lookups though, which is why ORAM support is optional.
Wrapping Up
Working on this project has been really rewarding for me because while a lot of the individual layers of the system seem simple, they’re the result of a lot of refinement and build up into something complex quickly. It was difficult though, in that I had to implement a lot of functionality myself instead of reuse other people’s code. This is because building end-to-end encrypted systems requires carefully integrating security into every feature, and the only good way to do that is from the start. I hope UtahFS is useful for others interested in secure storage.