Cryptographic code is everywhere: it gets run when we connect to the bank, when we send messages to our friends, or when we watch cat videos. But, it is not at all easy to take a cryptographic specification written in a natural language and produce running code from it, and it is even harder to validate both the theoretical assumptions and the correctness of the implementation itself. Mathematical proofs, as we talked about in our previous blog post, and code inspection are simply not enough. Testing and fuzzing can catch common or well-known bugs or mistakes, but might miss rare ones that can, nevertheless, be triggered by an attacker. Static analysis can detect mistakes in the code, but cannot check whether the code behaves as described by the specification in natural-language (for functional correctness). This gap between implementation and validation can have grave consequences in terms of security in the real world, and we need to bridge this chasm.
In this blog post, we will be talking about ways to make this gap smaller by making the code we deploy better through analyzing its security properties and its implementation. This blog post continues our work on high assurance cryptography, for example, on using Tamarin to analyze entire protocol specifications. In this one, we want to look more on the side of verifying implementations. Our desire for high assurance cryptography isn’t specific to post-quantum cryptography, but because quantum-safe algorithms and protocols are so new, we want extra reassurance that we’re doing the best we can. The post-quantum era also gives us a great opportunity to try and apply all the lessons we’ve learned while deploying classical cryptography, which will hopefully prevent us from making the same mistakes all over again.
This blog post will discuss formal verification. Formal verification is a technique we can use to prove that a piece of code correctly implements a specification. Formal verification, and formal methods in general, have been around for a long time, appearing as early as the 1950s. Today, they are being applied in a variety of ways: from automating the checking of security proofs to automating checks for functional correctness and the absence of side-channels attacks. Code verified using such formal verification has been deployed in popular products like Mozilla Firefox and Google Chrome.
Formal verification, as opposed to formal analysis, the topic of other blog posts, deals with verifying code and checking that it correctly implements a specification. Formal analysis, on the other hand, deals with establishing that a specification has the desired properties, for example, having a specific security guarantee.
Let’s explore what it means for an algorithm to have a proof that it achieves a certain security goal and what it means to have an implementation we can prove correctly implements that algorithm.
Goals of a formal analysis and verification process
Our goal, given a description of an algorithm in a natural language, is to produce two proofs: first, one that shows that the algorithm has the security properties we want and, second, that we have a correct implementation of it. We can go about this in four steps:
Turn the algorithm and its security goals into a formal specification. This is us defining the problem.
Use formal analysis to prove, in our case using a computer-aided proof tooling, that the algorithm attains the specified properties.
Use formal verification to prove that the implementation correctly implements the algorithm.
Use formal verification to prove that our implementation has additional properties, like memory safety, running in constant time, efficiency, etc.
Interestingly we can do step 2 in parallel with steps 3 and 4, because the two proofs are actually independent. As long as they are both building from the same specification established in step 1, the properties we establish in the formal analysis should flow down to the implementation.
Suppose, more concretely, we’re looking at an implementation and specification of a Key Encapsulation Mechanism (a KEM, such as FrodoKEM). FrodoKEM is designed to achieve IND-CCA security, so we want to prove that it does, and that we have an efficient, side-channel resistant and correct implementation of it.
As you might imagine, achieving even one of these goals is no small feat. Achieving all, especially given the way they conflict (efficiency clashes with side-channel resistance, for example), is a Herculean task. Decades of research have gone into this space, and it is huge; so let’s carve out and examine a small subsection to look at: we’ll look at two tools, EasyCrypt and Jasmin.
Before we jump into the tools, let’s take a brief aside to discuss why we’re not using Tamarin, which we’ve talked about in our other blog posts. Like EasyCrypt, Tamarin is also a tool used for formal analysis, but beyond that, the two tools are quite different. Formal analysis broadly splits into two camps, symbolic analysis and computational analysis. Tamarin, as we saw, uses symbolic analysis, which treats all functions effectively as black boxes, whereas EasyCrypt uses computational analysis. Computational analysis is much closer to how we program, and functions are given specific implementations. This gives computational analysis a much higher “resolution”: we can study properties in much greater detail and, perhaps, with greater ease. This detail, of course, comes at a cost. As functions grow into full protocols, with multiple modes, branching paths, and in the case of the Transport Layer Security (TLS), sometimes even resumption, computational models become unwieldy and difficult to work with, even with computer-assisted tooling. We therefore have to pick the correct tool for the job. When we need maximum assurance, sometimes both computational and symbolic proofs are constructed, with each playing to its strengths and compensating for the other’s drawbacks.
EasyCrypt
EasyCrypt is a proof assistant for cryptographic algorithms and imperative programs. A proof is basically a formal demonstration that some statement is true. EasyCrypt is called a proof assistant because it “assists” you with creating a proof; it does not create a proof for you, but rather, helps you come to it and gives you the power to have a machine check that each step logically follows from the last. It provides a language to write definitions, programs, and theorems along with an environment to develop machine-checked proofs.
A proof starts from a set of assumptions, and by taking a series of logical steps demonstrates that some statement is true. Let’s imagine for a moment that we are the hero Perseus on a quest to kill a mythological being, the terrifying Medusa. How can we prove to everyone that we’ve succeeded? No one is going to want to enter Medusa's cave to check that she is dead because they’ll be turned to stone. And we cannot just state, “I killed the Medusa.” Who will believe us without proof? After all, is this not a leap of faith?
What we can do is bring the head of the Medusa as proof. Providing the head as a demonstration is our proof because no mortal Gorgon can live without a head. Legend has it that Perseus completed the proof by demonstrating that the head was indeed that of the Medusa: Perseus used the head’s powers to turn Polydectes to stone (the latter was about to force Perseus’ mother to marry him, so let’s just say it wasn’t totally unprovoked). One can say that this proof was done “by hand” in that it was done without any mechanical help. For computer security proofs, sometimes the statements we want to prove are so cumberstone to prove and are so big that having a machine to help us is needed.
How does EasyCrypt achieve this? How does it help you? As we are dealing with cryptography here, first, let’s start by defining how one can reason about cryptography, the security it provides, and the proofs one uses to corroborate them.
When we encrypt something, we do this to hide whatever we want to send. In a perfect world, it would be indistinguishable from noise. Unfortunately, only the one-time-pad truly offers this property, so most of the time we make do with “close enough”: it should be infeasible to differentiate a true encrypted value from a random one.
When we want to show that a certain cryptographic protocol or algorithm has this property, we write it down as an “indistinguishability game.” The idea of the game is as follows:
Imagine a gnome is sitting in a box. The gnome takes a message as input to the box and produces a ciphertext. The gnome records each message and the ciphertext they see generated. A troll outside the box chooses two messages (m1 and m2) of the same length and sends them to the box. The gnome records the box operations and flips a coin. If the coin lands on its face, then the gnome sends the ciphertext (c1) corresponding to m1. Otherwise, they send c2 corresponding to m2. In order to win, the troll, knowing the messages and the ciphertext, has to guess which message was encrypted.
In this example, we can see two things: first, choices are random as the ciphertext sent is chosen by flipping a coin; second, the goal of the adversary is to win a game.
EasyCrypt takes this approach. Security goals are modeled as probabilistic programs (basically, as games) played by an adversary. Tools from program verification and programming language theory are used to justify the cryptographic reasoning. EasyCrypt relies on a “goal directed proof” approach, in which two important mechanisms occur: lemmas and tactics. Let’s see how this approach works (following this amazing paper):
The prover (in this case, you) enters a small statement to prove. For this, one uses the command lemma (meaning this is a minor statement needed to be proven).
EasyCrypt will display the formula as a statement to be proved (i.e the goal) and will also display all the known hypotheses (unproven lemmas) at any given point.
The prover enters a command (a tactic) to either decompose the statement into simpler ones, apply a hypothesis, or make progress in the proof in some other way.
EasyCrypt displays a new set of hypotheses and the parts that still need to be proved.
Back to step 3.
Let’s say you want to prove something small, like the statement “if p in conjunction with q, then q in conjunction with p.” In predicate logic terms, this will be written as (p ∧ q) → (q ∧ p)
. If we translate this into English statements, as Alice will say in Alice in Wonderland, it could be:
p: I have a world of my own. q: Everything is nonsense. p∧q: I have a world of my own and everything is nonsense. (p ∧ q) → (q ∧ p): If I have a world of my own and everything is nonsense, then, everything is nonsense, and I have a world of my own.
We will walk through such a statement and its proof in EasyCrypt. For more of these examples, see these given by the marvelous Alley Stoughton.
Our lemma and proof in EasyCrypt.
lemma implies_and () :
This line introduces the stated lemma and creatively calls it “implies_and”. It takes no parameters.
(forall (P, Q: bool) => P /\ Q => Q /\ P.
This is the statement we want to prove. We use the variables P and Q of type bool (booleans), and we state that if P and Q, then Q and P.
Up until now we have just declared our statement to prove to EasyCrypt. Let’s see how we write the proof:
proof.
This line demarcates the start of the proof for EasyCrypt.
move => p q H.
We introduce the hypothesis we want to prove (we move them to the “context”). We state that P and Q are both booleans, and that H is the hypothesis P /\ Q.
elim H.
We eliminate H (the conjunctive hypothesis) and we get the components: “p => q => q /\ p”.
trivial.
The proof is now trivial.
qed.
Quod erat demonstrandum (QED) denotes the end of the proof (if both are true, then the conjunction holds). Whew! For such a simple statement, this took quite a bit of work, because EasyCrypt leaves no stone unturned. If you get to this point, you can be sure your proof is absolutely correct, unless there is a bug in EasyCrypt itself (or unless we are proving something that we weren't supposed to).
As you see, EasyCrypt helped us by guiding us in decomposing the statement into simpler terms, and stating what still needed to be proven. And by strictly following logical principles, we managed to realize a proof. If we are doing something wrong, and our proof is incorrect, EasyCrypt will let us know, saying something like:
Screenshot of EasyCrypt showing us that we did something wrong.
What we have achieved is a computer-checked proof of the statement, giving us far greater confidence in the proof than if we had to scan over one written with pen and paper. But what makes EasyCrypt particularly attractive in addition to this is its tight integration with the Jasmin programming language as we will see later.
EasyCrypt will also interactively guide us to the proof, as it easily works with ProofGeneral in Emacs. In the image below we see, for example, that EasyCrypt is guiding us by showing the variables we have declared (p, q, and H) and what is missing to be proven (after the dashed line).
EasyCrypt interactively showing us at which point of the proof we are at: the cyan section shows us up until which point we have arrived.
If one is more comfortable with Coq proof assistant (you can find very good tutorials of it), a similar proof can be given:
Our lemma and proof in Coq.
EasyCrypt allows us to prove statements in a faster and more assured manner than if we do proofs by hand. Proving the truthness of the statement we just showed would be easy with the usage of truth tables, for example. But, it is only easy to find these truth tables or proofs when the statement is small. If one is given a complex cryptography algorithm or protocol, the situation is much harder.
Jasmin
Jasmin is an assembly-like programming language with some high-level syntactic conveniences such as loops and procedure calls while using assembly-level instructions. It does support function calls and functional arrays, as well. The Jasmin compiler predictably transforms source code into assembly code for a chosen platform (currently only x64 is supported). This transformation is verified: the correctness of some compiler passes (like function inlining or loop unrolling) are proven and verified in the Coq proof assistant. Other passes are programmed in a conventional programming language and the results are validated in Coq. The compiler also comes with a built-in checker for memory safety and constant-time safety.
This assembly-like syntax, combined with the stated assurances of the compiler, means that we have deep control over the output, and we can optimize it however we like without compromising safety. Because low-level cryptographic code tends to be concise and non-branching, Jasmin doesn’t need full support for general purpose language features or to provide lots of libraries. It only needs to support a set of basic features to give us everything we need.
One reason Jasmin is so powerful is that it provides a way to formally verify low-level code. The other reason is that Jasmin code can be automatically converted by the compiler into equivalent EasyCrypt code, which lets us reason about its security. In general terms, whatever guarantees apply to the EasyCrypt code also flow into the Jasmin code, and subsequently into the assembly code.
Let’s use the example of a very simple Jasmin function that performs multiplication to see Jasmin in action:
A multiplication function written in Jasmin.
What the function (“fn”) “mul” does, in this case, is to multiply by whatever number is provided as an argument to the function (the variable a
). The syntax of this small function should feel very familiar to anyone that has worked with the C family of programming languages. The only big difference is the use of the words reg
and u64
. What they state is that the variable a
, for example, is allocated in registers (hence, the use of reg: this defines the storage of the variable) and that it is 64-bit machine-word (hence, the use of u64). We can convert now this to “pure” x64 assembly:
A multiplication function written in Jasmin and transformed to Assembly.
The first lines of the assembly code are just “setting all up”. They are then followed by the “imulq” instruction, which just multiplies the variable by the constant (which in this case is labeled as “param”). While this small function might not show the full power of having the ability of safely translating to assembly, it can be seen when more complex functions are created. Functions that use while loops, arrays, calls to other functions are accepted by Jasmin and will be safely translated to assembly.
Assembly language has a little bit of a bad reputation because it is thought to be hard to learn, hard to read, and hard to maintain. Having a tool that helps you with translation is very useful to a programmer, and it is also useful as you can manually or automatically check what the assembly code looks like.
We can further check the code for its safety:
A multiplication function written and checked in Jasmin.
In this check, there are many things to understand. First, it checks that the inputs are allocated in a memory region of at least 0 bytes. Second, the “Rel” entry checks the allocated memory region safety pre-condition: for example, n must point to an allocated memory region of sufficient length.
You can then extract this functionality to EasyCrypt (and even configure EasyCrypt to verify Jasmin programs). Here is the corresponding EasyCrypt code, automatically produced by the Jasmin compiler:
A multiplication function written in Jasmin and extracted to EasyCrypt.
Here’s a slightly more involved example, that of a FrodoKEM utility function written in Jasmin.
A utility function for addition for FrodoKEM.
With a C-like syntax, this function adds two arrays (a
and b
), and returns the result (in out). The value NBAR is just a parameter you can specify in a C-like manner. You can then take this function and compile it to assembly. You can also use the Jasmin compiler to analyze the safety of the code (for example, that array accesses are in bounds, that memory accesses are valid, that arithmetic operations are applied to valid arguments) and verify the code runs in constant-time.
The addition function as used by FrodoKEM can also be extracted to EasyCrypt:
The addition function as extracted to EasyCrypt.
A theorem expressing the correctness (meaning that addition is correct) is expressed in EasyCrypt as so:
The theorem of addition function as extracted to EasyCrypt.
Note that EasyCrypt uses While Language and Hoare Logic. The corresponding proof that states that addition is correct:
The proof of the addition function as extracted to EasyCrypt.
Why formal verification for post-quantum cryptography?
As we have previously stated, cryptographic implementations are very hard to get right, and even if they are right, the security properties they claim to provide are sometimes wrong for their intended application. The reason why this matters so much is that post-quantum cryptography is the cryptography we will be using in the future due to the arrival of quantum computers. Deploying post-quantum cryptographic algorithms with bugs or flaws in their security properties would be a disaster because connections and data that travels through it can be decrypted or attacked. We are trying to prevent that.
Cryptography is difficult to get right, and it is not only difficult to get right by people new to it, but it is also difficult for anyone, even for the experts. The designs and code we write are error-prone as we all are, as humans, prone to errors. Some examples of when some designs got it wrong are as follows (luckily, these example were not deployed, and they did not have the usual disastrous consequences):
Falcon (a post-quantum algorithm currently part of the NIST procedure), produced valid signatures “but leaked information on the private key,” according to an official comment posted to the NIST post-quantum process on the algorithm. The comment also noted that “the fact that these bugs existed in the first place shows that the traditional development methodology (i.e. “being super careful”) has failed.“
“The De Feo–Jao–Plût identification scheme (the basis for SIDH signatures) contains an invalid assumption and provide[s] a counterexample for this assumption: thus showing the proof of soundness is invalid,” according to a finding that one proof of a post-quantum algorithm was not valid. This is an example of an incorrect proof, whose flaws were discovered and eliminated prior to any deployment.
Perhaps these two examples might convince the reader that formal analysis and formal verification of implementations are needed. While they help us avoid some human errors, they are not perfect. As for us, we are convinced of these methods. We are working towards a formally verified implementation of FrodoKEM (we have a first implementation of it in our cryptographic library, CIRCL), and we are collaborating to create a formally verified and implemented library we can run in real-world connections. If you are interested in learning more about EasyCrypt and Jasmin, visit the resources we have put together, try to install it following our guidelines, or follow some tutorials.
See you on other adventures in post-quantum (and some cat videos for you)!
References:
“SoK: Computer-Aided Cryptography” by Manuel Barbosa, Gilles Barthe, Karthik Bhargavan, Bruno Blanchet, Cas Cremers, Kevin Liao and Bryan Parno: https://eprint.iacr.org/2019/1393.pdf
“EasyPQC: Verifying Post-Quantum Cryptography” by Manuel Barbosa, Gilles Barthe, Xiong Fan, Benjamin Grégoire, Shih-Han Hung, Jonathan Katz, Pierre-Yves Strub, Xiaodi Wu and Li Zhou: https://eprint.iacr.org/2021/1253
“Jasmin: High-Assurance and High-Speed Cryptography” by José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Arthur Blot, Benjamin Grégoire, Vincent Laporte, Tiago Oliveira, Hugo Pacheco, Benedikt Schmidt and Pierre-Yves Strub: https://dl.acm.org/doi/pdf/10.1145/3133956.3134078
“The Last Mile: High-Assurance and High-Speed Cryptographic Implementations” by José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Benjamin Grégoire, Adrien Koutsos, Vincent Laporte,Tiago Oliveira and Pierre-Yves Strub: https://arxiv.org/pdf/1904.04606.pdf