SIDH in Go for quantumresistant TLS 1.3
This blog post presents a new cryptography primitive that we open sourced: p751sidh.
The Quantum Threat
Most of today's cryptography is designed to be secure against an adversary with enormous amounts of computational power. This means estimating how much work certain computations (such as factoring a number, or finding a discrete logarithm) require, and choosing cryptographic parameters based on our best estimate of how much work would be required to break the system.
If it were possible to build a largescale quantum computer, many of the problems whose difficulty we rely on for security would no longer be difficult to solve. While it remains unknown whether largescale quantum computers are possible (see this article for a good overview), it's a sufficient risk that there's wide interest in developing quantumresistant (or postquantum) cryptography: cryptography that works on ordinary computers we have today, but which is secure against a possible quantum computer.
At Cloudflare, our biggest use of cryptography is TLS, which we use both for serving our customers' websites (all Cloudflare sites get free HTTPS), as well as for internal interdatacenter communication on our backend.
In the TLS context, we want to create a secure connection between a client and a server. There are basically three cryptographic problems here:

Authenticity: the server needs to prove to the client that it is the real server (and optionally, the client can prove to the server that it's the real client);

Key agreement: the server and client need to agree, over an insecure connection, on a temporary shared secret known only to them;

Symmetric encryption: the server and client need to use their shared secret to encrypt the data they want to send over a secure connection.
Authenticity protects against active attackers, but because quantum computers aren't yet believed to exist, the main risk is a retroactive attack: for instance, a nationstate adversary (let's say, "NSA" for short) could record encrypted traffic, wait to build a quantum computer, and try to decrypt past traffic. Moreover, quantum algorithms seem to give only a small speedup against symmetric encryption, so the "key" problem to solve is #2, quantumresistant key agreement.
This is an active area of research, both in the design of new cryptosystems and in their implementation and deployment. For instance, last year, Google concluded an experiment using a latticebased key exchange in Chrome. Latticebased cryptosystems are an extremely promising family of quantumresistant algorithms. Their security relies on wellstudied computational problems, and they are computationally efficient. However, they have large key sizes and can require extra communication (which can necessitate additional roundtrips in protocols like TLS).
Another family of cryptosystems are supersingular isogeny systems, in particular supersingular isogeny DiffieHellman (SIDH). In contrast to latticebased systems, they rely on more exotic computational problems and are much more computationally expensive. However, they have much smaller key sizes and do not require extra communication: SIDH fits perfectly into TLS 1.3's key agreement mechanism.
TLS 1.3 is the latest version of the TLS protocol. This summer, I've been working at Cloudflare on an experiment for a quantumresistant version of TLS 1.3 using a hybrid key agreement combining X25519 and supersingular isogeny DiffieHellman (SIDH). To achieve this, I implemented a TLS 1.3 client in Go (as part of Cloudflare's tlstris), implemented SIDH in Go for the amd64
architecture, and combined the SIDH implementation with the TLS 1.3 key agreement mechanism to perform a quantumresistant TLS 1.3 handshake. This extends previous work by Microsoft Research on a SIDHbased key exchange for TLS 1.2, discussed below.
DiffieHellman key agreement in TLS 1.3
In the most recent version of TLS, TLS 1.3, the key agreement mechanism (part 2) is cleanly separated from the authentication mechanism (part 1). TLS 1.3 does key agreement using DiffieHellman, usually with an elliptic curve group. Before diving into the quantumresistant version, let's review how DiffieHellman (DH) works, and how it works in the context of TLS 1.3.
In DiffieHellman, we have two parties, Alice and Bob, wishing to establish a shared secret. They fix an abelian group G
of prime order p
, written additively, as well as a generator P
of G
(the basepoint). Alice then selects a uniformly random integer a
in the range [0,p]
. This determines a multiplicationbya
map, usually denoted [a] : G > G
. Alice computes the [a]P
, the image of the basepoint under her map, and sends it to Bob. Similarly, Bob chooses a random integer b
in the range [0,p]
, determining the map [b]
, computes [b]P
, and sends it to Alice. Alice and Bob then agree on a shared secret [ab]P
, which Alice computes as [a]([b]P)
and Bob computes as [b]([a]P)
:
(Here I'm describing the process in terms of maps, in order to show similarity with SIDH later).
In the TLS 1.3 context, this works as follows. A client initiates a connection by sending a TLS ClientHello message, which contains (among other data) a list of DH groups supported by the client, as well as "keyshares" (i.e., the [a]P
values) for some (or all) of these groups.
The server selects one of the DH groups supported by both the server and the client. In the happy case, the server selects a group the client provided a keyshare for, and sends back a ServerHello message containing the server's keyshare. From this point on, all handshake messages between the client and server, such as certificates, extensions, etc., are encrypted using a "handshake secret" derived from the keyshares. (In the unhappy case, where the client did not provide an acceptable keyshare, the server asks the client to retry, forcing an extra roundtrip).
Application data is later encrypted with a key derived from the handshake secret, as well as other data, so the security of the application data depends on the security of the key agreement. However, all existing DH groups in TLS are vulnerable to quantum algorithms.
Supersingularisogeny DiffieHellman
SIDH, proposed in 2011 by Luca De Feo and David Jao, is a relatively recent proposal for using elliptic curves to build a quantumresistant DiffieHellman scheme.
Roughly speaking, rather than working within a single elliptic curve group, SIDH works within a family of related, "isogenous" elliptic curves.
An isogeny is a map phi : E_1 > E_2
of elliptic curves which sends the identity element of the source curve E_1
to the identity of the target curve E_2
. It turns out that for every isogeny phi: E_1 > E_2
, there's a dual isogeny psi: E_2 > E_1
, so we can say that two curves are isogenous if they're linked by an isogeny.
Now we can consider an isogeny graph, whose edges are isogenies and whose vertices are elliptic curves. Instead of choosing secret multiplicationbyn
maps to move around inside one elliptic curve, Alice and Bob choose secret isogenies to move around inside a family of isogenous curves (i.e., they choose a random path through the isogeny graph), and the security of the system is related to the difficulty of computing isogenies between arbitrary curves.
The resulting diagram is slightly more complicated, but structurally similar to the one above:
What exactly is going on here? The starting curve E_0
, as well as the points P_A, Q_A, P_B, Q_B
, are system parameters.
An isogeny is uniquely determined by its kernel (the subgroup of points on the source curve which the isogeny maps to the identity point of the target curve). To choose an secret isogeny phi_A
, Alice chooses secret scalars m_A, n_A
, which determine a secret point [m_A]P_A + [n_A]Q_A
, which generates the kernel subgroup <[m_A]P_A + [n_A]Q_A>
and hence determines her secret isogeny phi_A
. Alice evaluates phi_A
at the points P_B
, Q_B
, and sends E_A
, phi_A(P_B)
, phi_A(Q_B)
to Bob, who does the same steps with A
and B
swapped.
Next, Alice uses E_B
, phi_B(P_A)
, phi_B(Q_A)
to construct an isogeny phi'_A
with kernel <[m_A]phi_B(P_A) + [n_A]phi_B(Q_A)>
, while Bob uses E_A
, phi_A(P_B)
, phi_A(Q_B)
to construct an isogeny phi'_B
with kernel <[m_B]phi_A(P_B) + [n_B]phi_A(Q_B)>
.
Now phi'_A
maps to the curve E_AB
, while phi'_B
maps to the curve E_BA
. The curves E_AB
and E_BA
are isomorphic. Since elliptic curves are classified by a number called the jinvariant, j(E_AB) = j(E_BA)
, and this is the shared secret between Alice and Bob.
A detailed technical explanation of this process can be found in the expansion to the SIDH paper by Luca De Feo, David Jao, and Jérôme Plût (the diagram above is Figure 1 of that paper), and an explanation in terms of rocket ships traveling through supersingular spacetime can be found in this article by Luca De Feo. Alternately, there's a recording here of Deirdre Connolly's talk at the February 2017 Cloudflare Crypto Meetup.
In 2016, Craig Costello, Patrick Longa, and Michael Naehrig, at Microsoft Research, published a paper on efficient algorithms for SIDH, applying optimization techniques from highspeed ECC to the original SIDH proposal.
They also published a constanttime, optimized implementation written in C and assembly, and a patch to OpenSSL to create SIDH ciphersuites for TLS 1.2. My Go implementation builds on their work (both algorithms and code), as discussed below.
SIDH key agreement in Go TLS
The SIDH implementation in the p751sidh
package has two parts: an outer p751sidh
package containing SIDH functionality, and an inner p751toolbox
package providing lowlevel functionality.
Because SIDH is implemented in terms of operations in a large finite field, the performance of the field arithmetic is critical to the performance of the protocol. Unfortunately, this requires writing assembly, because writing highperformance arithmetic is not possible in Go — it's simply not a design goal of the language. (There are a few reasons, most notably that there's no way to directly compute the (128bit) product of 64bit integers.)
The code is partially derived from the Microsoft Research implementation mentioned above. In particular, the field arithmetic is ported from the MSR assembly, and the implementation strategy follows their paper. (I experimented with a prototype implementation of field arithmetic using AVX2 and unsaturated limbs, but decided not to use it, since it got similar performance at the cost of less portability and more power use).
The assembly code for the lowest level field arithmetic is oriented around pointers to fixedsize buffers; this is wrapped in a Go API modeled after the big.Int
API. To test that the code behaves correctly, I used Go's testing/quick
package to write propertybased tests, which generate random field elements and compare the results of various operations against the same operations using big.Int
.
Curve and isogeny functions are implemented using the Go API, and the outerlevel SIDH functions achieve comparable performance as compared to the MSR implementation. In rough benchmarks, the Go implementation appears to be within 26% of the MSR implementation. The entire implementation is constanttime.
Concretely, on a T460s, Skylake i76600U @2.6GHz^{[1]}, key generation and shared secret computations take 1113ms. Note that unlike classic DiffieHellman, Alice and Bob's computations are slightly different, so they have different timings.
BenchmarkAliceKeyGen 11,709,778 ns/op
BenchmarkBobKeyGen 13,073,380 ns/op
BenchmarkSharedSecretAlice 11,256,985 ns/op
BenchmarkSharedSecretBob 12,984,817 ns/op
This is much more computationally expensive than a conventional ECDH key agreement, or a latticebased key agreement. However, from the point of view of latency, this might not be so bad. For example, 12 milliseconds is the roundtrip distance between Paris and Amsterdam, and so a key agreement requiring extra communication could easily take longer, even if the computations were less expensive.
Because SIDH is still new and unproven, the TLS integration performs a hybrid key exchange: it sends both an X25519 keyshare and an SIDH keyshare, performs both X25519 and SIDH shared secret computations, and feeds both shared secrets into the TLS key derivation mechanism. This ensures that even if SIDH turns out to be broken, the key agreement is at least as secure as X25519.
The TLS component is implemented as part of tlstris
, Cloudflare's fork of Go's crypto/tls
package, which has a partial implementation of TLS 1.3, Draft 18. Because tris
didn't support client functionality, I implemented a basic TLS 1.3 client before working on SIDH.
The hybrid key exchange is specified using the group identifier 0xFE24
. The 0xFE
places it in the privateuse reserved codeblock 0xFE00..0xFEFF
, since standardizing SIDH would be premature at this time; the number 24
was chosen due to its deep mathematical significance and connection to moonshine.
The entire SIDH integration takes less than 100 lines of code.
Misassembly hazards in Go assembly
The Microsoft Research SIDH implementation provides x64 assembly for field arithmetic, but Go's assembler uses a bespoke syntax derived from Plan 9, so reusing their assembly meant porting it to Go assembly.
When I first did this, the code produced incorrect results, even though all the instructions should have been exactly the same. I was eventually able to find the problem by dissassembling the generated Go binary, and comparing to the original assembly.
The original assembly was roughly of the form
...
sbb r10, rax
movq rax, 0
sbb rax, 0
...
The sbb dst, src
instruction is "subtract with borrow"; this instruction reads the carry flag CF
and sets dst = dst  (src + CF)
, CF = 1
if dst < src+CF
. So, this code is supposed to set the rax
register to 0
if the first subtraction did not underflow, and to 1111...11
if it did. (This value is used later in the computation as a mask). However, writing
...
SBBQ AX, R10
MOVQ $0, AX
SBBQ $0, AX
...
does not have the same result. The reason is that the Go assembler misassembles the MOVQ $0, AX
instruction to xor eax, eax
. This instruction has a shorter encoding. Unfortunately, it also has different behaviour: it clears the carry flag, breaking the program.
The reason this happens is that MOV
in Go assembly is declared to be a "pseudoinstruction", which does not necessarily correspond to a literal mov
instruction. Unfortunately, there's no specification of which instructions are pseudoinstructions, and what their behaviour is — MOV
in Go assembly is defined to clobber flags, but this isn't documented outside of compiler internals.
To work around this issue, we can drop literal bytes into the instruction stream. In this case, we write
#define ZERO_AX_WITHOUT_CLOBBERING_FLAGS BYTE $0xB8; BYTE $0; BYTE $0; BYTE $0; BYTE $0;
...
SBBQ AX, R10
ZERO_AX_WITHOUT_CLOBBERING_FLAGS
SBBQ $0, AX
to insert the bytes encoding the mov eax, 0
instruction, which leaves the carry flag intact.
Source Code
This implementation is still experimental, and should not be used in production without review. The computational cost of SIDH may keep it from being practical for shortlived client connections (at least in the near term). However, it may be suitable for longlived connections, such as interdatacenter connections, where the cost of the handshake is amortized over the length of the connection.
To find out more, the SIDH implementation can be found on GitHub as the p751sidh
package. The TLS integration can be found on my hdevalence/sidh
branch of tlstris
.
Thanks to Craig Costello, Diego Aranha, Deirdre Connolly, Nick Sullivan, Watson Ladd, Filippo Valsorda, and George Tankersley for their advice, comments, and discussions.
Unfortunately, Lenovo's BIOS does not allow disabling Turbo Boost. ↩︎