Inside the Entropy

Randomness, randomness everywhere;Nor any verifiable entropy.

Generating random outcomes is an essential part of everyday life; from lottery drawings and constructing competitions, to performing deep cryptographic computations. To use randomness, we must have some way to 'sample' it. This requires interpreting some natural phenomenon (such as a fair dice roll) as an event that generates some random output. From a computing perspective, we interpret random outputs as bytes that we can then use in algorithms (such as drawing a lottery) to achieve the functionality that we want.

The sampling of randomness securely and efficiently is a critical component of all modern computing systems. For example, nearly all public-key cryptography relies on the fact that algorithms can be seeded with bytes generated from genuinely random outcomes.

In scientific experiments, a random sampling of results is necessary to ensure that data collection measurements are not skewed. Until now, generating random outputs in a way that we can verify that they are indeed random has been very difficult; typically involving taking a variety of statistical measurements.

During Crypto week, Cloudflare is releasing a new public randomness beacon as part of the launch of the League of Entropy. The League of Entropy is a network of beacons that produces distributed, publicly verifiable random outputs for use in applications where the nature of the randomness must be publicly audited. The underlying cryptographic architecture is based on the drand project.

Verifiable randomness is essential for ensuring trust in various institutional decision-making processes such as elections and lotteries. There are also cryptographic applications that require verifiable randomness. In the land of decentralized consensus mechanisms, the DFINITY approach uses random seeds to decide the outcome of leadership elections. In this setting, it is essential that the randomness is publicly verifiable so that the outcome of the leadership election is trustworthy. Such a situation arises more generally in Sortitions: an election where leaders are selected as a random individual (or subset of individuals) from a larger set.

In this blog post, we will give a technical overview behind the cryptography used in the distributed randomness beacon, and how it can be used to generate publicly verifiable randomness. We believe that distributed randomness beacons have a huge amount of utility in realizing the Internet of the Future; where we will be able to rely on distributed, decentralized solutions to problems of a global-scale.

Randomness & entropy

A source of randomness is measured in terms of the amount of entropy it provides. Think about the entropy provided by a random output as a score to indicate how “random” the output actually is. The notion of information entropy was concretised by the famous scientist Claude Shannon in his paper A Mathematical Theory of Communication, and is sometimes known as Shannon Entropy.

A common way to think about random outputs is: a sequence of bits derived from some random outcome. For the sake of an argument, consider a fair 8-sided dice roll with sides marked 0-7. The outputs of the dice can be written as the bit-strings 000,001,010,...,111. Since the dice is fair, any of these outputs is equally likely. This is means that each of the bits is equally likely to be 0 or 1. Consequently, interpreting the output of the dice roll as a random output then derives randomness with 3 bits of entropy.

More generally, if a perfect source of randomness guarantees strings with n bits of entropy, then it generates bit-strings where each bit is equally likely to be 0 or 1. This allows us to predict the value of any bit with maximum probability 1/2. If the outputs are sampled from such a perfect source, we consider them uniformly distributed. If we sample the outputs from a source where one bit is predictable with higher probability, then the string has n-1 bits of entropy. To go back to the dice analogy, rolling a 6-sided dice provides less than 3 bits of entropy because the possible outputs are 000,001,010,011,100,101 and so the 2nd and 3rd bits are more likely to be to set to 0 than to 1.

It is possible to mix entropy sources using specifically designed mixing functions to retrieve something with even greater entropy. The maximum resulting entropy is the sum of the entropy taken from the number of entropic sources used as input.

Sampling randomness

To sample randomness, let’s first identify the appropriate sources. There are many natural phenomena that one can use:

atmospheric noise;
radioactive decay;
turbulent motion; like that generated in Cloudflare’s wall of lava lamps(!).

Unfortunately, these phenomena require very specific measuring tools, which are prohibitively expensive to install in mainstream consumer electronics. As such, most personal computing devices usually use external usage characteristics for seeding specific generator functions that output randomness as and when the system requires it. These characteristics include keyboard typing patterns and speed and mouse movement – since such usage patterns are based on the human user, it is assumed they provide sufficient entropy as a randomness source. An example of a random number generator that takes entropy from these characteristics is the Linux-based RDRAND function.

Naturally, it is difficult to tell whether a system is actually returning random outputs by only inspecting the outputs. There are statistical tests that detect whether a series of outputs is not uniformly distributed, but these tests cannot ensure that they are unpredictable. This means that it is hard to detect if a given system has had its randomness generation compromised.

Distributed randomness

It’s clear we need alternative methods for sampling randomness so that we can provide guarantees that trusted mechanisms, such as elections and lotteries, take place in secure tamper-resistant environments. The drand project was started by researchers at EPFL to address this problem. The drand charter is to provide an easily configurable randomness beacon running at geographically distributed locations around the world. The intention is for each of these beacons to generate portions of randomness that can be combined into a single random string that is publicly verifiable.

This functionality is achieved using threshold cryptography. Threshold cryptography seeks to derive solutions for standard cryptographic problems by combining information from multiple distributed entities. The notion of the threshold means that if there are n entities, then any t of the entities can combine to construct some cryptographic object (like a ciphertext, or a digital signature). These threshold systems are characterised by a setup phase, where each entity learns a share of data. They will later use this share of data to create a combined cryptographic object with a subset of the other entities.

Threshold randomness

In the case of a distributed randomness protocol, there are n randomness beacons that broadcast random values sampled from their initial data share, and the current state of the system. This data share is created during a trusted setup phase, and also takes in some internal random value that is generated by the beacon itself.

When a user needs randomness, they send requests to some number t of beacons, where t < n, and combine these values using a specific procedure. The result is a random value that can be verified and used for public auditing mechanisms.

Consider what happens if some proportion c/n of the randomness beacons are corrupted at any one time. The nature of a threshold cryptographic system is that, as long as c < t, then the end result still remains random.

If c exceeds t then the random values produced by the system become predictable and the notion of randomness is lost. In summary, the distributed randomness procedure provides verifiably random outputs with sufficient entropy only when c < t.

By distributing the beacons independent of each other and in geographically disparate locations, the probability that t locations can be corrupted at any one time is extremely low. The minimum choice of t is equal to n/2.

How does it actually work?

What we described above sounds a bit like magic^tm. Even if c = t-1, then we can ensure that the output is indeed random and unpredictable! To make it clearer how this works, let’s dive a bit deeper into the underlying cryptography.

Two core components of drand are: a distributed key generation (DKG) procedure, and a threshold signature scheme. These core components are used in setup and randomness generation procedures, respectively. In just a bit, we’ll outline how drand uses these components (without navigating too deeply into the onerous mathematics).

Distributed key generation

At a high-level, the DKG procedure creates a distributed secret key that is formed of n different key pairs (vk_i, sk_i), each one being held by the entity i in the system. These key pairs will eventually be used to instantiate a (t,n)-threshold signature scheme (we will discuss this more later). In essence, t of the entities will be able to combine to construct a valid signature on any message.

To think about how this might work, consider a distributed key generation scheme that creates n distributed keys that are going to be represented by pizzas. Each pizza is split into n slices and one slice from each is secretly passed to one of the participants. Each entity receives one slice from each of the different pizzas (n in total) and combines these slices to form their own pizza. Each combined pizza is unique and secret for each entity, representing their own key pair.

Mathematical intuition

Mathematically speaking, and rather than thinking about pizzas, we can describe the underlying phenomenon by reconstructing lines or curves on a graph. We can take two coordinates on a (x,y) plane and immediately (and uniquely) define a line with the equation y = ax+b. For example, the points (2,3) and (4,7) immediately define a line with gradient (7-3)/(4/2) = 2 so a=2. You can then derive the b coefficient as -1 by evaluating either of the coordinates in the equation y = 2x + b. By uniquely, we mean that only the line y = 2x -1 satisfies the two coordinates that are chosen; no other choice of a or b fits.

The curve ax+b has degree 1, where the degree of the equation refers to the highest order multiplication of unknown variables in the equation. That might seem like mathematical jargon, but the equation above contains only one term ax, which depends on the unknown variable x. In this specific term, the exponent (or power) of x is 1, and so the degree of the entire equation is also 1.

Likewise, by taking three sets of coordinates pairs in the same plane, we uniquely define a quadratic curve with an equation that approximately takes the form y = ax^2 + bx + c with the coefficients a,b,c uniquely defined by the chosen coordinates. The process is a bit more involved than the above case, but it essentially starts in the same way using three coordinate pairs (x_1, y_1), (x_2, y_2) and (x_3, y_3).

By a quadratic curve, we mean a curve of degree 2. We can see that this curve has degree 2 because it contains two terms ax^2 and bx that depend on x. The highest order term is the ax^2 term with an exponent of 2, so this curve has degree 2 (ignore the term bx which has a smaller power).

What we are ultimately trying to show is that this approach scales for curves of degree n (of the form y = a_n x^n + … a_1 x + a_0). So, if we take n+1 coordinates on the (x,y) plane, then we can uniquely reconstruct the curve of this form entirely. Such degree n equations are also known as polynomials of degree n.

In order to generalise the approach to general degrees we need some kind of formula. This formula should take n+1 pairs of coordinates and return a polynomial of degree n. Fortunately, such a formula exists without us having to derive it ourselves, it is known as the Lagrange interpolation polynomial. Using the formula in the link, we can reconstruct any n degree polynomial using n+1 unique pairs of coordinates.

Going back to pizzas temporarily, it will become clear in the next section how this Lagrange interpolation procedure essentially describes the dissemination of one slice (corresponding to (x,y) coordinates) taken from a single pizza (the entire n-1 degree polynomial) among n participants. Running this procedure n times in parallel allows each entity to construct their entire pizza (or the eventual key pair).

Back to key generation

Intuitively, in the DKG procedure we want to distribute n key pairs among n participants. This effectively means running n parallel instances of a t-out-of-n Shamir Secret Sharing scheme. This secret sharing scheme is built entirely upon the polynomial interpolation technique that we described above.

In a single instance, we take the secret key to be the first coefficient of a polynomial of degree t-1 and the public key is a published value that depends on this secret key, but does not reveal the actual coefficient. Think of RSA, where we have a number N = pq for secret large prime numbers p,q, where N is public but does not reveal the actual factorisation. Notice that if the polynomial is reconstructed using the interpolation technique above, then we immediately learn the secret key, because the first coefficient will be made explicit.

Each secret sharing scheme publishes shares, where each share is a different evaluation of the polynomial (dependent on the entity i receiving the key share). These evaluations are essentially coordinates on the (x,y) plane.

By running n parallel instances of the secret sharing scheme, each entity receives n shares and then combines all of these to form their overall key pair (vk_i, sk_i).

The DKG procedure uses n parallel secret sharing procedures along with Pedersen commitments to distribute the key pairs. We explain in the next section how this procedure is part of the procedure for provisioning randomness beacons.

In summary, it is important to remember that each party in the DKG protocol generates a random secret key from the n shares that they receive, and they compute the corresponding public key from this. We will now explain how each entity uses this key pair to perform the cryptographic procedure that is used by the drand protocol.

Threshold signature scheme

Remember: a standard signature scheme considers a key-pair (vk,sk), where vk is a public verification key and sk is a private signing key. So, messages m signed with sk can be verified with vk. The security of the scheme ensures that it is difficult for anybody who does not hold sk to compute a valid signature for any message m.

A threshold signature scheme allows a set of users holding distributed key-pairs (vk_i,sk_i) to compute intermediate signatures u_i on a given message m.

Given knowledge of some number t of intermediate signatures u_i, a valid signature u on the message m can be reconstructed under the combined secret key sk. The public key vk can also be inferred using knowledge of the public keys vk_i, and then this public key can be used to verify u.

Again, think back to reconstructing the degree t-1 curves on graphs with t known coordinates. In this case, the coordinates correspond to the intermediate signatures u_i, and the signature u corresponds to the entire curve. For the actual signature schemes, the mathematics are much more involved than in the DKG procedure, but the principal is the same.

drand protocol

The n beacons that will take part in the drand project are identified. In the trusted setup phase, the DKG protocol from above is run, and each beacon effectively creates a key pair (vk_i, sk_i) for a threshold signature scheme. In other words, this key pair will be able to generate intermediate signatures that can be combined to create an entire signature for the system.

For each round (occurring once a minute, for example), the beacons agree on a signature u evaluated over a message containing the previous round’s signature and the current round’s number. This signature u is the result of combining the intermediate signatures u_i over the same message. Each intermediate signature u_i is created by each of the beacons using their secret sk_i.

Once this aggregation completes, each beacon displays the signature for the current round, along with the previous signature and round number. This allows any client to publicly verify the signature over this data to verify that the beacons honestly aggregate. This provides a chain of verifiable signatures, extending back to the first round of output. In addition, there are threshold signature schemes that output signatures that are indistinguishable from random sequences of bytes. Therefore, these signatures can be used directly as verifiable randomness for the applications we discussed previously.

What does drand use?

To instantiate the required threshold signature scheme, drand uses the (t,n)-BLS signature scheme of Boneh, Lynn and Shacham. In particular, we can instantiate this scheme in the elliptic curve setting using Barreto-Naehrig curves. Moreover, the BLS signature scheme outputs sufficiently large signatures that are randomly distributed, giving them enough entropy to be sources of randomness. Specifically the signatures are randomly distributed over 64 bytes.

BLS signatures use a specific form of mathematical operation known as a cryptographic pairing. Pairings can be computed in certain elliptic curves, including the Barreto-Naehrig curve configurations. A detailed description of pairing operations is beyond the scope of this blog post; though it is important to remember that these operations are integral to how BLS signatures work.

Concretely speaking, all drand cryptographic operations are carried out using a library built on top of Cloudflare's implementation of the bn256 curve. The Pedersen DKG protocol follows the design of Gennaro et al..

How does it work?

The randomness beacons are synchronised in rounds. At each round, a beacon produces a new signature u_i using its private key sk_i on the previous signature generated and the round ID. These signatures are usually broadcast on the URL drand.<host>.com/api/public. These signatures can be verified using the keys vk_i and over the same data that is signed. By signing the previous signature and the current round identifier, this establishes a chain of trust for the randomness beacon that can be traced back to the original signature value.

The randomness can be retrieved by combining the signatures from each of the beacons using the threshold property of the scheme. This reconstruction of the signature u from each intermediate signature u_i is done internally by the League of Entropy nodes. Each beacon broadcasts the entire signature u, that can be accessed over the HTTP endpoint above.

The drand beacon

As we mentioned at the start of this blog post, Cloudflare has launched our distributed randomness beacon. This beacon is part of a network of beacons from different institutions around the globe that form the League of Entropy.

The Cloudflare beacon uses LavaRand as its internal source of randomness for the DKG. Other League of Entropy drand beacons have their own sources of randomness.

Give me randomness!

The below API endpoints are obsolete. Please see https://drand.love for the most up-to-date documentation.

The drand beacon allows you to retrieve the latest random value from the League of Entropy using a simple HTTP request:

curl https://drand.cloudflare.com/api/public

The response is a JSON blob of the form:

{
    "round": 7,
    "previous": <hex-encoded-previous-signature>,
    "randomness": {
        "gid": 21,
        "point": <hex-encoded-new-signature>
    }
}

where, randomness.point is the signature u aggregated among the entire set of beacons.

The signature is computed as an evaluation of the message, and is comprised of the signature of the previous round, previous, the current round number, round, and the aggregated secret key of the system. This signature can be verified using the entire public key vk of the Cloudflare beacon, learned using another HTTP request:

curl https://drand.cloudflare.com/api/public

There are eight collaborators in the League of Entropy. You can learn the current round of randomness (or the system’s public key) by querying these beacons on the HTTP endpoints listed above.

Randomness & the future

Cloudflare will continue to take an active role in the drand project, both as a contributor and by running a randomness beacon with the League of Entropy. The League of Entropy is a worldwide joint effort of individuals and academic institutions. We at Cloudflare believe it can help us realize the mission of helping Build a Better Internet. For more information on Cloudflare's participation in the League of Entropy, visit https://leagueofentropy.com or read Dina's blog post.

Cloudflare would like to thank all of their collaborators in the League of Entropy; from EPFL, UChile, Kudelski Security and Protocol Labs. This work would not have been possible without the work of those who contributed to the open-source drand project. We would also like to thank and appreciate the work of Gabbi Fisher, Brendan McMillion, and Mahrud Sayrafi in launching the Cloudflare randomness beacon.

The Cloudflare Blog