The Workers Distributed Data team has been hard at work since we gave you an update last November. Today, we’d like to share with you some of the stuff that has recently shipped in Workers KV: a new feature and an internal change that should significantly improve latency in some cases. Let’s dig in!
KV Metadata
Workers KV has a fairly straightforward interface: you can put keys and values into KV, and then fetch the value back out by key:
await contents.put(“index.html”, someHtmlContent);
await contents.put(“index.css”, someCssContent);
await contents.put(“index.js”, someJsContent);
// later
let index = await contents.get(“index.html”);
Pretty straightforward. But as you can see from this example, you may store different kinds of content in KV, even if the type is identical. All of the values are strings, but one is HTML, one is CSS, and one is JavaScript. If we were going to serve this content to users, we would have to construct a response. And when we do, we have to let the client know what the content type of that request is: text/html for HTML, text/css for CSS, and text/javascript for JavaScript. If we serve the incorrect content type to our clients, they won’t display the pages correctly.
One possible solution to this problem is using the mime package from npm. This lets us write code that looks like this:
// pathKey is a variable with a value like “index.html”
const mimeType = mime.getType(pathKey) || ‘text/plain’
Nice and easy. But there are some drawbacks. First of all, because we have to detect the content type at runtime which means we’re figuring this out on every request. It would be nicer to figure it out only once instead. Second, if we look at how the package implements getType, it does this by including an array of possible extensions and their types. This means that this array is included in our worker, taking up 9kb of space. That’s also less than ideal.
But now, we have a better way. Workers KV will now allow you to add some extra JSON to each key/value pair, to use however you’d like. So we could start inserting the contents of those files like this, instead:
await contents.put(“index.html”, someHtmlContent, {“Content-Type”: “text/html”});
await contents.put(“index.css”, someCssContent, {“Content-Type”: “text/css”});
await contents.put(“index.js”, someJsContent, {“Content-Type”: “text/javascript”});
You could determine these content types in various ways: by looking at the file extension like the mime package, or by using a library that inspects the file’s contents to figure out its type like libmagic. Regardless, the type would be stored in KV alongside the contents of the file. This way, there’s no need to recompute the type on every request. Additionally, the detection code would live in your uploading tool, not in your worker, creating a smaller bundle. Win-win!
The worker code would pass along this metadata by using a new method:
let {value, metadata} = await contents.getWithMetadata(“index.js”);
Here, value
would have the contents, like before. But metadata
contains the JSON of the metadata that was stored: metadata[“Content-Type”]
would return “text/javascript”
. You’ll also see this metadata come back when you make a list request as well.
Given that you can store arbitrary JSON, it’s useful for more than just content types: we’ve had folks post to the forums asking about etags, for example. We’re excited to see what you do with this new capability!
Significantly faster writes
Our documentation states:
Very infrequently read values are stored centrally, while more popular values are maintained in all of our data centers around the world.
This is why Workers KV is optimized for higher read volumes than writes. We distribute popular data across the globe, close to users wherever they are. However, for infrequently accessed data, we store the data in a central location until access is requested. Each write (and delete) must go back to the central data store, as do reads on less popular values. The central store was located in the United States, and so the speed for writes would be variable. In the US, it would be much faster than say, in Europe or Asia.
Recently, we have rolled out a major internal change. We have added a second source of truth on the European continent. These two sources of truth will still coordinate between themselves, ensuring that any data you write or update will be available in both places as soon as possible. But latencies from Europe, as well as places closer to Europe than the United States, should be much faster, as they do not have to go the full way to the US.
How much faster? Well, it will depend on your workload. Several other Cloudflare products use Workers KV, and here’s a graph of response times from one of them:
As you can see, there’s a sharp drop in the graph when the switchover happened.
We can also measure this time across all customers:
The long tail has been significantly shortened. (We’ve redacted the exact numbers, but you can still see the magnitude of the changes.)
More to come
The distributed data team has been working on some additional things, but we’re not quite ready to share them with you yet! We hope that you’ll find these changes make Workers KV even better for you, and we’ll be sharing more updates on the blog as we ship.