
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Tue, 07 Apr 2026 19:35:28 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Automatically generating Cloudflare’s Terraform provider]]></title>
            <link>https://blog.cloudflare.com/automatically-generating-cloudflares-terraform-provider/</link>
            <pubDate>Tue, 24 Sep 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ The Cloudflare Terraform provider used to be manually maintained. With the help of our existing OpenAPI code generation pipeline, we’re now automatically generating the provider for better  ]]></description>
            <content:encoded><![CDATA[ <p>In November 2022, we announced the transition to <a href="https://blog.cloudflare.com/open-api-transition/"><u>OpenAPI Schemas for the Cloudflare API</u></a>. Back then, we had an audacious goal to make the OpenAPI schemas the source of truth for our SDK ecosystem and reference documentation. During 2024’s Developer Week, we backed this up by <a href="https://blog.cloudflare.com/workers-production-safety/"><u>announcing that our SDK libraries are now automatically generated</u></a> from these OpenAPI schemas. Today, we’re excited to announce the latest pieces of the ecosystem to now be automatically generated — the Terraform provider and API reference documentation.</p><p>This means that the moment a new feature or attribute is added to our products and the team documents it, you’ll be able to see how it’s meant to be used across our SDK ecosystem <i>and</i> make use of it immediately. No more delays. No more lacking coverage of API endpoints.</p><p>You can find the new documentation site at <a href="https://developers.cloudflare.com/api-next/"><u>https://developers.cloudflare.com/api-next/</u></a>, and you can try the preview release candidate of the Terraform provider by <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/5.0.0-alpha1"><u>installing 5.0.0-alpha1</u></a>.</p>
    <div>
      <h2>Why Terraform? </h2>
      <a href="#why-terraform">
        
      </a>
    </div>
    <p>For anyone who is unfamiliar with <a href="https://www.terraform.io/"><u>Terraform</u></a>, it is a tool for managing your infrastructure as code, much like you would with your application code. Many of our customers (big and small) rely on Terraform to orchestrate their infrastructure in a technology-agnostic way. Under the hood, it is essentially an HTTP client with lifecycle management built in, which means it makes use of our publicly documented APIs in a way that understands how to create, read, update and delete for the life of the resource. </p>
    <div>
      <h2>Keeping Terraform updated — the old way</h2>
      <a href="#keeping-terraform-updated-the-old-way">
        
      </a>
    </div>
    <p>Historically, Cloudflare has manually maintained a Terraform provider, but since the provider internals require their own unique way of doing things, responsibility for maintenance and support has landed on the shoulders of a handful of individuals. The service teams always had difficulties keeping up with the number of changes, due to the amount of cognitive overhead required to ship a single change in the provider. In order for a team to get a change to the provider, it took a minimum of 3 pull requests (4 if you were adding support to <a href="https://github.com/cloudflare/cf-terraforming"><u>cf-terraforming</u></a>).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6spvs4QAkY7BXLNfABDSQs/838f9b224838cd174376eb413cce7848/image6.png" />
          </figure><p>Even with the 4 pull requests completed, it didn’t offer guarantees on coverage of all available attributes, which meant small yet important details could be forgotten and not exposed to customers, causing frustration when trying to configure a resource.</p><p>To address this, our Terraform provider needed to be relying on the same OpenAPI schemas that the rest of our SDK ecosystem was <a href="https://blog.cloudflare.com/lessons-from-building-an-automated-sdk-pipeline/"><u>already benefiting from</u></a>.</p>
    <div>
      <h2>Updating Terraform automatically</h2>
      <a href="#updating-terraform-automatically">
        
      </a>
    </div>
    <p>The thing that differentiates Terraform from our SDKs is that it manages the lifecycle of resources. With that comes a new range of problems related to known values and managing differences in the request and response payloads. Let’s compare the two different approaches of creating a new DNS record and fetching it back.</p><p>With our Go SDK:</p>
            <pre><code>// Create the new record
record, _ := client.DNS.Records.New(context.TODO(), dns.RecordNewParams{
	ZoneID: cloudflare.F("023e105f4ecef8ad9ca31a8372d0c353"),
	Record: dns.RecordParam{
		Name:    cloudflare.String("@"),
		Type:    cloudflare.String("CNAME"),
        Content: cloudflare.String("example.com"),
	},
})


// Wasteful fetch, but shows the point
client.DNS.Records.Get(
	context.Background(),
	record.ID,
	dns.RecordGetParams{
		ZoneID: cloudflare.String("023e105f4ecef8ad9ca31a8372d0c353"),
	},
)
</code></pre>
            <p>
And with Terraform:</p>
            <pre><code>resource "cloudflare_dns_record" "example" {
  zone_id = "023e105f4ecef8ad9ca31a8372d0c353"
  name    = "@"
  content = "example.com"
  type    = "CNAME"
}</code></pre>
            <p>On the surface, it looks like the Terraform approach is simpler, and you would be correct. The complexity of knowing how to create a new resource and maintain changes are handled for you. However, the problem is that for Terraform to offer this abstraction and data guarantee, all values must be known at apply time. That means that even if you’re not using the <code>proxied</code> value, Terraform needs to know what the value needs to be in order to save it in the state file and manage that attribute going forward. The error below is what Terraform operators commonly see from providers when the value isn’t known at apply time.</p>
            <pre><code>Error: Provider produced inconsistent result after apply

When applying changes to example_thing.foo, provider "provider[\"registry.terraform.io/example/example\"]"
produced an unexpected new value: .foo: was null, but now cty.StringVal("").</code></pre>
            <p>Whereas when using the SDKs, if you don’t need a field, you just omit it and never need to worry about maintaining known values.</p><p>Tackling this for our OpenAPI schemas was no small feat. Since introducing Terraform generation support, the quality of our schemas has improved by an order of magnitude. Now we are explicitly calling out all default values that are present, variable response properties based on the request payload, and any server-side computed attributes. All of this means a better experience for anyone that interacts with our APIs.</p>
    <div>
      <h3>Making the jump from terraform-plugin-sdk to terraform-plugin-framework</h3>
      <a href="#making-the-jump-from-terraform-plugin-sdk-to-terraform-plugin-framework">
        
      </a>
    </div>
    <p>To build a Terraform provider and expose resources or data sources to operators, you need two main things: a provider server and a provider.</p><p>The provider server takes care of exposing a <a href="https://github.com/hashicorp/terraform/blob/main/docs/plugin-protocol/README.md"><u>gRPC server</u></a> that Terraform core (via the CLI) uses to communicate when managing resources or reading data sources from the operator provided configuration.</p><p>The provider is responsible for wrapping the resources and data sources, communicating with the remote services, and managing the state file. To do this, you either rely on the <a href="https://github.com/hashicorp/terraform-plugin-sdk"><u>terraform-plugin-sdk</u></a> (commonly referred to as SDKv2) or <a href="https://github.com/hashicorp/terraform-plugin-framework"><u>terraform-plugin-framework</u></a>, which includes all the interfaces and methods provided by Terraform in order to manage the internals correctly. The decision as to which plugin you use depends on the age of your provider. SDKv2 has been around longer and is what most Terraform providers use, but due to the age and complexity, it has many core unresolved issues that must remain in order to facilitate backwards compatibility for those who rely on it. <code>terraform-plugin-framework</code> is the new version that, while lacking the breadth of features SDKv2 has, provides a more Go-like approach to building providers and addresses many of the underlying bugs in SDKv2.</p><p><i>(For a deeper comparison between SDKv2 and the framework, you can check out a </i><a href="https://www.youtube.com/watch?v=4P69E44mJGo"><i><u>conversation between myself and John Bristowe from Octopus Deploy</u></i></a><i>.)</i></p><p>The majority of the Cloudflare Terraform provider is built using SDKv2, but at the beginning of 2023, we <a href="https://github.com/cloudflare/terraform-provider-cloudflare/pull/2170"><u>took the plunge to multiplex</u></a> and offer both in our provider. To understand why this was needed, we have to understand a little about SDKv2. The way SDKv2 is structured isn't really conducive to representing null or "unset" values consistently and reliably. You can use the <a href="https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema#ResourceData.GetRawConfig"><u>experimental ResourceData.GetRawConfig</u></a> to check whether the value is set, null, or unknown in the config, but writing it back as null isn't really supported.</p><p>This caveat first popped up for us when the Edge Rules Engine (Rulesets) started onboarding new services and those services needed to support API responses that contained booleans in an unset (or missing), <code>true</code>, or <code>false</code> state each with their own reasoning and purpose. While this isn’t a conventional API design at Cloudflare, it is a valid way to do things that we should be able to work with. However, as mentioned above, the SDKv2 provider couldn't. This is because when a value isn't present in the response or read into state, it gets a Go-compatible zero value for the default. This showed up as the inability to unset values after they had been written to state as false values (and vice versa).</p><p>The only solution we have here to reliably use the three states of those boolean values is to migrate to the <code>terraform-plugin-framework</code>, which has the <a href="https://github.com/hashicorp/terraform-plugin-framework/blob/main/types/bool_value.go"><u>correct implementation of writing back unset values</u></a>.</p><p>Once we started adding more functionality using <code>terraform-plugin-framework</code> in the old provider, it was clear that it was a better developer experience, so we <a href="https://github.com/cloudflare/terraform-provider-cloudflare/pull/2871"><u>added a ratchet</u></a> to prevent SDKv2 usage going forward to get ahead of anyone unknowingly setting themselves up to hit this issue.</p><p>When we decided that we would be automatically generating the Terraform provider, it was only fitting that we also brought all the resources over to be based on the <code>terraform-plugin-framework</code> and leave the issues from SDKv2 behind for good. This did complicate the migration as with the improved internals came changes to major components like the schema and <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete"><u>CRUD operations</u></a> that we needed to familiarize ourselves with. However, it has been a worthwhile investment because by doing so, we’ve future-proofed the foundations of the provider and are now making fewer compromises on a great Terraform experience due to buggy, legacy internals.</p>
    <div>
      <h3>Iteratively finding bugs </h3>
      <a href="#iteratively-finding-bugs">
        
      </a>
    </div>
    <p>One of the common struggles with code generation pipelines is that unless you have existing tools that implement your new thing, it’s hard to know if it works or is reasonable to use. Sure, you can also generate your tests to exercise the new thing, but if there is a bug in the pipeline, you are very likely to not see it as a bug as you will be generating test assertions that show the bug is expected behavior.</p><p>One of the essential feedback loops we have had is the existing acceptance test suite. All resources within the existing provider had a mix of regression and functionality tests. Best of all, as the test suite is creating and managing real resources, it was very easy to know whether the outcome was a working implementation or not by looking at the HTTP traffic to see whether the API calls were accepted by the remote endpoints. Getting the test suite ported over was only a matter of copying over all the existing tests and checking for any type assertion differences (such as list to single nested list) before kicking off a test run to determine whether the resource was working correctly.</p><p>While the centralized schema pipeline was a huge quality of life improvement for having schema fixes propagate to the whole ecosystem almost instantly, it couldn’t help us solve the largest hurdle, which was surfacing bugs that hide other bugs. This was time-consuming because when fixing a problem in Terraform, you have three places where you can hit an error:</p><ol><li><p>Before any API calls are made, Terraform implements logical schema validation and when it encounters validation errors, it will immediately halt.</p></li><li><p>If any API call fails, it will stop at the CRUD operation and return the diagnostics, immediately halting.</p></li><li><p>After the CRUD operation has run, Terraform then has checks in place to ensure all values are known.</p></li></ol><p>That means that if we hit the bug at step 1 and then fixed the bug, there was no guarantee or way to tell that we didn’t have two more waiting for us. Not to mention that if we found a bug in step 2 and shipped a fix, that it wouldn’t then identify a bug in the first step on the next round of testing.</p><p>There is no silver bullet here and our workaround was instead to notice patterns of problems in the schema behaviors and apply CI lint rules within the OpenAPI schemas before it got into the code generation pipeline. Taking this approach incrementally cut down the number of bugs in step 1 and 2 until we were largely only dealing with the type in step 3.</p>
    <div>
      <h3>A more reusable approach to model and struct conversion </h3>
      <a href="#a-more-reusable-approach-to-model-and-struct-conversion">
        
      </a>
    </div>
    <p>Within Terraform provider CRUD operations, it is fairly common to see boilerplate like the following:</p>
            <pre><code>var plan ThingModel
diags := req.Plan.Get(ctx, &amp;plan)
resp.Diagnostics.Append(diags...)
if resp.Diagnostics.HasError() {
	return
}

out, err := r.client.UpdateThingModel(ctx, client.ThingModelRequest{
	AttrA: plan.AttrA.ValueString(),
	AttrB: plan.AttrB.ValueString(),
	AttrC: plan.AttrC.ValueString(),
})
if err != nil {
	resp.Diagnostics.AddError(
		"Error updating project Thing",
		"Could not update Thing, unexpected error: "+err.Error(),
	)
	return
}

result := convertResponseToThingModel(out)
tflog.Info(ctx, "created thing", map[string]interface{}{
	"attr_a": result.AttrA.ValueString(),
	"attr_b": result.AttrB.ValueString(),
	"attr_c": result.AttrC.ValueString(),
})

diags = resp.State.Set(ctx, result)
resp.Diagnostics.Append(diags...)
if resp.Diagnostics.HasError() {
	return
}</code></pre>
            <p>At a high level:</p><ul><li><p>We fetch the proposed updates (known as a plan) using <code>req.Plan.Get()</code></p></li><li><p>Perform the update API call with the new values</p></li><li><p>Manipulate the data from a Go type into a Terraform model (<code>convertResponseToThingModel</code>)</p></li><li><p>Set the state by calling <code>resp.State.Set()</code></p></li></ul><p>Initially, this doesn’t seem too problematic. However, the third step where we manipulate the Go type into the Terraform model quickly becomes cumbersome, error-prone, and complex because all of your resources need to do this in order to swap between the type and associated Terraform models.</p><p>To avoid generating more complex code than needed, one of the improvements featured in our provider is that all CRUD methods use unified <code>apijson.Marshal, apijson.Unmarshal</code>, and <code>apijson.UnmarshalComputed</code> methods that solve this problem by centralizing the conversion and handling logic based on the struct tags.</p>
            <pre><code>var data *ThingModel

resp.Diagnostics.Append(req.Plan.Get(ctx, &amp;data)...)
if resp.Diagnostics.HasError() {
	return
}

dataBytes, err := apijson.Marshal(data)
if err != nil {
	resp.Diagnostics.AddError("failed to serialize http request", err.Error())
	return
}
res := new(http.Response)
env := ThingResultEnvelope{*data}
_, err = r.client.Thing.Update(
	// ...
)
if err != nil {
	resp.Diagnostics.AddError("failed to make http request", err.Error())
	return
}

bytes, _ := io.ReadAll(res.Body)
err = apijson.UnmarshalComputed(bytes, &amp;env)
if err != nil {
	resp.Diagnostics.AddError("failed to deserialize http request", err.Error())
	return
}
data = &amp;env.Result

resp.Diagnostics.Append(resp.State.Set(ctx, &amp;data)...)</code></pre>
            <p>Instead of needing to generate hundreds of instances of type-to-model converter methods, we can instead decorate the Terraform model with the correct tags and handle marshaling and unmarshaling of the data consistently. It’s a minor change to the code that in the long run makes the generation more reusable and readable. As an added benefit, this approach is great for bug fixing as once you identify a bug with a particular type of field, fixing that in the unified interface fixes it for other occurrences you may not yet have found.</p>
    <div>
      <h2>But wait, there’s more (docs)!</h2>
      <a href="#but-wait-theres-more-docs">
        
      </a>
    </div>
    <p>To top off our OpenAPI schema usage, we’re tightening the SDK integration with our <a href="https://developers.cloudflare.com/api-next/"><u>new API documentation site</u></a>. It’s using the same pipeline we’ve invested in for the last two years while addressing some of the common usage issues.</p>
    <div>
      <h3>SDK aware </h3>
      <a href="#sdk-aware">
        
      </a>
    </div>
    <p>If you’ve used our API documentation site, you know we give you examples of interacting with the API using command line tools like curl. This is a great starting point, but if you’re using one of the SDK libraries, you need to do the mental gymnastics to convert it to the method or type definition you want to use. Now that we’re using the same pipeline to generate the SDKs <b>and</b> the documentation, we’re solving that by providing examples in all the libraries you <i>could</i> use — not just curl.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2SNCehksc30kXXQvVKYC47/a3a6071be64d006a2da9b2e615d143ae/image2.png" />
            
            </figure><p><sup><i>Example using cURL to fetch all zones.</i></sup></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/50PeyK8oOLb51mCLF4ikds/764db96a24232b611ec88d5ff8f8844f/image4.png" />
            
            </figure><p><sup><i>Example using the Typescript library to fetch all zones.</i></sup></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5rQn6OY3R1yi5iot1oxti4/09cf62ea46ede21d1541b5012497efdb/image5.png" />
            
            </figure><p><sup><i>Example using the Python library to fetch all zones.</i></sup></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Na9y9ta3fLBMEAvJK4uaH/41ecf061a5a088f4bdb313d70b173a9a/image7.png" />
            
            </figure><p><sup><i>Example using the Go library to fetch all zones.</i></sup></p><p>With this improvement, we also remember the language selection so if you’ve selected to view the documentation using our Typescript library and keep clicking around, we keep showing you examples using Typescript until it is swapped out.</p><p>Best of all, when we introduce new attributes to existing endpoints or add SDK languages, this documentation site is automatically kept in sync with the pipeline. It is no longer a huge effort to keep it all up to date.</p>
    <div>
      <h3>Faster and more efficient rendering</h3>
      <a href="#faster-and-more-efficient-rendering">
        
      </a>
    </div>
    <p>A problem we’ve always struggled with is the sheer number of API endpoints and how to represent them. As of this post, we have 1,330 endpoints, and for each of those endpoints, we have a request payload, a response payload, and multiple types associated with it. When it comes to rendering this much information, the solutions we’ve used in the past have had to make tradeoffs in order to make parts of the representation work.</p><p>This next iteration of the API documentation site addresses this is a couple of ways:</p><ul><li><p>It's implemented as a modern React application that pairs an interactive client-side experience with static pre-rendered content, resulting in a quick initial load and fast navigation. (Yes, it even works without JavaScript enabled!). </p></li><li><p>It fetches the underlying data incrementally as you navigate.</p></li></ul><p>By solving this foundational issue, we’ve unlocked other planned improvements to the documentation site and SDK ecosystem to improve the user experience without making tradeoffs like we’ve needed to in the past. </p>
    <div>
      <h3>Permissions</h3>
      <a href="#permissions">
        
      </a>
    </div>
    <p>One of the most requested features to be re-implemented into the documentation site has been minimum required permissions for API endpoints. One of the previous iterations of the documentation site had this available. However, unknown to most who used it, the values were manually maintained and were regularly incorrect, causing support tickets to be raised and frustration for users.</p><p>Inside Cloudflare's identity and access management system, answering the question “what do I need to access this endpoint” isn’t a simple one. The reason for this is that in the normal flow of a request to the control plane, we need two different systems to provide parts of the question, which can then be combined to give you the full answer. As we couldn’t initially automate this as part of the OpenAPI pipeline, we opted to leave it out instead of having it be incorrect with no way of verifying it.</p><p>Fast-forward to today, and we’re excited to say endpoint permissions are back! We built some new tooling that abstracts answering this question in a way that we can integrate into our code generation pipeline and have all endpoints automatically get this information. Much like the rest of the code generation platform, it is focused on having service teams own and maintain high quality schemas that can be reused with value adds introduced without any work on their behalf.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/641gSS5MLQpCvEANYXcVK6/447cf0b873ecb60fdbbc415df0424363/image3.png" />
            
            </figure>
    <div>
      <h2>Stop waiting for updates</h2>
      <a href="#stop-waiting-for-updates">
        
      </a>
    </div>
    <p>With these announcements, we’re putting an end to waiting for updates to land in the SDK ecosystem. These new improvements allow us to streamline the ability of new attributes and endpoints the moment teams document them. So what are you waiting for? Check out the <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/5.0.0-alpha1"><u>Terraform provider</u></a> and <a href="https://developers.cloudflare.com/api-next/"><u>API documentation site</u></a> today.</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[SDK]]></category>
            <category><![CDATA[Terraform]]></category>
            <category><![CDATA[Open API]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">1M8zVthnUiMpJpGylQuptu</guid>
            <dc:creator>Jacob Bednarz</dc:creator>
        </item>
        <item>
            <title><![CDATA[Lessons from building an automated SDK pipeline]]></title>
            <link>https://blog.cloudflare.com/lessons-from-building-an-automated-sdk-pipeline/</link>
            <pubDate>Tue, 23 Apr 2024 13:00:29 GMT</pubDate>
            <description><![CDATA[ During Developer Week 2024, Cloudflare announced revamped SDKs, automatically generated from OpenAPI schemas. This post details the pipeline's workings and lessons learned. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>In case you missed the <a href="/workers-production-safety">announcement</a> from Developer Week 2024, Cloudflare is now offering software development kits (SDKs) for <a href="https://github.com/cloudflare/cloudflare-typescript">Typescript</a>, <a href="https://github.com/cloudflare/cloudflare-go">Go</a> and <a href="https://github.com/cloudflare/cloudflare-python">Python</a>. As a reminder, you can get started by installing the packages.</p>
            <pre><code>// Typescript
npm install cloudflare

// Go
go get -u github.com/cloudflare/cloudflare-go/v2

// Python
pip install cloudflare</code></pre>
            <p>Instead of using a tool like <code>curl</code> or Postman to create a new zone in your account, you can use one of the SDKs in a language that you’re already comfortable with or that integrates directly into your existing codebase.</p>
            <pre><code>import Cloudflare from 'cloudflare';

const cloudflare = new Cloudflare({
  apiToken: process.env['CLOUDFLARE_API_TOKEN']
});

const newZone = await cloudflare.zones.create({
  account: { id: '023e105f4ecef8ad9ca31a8372d0c353' },
  name: 'example.com',
  type: 'full',
});</code></pre>
            <p>Since their inception, our SDKs have been manually maintained by one or more dedicated individuals. For every product addition or improvement, we needed to orchestrate a series of manually created pull requests to get those changes into customer hands. This, unfortunately, created an imbalance in the frequency and quality of changes that made it into the SDKs. Even though the product teams would drive some of these changes, not all languages were covered and the SDKs fell to either community-driven contributions or to the maintainers of the libraries to cover the remaining languages. Internally, we too felt this pain when using our own services and, instead of covering all languages, decided to rally our efforts behind the primary SDK (Go) to ensure that at least one of these libraries was in a good state.</p><p>This plan worked for newer products and additions to the Go SDK, which in turn helped tools like our <a href="https://github.com/cloudflare/terraform-provider-cloudflare/">Terraform Provider</a> stay mostly up to date, but even this focused improvement was still very taxing and time-consuming for internal teams to maintain. On top of this, the process didn’t provide any guarantees on coverage, parity, or correctness because the changes were still manually maintained and susceptible to human error. Regardless of the size of contribution, a team member would still need to coordinate a minimum of 4 pull requests (shown in more depth below) before a change was considered shipped and needed deep knowledge of the relationship between the dependencies in order to get it just right.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1wDwD07NI1poZCNaYHnjAq/0b80dab8da71ccabef400a6a7610d569/image1-19.png" />
            
            </figure><p><i>The pull requests previously required to ship an SDK change.</i></p><p>Following the completion of <a href="/open-api-transition">our transition to OpenAPI from JSON Hyper-Schema</a>, we caught up internally and started discussing what else OpenAPI could help us unlock. It was at that point that we set the lofty goal of using OpenAPI for more than just our documentation. It was time to use OpenAPI to generate our SDKs.</p><p>Before we dove headfirst into generating SDKs, we established some guiding principles. These would be non-negotiable and determine where we spent our effort.</p>
    <div>
      <h3>You should not be able to tell what underlying language generated the SDK</h3>
      <a href="#you-should-not-be-able-to-tell-what-underlying-language-generated-the-sdk">
        
      </a>
    </div>
    <p>This was important for us because too often companies build SDKs using automation. Not only do you end up with SDKs that are flavored based on generator language, but the SDKs then lack the language nuances or patterns that are noticeable to users familiar with the language.</p><p>For example, a Rubyist may use the following <code>if</code> expression:</p>
            <pre><code>do_something if bar?</code></pre>
            <p>Whereas most generators do not have this context and would instead default to the standard case where <code>if/else</code> expressions are spread over multiple lines.</p>
            <pre><code>if bar?
  do_something
end</code></pre>
            <p>Despite being a simple and non-material example, it demonstrates a nuance that a machine cannot decipher on its own. This is terrible for developers because you’re then no longer only thinking about how to solve the original task at hand, but you also end up tailoring your code to match how the generator has built the SDK and potentially lose out on the language features you would normally use. The problem is made significantly worse if you’re using a strongly typed language to generate a language without types, since it will be structuring and building code in a way that types are expected but never used.</p>
    <div>
      <h3>Lowering the mean time to uniform support</h3>
      <a href="#lowering-the-mean-time-to-uniform-support">
        
      </a>
    </div>
    <p>When a new feature is added to a product, it’s great that we add API support initially. However, if that new feature or product never makes it to whatever language SDK you are using to drive your API calls, it’s as good as non-existent. Similarly, not every use case is for infrastructure-as-code tools like Terraform, so we needed a better way of meeting our customers with uniformity where they choose to integrate with our services.</p><p>By extension, we want uniformity in the way the namespaces and methods are constructed. Ignoring the language-specific parts, if you’re using one of our SDKs and you are looking for the ability to list all DNS records, you should be able to trust that the method will be in the <code>dns</code> namespace and that to find all records, you can call a <code>list</code> method regardless of which one you are using. Example:</p>
            <pre><code>// Go
client.DNS.Records.List(...)

// Typescript
client.dns.records.list(...)

// Python
client.dns.records.list(...)</code></pre>
            <p>This leads to less time digging through documentation to find what invocation you need and more time using the tools you’re already familiar with.</p>
    <div>
      <h3>Fast feedback loops, clear conventions</h3>
      <a href="#fast-feedback-loops-clear-conventions">
        
      </a>
    </div>
    <p>Cloudflare has <b>a lot</b> of APIs; everything is backed by an API <i>somewhere</i>. However, not all Cloudflare APIs are designed with the same conventions in mind. Those APIs that are on the critical path and regularly experience traffic surges or malformed input are naturally more hardened and more resilient than those that are infrequently used. This creates a divergence in quality of the endpoint, which shouldn’t be the case.</p><p>Where we have learned a lesson or improved a system through a best practice, we should make it easy for others to be aware of and opt into that pattern with little friction at the earliest possible time, ideally as they are proposing the change in CI. That is why when we built the OpenAPI pipeline for API schemas, we built in mechanisms to allow applying linting rules, using <a href="https://redocly.com/docs/cli/">redocly CLI</a>, that will either warn the engineer or block them entirely, depending on the severity of the violation.</p><p>For example, we want to encourage usage of fine grain API tokens, so we should present those authentication schemes first and ensure they are supported for new endpoints. To enforce this, we can write a <a href="https://redocly.com/docs/cli/configuration/reference/plugins/">redocly plugin</a>:</p>
            <pre><code>module.exports = {
    id: 'local',
    assertions: {
        apiTokenAuthSupported: (value, options, location) =&gt; {
            for (const i in value) {
                if (value.at(i)?.hasOwnProperty("api_token")) {
                    return [];
                }
            }

            return [{message: 'API Token should be defined as an auth method', location}];
        },
        apiTokenAuthDefinedFirst: (value, options, location) =&gt; {
            if (!value.at(0)?.hasOwnProperty("api_token")) {
                return [{message: 'API Tokens should be the first listed Security Option', location}];
            }

            return [];
        },
    },
};</code></pre>
            <p>And the rule configuration:</p>
            <pre><code>rule/security-options-defined:
  severity: error
  subject:
    type: Operation
    property: security
  where:
  - subject:
    type: Operation
    property: security
    assertions:
      defined: true
  assertions:
    local/apiTokenAuthSupported: {}
    local/apiTokenAuthDefinedFirst: {}</code></pre>
            <p>In this example, should a team forget to put the API token authentication scheme first, or define it at all, the CI run will fail. Teams are provided a helpful failure message with a link to the conventions to discover more if they need to understand why the change is recommended.</p><p>These lints can be used for style conventions, too. For our documentation descriptions, we like descriptions to start with a capital letter and end in a period. Again, we can add a lint to enforce this requirement.</p>
            <pre><code>module.exports = {
    id: 'local',
    assertions: {
        descriptionIsFormatted: (value, options, location) =&gt; {
            for (const i in value) {
                if (/^[A-Z].*\.$/.test(value)) {
                    return [];
                }
            }

            return [{message: 'Descriptions should start with a capital and end in a period.', location}];
        },
    },
};</code></pre>
            
            <pre><code>rule/security-options-defined:
  severity: error
  subject:
    type: Schema
    property: description
  assertions:
    local/descriptionIsFormatted: {}</code></pre>
            <p>This makes shipping endpoints of the same quality much easier and prevents teams needing to sort through all the API design or resiliency patterns we may have introduced over the years – possibly even before they joined Cloudflare.</p>
    <div>
      <h2>Building the generation machine</h2>
      <a href="#building-the-generation-machine">
        
      </a>
    </div>
    <p>Once we had our guiding principles, we started doing some analysis of our situation and saw that if we decided to build the solution entirely in house, we would be at least 6–9 months away from a single high quality SDK with the potential for additional follow-up work each time we had a new language addition. This wasn’t acceptable and prevented us from meeting the requirement of needing a low-cost followup for additional languages, so we explored the OpenAPI generation landscape.</p><p>Due to the size and complexity of our schemas, we weren’t able to use most off the shelf products. We tried a handful of solutions and workarounds, but we weren’t comfortable with any of the options; that was, until we tried <a href="https://www.stainlessapi.com/?ref=cloudflare_blog">Stainless</a>. Founded by one of the engineers that built what many consider to be the best-in-class API experiences at Stripe, Stainless is dedicated to generating SDKs. If you've used the OpenAI <a href="https://github.com/openai/openai-python">Python</a> or <a href="https://github.com/openai/openai-node">Typescript</a> SDKs, you've used an SDK generated by Stainless.</p><p>The way the platform offering works is that you bring your OpenAPI schemas and map them to methods with the configuration file. Those inputs then get fed into the generation engine to build your SDKs.</p>
            <pre><code>resources:
  zones:
    methods:
      list: get /zones</code></pre>
            <p>The configuration above would allow you to generate various <code>client.zones.list()</code> operations across your SDKs.</p><p>This approach means we can do the majority of our changes using the existing API schemas, but if there is an SDK-specific issue, we can modify that behavior on a per-SDK basis using the configuration file.</p><p>An added benefit of using the Stainless generation engine is that it gives us a clear line of responsibility when discussing where a change should be made.</p><ul><li><p><b>Service team:</b> Knows their service best and manages the representation for end users.</p></li><li><p><b>API team:</b> Understands and implements best practices for APIs and SDK conventions, builds centralized tooling or components within the platform for all teams, and translates service mappings to align with Stainless.</p></li><li><p><b>Stainless:</b> Provides a simple interface to generate SDKs consistently.</p></li></ul><p>The decision to use Stainless has allowed us to move our focus from building the generation engine to instead building high-quality schemas to describe our services. In the span of a few months, we have gone from inconsistent, manually maintained SDKs to automatically shipping three language SDKs with hands-off updates freely flowing from the internal teams. Best of all, it is now a single pull request workflow for the majority of our changes – even if we were to add a new language or integration to the pipeline!</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/pcaHSceyj5ah8LygAlB3Y/34610034b6c955b4b61ee1b6f7d50627/image2-17.png" />
            
            </figure><p><i>Just a single pull request is now required to ship an SDK change.</i></p>
    <div>
      <h2>Lessons from our journey, for yours</h2>
      <a href="#lessons-from-our-journey-for-yours">
        
      </a>
    </div>
    
    <div>
      <h3>Mass updates, made easy</h3>
      <a href="#mass-updates-made-easy">
        
      </a>
    </div>
    <p>Depending on the age of your APIs, you will have a diverging history of how they are represented to customers. That may be as simple as path parameters being inconsistent or perhaps something more complex like different HTTP methods for updates. While you can handle these individually at any sort of scale, that just isn’t feasible. As of this post, Cloudflare offers roughly 1,300 publicly documented endpoints, and we needed a more automatable solution. For us, that was codemods. Codemods are a way of applying transformations to perform large scale <a href="https://www.cloudflare.com/learning/cloud/how-to-refactor-applications/">refactoring</a> of your codebase. This allows you to programmatically rewrite expressions, syntax or other parts of your code without having to manually go through every file. Think of it like find and replace, but on steroids and with more context of the underlying language constructs.</p><p>We started with a tool called <a href="https://comby.dev/">comby</a>. We wrapped it in a custom CLI tool that knew how to speak to our version control endpoints and wired it in a way that provides a comby configuration TOML file, pull request description, and commit message for each transformation we needed to apply. Here is a sample comby configuration where we updated the URI paths to be consistently suffixed with _id instead of other variations (<code>_identifier</code>, <code>Identifier</code>, etc.) where we had a plural resource followed by an individual identifier.</p>
            <pre><code>[account-id-1-path-consistency]
match = 'paths/~accounts~1{account_identifier1}'
rewrite = 'paths/~accounts~1{account_id}'

[account-id-camelcase-path-consistency]
match = 'paths/~accounts~1{accountId}'
rewrite = 'paths/~accounts~1{account_id}'

[placeholder-identifier-to-id]
match = ':[_~_identifier}]' # need the empty hole match here since we are using unbalanced }
rewrite = '_id}'

[route-consistency-for-resource-plurals]
match = ':[topic~/\w+/]{:[id~\w+]}'
rewrite = ':[topic]{:[id]}'
rule = 'where rewrite :[id] { :[x] -&gt; :[topic] }, rewrite :[id] { /:[x]s/ -&gt; :[x]_id }'

[property-identifier-to-id]
match = 'name: :[topic]_identifier'
rewrite = 'name: :[topic]_id'</code></pre>
            <p>For an interactive version of this configuration, check out the comby playground.</p><p>This approach worked for the majority of our internal changes. However, knowing how difficult migrations can be, we also wanted a tool that we could provide to customers to use for their own SDK migrations. In the past we’ve used <a href="https://registry.terraform.io/providers/cloudflare/cloudflare/latest/docs/guides/version-3-upgrade">comby for upgrades in the Terraform Provider</a> with great feedback. While comby is powerful, once you start using more complex expressions, the syntax can be difficult to understand unless you are familiar with it.</p><p>After looking around, we eventually found <a href="https://www.grit.io/">Grit</a>. It is a tool that does everything we need (including the custom CLI) while being very familiar to anyone that understands basic Javascript through a query language, known as <a href="https://docs.grit.io/tutorials/gritql">GritQL</a>. An added bonus here is that we are able to contribute to the <a href="https://docs.grit.io/patterns">Grit Pattern Library</a>, so our migrations are <a href="https://app.grit.io/studio?preset=cloudflare_go_v2&amp;key=nt_BGTed1mbXzuvxZ9n2q">only ever a single CLI invocation away</a> for anyone to use once they have the CLI installed.</p>
            <pre><code>// Migrate to the Golang v2 library 
grit apply cloudflare_go_v2</code></pre>
            
    <div>
      <h3>Consistency, consistency, consistency</h3>
      <a href="#consistency-consistency-consistency">
        
      </a>
    </div>
    <p>Did I mention consistency is important? Before attempting to feed your OpenAPI schemas into any system (especially a homegrown one), get them consistent with the practices, structures, and how you intend to represent them. This makes determining what is a bug in your generation pipeline vs a bug in your schema much easier. If it’s broken everywhere, it’s the generation pipeline, otherwise it’s an isolated bug to track down in your schema.</p><p>Having consistency leads into a better developer experience. From our examples above, if your routes always follow the plural resource name followed by an identifier, the end user doesn’t have to think about what the inputs need to be. The consistency and conventions lead them there – even if your documentation is lacking.</p>
    <div>
      <h3>Use shared $refs sparingly</h3>
      <a href="#use-shared-refs-sparingly">
        
      </a>
    </div>
    <p>It seems like a great idea for reusability at the time of writing them, but when overused, <code>$ref</code>s make finding correct values problematic and lead to <a href="https://en.wikipedia.org/wiki/Cargo_cult_programming">cargo cult</a> practices. In turn, this leads to lower quality and difficult-to-change schemas despite looking more usable from the outset. Consider the following schema example:</p>
            <pre><code>thing_base:
  type: object
  required:
    - id
  properties:
    updated_at:
      $ref: '#/components/schemas/thing_updated_at'
    created_at:
      $ref: '#/components/schemas/thing_updated_at'
    id:
      $ref: '#/components/schemas/thing_identifier'
      
thing_updated_at:
  type: string
  format: date-time
  description: When the resource was last updated.
  example: "2014-01-01T05:20:00Z"
  
thing_created_at:
  type: string
  format: date-time
  description: When the resource was created.
  example: "2014-01-01T05:20:00Z"

thing_id:
  type: string
  description: Unique identifier of the resource.
  example: "2014-01-01T05:20:00Z"</code></pre>
            <p>Did you spot the bug? Have another look at the <code>created_at</code> value. You likely didn’t catch it at first glance, but this is a common issue when needing to reference reusable values. Here, it is a minor annoyance as the documentation would be incorrect (<code>created_at</code> would have the description of <code>updated_at</code>), but in other cases, it could be a completely incorrect schema representation.</p><p>For us, the correct usage of <code>$ref</code> values is predominantly where you have potential for multiple component schemas that may be used as part of a <code>oneOf</code>, <code>allOf</code> or <code>anyOf</code> directive.</p>
            <pre><code>dns_record:
  oneOf:
    - $ref: '#/components/schemas/dns-records_ARecord'
    - $ref: '#/components/schemas/dns-records_AAAARecord'
    - $ref: '#/components/schemas/dns-records_CAARecord'
    - $ref: '#/components/schemas/dns-records_CERTRecord'
    - $ref: '#/components/schemas/dns-records_CNAMERecord'
    - $ref: '#/components/schemas/dns-records_DNSKEYRecord'
    - $ref: '#/components/schemas/dns-records_DSRecord'
    - $ref: '#/components/schemas/dns-records_HTTPSRecord'
    - $ref: '#/components/schemas/dns-records_LOCRecord'
    - $ref: '#/components/schemas/dns-records_MXRecord'
    - $ref: '#/components/schemas/dns-records_NAPTRRecord'
    - $ref: '#/components/schemas/dns-records_NSRecord'
    - $ref: '#/components/schemas/dns-records_PTRRecord'
    - $ref: '#/components/schemas/dns-records_SMIMEARecord'
    - $ref: '#/components/schemas/dns-records_SRVRecord'
    - $ref: '#/components/schemas/dns-records_SSHFPRecord'
    - $ref: '#/components/schemas/dns-records_SVCBRecord'
    - $ref: '#/components/schemas/dns-records_TLSARecord'
    - $ref: '#/components/schemas/dns-records_TXTRecord'
    - $ref: '#/components/schemas/dns-records_URIRecord'
  type: object
  required:
    - id
    - type
    - name
    - content
    - proxiable
    - created_on
    - modified_on</code></pre>
            <p>When in doubt, consider the <a href="https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">YAGNI principle</a> instead. You can always refactor and extract this later once you have enough uses to determine the correct abstraction.</p>
    <div>
      <h3>Design your ideal usage and work backwards</h3>
      <a href="#design-your-ideal-usage-and-work-backwards">
        
      </a>
    </div>
    <p>Before we wrote a single line of code to solve the problem of generation, we prepared language design documents for each of our target languages that followed the <a href="https://tom.preston-werner.com/2010/08/23/readme-driven-development.html">README-driven</a> design principles. This meant our focus from the initial design was on the usability of the library and not on the technical challenges that we would eventually encounter. This led us to identify problems and patterns early with how various language nuances would surface to the end user without investing in anything more than a document. Python keyword arguments, Go interfaces, how to enforce required parameters, client instantiation and overrides, types – all considerations made up front that helped minimize the number of unknowns as we built out support.</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>When we embarked on the OpenAPI journey, we knew it was only the beginning and would eventually open more doors and quality of life improvements for teams and customers alike. Now that we have a few language SDKs available, we’re turning our attention to generating our <a href="https://github.com/cloudflare/terraform-provider-cloudflare">Terraform Provider</a> using the same guiding principles to further minimize the maintenance burden. But that’s still not all. Coming later in 2024 are more improvements and integrations with other parts of the Cloudflare Developer Platform, so stay tuned.</p><p>If you haven’t already, check out one of the SDKs in <a href="https://github.com/cloudflare/cloudflare-go">Go</a>, <a href="https://github.com/cloudflare/cloudflare-typescript">Typescript</a> and <a href="https://github.com/cloudflare/cloudflare-python">Python</a> today. If you’d like support for a different language, go <a href="https://forms.gle/TPQw3eoiRbyQBfEv5">here</a> to submit your details to help determine the next language. We’d love to hear what languages you would like offered as a Cloudflare SDK.</p> ]]></content:encoded>
            <category><![CDATA[SDK]]></category>
            <guid isPermaLink="false">5EqaaHlqidaSycyCbmPRvk</guid>
            <dc:creator>Jacob Bednarz</dc:creator>
        </item>
        <item>
            <title><![CDATA[New tools for production safety — Gradual deployments, Source maps, Rate Limiting, and new SDKs]]></title>
            <link>https://blog.cloudflare.com/workers-production-safety/</link>
            <pubDate>Thu, 04 Apr 2024 13:05:00 GMT</pubDate>
            <description><![CDATA[ Today we are announcing five updates that put more power in your hands – Gradual Deployments, Source mapped stack traces in Tail Workers, a new Rate Limiting API, brand-new API SDKs, and updates to Durable Objects – each built with mission-critical production services in mind ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6HeyqozVOGygo2RCNnunIq/b429dc5b9f81c9fed6dfc0d200d296e5/image4-7.png" />
            
            </figure><p>2024’s Developer Week is all about production readiness. On Monday. April 1, we <a href="/making-full-stack-easier-d1-ga-hyperdrive-queues/">announced</a> that <a href="https://developers.cloudflare.com/d1/">D1</a>, <a href="https://developers.cloudflare.com/queues/">Queues</a>, <a href="https://developers.cloudflare.com/hyperdrive/">Hyperdrive</a>, and <a href="https://developers.cloudflare.com/analytics/analytics-engine/">Workers Analytics Engine</a> are ready for production scale and generally available. On Tuesday, April 2, we <a href="/workers-ai-ga-huggingface-loras-python-support">announced</a> the same about our inference platform, <a href="https://developers.cloudflare.com/workers-ai/">Workers AI</a>. And we’re not nearly done yet.</p><p>However, production readiness isn’t just about the scale and reliability of the services you build with. You also need tools to make changes safely and reliably. You depend not just on what Cloudflare provides, but on being able to precisely control and tailor how Cloudflare behaves to the needs of your application.</p><p>Today we are announcing five updates that put more power in your hands – Gradual Deployments, source mapped stack traces in Tail Workers, a new Rate Limiting API, brand-new API SDKs, and updates to Durable Objects – each built with mission-critical production services in mind. We build our own products using Workers, including <a href="https://developers.cloudflare.com/cloudflare-one/policies/access/">Access</a>, <a href="https://developers.cloudflare.com/r2/">R2</a>, <a href="https://developers.cloudflare.com/kv/">KV</a>, <a href="https://developers.cloudflare.com/waiting-room/">Waiting Room</a>, <a href="https://developers.cloudflare.com/vectorize/">Vectorize</a>, <a href="https://developers.cloudflare.com/queues/">Queues</a>, <a href="https://developers.cloudflare.com/stream/">Stream</a>, and more. We rely on each of these new features ourselves to ensure that we are production ready – and now we’re excited to bring them to everyone.</p>
    <div>
      <h3>Gradually deploy changes to Workers and Durable Objects</h3>
      <a href="#gradually-deploy-changes-to-workers-and-durable-objects">
        
      </a>
    </div>
    <p>Deploying a Worker is nearly instantaneous – a few seconds and your change is live <a href="https://www.cloudflare.com/network/">everywhere</a>.</p><p>When you reach production scale, each change you make carries greater risk, both in terms of volume and expectations. You need to meet your 99.99% availability SLA, or have an ambitious P90 latency SLO. A bad deployment that’s live for 100% of traffic for 45 seconds could mean millions of failed requests. A subtle code change could cause a thundering herd of retries to an overwhelmed backend, if rolled out all at once. These are the kinds of risks we consider and mitigate ourselves for our own services built on Workers.</p><p>The way to mitigate these risks is to deploy changes gradually – commonly called rolling deployments:</p><ol><li><p>The current version of your application runs in production.</p></li><li><p>You deploy the new version of your application to production, but only route a small percentage of traffic to this new version, and wait for it to “soak” in production, monitoring for regressions and bugs. If something bad happens, you’ve caught it early at a small percentage (e.g. 1%) of traffic and can revert quickly.</p></li><li><p>You gradually increment the percentage of traffic until the new version receives 100%, at which point it is fully rolled out.</p></li></ol><p>Today we’re opening up a first-class way to deploy code changes gradually to Workers and Durable Objects via the <a href="https://developers.cloudflare.com/api/operations/worker-deployments-list-deployments">Cloudflare API</a>, the <a href="https://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/#via-wrangler">Wrangler CLI</a>, or the <a href="https://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/#via-the-cloudflare-dashboard">Workers dashboard</a>. Gradual Deployments is entering open beta – you can use Gradual Deployments with any Cloudflare account that is on the <a href="https://developers.cloudflare.com/workers/platform/pricing/#workers">Workers Free plan</a>, and very soon you’ll be able to start using Gradual Deployments with Cloudflare accounts on the <a href="https://developers.cloudflare.com/workers/platform/pricing/#workers">Workers Paid</a> and Enterprise plans. You’ll see a banner on the Workers dashboard once your account has access.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5C2hc1EtfppDeWWDxJBh3K/3235bfa198e136bfac793f877415011d/pasted-image-0.png" />
            
            </figure><p>When you have two versions of your Worker or Durable Object running concurrently in production, you almost certainly want to be able to filter your metrics, exceptions, and logs by version. This can help you spot production issues early, when the new version is only rolled out to a small percentage of traffic, or compare performance metrics when splitting traffic 50/50. We’ve also added <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability</a> at a version level across our platform:</p><ul><li><p>You can filter analytics in the Workers dashboard and via the <a href="https://developers.cloudflare.com/analytics/graphql-api/">GraphQL Analytics API</a> by version.</p></li><li><p><a href="https://developers.cloudflare.com/workers/observability/logging/logpush/">Workers Trace Events</a> and <a href="https://developers.cloudflare.com/workers/observability/logging/tail-workers/">Tail Worker</a> events include the version ID of your Worker, along with optional version message and version tag fields.</p></li><li><p>When using <a href="https://developers.cloudflare.com/workers/wrangler/commands/#tail">wrangler tail</a> to view live logs, you can view logs for a specific version.</p></li><li><p>You can access version ID, message, and tag from within your Worker’s code, by configuring the <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/version-metadata/">Version Metadata binding</a>.</p></li></ul><p>You may also want to make sure that each client or user only sees a consistent version of your Worker. We’ve added <a href="https://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/#version-keys-and-session-affinity">Version Affinity</a> so that requests associated with a particular identifier (such as user, session, or any unique ID) are always handled by a consistent version of your Worker. <a href="https://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/#version-keys-and-session-affinity">Session Affinity</a>, when used with <a href="https://developers.cloudflare.com/workers/configuration/versions-and-deployments/gradual-deployments/#setting-cloudflare-workers-version-key-using-ruleset-engine">Ruleset Engine</a>, gives you full control over both the mechanism and identifier used to ensure “stickiness”.</p><p>Gradual Deployments is entering open beta. As we move towards GA, we’re working to support:</p><ul><li><p><b>Version Overrides.</b> Invoke a specific version of your Worker in order to test before it serves any production traffic. This will allow you to create Blue-Green Deployments.</p></li><li><p><b>Cloudflare Pages.</b> Let the <a href="https://www.cloudflare.com/learning/serverless/glossary/what-is-ci-cd/">CI/CD system</a> in Pages automatically progress the deployments on your behalf.</p></li><li><p><b>Automatic rollbacks.</b> Roll back deployments automatically when the error rate spikes for a new version of your Worker.</p></li></ul><p>We’re looking forward to hearing your feedback! Let us know what you think through <a href="https://www.cloudflare.com/lp/developer-week-deployments/">this</a> feedback form or reach out in our <a href="https://discord.gg/HJvPcPcN">Developer Discord</a> in the #workers-gradual-deployments-beta channel.</p>
    <div>
      <h3>Source mapped stack traces in Tail Workers</h3>
      <a href="#source-mapped-stack-traces-in-tail-workers">
        
      </a>
    </div>
    <p>Production readiness means tracking errors and exceptions, and trying to drive them down to zero. When an error occurs, the first thing you typically want to look at is the error’s <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Error/stack">stack trace</a> – the specific functions that were called, in what order, from which line and file, and with what arguments.</p><p>Most JavaScript code – not just on Workers, but across platforms – is first bundled, often transpiled, and then minified before being deployed to production. This is done behind the scenes to create smaller bundles to optimize performance and convert from Typescript to JavaScript if needed.</p><p>If you’ve ever seen an exception return a stack trace like: /src/index.js:1:342,it means the error occurred on the 342nd character of your function’s minified code. This is clearly not very helpful for debugging.</p><p><a href="https://web.dev/articles/source-maps">Source maps</a> solve this – they map compiled and minified code back to the original code that you wrote. Source maps are combined with the stack trace returned by the JavaScript runtime in order to present you with a human-readable stack trace. For example, the following stack trace shows that the Worker received an unexpected null value on line 30 of the down.ts file. This is a useful starting point for debugging, and you can move down the stack trace to understand the functions that were called that were set that resulted in the null value.</p>
            <pre><code>Unexpected input value: null
  at parseBytes (src/down.ts:30:8)
  at down_default (src/down.ts:10:19)
  at Object.fetch (src/index.ts:11:12)</code></pre>
            <p>Here’s how it works:</p><ol><li><p>When you set upload_source_maps = true in your <a href="https://developers.cloudflare.com/workers/wrangler/configuration/">wrangler.toml</a>, Wrangler will automatically generate and upload any source map files when you run <a href="https://developers.cloudflare.com/workers/wrangler/commands/#deploy">wrangler deploy</a> or <a href="https://developers.cloudflare.com/workers/wrangler/commands/#versions">wrangler versions upload</a>.</p></li><li><p>When your Worker throws an uncaught exception, we fetch the source map and use it to map the stack trace of the exception back to lines of your Worker’s original source code.</p></li><li><p>You can then view this deobfuscated stack trace in <a href="https://developers.cloudflare.com/workers/observability/logging/real-time-logs/">real-time logs</a> or in <a href="https://developers.cloudflare.com/workers/observability/logging/tail-workers/">Tail Workers</a>.</p></li></ol><p>Starting today, in open beta, you can upload source maps to Cloudflare when you deploy your Worker – <a href="https://developers.cloudflare.com/workers/observability/source-maps">get started by reading the docs</a>. And starting on April 15th , the Workers runtime will start using source maps to deobfuscate stack traces. We’ll post a notification in the Cloudflare dashboard and post on our <a href="https://twitter.com/CloudflareDev?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor">Cloudflare Developers X account</a> when source mapped stack traces are available.</p>
    <div>
      <h3>New Rate Limiting API in Workers</h3>
      <a href="#new-rate-limiting-api-in-workers">
        
      </a>
    </div>
    <p>An API is only production ready if it has a sensible <a href="https://www.cloudflare.com/learning/bots/what-is-rate-limiting/">rate limit</a>. And as you grow, so does the complexity and diversity of limits that you need to enforce in order to balance the needs of specific customers, protect the health of your service, or enforce and adjust limits in specific scenarios. Cloudflare’s own API has this challenge – each of our dozens of products, each with many API endpoints, may need to enforce different rate limits.</p><p>You’ve been able to configure <a href="https://developers.cloudflare.com/waf/rate-limiting-rules/">Rate Limiting rules</a> on Cloudflare since 2017. But until today, the only way to control this was in the Cloudflare dashboard or via the Cloudflare API. It hasn’t been possible to define behavior at <i>runtime</i>, or write code in a Worker that interacts directly with rate limits – you could only control whether a request is rate limited or not before it hits your Worker.</p><p>Today we’re introducing a new API, in open beta, that gives you direct access to rate limits from your Worker. It’s lightning fast, backed by memcached, and dead simple to add to your Worker. For example, the following configuration defines a rate limit of 100 requests within a 60-second period:</p>
            <pre><code>[[unsafe.bindings]]
name = "RATE_LIMITER"
type = "ratelimit"
namespace_id = "1001" # An identifier unique to your Cloudflare account

# Limit: the number of tokens allowed within a given period, in a single Cloudflare location
# Period: the duration of the period, in seconds. Must be either 60 or 10
simple = { limit = 100, period = 60 } </code></pre>
            <p>Then, in your Worker, you can call the limit method on the RATE_LIMITER binding, providing a key of your choosing. Given the configuration above, this code will return a <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429">HTTP 429</a> response status code once more than 100 requests to a specific path are made within a 60-second period:</p>
            <pre><code>export default {
  async fetch(request, env) {
    const { pathname } = new URL(request.url)

    const { success } = await env.RATE_LIMITER.limit({ key: pathname })
    if (!success) {
      return new Response(`429 Failure – rate limit exceeded for ${pathname}`, { status: 429 })
    }

    return new Response(`Success!`)
  }
}</code></pre>
            <p>Now that Workers can connect directly to a data store like memcached, what else could we provide? Counters? Locks? An <a href="https://github.com/cloudflare/workerd/pull/1666">in-memory cache</a>? Rate limiting is the first of many primitives that we’re exploring providing in Workers that address questions we’ve gotten for years about where a temporary shared state that spans many Worker <a href="https://developers.cloudflare.com/workers/reference/how-workers-works/#isolates">isolates</a> should live. If you rely on putting state in the global scope of your Worker today, we’re working on better primitives that are purpose-built for specific use cases.</p><p>The Rate Limiting API in Workers is in open beta, and you can get started by <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/rate-limit">reading the docs</a>.</p>
    <div>
      <h3>New auto-generated SDKs for Cloudflare’s API</h3>
      <a href="#new-auto-generated-sdks-for-cloudflares-api">
        
      </a>
    </div>
    <p>Production readiness means going from making changes by clicking buttons in a dashboard to making changes programmatically, using an infrastructure-as-code approach like <a href="https://github.com/cloudflare/terraform-provider-cloudflare">Terraform</a> or <a href="https://github.com/pulumi/pulumi-cloudflare">Pulumi</a>, or by making API requests directly, either on your own or via an SDK.</p><p>The <a href="https://developers.cloudflare.com/api/">Cloudflare API</a> is massive, and constantly adding new capabilities – on average we <a href="https://github.com/cloudflare/api-schemas/activity">update our API schemas between 20 and 30 times per day</a>. But to date, our API SDKs have been built and maintained manually, so we had a burning need to automate this.</p><p>We’ve done that, and today we’re announcing new client SDKs for the Cloudflare API in three languages – <a href="https://github.com/cloudflare/cloudflare-typescript">Typescript</a>, <a href="https://github.com/cloudflare/cloudflare-python">Python</a> and <a href="https://github.com/cloudflare/cloudflare-go">Go</a> – with more languages on the way.</p><p>Each SDK is generated automatically using <a href="https://www.stainlessapi.com/">Stainless API</a>, based on the <a href="https://github.com/cloudflare/api-schemas">OpenAPI schemas</a> that define the structure and capabilities of each of our API endpoints. This means that when we add any new functionality to the Cloudflare API, across any Cloudflare product, these API SDKs are automatically regenerated, and new versions are published, ensuring that they are correct and up-to-date.</p><p>You can install the SDKs by running one of the following commands:</p>
            <pre><code>// Typescript
npm install cloudflare

// Python
pip install cloudflare

// Go
go get -u github.com/cloudflare/cloudflare-go/v2</code></pre>
            <p>If you use Terraform or Pulumi, under the hood, Cloudflare’s Terraform Provider currently uses the existing, non-automated <a href="https://github.com/cloudflare/cloudflare-go">Go SDK</a>. When you run terraform apply, the Cloudflare Terraform Provider determines which API requests to make in what order, and executes these using the Go SDK.</p><p>The new, auto-generated Go SDK clears a path towards more comprehensive Terraform support for all Cloudflare products, providing a base set of tools that can be relied upon to be both correct and up-to-date with the latest API changes. We’re building towards a future where any time a product team at Cloudflare builds a new feature that is exposed via the Cloudflare API, it is automatically supported by the SDKs. Expect more updates on this throughout 2024.</p>
    <div>
      <h3>Durable Object namespace analytics and WebSocket Hibernation GA</h3>
      <a href="#durable-object-namespace-analytics-and-websocket-hibernation-ga">
        
      </a>
    </div>
    <p>Many of our own products, including <a href="https://developers.cloudflare.com/waiting-room/">Waiting Room</a>, <a href="https://developers.cloudflare.com/r2/">R2</a>, and <a href="https://developers.cloudflare.com/queues/">Queues</a>, as well as platforms like <a href="https://www.partykit.io/">PartyKit</a>, are built using <a href="https://developers.cloudflare.com/durable-objects/">Durable Objects</a>. Deployed globally, including newly added support for Oceania, you can think of Durable Objects like singleton Workers that can provide a single point of coordination and <a href="https://developers.cloudflare.com/durable-objects/api/transactional-storage-api/">persist state</a>. They’re perfect for applications that need real-time user coordination, like interactive chat or collaborative editing. Take Atlassian’s word for it:</p><blockquote><p><i>One of our new capabilities is</i> <a href="https://www.atlassian.com/software/confluence/whiteboards"><i>Confluence whiteboards</i></a><i>, which provides a freeform way to capture unstructured work like brainstorming and early planning before teams document it more formally. The team considered many options for real-time collaboration and ultimately decided to use Cloudflare’s Durable Objects. Durable Objects have proven to be a fantastic fit for this problem space, with a unique combination of functionalities that has allowed us to greatly simplify our infrastructure and easily scale to a large number of users. -</i> <a href="https://www.atlassian.com/software/confluence/whiteboards"><i>Atlassian</i></a></p></blockquote><p>We haven’t previously exposed associated analytical trends in the dashboard, making it hard to understand the usage patterns and error rates within a <a href="https://developers.cloudflare.com/durable-objects/configuration/access-durable-object-from-a-worker/#generate-ids-randomly">Durable Objects namespace</a> unless you used the <a href="https://developers.cloudflare.com/analytics/graphql-api/">GraphQL Analytics API</a> directly. The <a href="https://dash.cloudflare.com/?to=/:account/workers/durable-objects">Durable Objects dashboard</a> has now been revamped, letting you drill down into metrics, and go as deep as you need.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2wFlllqLKz9G1J7ZU4cAfp/67b84ff52331c449bfbbb291fec01ffa/pasted-image-0--1-.png" />
            
            </figure><p>From <a href="/introducing-workers-durable-objects">day one</a>, Durable Objects have supported <a href="https://developer.mozilla.org/en-US/docs/Web/API/WebSocket">WebSockets</a>, allowing many clients to directly connect to a Durable Object to send and receive messages.</p><p>However, sometimes client applications open a WebSocket connection and then eventually stop doing...anything. Think about that tab you’ve had sitting open in your browser for the last 5 hours, but haven’t touched. If it uses WebSockets to send and receive messages, it effectively has a long-lived TCP connection that isn’t being used for anything. If this connection is to a Durable Object, the Durable Object must stay running, waiting for something to happen, consuming memory, and costing you money.</p><p>We first <a href="/workers-pricing-scale-to-zero">introduced WebSocket Hibernation</a> to solve this problem, and today we’re announcing that this feature is out of beta and is Generally Available. With WebSocket Hibernation, you set an automatic response to be used while hibernating and serialize state such that it survives hibernation. This gives Cloudflare the inputs we need in order to maintain open WebSocket connections from clients while “hibernating” the Durable Object such that it is not actively running, and you are not billed for idle time. The result is that your state is always available in-memory when you actually need it, but isn’t unnecessarily kept around when it’s not. As long as your Durable Object is hibernating, even if there are active clients still connected over a WebSocket, you won’t be billed for duration.</p><p>In addition, we’ve heard developer feedback on the costs of incoming WebSocket messages to Durable Objects, which favor smaller, more frequent messages for real-time communication. Starting today incoming WebSocket messages will be billed at the equivalent of 1/20th of a request (as opposed to 1 message being the equivalent of 1 request as it has been up until now). Following a <a href="https://developers.cloudflare.com/durable-objects/platform/pricing/#example-4">pricing example</a>:</p>
<table>
<thead>
  <tr>
    <th></th>
    <th><span>WebSocket Connection Requests</span></th>
    <th><span>Incoming WebSocket Messages</span></th>
    <th><span>Billed Requests</span></th>
    <th><span>Request Billing</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>Before</span></td>
    <td><span>10K</span></td>
    <td><span>432M</span></td>
    <td><span>432,010,000</span></td>
    <td><span>$64.65</span></td>
  </tr>
  <tr>
    <td><span>After</span></td>
    <td><span>10K</span></td>
    <td><span>432M</span></td>
    <td><span>21,610,000</span></td>
    <td><span>$3.09</span></td>
  </tr>
</tbody>
</table>
    <div>
      <h3>Production ready, without production complexity</h3>
      <a href="#production-ready-without-production-complexity">
        
      </a>
    </div>
    <p>Becoming production ready on the last generation of cloud platforms meant slowing down how fast you shipped. It meant stitching together many disconnected tools or standing up whole teams to work on internal platforms. You had to retrofit your own productivity layers onto platforms that put up roadblocks.</p><p>The Cloudflare Developer Platform is grown up and production ready, and committed to being an integrated platform where products intuitively work together and where there aren’t 10 ways to do the same thing, with no need for a compatibility matrix to help understand what works together. Each of these updates shows this in action, integrating new functionality across products and parts of Cloudflare’s platform.</p><p>To that end, we want to hear from you about not only what you want to see next, but where you think we could be even simpler, or where you think our products could work better together. Tell us where you think we could do more – the <a href="https://discord.cloudflare.com/">Cloudflare Developers Discord</a> is always open.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Rate Limiting]]></category>
            <category><![CDATA[SDK]]></category>
            <category><![CDATA[Observability]]></category>
            <guid isPermaLink="false">2IHoIDHRhxpNxOfd1ihQ0Y</guid>
            <dc:creator>Tanushree Sharma</dc:creator>
            <dc:creator>Jacob Bednarz</dc:creator>
        </item>
    </channel>
</rss>