
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Thu, 09 Apr 2026 19:44:15 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Announcing Workers automatic tracing, now in open beta]]></title>
            <link>https://blog.cloudflare.com/workers-tracing-now-in-open-beta/</link>
            <pubDate>Tue, 28 Oct 2025 12:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare Workers' support for automatic tracing is now in open beta! Export traces to any OpenTelemetry-compatible provider for deeper application observability -- no code changes required ]]></description>
            <content:encoded><![CDATA[ <p></p><p>When your Worker slows down or starts throwing errors, finding the root cause shouldn't require hours of log analysis and trial-and-error debugging. You should have clear visibility into what's happening at every step of your application's request flow. This is feedback we’ve heard loud and clear from developers using Workers, and today we’re excited to announce an Open Beta for tracing on <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Cloudflare Workers</u></a>! You can now:  </p><ul><li><p><b>Get automatic instrumentation for applications on the Workers platform: </b>No manual setup, complex instrumentation, or code changes. It works out of the box. </p></li><li><p><b>Explore and investigate traces in the Cloudflare dashboard:</b> Your traces are processed and available in the Workers Observability dashboard alongside your existing logs.</p></li><li><p><b>Export logs and traces to OpenTelemetry-compatible providers:</b> Send OpenTelemetry traces (and correlated logs) to your observability provider of choice. </p></li></ul><p>In 2024, <a href="https://blog.cloudflare.com/cloudflare-acquires-baselime-expands-observability-capabilities/"><u>we set out to build</u></a> the best first-party <a href="https://www.cloudflare.com/developer-platform/products/workers-observability/"><u>observability</u></a> of any cloud platform. We launched a new metrics dashboard to give better insights into how your Worker is performing, <a href="https://developers.cloudflare.com/workers/observability/logs/workers-logs/#enable-workers-logs"><u>Workers Logs</u></a> to automatically ingest and store logs for your Workers, a <a href="https://developers.cloudflare.com/workers/observability/query-builder/"><u>query builder</u></a> to explore your data across any dimension and <a href="https://developers.cloudflare.com/workers/observability/logs/real-time-logs/"><u>real-time logs</u></a> to stream your logs in real time with advanced filtering capabilities. Starting today, you can get an even deeper understanding of your Workers applications by enabling automatic <b>tracing</b>!</p>
    <div>
      <h3>What is Workers Tracing? </h3>
      <a href="#what-is-workers-tracing">
        
      </a>
    </div>
    <p>Workers traces capture and emit OpenTelemetry-compliant spans to show you detailed metadata and timing information on every operation your Worker performs.<b> </b>It helps you identify performance bottlenecks, resolve errors, and understand how your Worker interacts with other services on the Workers platform. You can now answer questions like:</p><ul><li><p>Which calls are slowing down my application?</p></li><li><p>Which queries to my database take the longest? </p></li><li><p>What happened within a request that resulted in an error?</p></li></ul><p>Tracing provides a visualization of each invocation's journey through various operations. Each operation is captured as a span, a timed segment that shows what happened and how long it took. Child spans nest within parent spans to show sub-operations and dependencies, creating a hierarchical view of your invocation’s execution flow. Each span can include contextual metadata or attributes that provide details for debugging and filtering events.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4F7l3WSJ2hY0eu6kX47Rdp/c7c3934e9abbbb1f01ec979941d35b54/unnamed.png" />
          </figure>
    <div>
      <h3>Full automatic instrumentation, no code changes </h3>
      <a href="#full-automatic-instrumentation-no-code-changes">
        
      </a>
    </div>
    <p>Previously, instrumenting your application typically required an understanding of the <a href="https://opentelemetry.io/docs/specs/"><u>OpenTelemetry spec</u></a>, multiple <a href="https://opentelemetry.io/docs/concepts/instrumentation/libraries/"><u>OTel libraries</u></a>, and how they related to each other. Implementation was tedious and bloated your codebase with instrumentation code that obfuscated your application logic.</p><p>Setting up tracing typically meant spending hours integrating third-party SDKs, wrapping every database call and API request with instrumentation code, and debugging complex config files before you saw a single trace. This implementation overhead often makes observability an afterthought, leaving you without full visibility in production when issues arise.</p><p>What makes Workers Tracing truly magical is it’s <b>completely automatic – no set up, no code changes, no wasted time. </b>We took the approach of automatically instrumenting every I/O operation in your Workers, through a deep integration in <a href="https://github.com/cloudflare/workerd"><u>workerd</u></a>, our runtime, enabling us to capture the full extent of data flows through every invocation of your Workers.</p><p>You focus on your application logic. We take care of the instrumentation.</p>
    <div>
      <h4>What you can trace today</h4>
      <a href="#what-you-can-trace-today">
        
      </a>
    </div>
    <p>The operations covered today are: </p><ul><li><p><b>Binding calls:</b> Interactions with various <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/"><u>Worker bindings</u></a>. KV reads and writes, <a href="https://www.cloudflare.com/developer-platform/products/r2/">R2 object storage</a> operations, Durable Object invocations, and many more binding calls are automatically traced. This gives you complete visibility into how your Worker uses other services.</p></li><li><p><b>Fetch calls:</b> All outbound HTTP requests are automatically instrumented, capturing timing, status codes, and request metadata. This enables you to quickly identify which external dependencies are affecting your application's performance.</p></li><li><p><b>Handler calls:</b> Methods on a Worker that can receive and process external inputs, such as <a href="https://developers.cloudflare.com/workers/runtime-apis/handlers/fetch/"><u>fetch handlers</u></a>, <a href="https://developers.cloudflare.com/workers/runtime-apis/handlers/scheduled/"><u>scheduled handlers</u></a>, and <a href="https://developers.cloudflare.com/queues/configuration/javascript-apis/#consumer"><u>queue handlers</u></a>. This gives you visibility into performance of how your Worker is being invoked.</p></li></ul>
    <div>
      <h4>Automatic attributes on every span </h4>
      <a href="#automatic-attributes-on-every-span">
        
      </a>
    </div>
    <p>Our automated instrumentation captures each operation as a span. For example, a span generated by an R2 binding call (like a <code>get</code> or <code>put</code> operation) will automatically contain any available attributes, such as the operation type, the error if applicable, the object key, and duration. These detailed attributes provide the context you need to answer precise questions about your application without needing to manually log every detail.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6n5j0AdsgouMuHacgrTW9Q/60e4f73dd66a591ee666d6acdbaafc03/unnamed2.png" />
          </figure><p>We will continue to add more detailed attributes to spans and add the ability to trace an invocation across multiple Workers or external services. Our <a href="http://developers.cloudflare.com/workers/observability/traces/spans-and-attributes/"><u>documentation</u></a> contains a complete list of all instrumented spans and their attributes.</p>
    <div>
      <h3>Investigate traces in the Workers dashboard</h3>
      <a href="#investigate-traces-in-the-workers-dashboard">
        
      </a>
    </div>
    <p>You can easily<a href="http://developers.cloudflare.com/workers/observability/traces/"><u> view traces directly within a specific Worker application</u></a> in the Cloudflare dashboard, giving you immediate visibility into your application's performance. You’ll find a list of all trace events within your desired time frame and a trace visualization of each invocation including duration of each call and any available attributes. You can also query across all Workers on your account, letting you pinpoint issues occurring on multiple applications. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7HU4vktx7zr3aflWK9cdsl/73dcd1f1f9702faa65430fec9c9da8c5/unnamed_3.png" />
          </figure><p>To get started viewing traces on your Workers application, you can set: </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5e7hIAgVnjxvZSshwBTpXf/a0571af6e43be414c85e25dba106848c/1.png" />
          </figure>
    <div>
      <h3>Export traces to OpenTelemetry compatible providers </h3>
      <a href="#export-traces-to-opentelemetry-compatible-providers">
        
      </a>
    </div>
    <p>However, we realize that some development teams need Workers data to live alongside other telemetry data in the <b>tools they are already using</b>. That’s why we’re also adding tracing exports, letting your team send, visualize and query data with your existing observability stack! Starting today, you can export traces directly to providers like <a href="https://www.honeycomb.io/"><u>Honeycomb</u></a>, <a href="https://grafana.com/"><u>Grafana</u></a>, <a href="https://sentry.io/welcome/"><u>Sentry</u></a> or any other <a href="http://developers.cloudflare.com/workers/observability/exporting-opentelemetry-data/#available-opentelemetry-destinations"><u>OpenTelemetry Protocol (OTLP) provider with an available endpoint</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3GdiHZEL1bIpGn5BRhbyyT/0fbb6610a488f5f2c44f78ba3ecaf576/Export_traces_to_OpenTelemetry_compatible_providers_.png" />
          </figure>
    <div>
      <h4>Correlated logs and traces </h4>
      <a href="#correlated-logs-and-traces">
        
      </a>
    </div>
    <p>We also support exporting OTLP-formatted logs that share the same trace ID, enabling third-party platforms to automatically correlate log entries with their corresponding traces. This lets you easily jump between spans and related log messages.</p>
    <div>
      <h4>Set up your destination, enable exports, and go! </h4>
      <a href="#set-up-your-destination-enable-exports-and-go">
        
      </a>
    </div>
    <p>To start sending events to your destination of choice, first, configure your OTLP endpoint destination in the Cloudflare dashboard. For every destination you can specify a custom name and set custom headers to include API keys or app configuration. </p><p>Once you have your destination set up (e.g. <code>honeycomb-tracing</code>), set the following in your <code>wrangler.jsonc </code>and deploy: </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2xUbPsz307PvkZ6i15iUKe/452efaad0afdbeb2722f42726d0a849b/3.png" />
          </figure>
    <div>
      <h3>Coming up for Workers observability</h3>
      <a href="#coming-up-for-workers-observability">
        
      </a>
    </div>
    <p>This is just the beginning of Workers providing the workflows and tools to get you the telemetry data you want, where you want it. We’re improving our support both for native tracing in the dashboard and for exporting other types of telemetry to 3rd parties. In the upcoming months we’ll be launching: </p><ul><li><p><b>Support for more spans and attributes: </b>We are adding more automatic traces for every part of the Workers platform. While our first goal is to give you visibility into the duration of every operation within your request, we also want to add detailed attributes. Your feedback on what’s missing will be extremely valuable here. </p></li><li><p><b>Trace context propagation: </b>When building<b> </b><b><i>distributed</i></b> applications, ensuring your traces connect across all of your services (even those outside of Cloudflare), automatically linking spans together to create complete, end-to-end visibility is critical. For example, a trace from Workers could be nested from a parent service or vice versa. When fully implemented, our automatic trace context propagation will follow <a href="https://www.w3.org/TR/trace-context/"><u>W3C standards</u></a> to ensure compatibility across your existing tools and services. </p></li><li><p><b>Support for custom spans and attributes</b>: While automatic instrumentation gives you visibility into what’s happening within the Workers platform, we know you need visibility into your own application logic too. So, we’ll give you the ability to manually add your own spans as well.</p></li><li><p><b>Ability to export metrics: </b>Today, metrics, logs and traces are available for you to monitor and view within the Workers dashboard. But the final missing piece is giving you the ability to export both infrastructure metrics (like request volume, error rates, and execution duration) and custom application metrics to your preferred observability provider.</p></li></ul>
    <div>
      <h3>What you can expect from tracing pricing  </h3>
      <a href="#what-you-can-expect-from-tracing-pricing">
        
      </a>
    </div>
    <p>Today, at the start of beta, viewing traces in the Cloudflare dashboard and exporting traces to a 3rd party provider are both free. On <b>January 15, 2026</b>, tracing and log events will be charged the following pricing:</p><p><b>Viewing Workers traces in the Cloudflare dashboard</b></p><p>To view traces in the Cloudflare dashboard, you can do so on a <a href="https://www.cloudflare.com/plans/developer-platform/"><u>Workers Free and Paid plan</u></a> at the pricing shown below:</p>
<div><table><thead>
  <tr>
    <th></th>
    <th><span>Workers Free</span></th>
    <th><span>Workers Paid</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Included Volume </span></td>
    <td><span>200K events per day</span></td>
    <td><span>20M events per month </span></td>
  </tr>
  <tr>
    <td><span>Additional Events </span></td>
    <td><span>N/A</span></td>
    <td><span>$0.60 per million logs </span></td>
  </tr>
  <tr>
    <td><span>Retention </span></td>
    <td><span>3 days </span></td>
    <td><span>7 days </span></td>
  </tr>
</tbody></table></div><p><b>Exporting traces and logs </b></p><p>To export traces to a 3rd-party OTLP-compatible destination, you will need a <b>Workers Paid </b>subscription. Pricing is based on total span or log events with the following inclusions:</p>
<div><table><thead>
  <tr>
    <th></th>
    <th><span>Workers Free</span></th>
    <th><span>Workers Paid</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Events  </span></td>
    <td><br /><span>Not available</span></td>
    <td><span>10 million events per month </span></td>
  </tr>
  <tr>
    <td><span>Additional events </span></td>
    <td><span>$0.05 per million batched events</span></td>
  </tr>
</tbody></table></div>
    <div>
      <h3>Enable tracing today</h3>
      <a href="#enable-tracing-today">
        
      </a>
    </div>
    <p>Ready to get started with tracing on your Workers application? </p><ul><li><p><b>Check out our </b><a href="http://developers.cloudflare.com/workers/observability/traces/"><b><u>documentation</u></b></a><b>: </b>Learn how to get set up, read about current limitations and discover more about what’s coming up. </p></li><li><p><b>Join the chatter in our </b><a href="https://github.com/cloudflare/workers-sdk/discussions/11062"><b>GitHub discussion</b></a><b>:</b> Your feedback will be extremely valuable in our beta period on our automatic instrumentation, tracing dashboard, and OpenTelemetry export flow. Head to our GitHub discussion to raise issues, put in feature requests and get in touch with us!</p></li></ul><p></p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Observability]]></category>
            <category><![CDATA[Tracing]]></category>
            <category><![CDATA[OpenTelemetry ]]></category>
            <guid isPermaLink="false">2Np8UAH0AuW7KjeIwym0NY</guid>
            <dc:creator>Nevi Shah</dc:creator>
            <dc:creator>Boris Tane</dc:creator>
            <dc:creator>Jeremy Morrell</dc:creator>
        </item>
        <item>
            <title><![CDATA[Moving Baselime from AWS to Cloudflare: simpler architecture, improved performance, over 80% lower cloud costs]]></title>
            <link>https://blog.cloudflare.com/80-percent-lower-cloud-cost-how-baselime-moved-from-aws-to-cloudflare/</link>
            <pubDate>Thu, 31 Oct 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ Post-acquisition, we migrated Baselime from AWS to the Cloudflare Developer Platform and in the process, we improved query times, simplified data ingestion, and now handle far more events, all while cutting costs. Here’s how we built a modern, high-performing observability platform on Cloudflare’s network.  ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Introduction</h2>
      <a href="#introduction">
        
      </a>
    </div>
    <p>When <a href="https://blog.cloudflare.com/cloudflare-acquires-baselime-expands-observability-capabilities/"><u>Baselime joined Cloudflare</u></a> in April 2024, our architecture had evolved to hundreds of AWS Lambda functions, dozens of databases, and just as many queues. We were drowning in complexity and our cloud costs were growing fast. We are now building <a href="https://baselime.io/"><u>Baselime</u></a> and <a href="https://developers.cloudflare.com/workers/observability/logs/workers-logs/"><u>Workers Observability</u></a> on Cloudflare and will save over 80% on our cloud compute bill. The estimated potential Cloudflare costs are for Baselime, which remains a stand-alone offering, and the estimate is based on the <a href="https://developers.cloudflare.com/workers/platform/pricing/"><u>Workers Paid plan</u></a>. Not only did we achieve huge cost savings, we also simplified our architecture and improved overall latency, scalability, and reliability.</p><table><tr><td><p><b>Cost (daily)</b></p></td><td><p><b>Before (AWS)</b></p></td><td><p><b>After (Cloudflare)</b></p></td></tr><tr><td><p>Compute</p></td><td><p>$650 - AWS Lambda</p></td><td><p>$25 - Cloudflare Workers</p></td></tr><tr><td><p>CDN</p></td><td><p>$140 - Cloudfront</p></td><td><p>$0 - Free</p></td></tr><tr><td><p>Data Stream + Analytics database</p></td><td><p>$1,150 - Kinesis Data Stream + EC2</p></td><td><p>$300 - Workers Analytics Engine</p></td></tr><tr><td><p>Total (daily)</p></td><td><p>$1,940</p></td><td><p>$325</p></td></tr><tr><td><p><b>Total (annual)</b></p></td><td><p><b>$708,100</b></p></td><td><p><b>$118,625</b> (83% cost reduction)</p></td></tr></table><p><sub><i>Table 1: AWS vs. Workers Costs Comparison ($USD)</i></sub></p><p>When we joined Cloudflare, we immediately saw a surge in usage, and within the first week following the announcement, we were processing over a billion events daily and our weekly active users tripled.</p><p>As the platform grew, so did the challenges of managing real-time observability with new scalability, reliability, and cost considerations. This drove us to rebuild Baselime on the Cloudflare Developer Platform, where we could innovate quickly while reducing operational overhead.</p>
    <div>
      <h2>Initial architecture — all on AWS</h2>
      <a href="#initial-architecture-all-on-aws">
        
      </a>
    </div>
    <p>Our initial architecture was all on Amazon Web Services (AWS). We’ll focus here on the data pipeline, which covers ingestion, processing, and storage of tens of billions of events daily.</p><p>This pipeline was built on top of AWS Lambda, Cloudfront, Kinesis, EC2, DynamoDB, ECS, and ElastiCache.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ruQuVe7kcd31FOS91nstS/9d422e751e0d2d3a90d603e6adcd5c48/image1.png" />
          </figure><p><sup><i>Figure1: Initial data pipeline architecture</i></sup></p><p>The key elements are:</p><ul><li><p><b>Data receptors</b>: Responsible for receiving telemetry data from multiple sources, including OpenTelemetry, Cloudflare Logpush, CloudWatch, Vercel, etc. They cover validation, authentication, and transforming data from each source into a common internal format. The data receptors were deployed either on AWS Lambda (using function URLs and Cloudfront) or ECS Fargate depending on the data source.</p></li><li><p><b>Kinesis Data Stream</b>: Responsible for transporting the data from the receptors to the next step: data processing.</p></li><li><p><b>Processor</b>: A single AWS Lambda function responsible for enriching and transforming the data for storage. It also performed real-time error tracking and detecting patterns in logs.</p></li><li><p><b>ClickHouse cluster</b>: All the telemetry data was ultimately indexed and stored in a self-hosted ClickHouse cluster on EC2.</p></li></ul><p>In addition to these key elements, the existing stack also included orchestration with Firehose, S3 buckets, SQS, DynamoDB and RDS for error handling, retries, and storing metadata.</p><p>While this architecture served us well in the early days, it started to show major cracks as we scaled our solution to more and larger customers.</p><p>Handling retries at the interface between the data receptors and the Kinesis Data Stream was complex, requiring introducing and orchestrating Firehose, S3 buckets, SQS, and another Lambda function.</p><p>Self-hosting ClickHouse also introduced major challenges at scale, as we continuously had to plan our capacity and update our setup to keep pace with our growing user base whilst attempting to maintain control over costs.</p><p>Costs began scaling unpredictably with our growing workloads, especially in AWS Lambda, Kinesis, and EC2, but also in less obvious ways, such as in Cloudfront (required for a custom domain in front of Lambda function URLs) and DynamoDB. Specifically, the time spent on I/O operations in AWS Lambda was a particularly costly piece. At every step, from the data receptors to the ClickHouse cluster, moving data to the next stage required waiting for a network request to complete, accounting for over 70% of wall time in the Lambda function.</p><p>In a nutshell, we were continuously paged by our alerts, innovating at a slower pace, and our costs were out of control.</p><p>Additionally, the entire solution was deployed in a single AWS region: eu-west-1. As a result, all developers located outside continental Europe were experiencing high latency when emitting logs and traces to Baselime. </p>
    <div>
      <h2>Modern architecture — transitioning to Cloudflare</h2>
      <a href="#modern-architecture-transitioning-to-cloudflare">
        
      </a>
    </div>
    <p>The shift to the <a href="https://www.cloudflare.com/en-gb/developer-platform/products/"><u>Cloudflare Developer Platform</u></a> enabled us to rethink our architecture to be exceptionally fast, globally distributed, and highly scalable, without compromising on cost, complexity, or agility. This new architecture is built on top of Cloudflare primitives.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/16ndcTUS2vAg6TM4djTGUH/0e187d50ae466c275839c6aac91e5249/image5.png" />
          </figure><p><sup><i>Figure 2: Modern data pipeline architecture</i></sup></p>
    <div>
      <h3>Cloudflare Workers: the core of Baselime</h3>
      <a href="#cloudflare-workers-the-core-of-baselime">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/developer-platform/workers/"><u>Cloudflare Workers</u></a> are now at the core of everything we do. All the data receptors and the processor run in Workers. Workers minimize cold-start times and are deployed globally by default. As such, developers always experience lower latency when emitting events to Baselime.</p><p>Additionally, we heavily use <a href="https://blog.cloudflare.com/javascript-native-rpc/"><u>JavaScript-native RPC</u></a> for data transfer between steps of the pipeline. It’s low-latency, lightweight, and simplifies communication between components. This further simplifies our architecture, as separate components behave more as functions within the same process, rather than completely separate applications.</p>
            <pre><code>export default {
  async fetch(request: Request, env: Bindings, ctx: ExecutionContext): Promise&lt;Response&gt; {
      try {
        const { err, apiKey } = auth(request);
        if (err) return err;

        const data = {
          workspaceId: apiKey.workspaceId,
          environmentId: apiKey.environmentId,
          events: request.body
        };
        await env.PROCESSOR.ingest(data);

        return success({ message: "Request Accepted" }, 202);
      } catch (error) {
        return failure({ message: "Internal Error" });
      }
  },
};</code></pre>
            <p><sup><i>Code Block 1: Simplified data receptor using JavaScript-native RPC to execute the processor.</i></sup></p><p>Workers also expose a <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/rate-limit/"><u>Rate Limiting binding</u></a> that enables us to automatically add rate limiting to our services, which we previously had to build ourselves using a combination of DynamoDB and ElastiCache.</p><p>Moreover, we heavily use <code>ctx.waitUntil</code> within our Worker invocations, to offload data transformation outside the request / response path. This further reduces the latency of calls developers make to our data receptors.</p>
    <div>
      <h3>Durable Objects: stateful data processing</h3>
      <a href="#durable-objects-stateful-data-processing">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/en-gb/developer-platform/durable-objects/"><u>Durable Objects</u></a> is a unique service within the Cloudflare Developer Platform, as it enables building stateful applications in a serverless environment. We use Durable Objects in the data pipelines for both real-time error tracking and detecting log patterns.</p><p>For instance, to track errors in real-time, we create a durable object for each new type of error, and this durable object is responsible for keeping track of the frequency of the error, when to notify customers, and the notification channels for the error. <b>This implementation with a single building block removes the need for ElastiCache, Kinesis, and multiple Lambda functions to coordinate protecting the RDS database from being overwhelmed by a high frequency error.</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/14SC0ackxCGiRxAr8DY1Vs/c5729735296552013765c298af802b38/image4.png" />
          </figure><p><sup><i>Figure 3: Real-time error detection architecture comparison</i></sup></p><p>Durable Objects gives us precise control over consistency and concurrency of managing state in the data pipeline.</p><p>In addition to the data pipeline, we use Durable Objects for alerting. Our previous architecture required orchestrating EventBridge Scheduler, SQS, DynamoDB and multiple AWS Lambda functions, whereas with Durable Objects, everything is handled within the <code>alarm</code> handler. </p>
    <div>
      <h3>Workers Analytics Engine: high-cardinality analytics at scale</h3>
      <a href="#workers-analytics-engine-high-cardinality-analytics-at-scale">
        
      </a>
    </div>
    <p>Though managing our own ClickHouse cluster was technically interesting and challenging, it took us away from building the best observability developer experience. With this migration, more of our time is spent enhancing our product and none is spent managing server instances.</p><p><a href="https://developers.cloudflare.com/analytics/analytics-engine/"><u>Workers Analytics Engine</u></a> lets us synchronously write events to a scalable high-cardinality analytics database. We built on top of the same technology that powers Workers Analytics Engine. We also made internal changes to Workers Analytics Engine to natively enable high dimensionality in addition to high cardinality.</p><p>Moreover, Workers Analytics Engine and our solution leverages <a href="https://blog.cloudflare.com/explaining-cloudflares-abr-analytics/"><u>Cloudflare’s ABR analytics</u></a>. ABR stands for Adaptive Bit Rate, and enables us to store telemetry data in multiple tables with varying resolutions, from 100% to 0.0001% of the data. Querying the table with 0.0001% of the data will be several orders of magnitudes faster than the table with all the data, with a corresponding trade-off in accuracy. As such, when a query is sent to our systems, Workers Analytics Engine dynamically selects the most appropriate table to run the query, optimizing both query time and accuracy. Users always get the most accurate result with optimal query time, regardless of the size of their dataset or the timeframe of the query. Compared to our previous system, which was always running queries on the full dataset, the new system now delivers faster queries across our entire user base and use cases<i>.</i></p><p>In addition to these core services (Workers, Durable Objects, Workers Analytics Engine), the new architecture leverages other building blocks from the Cloudflare Developer Platform. <a href="https://www.cloudflare.com/en-gb/developer-platform/products/cloudflare-queues/"><u>Queues</u></a> for asynchronous messaging, decoupling services and enabling an event-driven architecture; <a href="https://www.cloudflare.com/en-gb/developer-platform/d1/"><u>D1</u></a> as our main database for transactional data (queries, alerts, dashboards, configurations, etc.); <a href="https://www.cloudflare.com/en-gb/developer-platform/workers-kv/"><u>Workers KV</u></a> for fast distributed storage; <a href="https://hono.dev/"><u>Hono</u></a> for all our APIs, etc.</p>
    <div>
      <h2>How did we migrate?</h2>
      <a href="#how-did-we-migrate">
        
      </a>
    </div>
    <p>Baselime is built on an event-driven architecture, where every user action triggers an event. It operates on the principle that every user action is recorded as an event and emitted to the rest of the system — whether it’s creating a user, editing a dashboard, or performing any other action. Migrating to Cloudflare involved transitioning our event-driven architecture without compromising uptime and data consistency. Previously, this was powered by AWS EventBridge and SQS, and we moved entirely to Cloudflare Queues.</p><p>We followed the <a href="https://martinfowler.com/bliki/StranglerFigApplication.html"><u>strangler fig pattern</u></a> to incrementally migrate the solution from AWS to Cloudflare. It consists of gradually replacing specific parts of the system with newer services, with minimal disruption to the system. Early in the process, we created a central Cloudflare Queue which acted as the backbone for all transactional event processing during the migration. Every event, whether a new user signup or a dashboard edit, was funneled into this Queue. From there, events were dynamically routed, each event to the relevant part of the application. User actions were synced into D1 and KV, ensuring that all user actions were mirrored across both AWS and Cloudflare during the transition.</p><p>This syncing mechanism enabled us to maintain consistency and ensure that no data was lost as users continued to interact with Baselime.</p><p>Here's an example of how events are processed:</p>
            <pre><code>export default {
  async queue(batch, env) {
    for (const message of batch.messages) {
      try {
        const event = message.body;
        switch (event.type) {
          case "WORKSPACE_CREATED":
            await workspaceHandler.create(env, event.data);
            break;
          case "QUERY_CREATED":
            await queryHandler.create(env, event.data);
            break;
          case "QUERY_DELETED":
            await queryHandler.remove(env, event.data);
            break;
          case "DASHBOARD_CREATED":
            await dashboardHandler.create(env, event.data);
            break;
          //
          // Many more events...
          //
          default:
            logger.info("Matched no events", { type: event.type });
        }
        message.ack();
      } catch (e) {
        if (message.attempts &lt; 3) {
          message.retry({ delaySeconds: Math.ceil(30 ** message.attempts / 10), });
        } else {
          logger.error("Failed handling event - No more retrys", { event: message.body, attempts: message.attempts }, e);
        }
      }
    }
  },
} satisfies ExportedHandler&lt;Env, InternalEvent&gt;;</code></pre>
            <p><sup><i>Code Block 2: Simplified internal events processing during migration.</i></sup></p><p>We migrated the data pipeline from AWS to Cloudflare with an outside-in method: we started with the data receptors and incrementally moved the data processor and the ClickHouse cluster to the new architecture. We began writing telemetry data (logs, metrics, traces, wide-events, etc.) to both ClickHouse (in AWS) and to Workers Analytics Engine simultaneously for the duration of the retention period (30 days).</p><p>The final step was rewriting all of our endpoints, previously hosted on AWS Lambda and ECS containers, into Cloudflare Workers. Once those Workers were ready, we simply switched the DNS records to point to the Workers instead of the existing Lambda functions.</p><p>Despite the complexity, the entire migration process, from the data pipeline to all re-writing API endpoints, took our then team of 3 engineers less than three months.</p>
    <div>
      <h2>We ended up saving over 80% on our cloud bill</h2>
      <a href="#we-ended-up-saving-over-80-on-our-cloud-bill">
        
      </a>
    </div>
    
    <div>
      <h3>Savings on the data receptors</h3>
      <a href="#savings-on-the-data-receptors">
        
      </a>
    </div>
    <p>After switching the data receptors from AWS to Cloudflare in early June 2024, our AWS Lambda cost was reduced by over 85%. These costs were primarily driven by I/O time the receptors spent sending data to a Kinesis Data Stream in the same region.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6P34XkcxEhqR6cjWAGnZaL/54025f08f4642649b7ae53fbaa3775b4/image3.png" />
          </figure><p><sup><i>Figure 4: Baselime daily AWS Lambda cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]</i></sup></p><p>Moreover, we used Cloudfront to enable custom domains pointing to the data receptors. When we migrated the data receptors to Cloudflare, there was no need for Cloudfront anymore. As such, our Cloudfront cost was reduced to $0.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1vj3JUrE580W749VX8JMBj/c82477d00fe33fba1ffd8658ef0a1229/image2.png" />
          </figure><p><sup><i>Figure 5: Baselime daily Cloudfront cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]</i></sup></p><p>If we were a regular Cloudflare customer, we estimate that our Cloudflare Workers bill would be around \$25/day after the switch, against \$790/day on AWS: over 95% cost reduction. These savings are primarily driven by the Workers pricing model, since Workers charge for CPU time, and the receptors are primarily just moving data, and as such, are mostly I/O bound.</p>
    <div>
      <h3>Savings on the ClickHouse cluster</h3>
      <a href="#savings-on-the-clickhouse-cluster">
        
      </a>
    </div>
    <p>To evaluate the cost impact of switching from self-hosting ClickHouse to using Workers Analytics Engine, we need to take into account not only the EC2 instances, but also the disk space, networking, and the Kinesis Data Stream cost.</p><p>We completed this switch in late August, achieving over 95% cost reduction in both the Kinesis Data Stream and all EC2 related costs.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3hpbfmwC5vEjDXeSIC2XIr/d103e813848926f224efe6e22e0717ce/image9.png" />
          </figure><p><sup><i>Figure 6: Baselime daily Kinesis Data Stream cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]</i></sup></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7aBU0TVUoRTC5GAo3HkS0N/eebb0559ad8d32b56e20f1e5c6121c6f/image6.png" />
          </figure><p><sup><i>Figure 7: Baselime daily EC2 cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]</i></sup></p><p>If we were a regular Cloudflare customer, we estimate that our Workers Analytics Engine cost would be around \$300/day after the switch, compared to \$1150/day on AWS, a cost reduction of over 70%.</p><p>Not only did we significantly reduce costs by migrating to Cloudflare, but we also improved performance across the board. Responses to users are now faster, with real-time event ingestion happening across Cloudflare’s network, closer to our users. Responses to users querying their data are also much faster, thanks to Cloudflare’s deep expertise in operating ClickHouse at scale.</p><p>Most importantly, we’re no longer bound by limitations in throughput or scale. We launched <a href="https://developers.cloudflare.com/workers/observability/logs/workers-logs"><u>Workers Logs</u></a> on September 26, 2024, and our system now handles a much higher volume of events than before, with no sacrifices in speed or reliability.</p><p>These cost savings are outstanding as is, and do not include the total cost of ownership of those systems. We significantly simplified our systems and our codebase, as the platform is taking care of more for us. We’re paged less, we spend less time monitoring infrastructure, and we can focus on delivering product improvements.</p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Migrating Baselime to Cloudflare has transformed how we build and scale our platform. With Workers, Durable Objects, Workers Analytics Engine, and other services, we now run a fully serverless, globally distributed system that’s more cost-efficient and agile. This shift has significantly reduced our operational overhead and enabled us to iterate faster, delivering better observability tooling to our users.</p><p>You can start observing your Cloudflare Workers today with <a href="https://developers.cloudflare.com/workers/observability/logs/workers-logs/"><u>Workers Logs</u></a>. Looking ahead, we’re excited about the features we will deliver directly in the Cloudflare Dashboard, including real-time error tracking, alerting, and a query builder for high-cardinality and dimensionality events. All coming by early 2025.</p> ]]></content:encoded>
            <category><![CDATA[Observability]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Performance]]></category>
            <guid isPermaLink="false">6heSTMT0I5jl0vpeR9TucD</guid>
            <dc:creator>Boris Tane</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare acquires Baselime to expand serverless application observability capabilities]]></title>
            <link>https://blog.cloudflare.com/cloudflare-acquires-baselime-expands-observability-capabilities/</link>
            <pubDate>Fri, 05 Apr 2024 15:50:00 GMT</pubDate>
            <description><![CDATA[ Today, we’re thrilled to announce that Cloudflare has acquired Baselime, a serverless observability company ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Today, we’re thrilled to announce that <a href="https://www.cloudflare.com/press-releases/2024/cloudflare-enters-observability-market-with-acquisition-baselime/">Cloudflare has acquired Baselime</a>.</p><p>The cloud is changing. Just a few years ago, serverless functions were revolutionary. Today, entire applications are built on serverless architectures, from compute to databases, storage, queues, etc. — with Cloudflare leading the way in making it easier than ever for developers to build, without having to think about their architecture. And while the adoption of serverless has made it simple for developers to run fast, it has also made one of the most difficult problems in software even harder: how the heck do you unravel the behavior of distributed systems?</p><p>When I started Baselime 2 years ago, our goal was simple: enable every developer to build, ship, and learn from their serverless applications such that they can resolve issues before they become problems.</p><p>Since then, we built an observability platform that enables developers to understand the behaviour of their cloud applications. It’s designed for high cardinality and dimensionality data, from logs to distributed tracing with <a href="https://opentelemetry.io/">OpenTelemetry</a>. With this data, we automatically surface insights from your applications, and enable you to quickly detect, troubleshoot, and resolve issues in production.</p><p>In parallel, Cloudflare has been busy the past few years building the next frontier of cloud computing: the connectivity cloud. The team is building primitives that enable developers to build applications with a completely new set of paradigms, from Workers to D1, <a href="https://www.cloudflare.com/developer-platform/r2/">R2</a>, Queues, KV, Durable Objects, AI, and all the other services available on the Cloudflare Developers Platform.</p><p>This synergy makes Cloudflare the perfect home for Baselime. Our core mission has always been to simplify and innovate around observability for the future of the cloud, and Cloudflare's ecosystem offers the ideal ground to further this cause. With Cloudflare, we're positioned to deeply integrate into a platform that tens of thousands of developers trust and use daily, enabling them to quickly build, ship, and troubleshoot applications. We believe that every Worker, Queue, KV, Durable Object, AI call, etc. should have built-in observability by default.</p><p>That’s why we’re incredibly excited about the potential of what we can build together and the impact it will have on developers around the world.</p><p>To give you a preview into what’s ahead, I wanted to dive deeper into the 3 core concepts we followed while building Baselime.</p>
    <div>
      <h3>High Cardinality and Dimensionality</h3>
      <a href="#high-cardinality-and-dimensionality">
        
      </a>
    </div>
    <p>Cardinality and dimensionality are best described using examples. Imagine you’re playing a board game with a deck of cards. High cardinality is like playing a game where every card is a unique character, making it hard to remember or match them. And high dimensionality is like each card has tons of details like strength, speed, magic, aura, etc., making the game's strategy complex because there's so much to consider.</p><p>This also applies to the data your application emits. For example, when you log an HTTP request that makes database calls.</p><ul><li><p>High cardinality means that your logs can have a unique <code>userId</code> or <code>requestId</code> (which can take millions of distinct values). Those are high cardinality fields.</p></li><li><p>High dimensionality means that your logs can have thousands of possible fields. You can record each HTTP header of your request and the details of each database call. Any log can be a key-value object with thousands of individual keys.</p></li></ul><p>The ability to query on high cardinality and dimensionality fields is key to modern observability. You can surface all errors or requests for a specific user, compute the duration of each of those requests, and group by location. You can answer all of those questions with a single tool.</p>
    <div>
      <h3>OpenTelemetry</h3>
      <a href="#opentelemetry">
        
      </a>
    </div>
    <p><a href="https://opentelemetry.io/">OpenTelemetry</a> provides a common set of tools, APIs, SDKs, and standards for instrumenting applications. It is a game-changer for debugging and understanding cloud applications. You get to see the big picture: how fast your HTTP APIs are, which routes are experiencing the most errors, or which database queries are slowest. You can also get into the details by following the path of a single request or user across your entire application.</p><p>Baselime is OpenTelemetry native, and it is built from the ground up to leverage OpenTelemetry data. To support this, we built a set of OpenTelemetry SDKs compatible with several serverless runtimes.</p><p>Cloudflare is building the cloud of tomorrow and has developed <a href="https://github.com/cloudflare/workerd">workerd</a>, a modern JavaScript runtime for Workers. With Cloudflare, we are considering embedding OpenTelemetry directly in the Workers’ runtime. That’s one more reason we’re excited to grow further at Cloudflare, enabling more developers to understand their applications, even in the most unconventional scenarios.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5srzwFELwKonKcv5tbyl7D/56db2543ab62c98e198b39164565308c/pasted-image-0--8-.png" />
            
            </figure>
    <div>
      <h3>Developer Experience</h3>
      <a href="#developer-experience">
        
      </a>
    </div>
    <p>Observability without action is just storage. I have seen too many developers pay for tools to store logs and metrics they never use, and the key reason is how opaque these tools are.</p><p>The crux of the issue in modern observability isn’t the technology itself, but rather the developer experience. Many tools are complex, with a significant learning curve. This friction reduces the speed at which developers can identify and resolve issues, ultimately affecting the reliability of their applications. Improving developer experience is key to unlocking the full potential of observability.</p><p>We built Baselime to be an exploratory solution that surfaces insights to you rather than requiring you to dig for them. For example, we notify you in real time when errors are discovered in your application, based on your logs and traces. You can quickly search through all of your data with full-text search, or using our powerful query engine, which makes it easy to correlate logs and traces for increased visibility, or ask our AI debugging assistant for insights on the issue you’re investigating.</p><p>It is always possible to go from one insight to another, asking questions about the state of your app iteratively until you get to the root cause of the issue you are troubleshooting.</p><p>Cloudflare has always prioritised the developer experience of its developer platform, especially with <a href="https://developers.cloudflare.com/workers/wrangler/">Wrangler</a>, and we are convinced it’s the right place to solve the developer experience problem of observability.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/57hiGcPaNP3WBnqhDbU1TC/48fd408810c38c23969a932b72636da0/pasted-image-0-2.png" />
            
            </figure>
    <div>
      <h3>What’s next?</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Over the next few months, we’ll work to bring the core of Baselime into the Cloudflare ecosystem, starting with OpenTelemetry, real-time error tracking, and all the developer experience capabilities that make a great observability solution. We will keep building and improving observability for applications deployed outside Cloudflare because we understand that observability should work across providers.</p><p>But we don’t want to stop there. We want to push the boundaries of what modern observability looks like. For instance, directly connecting to your codebase and correlating insights from your logs and traces to functions and classes in your codebase. We also want to enable more AI capabilities beyond our debugging assistant. We want to deeply integrate with your repositories such that you can go from an error in your logs and traces to a Pull Request in your codebase within minutes.</p><p>We also want to enable everyone building on top of Large Language Models to do all your LLM observability directly within Cloudflare, such that you can optimise your prompts, improve latencies and reduce error rates directly within your cloud provider. These are just a handful of capabilities we can now build with the support of the Cloudflare platform.</p>
    <div>
      <h3>Thanks</h3>
      <a href="#thanks">
        
      </a>
    </div>
    <p>We are incredibly thankful to our community for its continued support, from day 0 to today. With your continuous feedback, you’ve helped us build something we’re incredibly proud of.</p><p>To all the developers currently using Baselime, you’ll be able to keep using the product and will receive ongoing support. Also, we are now making all the paid Baselime features completely free.</p><p>Baselime products remain available to sign up for while we work on integrating with the Cloudflare platform. We anticipate sunsetting the Baselime products towards the end of 2024 when you will be able to observe all of your applications within the Cloudflare dashboard. If you're interested in staying up-to-date on our work with Cloudflare, we will release a signup link in the coming weeks!</p><p>We are looking forward to continuing to innovate with you.</p> ]]></content:encoded>
            <category><![CDATA[Developer Week]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Observability]]></category>
            <category><![CDATA[Acquisitions]]></category>
            <category><![CDATA[Connectivity Cloud]]></category>
            <guid isPermaLink="false">7MaPl0JR4okx51bFTrn69J</guid>
            <dc:creator>Boris Tane</dc:creator>
            <dc:creator>Rita Kozlov</dc:creator>
        </item>
    </channel>
</rss>