
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 04 Apr 2026 09:50:00 GMT</lastBuildDate>
        <item>
            <title><![CDATA[A year of improving Node.js compatibility in Cloudflare Workers]]></title>
            <link>https://blog.cloudflare.com/nodejs-workers-2025/</link>
            <pubDate>Thu, 25 Sep 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ Over the year we have greatly expanded Node.js compatibility. There are hundreds of new Node.js APIs now available that make it easier to run existing Node.js code on our platform. ]]></description>
            <content:encoded><![CDATA[ <p>We've been busy.</p><p>Compatibility with the broad JavaScript developer ecosystem has always been a key strategic investment for us. We believe in open standards and an open web. We want you to see <a href="https://workers.cloudflare.com/"><u>Workers</u></a> as a powerful extension of your development platform with the ability to just drop code in that Just Works. To deliver on this goal, the Cloudflare Workers team has spent the past year significantly expanding compatibility with the Node.js ecosystem, enabling hundreds (if not thousands) of popular <a href="https://npmjs.com"><u>npm</u></a> modules to now work seamlessly, including the ever popular <a href="https://expressjs.com"><u>express</u></a> framework.</p><p>We have implemented a <a href="https://developers.cloudflare.com/workers/runtime-apis/nodejs/"><u>substantial subset of the Node.js standard library</u></a>, focusing on the most commonly used, and asked for, APIs. These include:</p>
<div><table><colgroup>
<col></col>
<col></col>
</colgroup>
<thead>
  <tr>
    <th><span>Module</span></th>
    <th><span>API documentation</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>node:console</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/console.html"><span>https://nodejs.org/docs/latest/api/console.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:crypto</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/crypto.html"><span>https://nodejs.org/docs/latest/api/crypto.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:dns</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/dns.html"><span>https://nodejs.org/docs/latest/api/dns.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:fs</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/fs.html"><span>https://nodejs.org/docs/latest/api/fs.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:http</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/http.html"><span>https://nodejs.org/docs/latest/api/http.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:https</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/https.html"><span>https://nodejs.org/docs/latest/api/https.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:net</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/net.html"><span>https://nodejs.org/docs/latest/api/net.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:process</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/process.html"><span>https://nodejs.org/docs/latest/api/process.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:timers</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/timers.html"><span>https://nodejs.org/docs/latest/api/timers.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:tls</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/tls.html"><span>https://nodejs.org/docs/latest/api/tls.html</span></a><span> </span></td>
  </tr>
  <tr>
    <td><span>node:zlib</span></td>
    <td><a href="https://nodejs.org/docs/latest/api/zlib.html"><span>https://nodejs.org/docs/latest/api/zlib.html</span></a><span> </span></td>
  </tr>
</tbody></table></div><p>Each of these has been carefully implemented to approximate Node.js' behavior as closely as possible where feasible. Where matching <a href="http://nodejs.org"><u>Node.js</u></a>' behavior is not possible, our implementations will throw a clear error when called, rather than silently failing or not being present at all. This ensures that packages that check for the presence of these APIs will not break, even if the functionality is not available.</p><p>In some cases, we had to implement entirely new capabilities within the runtime in order to provide the necessary functionality. For <code>node:fs</code>, we added a new virtual file system within the Workers environment. In other cases, such as with <code>node:net</code>, <code>node:tls</code>, and <code>node:http</code>, we wrapped the new Node.js APIs around existing Workers capabilities such as the <a href="https://developers.cloudflare.com/workers/runtime-apis/tcp-sockets/"><u>Sockets API</u></a> and <a href="https://developers.cloudflare.com/workers/runtime-apis/fetch/"><code><u>fetch</u></code></a>.</p><p>Most importantly, <b>all of these implementations are done natively in the Workers runtime</b>, using a combination of TypeScript and C++. Whereas our earlier Node.js compatibility efforts relied heavily on polyfills and shims injected at deployment time by developer tooling such as <a href="https://developers.cloudflare.com/workers/wrangler/"><u>Wrangler</u></a>, we are moving towards a model where future Workers will have these APIs available natively, without need for any additional dependencies. This not only improves performance and reduces memory usage, but also ensures that the behavior is as close to Node.js as possible.</p>
    <div>
      <h2>The networking stack</h2>
      <a href="#the-networking-stack">
        
      </a>
    </div>
    <p>Node.js has a rich set of networking APIs that allow applications to create servers, make HTTP requests, work with raw TCP and UDP sockets, send DNS queries, and more. Workers do not have direct access to raw kernel-level sockets though, so how can we support these Node.js APIs so packages still work as intended? We decided to build on top of the existing <a href="https://developers.cloudflare.com/workers/runtime-apis/tcp-sockets/"><u>managed Sockets</u></a> and fetch APIs. These implementations allow many popular Node.js packages that rely on networking APIs to work seamlessly in the Workers environment.</p><p>Let's start with the HTTP APIs.</p>
    <div>
      <h3>HTTP client and server support</h3>
      <a href="#http-client-and-server-support">
        
      </a>
    </div>
    <p>From the moment we announced that we would be pursuing Node.js compatibility within Workers, users have been asking specifically for an implementation of the <code>node:http</code> module. There are countless modules in the ecosystem that depend directly on APIs like <code>http.get(...)</code> and <code>http.createServer(...)</code>.</p><p>The <code>node:http</code> and <code>node:https</code> modules provide APIs for creating HTTP clients and servers. <a href="https://blog.cloudflare.com/bringing-node-js-http-servers-to-cloudflare-workers/"><u>We have implemented both</u></a>, allowing you to create HTTP clients using <code>http.request()</code> and servers using <code>http.createServer()</code>. <a href="https://developers.cloudflare.com/workers/runtime-apis/nodejs/http/"><u>The HTTP client implementation</u></a> is built on top of the Fetch API, while the HTTP server implementation is built on top of the Workers runtime’s existing request handling capabilities.</p><p>The client side is fairly straightforward:</p>
            <pre><code>import http from 'node:http';

export default {
  async fetch(request) {
    return new Promise((resolve, reject) =&gt; {
      const req = http.request('http://example.com', (res) =&gt; {
        let data = '';
        res.setEncoding('utf8');
        res.on('data', (chunk) =&gt; {
          data += chunk;
        });
        res.on('end', () =&gt; {
          resolve(new Response(data));
        });
      });
      req.on('error', (err) =&gt; {
        reject(err);
      });
      req.end();
    });
  }
}
</code></pre>
            <p>The server side is just as simple but likely even more exciting. We've often been asked about the possibility of supporting <a href="https://expressjs.com/"><u>Express</u></a>, or <a href="https://koajs.com/"><u>Koa</u></a>, or <a href="https://fastify.dev/"><u>Fastify</u></a> within Workers, but it was difficult to do because these were so dependent on the Node.js APIs. With the new additions it is now possible to use both Express and Koa within Workers, and we're hoping to be able to add Fastify support later. </p>
            <pre><code>import { createServer } from "node:http";
import { httpServerHandler } from "cloudflare:node";

const server = createServer((req, res) =&gt; {
  res.writeHead(200, { "Content-Type": "text/plain" });
  res.end("Hello from Node.js HTTP server!");
});

export default httpServerHandler(server);
</code></pre>
            <p>The <code>httpServerHandler()</code> function from the <code>cloudflare:nod</code>e module integrates the HTTP <code>server</code> with the Workers fetch event, allowing it to handle incoming requests.</p>
    <div>
      <h3>The <code>node:dns</code> module</h3>
      <a href="#the-node-dns-module">
        
      </a>
    </div>
    <p>The <code>node:dns</code> module provides an API for performing DNS queries. </p><p>At Cloudflare, we happen to have a <a href="https://developers.cloudflare.com/1.1.1.1/encryption/dns-over-https/"><u>DNS-over-HTTPS (DoH)</u></a> service and our own <a href="https://one.one.one.one/"><u>DNS service called 1.1.1.1</u></a>. We took advantage of this when exposing <code>node:dns</code> in Workers. When you use this module to perform a query, it will just make a subrequest to 1.1.1.1 to resolve the query. This way the user doesn’t have to think about DNS servers, and the query will just work.</p>
    <div>
      <h3>The <code>node:net</code> and <code>node:tls</code> modules</h3>
      <a href="#the-node-net-and-node-tls-modules">
        
      </a>
    </div>
    <p>The <code>node:net</code> module provides an API for creating TCP sockets, while the <code>node:tls</code> module provides an API for creating secure TLS sockets. As we mentioned before, both are built on top of the existing <a href="https://developers.cloudflare.com/workers/runtime-apis/tcp-sockets/"><u>Workers Sockets API</u></a>. Note that not all features of the <code>node:net</code> and <code>node:tls</code> modules are available in Workers. For instance, it is not yet possible to create a TCP server using <code>net.createServer()</code> yet (but maybe soon!), but we have implemented enough of the APIs to allow many popular packages that rely on these modules to work in Workers.</p>
            <pre><code>import net from 'node:net';
import tls from 'node:tls';

export default {
  async fetch(request) {
    const { promise, resolve } = Promise.withResolvers();
    const socket = net.connect({ host: 'example.com', port: 80 },
        () =&gt; {
      let buf = '';
      socket.setEncoding('utf8')
      socket.on('data', (chunk) =&gt; buf += chunk);
      socket.on('end', () =&gt; resolve(new Response('ok'));
      socket.end();
    });
    return promise;
  }
}
</code></pre>
            
    <div>
      <h2>A new virtual file system and the <code>node:fs</code> module</h2>
      <a href="#a-new-virtual-file-system-and-the-node-fs-module">
        
      </a>
    </div>
    <p>What does supporting filesystem APIs mean in a serverless environment? When you deploy a Worker, it runs in Region:Earth and we don’t want you needing to think about individual servers with individual file systems. There are, however, countless existing applications and modules in the ecosystem that leverage the file system to store configuration data, read and write temporary data, and more.</p><p>Workers do not have access to a traditional file system like a Node.js process does, and for good reason! A Worker does not run on a single machine; a single request to one worker can run on any one of thousands of servers anywhere in Cloudflare's global <a href="https://www.cloudflare.com/network"><u>network</u></a>. Coordinating and synchronizing access to shared physical resources such as a traditional file system harbor major technical challenges and risks of deadlocks and more; challenges that are inherent in any massively distributed system. Fortunately, Workers provide powerful tools like <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a> that provide a solution for coordinating access to shared, durable state at scale. To address the need for a file system in Workers, we built on what already makes Workers great.</p><p>We implemented a virtual file system that allows you to use the node:fs APIs to read and write temporary, in-memory files. This virtual file system is specific to each Worker. When using a stateless worker, files created in one request are not accessible in any other request. However, when using a Durable Object, this temporary file space can be shared across multiple requests from multiple users. This file system is ephemeral (for now), meaning that files are not persisted across Worker restarts or deployments, so it does not replace the use of the <a href="https://developers.cloudflare.com/durable-objects/api/storage-api/"><u>Durable Object Storage</u></a> mechanism, but it provides a powerful new tool that greatly expands the capabilities of your Durable Objects.</p><p>The <code>node:fs</code> module provides a rich set of APIs for working with files and directories:</p>
            <pre><code>import fs from 'node:fs';

export default {
  async fetch(request) {
    // Write a temporary file
    await fs.promises.writeFile('/tmp/hello.txt', 'Hello, world!');

    // Read the file
    const data = await fs.promises.readFile('/tmp/hello.txt', 'utf-8');

    return new Response(`File contents: ${data}`);
  }
}
</code></pre>
            <p>The virtual file system supports a wide range of file operations, including reading and writing files, creating and removing directories, and working with file descriptors. It also supports standard input/output/error streams via <code>process.stdin</code>, <code>process.stdout</code>, and <code>process.stderr</code>, symbolic links, streams, and more.</p><p>While the current implementation of the virtual file system is in-memory only, we are exploring options for adding persistent storage in the future that would link to existing Cloudflare storage solutions like <a href="https://www.cloudflare.com/developer-platform/products/r2/">R2</a> or Durable Objects. But you don't have to wait on us! When combined with powerful tools like Durable Objects and <a href="https://developers.cloudflare.com/workers/runtime-apis/rpc/"><u>JavaScript RPC</u></a>, it's certainly possible to create your own general purpose, durable file system abstraction backed by sqlite storage.</p>
    <div>
      <h2>Cryptography with <code>node:crypto</code></h2>
      <a href="#cryptography-with-node-crypto">
        
      </a>
    </div>
    <p>The <code>node:crypto</code> module provides a comprehensive set of cryptographic functionality, including hashing, encryption, decryption, and more. We have implemented a full version of the <code>node:crypto</code> module, allowing you to use familiar cryptographic APIs in your Workers applications. There will be some difference in behavior compared to Node.js due to the fact that Workers uses <a href="https://github.com/google/boringssl/blob/main/README.md"><u>BoringSSL</u></a> under the hood, while Node.js uses <a href="https://github.com/openssl"><u>OpenSSL</u></a>. However, we have strived to make the APIs as compatible as possible, and many popular packages that rely on <code>node:crypto</code> now work seamlessly in Workers.</p><p>To accomplish this, we didn't just copy the implementation of these cryptographic operations from Node.js. Rather, we worked within the Node.js project to extract the core crypto functionality out into a separate dependency project called <a href="https://github.com/nodejs/ncrypto"><code><u>ncrypto</u></code></a> that is used – not only by Workers but Bun as well – to implement Node.js compatible functionality by simply running the exact same code that Node.js is running.</p>
            <pre><code>import crypto from 'node:crypto';

export default {
  async fetch(request) {
    const hash = crypto.createHash('sha256');
    hash.update('Hello, world!');
    const digest = hash.digest('hex');

    return new Response(`SHA-256 hash: ${digest}`);
  }
}
</code></pre>
            <p>All major capabilities of the <code>node:crypto</code> module are supported, including:</p><ul><li><p>Hashing (e.g., SHA-256, SHA-512)</p></li><li><p>HMAC</p></li><li><p>Symmetric encryption/decryption</p></li><li><p>Asymmetric encryption/decryption</p></li><li><p>Digital signatures</p></li><li><p>Key generation and management</p></li><li><p>Random byte generation</p></li><li><p>Key derivation functions (e.g., PBKDF2, scrypt)</p></li><li><p>Cipher and Decipher streams</p></li><li><p>Sign and Verify streams</p></li><li><p>KeyObject class for managing keys</p></li><li><p>Certificate handling (e.g., X.509 certificates)</p></li><li><p>Support for various encoding formats (e.g., PEM, DER, base64)</p></li><li><p>and more…</p></li></ul>
    <div>
      <h2>Process &amp; Environment</h2>
      <a href="#process-environment">
        
      </a>
    </div>
    <p>In Node.js, the <code>node:process</code> module provides a global object that gives information about, and control over, the current Node.js process. It includes properties and methods for accessing environment variables, command-line arguments, the current working directory, and more. It is one of the most fundamental modules in Node.js, and many packages rely on it for basic functionality and simply assume its presence. There are, however, some aspects of the <code>node:process</code> module that do not make sense in the Workers environment, such as process IDs and user/group IDs which are tied to the operating system and process model of a traditional server environment and have no equivalent in the Workers environment.</p><p>When <code>nodejs_compat</code> is enabled, the <code>process</code> global will be available in your Worker scripts or you can import it directly via <code>import process from 'node:process'</code>. Note that the <code>process</code> global is only available when the <code>nodejs_compat</code> flag is enabled. If you try to access <code>process</code> without the flag, it will be <code>undefined</code> and the import will throw an error.</p><p>Let's take a look at the <code>process</code> APIs that do make sense in Workers, and that have been fully implemented, starting with <code>process.env</code>.</p>
    <div>
      <h3>Environment variables</h3>
      <a href="#environment-variables">
        
      </a>
    </div>
    <p>Workers have had <a href="https://developers.cloudflare.com/workers/configuration/environment-variables/"><u>support for environment variables</u></a> for a while now, but previously they were only accessible via the env argument passed to the Worker function. Accessing the environment at the top-level of a Worker was not possible:</p>
            <pre><code>export default {
  async fetch(request, env) {
    const config = env.MY_ENVIRONMENT_VARIABLE;
    // ...
  }
}
</code></pre>
            <p> With the <a href="https://developers.cloudflare.com/workers/configuration/environment-variables/"><code><u>new process.env</u></code><u> implementation</u></a>, you can now access environment variables in a more familiar way, just like in Node.js, and at any scope, including the top-level of your Worker:</p>
            <pre><code>import process from 'node:process';
const config = process.env.MY_ENVIRONMENT_VARIABLE;

export default {
  async fetch(request, env) {
    // You can still access env here if you need to
    const configFromEnv = env.MY_ENVIRONMENT_VARIABLE;
    // ...
  }
}
</code></pre>
            <p><a href="https://developers.cloudflare.com/workers/configuration/environment-variables/"><u>Environment variables</u></a> are set in the same way as before, via the <code>wrangler.toml</code> or <code>wrangler.jsonc</code> configuration file, or via the Cloudflare dashboard or API. They may be set as simple key-value pairs or as JSON objects:</p>
            <pre><code>{
  "name": "my-worker-dev",
  "main": "src/index.js",
  "compatibility_date": "2025-09-15",
  "compatibility_flags": [
    "nodejs_compat"
  ],
  "vars": {
    "API_HOST": "example.com",
    "API_ACCOUNT_ID": "example_user",
    "SERVICE_X_DATA": {
      "URL": "service-x-api.dev.example",
      "MY_ID": 123
    }
  }
}
</code></pre>
            <p>When accessed via <code>process.env</code>, all environment variable values are strings, just like in Node.js.</p><p>Because <code>process.env</code> is accessible at the global scope, it is important to note that environment variables are accessible from anywhere in your Worker script, including third-party libraries that you may be using. This is consistent with Node.js behavior, but it is something to be aware of from a security and configuration management perspective. The <a href="https://developers.cloudflare.com/secrets-store/"><u>Cloudflare Secrets Store</u></a> can provide enhanced handling around secrets within Workers as an alternative to using environment variables.</p>
    <div>
      <h4>Importable environment and waitUntil</h4>
      <a href="#importable-environment-and-waituntil">
        
      </a>
    </div>
    <p>When not using the <code>nodejs_compat</code> flag, we decided to go a step further and make it possible to import both the environment, and the <a href="https://developers.cloudflare.com/workers/configuration/environment-variables/"><u>waitUntil mechanism</u></a>, as a module, rather than forcing users to always access it via the <code>env</code> and <code>ctx</code> arguments passed to the Worker function. This can make it easier to access the environment in a more modular way, and can help to avoid passing the <code>env</code> argument through multiple layers of function calls. This is not a Node.js-compatibility feature, but we believe it is a useful addition to the Workers environment:</p>
            <pre><code>import { env, waitUntil } from 'cloudflare:workers';

const config = env.MY_ENVIRONMENT_VARIABLE;

export default {
  async fetch(request) {
    // You can still access env here if you need to
    const configFromEnv = env.MY_ENVIRONMENT_VARIABLE;
    // ...
  }
}

function doSomething() {
  // Bindings and waitUntil can now be accessed without
  // passing the env and ctx through every function call.
  waitUntil(env.RPC.doSomethingRemote());
}
</code></pre>
            <p>One important note about <code>process.env</code>: changes to environment variables via <code>process.env</code> will not be reflected in the <code>env</code> argument passed to the Worker function, and vice versa. The <code>process.env</code> is populated at the start of the Worker execution and is not updated dynamically. This is consistent with Node.js behavior, where changes to <code>process.env</code> do not affect the actual environment variables of the running process. We did this to minimize the risk that a third-party library, originally meant to run in Node.js, could inadvertently modify the environment assumed by the rest of the Worker code.</p>
    <div>
      <h3>Stdin, stdout, stderr</h3>
      <a href="#stdin-stdout-stderr">
        
      </a>
    </div>
    <p>Workers do not have a traditional standard input/output/error streams like a Node.js process does. However, we have implemented <code>process.stdin</code>, <code>process.stdout</code>, and <code>process.stderr</code> as stream-like objects that can be used similarly. These streams are not connected to any actual process stdin and stdout, but they can be used to capture output that is written to the logs captured by the Worker in the same way as <code>console.log</code> and friends, just like them, they will show up in <a href="https://developers.cloudflare.com/workers/observability/logs/workers-logs/"><u>Workers Logs</u></a>.</p><p>The <code>process.stdout</code> and <code>process.stderr</code> are Node.js writable streams:</p>
            <pre><code>import process from 'node:process';

export default {
  async fetch(request) {
    process.stdout.write('This will appear in the Worker logs\n');
    process.stderr.write('This will also appear in the Worker logs\n');
    return new Response('Hello, world!');
  }
}
</code></pre>
            <p>Support for <code>stdin</code>, <code>stdout</code>, and <code>stderr</code> is also integrated with the virtual file system, allowing you to write to the standard file descriptors <code>0</code>, <code>1</code>, and <code>2</code> (representing <code>stdin</code>, <code>stdout</code>, and <code>stderr</code> respectively) using the <code>node:fs</code> APIs:</p>
            <pre><code>import fs from 'node:fs';
import process from 'node:process';

export default {
  async fetch(request) {
    // Write to stdout
    fs.writeSync(process.stdout.fd, 'Hello, stdout!\n');
    // Write to stderr
    fs.writeSync(process.stderr.fd, 'Hello, stderr!\n');

    return new Response('Check the logs for stdout and stderr output!');
  }
}
</code></pre>
            
    <div>
      <h3>Other process APIs</h3>
      <a href="#other-process-apis">
        
      </a>
    </div>
    <p>We cannot cover every <code>node:process</code> API in detail here, but here are some of the other notable APIs that we have implemented:</p><ul><li><p><code>process.nextTick(fn)</code>: Schedules a callback to be invoked after the current execution context completes. Our implementation uses the same microtask queue as promises so that it behaves exactly the same as <code>queueMicrotask(fn)</code>.</p></li><li><p><code>process.cwd()</code> and <code>process.chdir()</code>: Get and change the current virtual working directory. The current working directory is initialized to /<code>bundle</code> when the Worker starts, and every request has its own isolated view of the current working directory. Changing the working directory in one request does not affect the working directory in other requests.</p></li><li><p><code>process.exit()</code>: Immediately terminates the current Worker request execution. This is unlike Node.js where <code>process.exit()</code> terminates the entire process. In Workers, calling <code>process.exit()</code> will stop execution of the current request and return an error response to the client.</p></li></ul>
    <div>
      <h2>Compression with <code>node:zlib</code></h2>
      <a href="#compression-with-node-zlib">
        
      </a>
    </div>
    <p>The <code>node:zlib</code> module provides APIs for compressing and decompressing data using various algorithms such as gzip, deflate, and brotli. We have implemented the <code>node:zlib</code> module, allowing you to use familiar compression APIs in your Workers applications. This enables a wide range of use cases, including data compression for network transmission, response optimization, and archive handling.</p>
            <pre><code>import zlib from 'node:zlib';

export default {
  async fetch(request) {
    const input = 'Hello, world! Hello, world! Hello, world!';
    const compressed = zlib.gzipSync(input);
    const decompressed = zlib.gunzipSync(compressed).toString('utf-8');

    return new Response(`Decompressed data: ${decompressed}`);
  }
}
</code></pre>
            <p>While Workers has had built-in support for gzip and deflate compression via the <a href="https://compression.spec.whatwg.org/"><u>Web Platform Standard Compression API</u></a>, the <code>node:zlib</code> module support brings additional support for the Brotli compression algorithm, as well as a more familiar API for Node.js developers.</p>
    <div>
      <h2>Timing &amp; scheduling</h2>
      <a href="#timing-scheduling">
        
      </a>
    </div>
    <p>Node.js provides a set of timing and scheduling APIs via the <code>node:timers</code> module. We have implemented these in the runtime as well.</p>
            <pre><code>import timers from 'node:timers';

export default {
  async fetch(request) {
    timers.setInterval(() =&gt; {
      console.log('This will log every half-second');
    }, 500);

    timers.setImmediate(() =&gt; {
      console.log('This will log immediately after the current event loop');
    });

    return new Promise((resolve) =&gt; {
      timers.setTimeout(() =&gt; {
        resolve(new Response('Hello after 1 second!'));
      }, 1000);
    });
  }
}
</code></pre>
            <p>The Node.js implementations of the timers APIs are very similar to the standard Web Platform with one key difference: the Node.js timers APIs return <code>Timeout</code> objects that can be used to manage the timers after they have been created. We have implemented the <code>Timeout</code> class in Workers to provide this functionality, allowing you to clear or re-fire timers as needed.</p>
    <div>
      <h2>Console</h2>
      <a href="#console">
        
      </a>
    </div>
    <p>The <code>node:console</code> module provides a set of console logging APIs that are similar to the standard <code>console</code> global, but with some additional features. We have implemented the <code>node:console</code> module as a thin wrapper around the existing <code>globalThis.console</code> that is already available in Workers.</p>
    <div>
      <h2>How to enable the Node.js compatibility features</h2>
      <a href="#how-to-enable-the-node-js-compatibility-features">
        
      </a>
    </div>
    <p>To enable the Node.js compatibility features as a whole within your Workers, you can set the <code>nodejs_compat</code> <a href="https://developers.cloudflare.com/workers/configuration/compatibility-flags/"><u>compatibility flag</u></a> in your <a href="https://developers.cloudflare.com/workers/wrangler/configuration/"><code><u>wrangler.jsonc or wrangler.toml</u></code></a> configuration file. If you are not using Wrangler, you can also set the flag via the <a href="https://dash.cloudflare.com"><u>Cloudflare dashboard</u></a> or API:</p>
            <pre><code>{
  "name": "my-worker",
  "main": "src/index.js",
  "compatibility_date": "2025-09-21",
  "compatibility_flags": [
    // Get everything Node.js compatibility related
    "nodejs_compat",
  ]
}
</code></pre>
            <p><b>The compatibility date here is key! Update that to the most current date, and you'll always be able to take advantage of the latest and greatest features.</b></p><p>The <code>nodejs_compat</code> flag is an umbrella flag that enables all the Node.js compatibility features at once. This is the recommended way to enable Node.js compatibility, as it ensures that all features are available and work together seamlessly. However, if you prefer, you can also enable or disable some features individually via their own compatibility flags:</p>
<div><table><thead>
  <tr>
    <th><span>Module</span></th>
    <th><span>Enable Flag (default)</span></th>
    <th><span>Disable Flag</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>node:console</span></td>
    <td><span>enable_nodejs_console_module</span></td>
    <td><span>disable_nodejs_console_module</span></td>
  </tr>
  <tr>
    <td><span>node:fs</span></td>
    <td><span>enable_nodejs_fs_module</span></td>
    <td><span>disable_nodejs_fs_module</span></td>
  </tr>
  <tr>
    <td><span>node:http (client)</span></td>
    <td><span>enable_nodejs_http_modules</span></td>
    <td><span>disable_nodejs_http_modules</span></td>
  </tr>
  <tr>
    <td><span>node:http (server)</span></td>
    <td><span>enable_nodejs_http_server_modules</span></td>
    <td><span>disable_nodejs_http_server_modules</span></td>
  </tr>
  <tr>
    <td><span>node:os</span></td>
    <td><span>enable_nodejs_os_module</span></td>
    <td><span>disable_nodejs_os_module</span></td>
  </tr>
  <tr>
    <td><span>node:process</span></td>
    <td><span>enable_nodejs_process_v2</span></td>
    <td></td>
  </tr>
  <tr>
    <td><span>node:zlib</span></td>
    <td><span>nodejs_zlib</span></td>
    <td><span>no_nodejs_zlib</span></td>
  </tr>
  <tr>
    <td><span>process.env</span></td>
    <td><span>nodejs_compat_populate_process_env</span></td>
    <td><span>nodejs_compat_do_not_populate_process_env</span></td>
  </tr>
</tbody></table></div><p>By separating these features, you can have more granular control over which Node.js APIs are available in your Workers. At first, we had started rolling out these features under the one <code>nodejs_compat</code> flag, but we quickly realized that some users perform feature detection based on the presence of certain modules and APIs and that by enabling everything all at once we were risking breaking some existing Workers. Users who are checking for the existence of these APIs manually can ensure new changes don’t break their workers by opting out of specific APIs:</p>
            <pre><code>{
  "name": "my-worker",
  "main": "src/index.js",
  "compatibility_date": "2025-09-15",
  "compatibility_flags": [
    // Get everything Node.js compatibility related
    "nodejs_compat",
    // But disable the `node:zlib` module if necessary
    "no_nodejs_zlib",
  ]
}
</code></pre>
            <p>But, to keep things simple, <b>we recommend starting with the </b><code><b>nodejs_compat</b></code><b> flag, which will enable everything. You can always disable individual features later if needed.</b> There is no performance penalty to having the additional features enabled.</p>
    <div>
      <h3>Handling end-of-life'd APIs</h3>
      <a href="#handling-end-of-lifed-apis">
        
      </a>
    </div>
    <p>One important difference between Node.js and Workers is that Node.js has a <a href="https://nodejs.org/en/eol"><u>defined long term support (LTS) schedule</u></a> that allows it to make breaking changes at certain points in time. More specifically, Node.js can remove APIs and features when they reach end-of-life (EOL). On Workers, however, we have a rule that once a Worker is deployed, <a href="https://blog.cloudflare.com/backwards-compatibility-in-cloudflare-workers/"><u>it will continue to run as-is indefinitely</u></a>, without any breaking changes as long as the compatibility date does not change. This means that we cannot simply remove APIs when they reach EOL in Node.js, since this would break existing Workers. To address this, we have introduced a new set of compatibility flags that allow users to specify that they do not want the <code>nodejs_compat</code> features to include end-of-life APIs. These flags are based on the Node.js major version in which the APIs were removed:</p><p>The <code>remove_nodejs_compat_eol</code> flag will remove all APIs that have reached EOL up to your current compatibility date:</p>
            <pre><code>{
  "name": "my-worker",
  "main": "src/index.js",
  "compatibility_date": "2025-09-15",
  "compatibility_flags": [
    // Get everything Node.js compatibility related
    "nodejs_compat",
    // Remove Node.js APIs that have reached EOL up to your
    // current compatibility date
    "remove_nodejs_compat_eol",
  ]
}
</code></pre>
            <ul><li><p>The <code>remove_nodejs_compat_eol_v22</code> flag will remove all APIs that reached EOL in Node.js v22. When using r<code>emovenodejs_compat_eol</code>, this flag will be automatically enabled if your compatibility date is set to a date after Node.js v22's EOL date (April 30, 2027).</p></li><li><p>The <code>remove_nodejs_compat_eol_v23</code> flag will remove all APIs that reached EOL in Node.js v23. When using r<code>emovenodejs_compat_eol</code>, this flag will be automatically enabled if your compatibility date is set to a date after Node.js v24's EOL date (April 30, 2028).</p></li><li><p>The <code>remove_nodejs_compat_eol_v24</code> flag will remove all APIs that reached EOL in Node.js v24. When using <code>removenodejs_compat_eol</code>, this flag will be automatically enabled if your compatibility date is set to a date after Node.js v24's EOL date (April 30, 2028).</p></li></ul><p>If you look at the date for <code>remove_nodejs_compat_eol_v23</code> you'll notice that it is the same as the date for <code>remove_nodejs_compat_eol_v24</code>. That is not a typo! Node.js v23 is not an LTS release, and as such it has a very short support window. It was released in October 2023 and reached EOL in May 2024. Accordingly, we have decided to group the end-of-life handling of non-LTS releases into the next LTS release. This means that when you set your compatibility date to a date after the EOL date for Node.js v24, you will also be opting out of the APIs that reached EOL in Node.js v23. Importantly, these flags will not be automatically enabled until your compatibility date is set to a date after the relevant Node.js version's EOL date, ensuring that existing Workers will have plenty of time to migrate before any APIs are removed, or can choose to just simply keep using the older APIs indefinitely by using the reverse compatibility flags like <code>add_nodejs_compat_eol_v24</code>.</p>
    <div>
      <h2>Giving back</h2>
      <a href="#giving-back">
        
      </a>
    </div>
    <p>One other important bit of work that we have been doing is expanding Cloudflare's investment back into the Node.js ecosystem as a whole. There are now five members of the Workers runtime team (plus one summer intern) that are actively contributing to the <a href="https://github.com/nodejs/node"><u>Node.js project</u></a> on GitHub, two of which are members of Node.js' Technical Steering Committee. While we have made a number of new feature contributions such as an implementation of the Web Platform Standard <a href="https://blog.cloudflare.com/improving-web-standards-urlpattern/"><u>URLPattern</u></a> API and improved implementation of <a href="https://github.com/nodejs/ncrypto"><u>crypto</u></a> operations, our primary focus has been on improving the ability for other runtimes to interoperate and be compatible with Node.js, fixing critical bugs, and improving performance. As we continue to grow our efforts around Node.js compatibility we will also grow our contributions back to the project and ecosystem as a whole.</p>
<div><table><thead>
  <tr>
    <th><span>Aaron Snell</span></th>
    <th><span>2025 Summer Intern, Cloudflare Containers</span><br /><span>Node.js Web Infrastructure Team</span></th>
    <th><img src="https://images.ctfassets.net/zkvhlag99gkb/2ud1DF6HOI3ha2ySAhPOve/803132cf224695a48698afb806bf147b/Aaron.png?h=250" /></th>
  </tr>
  <tr>
    <th><img src="https://images.ctfassets.net/zkvhlag99gkb/2nqff7ZSEryQfXbl2OdwfJ/6b4a56a3e71f439032d3bc0413d2d72f/GitHub.png?h=250" /></th>
    <th><a href="https://github.com/flakey5"><span>flakey5</span></a></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Dario Piotrowicz</span></td>
    <td><span>Senior System Engineer</span><br /><span>Node.js Collaborator</span></td>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/4K17bsjek1z4u2KRTtZ8uS/d7058dea515cb057a1727bcd01a0f5d2/Dario.png?h=250" /></td>
  </tr>
  <tr>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/2nqff7ZSEryQfXbl2OdwfJ/6b4a56a3e71f439032d3bc0413d2d72f/GitHub.png?h=250" /></td>
    <td><a href="https://github.com/dario-piotrowicz"><span>dario-piotrowicz</span></a></td>
  </tr>
  <tr>
    <td><span>Guy Bedford</span></td>
    <td><span>Principal Systems Engineer</span><br /><span>Node.js Collaborator</span></td>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/iYM8oWWSK89MesmQwctfc/4d86847238b1f10e18717771e2ad5ee8/Guy.png?h=250" /></td>
  </tr>
  <tr>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/2nqff7ZSEryQfXbl2OdwfJ/6b4a56a3e71f439032d3bc0413d2d72f/GitHub.png?h=250" /></td>
    <td><a href="https://github.com/guybedford"><span>guybedford</span></a></td>
  </tr>
  <tr>
    <td><span>James Snell</span></td>
    <td><span>Principal Systems Engineer</span><br /><span>Node.js TSC</span></td>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/4vN2YAqsEBlSnWtXRM0pTT/5e9130753ed71933fc94bc2c634425f3/James.png?h=250" /></td>
  </tr>
  <tr>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/2nqff7ZSEryQfXbl2OdwfJ/6b4a56a3e71f439032d3bc0413d2d72f/GitHub.png?h=250" /></td>
    <td><a href="https://github.com/jasnell"><span>jasnell</span></a></td>
  </tr>
  <tr>
    <td><span>Nicholas Paun</span></td>
    <td><span>Systems Engineer</span><br /><span>Node.js Contributor</span></td>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/4ePtfLAzk4pKYi4hU4dRLX/e4dcdfe86a4e54c4d02e356e2078d214/Nicholas.png?h=250" /></td>
  </tr>
  <tr>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/2nqff7ZSEryQfXbl2OdwfJ/6b4a56a3e71f439032d3bc0413d2d72f/GitHub.png?h=250" /></td>
    <td><a href="https://github.com/npaun"><span>npaun</span></a></td>
  </tr>
  <tr>
    <td><span>Yagiz Nizipli</span></td>
    <td><span>Principal Systems Engineer</span><br /><span>Node.js TSC</span></td>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/2nvpEqU0VHi3Se9fxJ5vE8/0f5628bc1756c7e3e363760be9c493ae/Yagiz.png?h=250" /></td>
  </tr>
  <tr>
    <td><img src="https://images.ctfassets.net/zkvhlag99gkb/2nqff7ZSEryQfXbl2OdwfJ/6b4a56a3e71f439032d3bc0413d2d72f/GitHub.png?h=250" /></td>
    <td><a href="https://github.com/anonrig"><span>anonrig</span></a></td>
  </tr>
</tbody></table></div><p>Cloudflare is also proud to continue supporting critical infrastructure for the Node.js project through its <a href="https://openjsf.org/blog/openjs-cloudflare-partnership"><u>ongoing strategic partnership</u></a> with the OpenJS Foundation, providing free access to the project to services such as Workers, R2, DNS, and more.</p>
    <div>
      <h2>Give it a try!</h2>
      <a href="#give-it-a-try">
        
      </a>
    </div>
    <p>Our vision for Node.js compatibility in Workers is not just about implementing individual APIs, but about creating a comprehensive platform that allows developers to run existing Node.js code seamlessly in the Workers environment. This involves not only implementing the APIs themselves, but also ensuring that they work together harmoniously, and that they integrate well with the unique aspects of the Workers platform.</p><p>In some cases, such as with <code>node:fs</code> and <code>node:crypto</code>, we have had to implement entirely new capabilities that were not previously available in Workers and did so at the native runtime level. This allows us to tailor the implementations to the unique aspects of the Workers environment and ensure both performance and security.</p><p>And we're not done yet. We are continuing to work on implementing additional Node.js APIs, as well as improving the performance and compatibility of the existing implementations. We are also actively engaging with the community to understand their needs and priorities, and to gather feedback on our implementations. If there are specific Node.js APIs or npm packages that you would like to see supported in Workers, <a href="https://github.com/cloudflare/workerd/"><u>please let us know</u></a>! If there are any issues or bugs you encounter, please report them on our <a href="https://github.com/cloudflare/workerd/"><u>GitHub repository</u></a>. While we might not be able to implement every single Node.js API, nor match Node.js' behavior exactly in every case, we are committed to providing a robust and comprehensive Node.js compatibility layer that meets the needs of the community.</p><p>All the Node.js compatibility features described in this post are <a href="https://developers.cloudflare.com/workers/runtime-apis/nodejs/"><u>available now</u></a>. To get started, simply enable the <code>nodejs_compat</code> compatibility flag in your <code>wrangler.toml</code> or <code>wrangler.jsonc</code> file, or via the Cloudflare dashboard or API. You can then start using the Node.js APIs in your Workers applications right away.</p> ]]></content:encoded>
            <category><![CDATA[Node.js]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Serverless]]></category>
            <category><![CDATA[Servers]]></category>
            <guid isPermaLink="false">rMNgTNdCcEh6MjAlrKkL3</guid>
            <dc:creator>James M Snell</dc:creator>
        </item>
        <item>
            <title><![CDATA[Bringing Node.js HTTP servers to Cloudflare Workers]]></title>
            <link>https://blog.cloudflare.com/bringing-node-js-http-servers-to-cloudflare-workers/</link>
            <pubDate>Mon, 08 Sep 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ We've implemented the node:http client and server APIs in Cloudflare Workers, allowing developers to migrate existing Node.js applications with minimal code changes. ]]></description>
            <content:encoded><![CDATA[ <p>We’re making it easier to run your Node.js applications on <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Cloudflare Workers </u></a>by adding support for the <code>node:http</code> client and server APIs. This significant addition brings familiar Node.js HTTP interfaces to the edge, enabling you to deploy existing Express.js, Koa, and other Node.js applications globally with zero cold starts, automatic scaling, and significantly lower latency for your users — all without rewriting your codebase. Whether you're looking to migrate legacy applications to a modern serverless platform or build new ones using the APIs you already know, you can now leverage Workers' global network while maintaining your existing development patterns and frameworks.</p>
    <div>
      <h2>The Challenge: Node.js-style HTTP in a Serverless Environment</h2>
      <a href="#the-challenge-node-js-style-http-in-a-serverless-environment">
        
      </a>
    </div>
    <p>Cloudflare Workers operate in a unique <a href="https://www.cloudflare.com/learning/serverless/what-is-serverless/"><u>serverless</u></a> environment where direct tcp connection isn't available. Instead, all networking operations are fully managed by specialized services outside the Workers runtime itself — systems like our <a href="https://blog.cloudflare.com/introducing-oxy/"><u>Open Egress Router (OER)</u></a> and <a href="https://github.com/cloudflare/pingora"><u>Pingora</u></a> that handle connection pooling, keeping connections warm, managing egress IPs, and all the complex networking details. This means as a developer, you don't need to worry about TLS negotiation, connection management, or network optimization — it's all handled for you automatically.</p><p>This fully-managed approach is actually why we can't support certain Node.js APIs — these networking decisions are handled at the system level for performance and security. While this makes Workers different from traditional Node.js environments, it also makes them better for serverless computing — you get enterprise-grade networking without the complexity.</p><p>This fundamental difference required us to rethink how HTTP APIs work at the edge while maintaining compatibility with existing Node.js code patterns.</p><p>Our Solution: we've implemented the core `node:http` APIs by building on top of the web-standard technologies that Workers already excel at. Here's how it works:</p>
    <div>
      <h3>HTTP Client APIs</h3>
      <a href="#http-client-apis">
        
      </a>
    </div>
    <p>The <code>node:http</code> client implementation includes the essential APIs you're familiar with:</p><ul><li><p><code>http.get()</code> - For simple GET requests</p></li><li><p><code>http.request()</code> - For full control over HTTP requests</p></li></ul><p>Our implementations of these APIs are built on top of the standard <a href="https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API"><code><u>fetch()</u></code></a> API that Workers use natively, providing excellent performance while maintaining Node.js compatibility.</p>
            <pre><code>import http from 'node:http';

export default {
  async fetch(request) {
    // Use familiar Node.js HTTP client APIs
    const { promise, resolve, reject } = Promise.withResolvers();

    const req = http.get('https://api.example.com/data', (res) =&gt; {
      let data = '';
      res.on('data', chunk =&gt; data += chunk);
      res.on('end', () =&gt; {
        resolve(new Response(data, {
          headers: { 'Content-Type': 'application/json' }
        }));
      });
    });

    req.on('error', reject);

    return promise;
  }
};</code></pre>
            
    <div>
      <h3>What's Supported</h3>
      <a href="#whats-supported">
        
      </a>
    </div>
    <ul><li><p>Standard HTTP methods (GET, POST, PUT, DELETE, etc.)</p></li><li><p>Request and response headers</p></li><li><p>Request and response bodies</p></li><li><p>Streaming responses</p></li><li><p>Basic authentication</p></li></ul>
    <div>
      <h3>Current Limitations</h3>
      <a href="#current-limitations">
        
      </a>
    </div>
    <ul><li><p>The <a href="https://nodejs.org/api/http.html#class-httpagent"><code><u>Agent</u></code></a> API is provided but operates as a no-op.</p></li><li><p><a href="https://nodejs.org/docs/v22.19.0/api/http.html#responseaddtrailersheaders"><u>Trailers</u></a>, <a href="https://nodejs.org/docs/v22.19.0/api/http.html#responsewriteearlyhintshints-callback"><u>early hints</u></a>, and <a href="https://nodejs.org/docs/v22.19.0/api/http.html#event-continue"><u>1xx responses</u></a> are not supported.</p></li><li><p>TLS-specific options are not supported (Workers handle TLS automatically).</p></li></ul>
    <div>
      <h2>HTTP Server APIs</h2>
      <a href="#http-server-apis">
        
      </a>
    </div>
    <p>The server-side implementation is where things get particularly interesting. Since Workers can't create traditional TCP servers listening on specific ports, we've created a bridge system that connects Node.js-style servers to the Workers request handling model.</p><p>When you create an HTTP server and call <code>listen(port)</code>, instead of opening a TCP socket, the server is registered in an internal table within your Worker. This internal table acts as a bridge between http.createServer executions and the incoming fetch requests using the port number as the identifier. 

You then use one of two methods to bridge incoming Worker requests to your Node.js-style server.</p>
    <div>
      <h3>Manual Integration with <code>handleAsNodeRequest</code></h3>
      <a href="#manual-integration-with-handleasnoderequest">
        
      </a>
    </div>
    <p>This approach gives you the flexibility to integrate Node.js HTTP servers with other Worker features, and allows you to have multiple handlers in your default <a href="https://developers.cloudflare.com/workers/runtime-apis/bindings/service-bindings/rpc/"><u>entrypoint</u></a> such as <code>fetch</code>, <code>scheduled</code>, <code>queue</code>, etc.</p>
            <pre><code>import { handleAsNodeRequest } from 'cloudflare:node';
import { createServer } from 'node:http';

// Create a traditional Node.js HTTP server
const server = createServer((req, res) =&gt; {
  res.writeHead(200, { 'Content-Type': 'text/plain' });
  res.end('Hello from Node.js HTTP server!');
});

// Register the server (doesn't actually bind to port 8080)
server.listen(8080);

// Bridge from Workers fetch handler to Node.js server
export default {
  async fetch(request) {
    // You can add custom logic here before forwarding
    if (request.url.includes('/admin')) {
      return new Response('Admin access', { status: 403 });
    }

    // Forward to the Node.js server
    return handleAsNodeRequest(8080, request);
  },
  async queue(batch, env, ctx) {
    for (const msg of batch.messages) {
      msg.retry();
    }
  },
  async scheduled(controller, env, ctx) {
    ctx.waitUntil(doSomeTaskOnSchedule(controller));
  },
};</code></pre>
            <p>This approach is perfect when you need to:</p><ul><li><p>Integrate with other Workers features like <a href="https://www.cloudflare.com/developer-platform/products/workers-kv/"><u>KV</u></a>, <a href="https://www.cloudflare.com/developer-platform/products/durable-objects/"><u>Durable Objects</u></a>, or <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2</u></a></p></li><li><p>Handle some routes differently while delegating others to the Node.js server</p></li><li><p>Apply custom middleware or request processing</p></li></ul>
    <div>
      <h3>Automatic Integration with <code>httpServerHandler</code></h3>
      <a href="#automatic-integration-with-httpserverhandler">
        
      </a>
    </div>
    <p>For use cases where you want to integrate a Node.js HTTP server without any additional features or complexity, you can use the `httpServerHandler` function. This function automatically handles the integration for you. This solution is ideal for applications that don’t need Workers-specific features.</p>
            <pre><code>import { httpServerHandler } from 'cloudflare:node';
import { createServer } from 'node:http';

// Create your Node.js HTTP server
const server = createServer((req, res) =&gt; {
  if (req.url === '/') {
    res.writeHead(200, { 'Content-Type': 'text/html' });
    res.end('&lt;h1&gt;Welcome to my Node.js app on Workers!&lt;/h1&gt;');
  } else if (req.url === '/api/status') {
    res.writeHead(200, { 'Content-Type': 'application/json' });
    res.end(JSON.stringify({ status: 'ok', timestamp: Date.now() }));
  } else {
    res.writeHead(404, { 'Content-Type': 'text/plain' });
    res.end('Not Found');
  }
});

server.listen(8080);

// Export the server as a Workers handler
export default httpServerHandler({ port: 8080 });
// Or you can simply pass the http.Server instance directly:
// export default httpServerHandler(server);</code></pre>
            
    <div>
      <h2><a href="https://expressjs.com/"><u>Express.js</u></a>, <a href="https://koajs.com/"><u>Koa.js</u></a> and Framework Compatibility</h2>
      <a href="#and-framework-compatibility">
        
      </a>
    </div>
    <p>These HTTP APIs open the door to running popular Node.js frameworks like Express.js on Workers. If any of the middlewares for these frameworks don’t work as expected, please <a href="https://github.com/cloudflare/workerd/issues"><u>open an issue</u></a> to Cloudflare Workers repository.</p>
            <pre><code>import { httpServerHandler } from 'cloudflare:node';
import express from 'express';

const app = express();

app.get('/', (req, res) =&gt; {
  res.json({ message: 'Express.js running on Cloudflare Workers!' });
});

app.get('/api/users/:id', (req, res) =&gt; {
  res.json({
    id: req.params.id,
    name: 'User ' + req.params.id
  });
});

app.listen(3000);
export default httpServerHandler({ port: 3000 });
// Or you can simply pass the http.Server instance directly:
// export default httpServerHandler(app.listen(3000));</code></pre>
            <p>In addition to <a href="https://expressjs.com"><u>Express.js</u></a>, <a href="https://koajs.com/"><u>Koa.js</u></a> is also supported:</p>
            <pre><code>import Koa from 'koa';
import { httpServerHandler } from 'cloudflare:node';

const app = new Koa()

app.use(async ctx =&gt; {
  ctx.body = 'Hello World';
});

app.listen(8080);

export default httpServerHandler({ port: 8080 });</code></pre>
            
    <div>
      <h2>Getting started with serverless <a href="http://node.js"><u>Node.js</u></a> applications</h2>
      <a href="#getting-started-with-serverless-applications">
        
      </a>
    </div>
    <p>The <code>node:http </code>and <code>node:https</code> APIs are available in Workers with Node.js compatibility enabled using the <a href="https://developers.cloudflare.com/workers/configuration/compatibility-dates/#nodejs-compatibility-flag"><code><u>nodejs_compat</u></code></a> compatibility flag with a compatibility date later than 08-15-2025.</p><p>The addition of <code>node:http</code> support brings us closer to our goal of making Cloudflare Workers the best platform for running JavaScript at the edge, whether you're building new applications or migrating existing ones.</p><a href="https://deploy.workers.cloudflare.com/?url=&lt;https://github.com/cloudflare/templates/tree/main/nodejs-http-server-template"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Ready to try it out? <a href="https://developers.cloudflare.com/workers/runtime-apis/nodejs/"><u>Enable Node.js compatibility</u></a> in your Worker and start exploring the possibilities of familiar<a href="https://developers.cloudflare.com/workers/runtime-apis/nodejs/http/"><u> HTTP APIs at the edge</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Node.js]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[JavaScript]]></category>
            <category><![CDATA[Serverless]]></category>
            <category><![CDATA[Servers]]></category>
            <guid isPermaLink="false">k5sD9WGL8BsJPuqsJj6Fn</guid>
            <dc:creator>Yagiz Nizipli</dc:creator>
            <dc:creator>James M Snell</dc:creator>
        </item>
        <item>
            <title><![CDATA[Is this thing on? Using OpenBMC and ACPI power states for reliable server boot]]></title>
            <link>https://blog.cloudflare.com/how-we-use-openbmc-and-acpi-power-states-to-monitor-the-state-of-our-servers/</link>
            <pubDate>Tue, 22 Oct 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare’s global fleet benefits from being managed by open source firmware for the Baseboard Management Controller (BMC), OpenBMC. This has come with various challenges, some of which we discuss here with an explanation of how the open source nature of the firmware for the BMC enabled us to fix the issues and maintain a more stable fleet. ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Introduction</h2>
      <a href="#introduction">
        
      </a>
    </div>
    <p>At Cloudflare, we provide a range of services through our global network of servers, located in <a href="https://www.cloudflare.com/network/"><u>330 cities</u></a> worldwide. When you interact with our long-standing <a href="https://www.cloudflare.com/application-services/products/"><u>application services</u></a>, or newer services like <a href="https://ai.cloudflare.com/?_gl=1*1vedsr*_gcl_au*NzE0Njc1NTIwLjE3MTkzMzEyODc.*_ga*NTgyMWU1Y2MtYTI2NS00MDA3LTlhZDktYWUxN2U5MDkzYjY3*_ga_SQCRB0TXZW*MTcyMTIzMzM5NC4xNS4xLjE3MjEyMzM1MTguMC4wLjA."><u>Workers AI</u></a>, you’re in contact with one of our fleet of thousands of servers which support those services.</p><p>These servers which provide Cloudflare services are managed by a Baseboard Management Controller (BMC). The BMC is a special purpose processor  — different from the Central Processing Unit (CPU) of a server — whose sole purpose is ensuring a smooth operation of the server.</p><p>Regardless of the server vendor, each server has this BMC. The BMC runs independently of the CPU and has its own embedded operating system, usually referred to as <a href="https://en.wikipedia.org/wiki/Firmware"><u>firmware</u></a>. At Cloudflare, we customize and deploy a server-specific version of the BMC firmware. The BMC firmware we deploy at Cloudflare is based on the <a href="https://www.openbmc.org/"><u>Linux Foundation Project for BMCs, OpenBMC</u></a>. OpenBMC is an open-sourced firmware stack designed to work across a variety of systems including enterprise, telco, and cloud-scale data centers. The open-source nature of OpenBMC gives us greater flexibility and ownership of this critical server subsystem, instead of the closed nature of proprietary firmware. This gives us transparency (which is important to us as a security company) and allows us faster time to develop custom features/fixes for the BMC firmware that we run on our entire fleet.</p><p>In this blog post, we are going to describe how we customized and extended the OpenBMC firmware to better monitor our servers’ boot-up processes to start more reliably and allow better diagnostics in the event that an issue happens during server boot-up.</p>
    <div>
      <h2>Server subsystems</h2>
      <a href="#server-subsystems">
        
      </a>
    </div>
    <p>Server systems consist of multiple complex subsystems that include the processors, memory, storage, networking, power supply, cooling, etc. When booting up the host of a server system, the power state of each subsystem of the server is changed in an asynchronous manner. This is done so that subsystems can initialize simultaneously, thereby improving the efficiency of the boot process. Though started asynchronously, these subsystems may interact with each other at different points of the boot sequence and rely on handshake/synchronization to exchange information. For example, during boot-up, the <a href="https://en.wikipedia.org/wiki/UEFI"><u>UEFI (Universal Extensible Firmware Interface)</u></a>, often referred to as the <a href="https://en.wikipedia.org/wiki/BIOS"><u>BIOS</u></a>, configures the motherboard in a phase known as the Platform Initialization (PI) phase, during which the UEFI collects information from subsystems such as the CPUs, memory, etc. to initialize the motherboard with the right settings.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6csPNEksLXsGgt3dq5xZ0S/3236656dbc01f3085bada5af853c3516/image1.png" />
          </figure><p><sup><i>Figure 1: Server Boot Process</i></sup></p><p>When the power state of the subsystems, handshakes, and synchronization are not properly managed, there may be race conditions that would result in failures during the boot process of the host. Cloudflare experienced some of these boot-related failures while rolling out open source firmware (<a href="https://en.wikipedia.org/wiki/OpenBMC"><u>OpenBMC</u></a>) to the Baseboard Management Controllers (BMCs) of our servers. </p>
    <div>
      <h2>Baseboard Management Controller (BMC) as a manager of the host</h2>
      <a href="#baseboard-management-controller-bmc-as-a-manager-of-the-host">
        
      </a>
    </div>
    <p>A BMC is a specialized microprocessor that is attached to the board of a host (server) to assist with remote management capabilities of the host. Servers usually sit in data centers and are often far away from the administrators, and this creates a challenge to maintain them at scale. This is where a BMC comes in, as the BMC serves as the interface that gives administrators the ability to securely and remotely access the servers and carry out management functions. The BMC does this by exposing various interfaces, including <a href="https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface"><u>Intelligent Platform Management Interface (IPMI)</u></a> and <a href="https://www.dmtf.org/standards/redfish"><u>Redfish</u></a>, for distributed management. In addition, the BMC receives data from various sensors/devices (e.g. temperature, power supply) connected to the server, and also the operating parameters of the server, such as the operating system state, and publishes the values on its IPMI and Redfish interfaces.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/33dNmfyjqrbAGvcbZLTa0h/db3e6b79b1010081916ee6498b10c297/image2.png" />
          </figure><p><sup><i>Figure 2: Block diagram of BMC in a server system.</i></sup></p><p>At Cloudflare, we use the <a href="https://github.com/openbmc/openbmc"><u>OpenBMC</u></a> project for our Baseboard Management Controller (BMC).</p><p>Below are examples of management functions carried out on a server through the BMC. The interactions in the examples are done over <a href="https://github.com/ipmitool/ipmitool/wiki"><u>ipmitool</u></a>, a command line utility for interacting with systems that support IPMI.</p>
            <pre><code># Check the sensor readings of a server remotely (i.e. over a network)
$  ipmitool &lt;some authentication&gt; &lt;bmc ip&gt; sdr
PSU0_CURRENT_IN  | 0.47 Amps         | ok
PSU0_CURRENT_OUT | 6 Amps            | ok
PSU0_FAN_0       | 6962 RPM          | ok
SYS_FAN          | 13034 RPM         | ok
SYS_FAN1         | 11172 RPM         | ok
SYS_FAN2         | 11760 RPM         | ok
CPU_CORE_VR_POUT | 9.03 Watts        | ok
CPU_POWER        | 76.95 Watts       | ok
CPU_SOC_VR_POUT  | 12.98 Watts       | ok
DIMM_1_VR_POUT   | 29.03 Watts       | ok
DIMM_2_VR_POUT   | 27.97 Watts       | ok
CPU_CORE_MOSFET  | 40 degrees C      | ok
CPU_TEMP         | 50 degrees C      | ok
DIMM_MOSFET_1    | 36 degrees C      | ok
DIMM_MOSFET_2    | 39 degrees C      | ok
DIMM_TEMP_A1     | 34 degrees C      | ok
DIMM_TEMP_B1     | 33 degrees C      | ok

…

# check the power status of a server remotely (i.e. over a network)
ipmitool &lt;some authentication&gt; &lt;bmc ip&gt; power status
Chassis Power is off

# power on the server
ipmitool &lt;some authentication&gt; &lt;bmc ip&gt; power on
Chassis Power Control: On</code></pre>
            <p>Switching to OpenBMC firmware for our BMCs gives us more control over the software that powers our infrastructure. This has given us more flexibility, customizations, and an overall better uniform experience for managing our servers. Since OpenBMC is open source, we also leverage community fixes while upstreaming some of our own. Some of the advantages we have experienced with OpenBMC include a faster turnaround time to fixing issues, <a href="https://blog.cloudflare.com/de-de/thermal-design-supporting-gen-12-hardware-cool-efficient-and-reliable/"><u>optimizations around thermal cooling</u></a>, <a href="https://blog.cloudflare.com/gen-12-servers/"><u>increased power efficiency</u></a> and <a href="https://blog.cloudflare.com/how-we-used-openbmc-to-support-ai-inference-on-gpus-around-the-world/"><u>supporting AI inference</u></a>.</p><p>While developing Cloudflare’s OpenBMC firmware, however, we ran into a number of boot problems.</p><p><b><i>Host not booting:</i></b> When we send a request over IPMI for a host to power on (as in the example above, power on the server), ipmitool would indicate the power status of the host as ON, but we would not see any power going into the CPU nor any activity on the CPU. While ipmitool was correct about the power going into the chassis as ON, we had no information about the power state of the server from ipmitool, and we initially falsely assumed that since the chassis power was on, the rest of the server components should be ON. The <a href="https://documents.uow.edu.au/~blane/netapp/ontap/sysadmin/monitoring/concept/c_oc_mntr_bmc-sys-event-log.html"><u>System Event Log (SEL)</u></a>, which is responsible for displaying platform-specific events, was not giving us any useful information beyond indicating that the server was in a soft-off state (powered off), working state (operating system is loading and running), or that a “System Restart” of the host was initiated.</p>
            <pre><code># System Event Logs (SEL) showing the various power states of the server
$ ipmitool sel elist | tail -n3
  4d |  Pre-Init  |0000011021| System ACPI Power State ACPI_STATUS | S5_G2: soft-off | Asserted
  4e |  Pre-Init  |0000011022| System ACPI Power State ACPI_STATUS | S0_G0: working | Asserted
  4f |  Pre-Init  |0000011023| System Boot Initiated RESTART_CAUSE | System Restart | Asserted</code></pre>
            <p>In the System Event Logs shown above, ACPI is the acronym for Advanced Configuration and Power Interface, a standard for power management on computing systems. In the ACPI soft-off state, the host is powered off (the motherboard is on standby power but CPU/host isn’t powered on); according to the <a href="https://uefi.org/sites/default/files/resources/ACPI_Spec_6_5_Aug29.pdf"><u>ACPI specifications</u></a>, this state is called S5_G2. (These states are discussed in more detail below.) In the ACPI working state, the host is booted and in a working state, also known in the ACPI specifications as status S0_G0 (which in our case happened to be false), and the third row indicates the cause of the restart was due to a System Restart. Most of the boot-related SEL events are sent from the UEFI to the BMC. The UEFI has been something of a black box to us, as we rely on our original equipment manufacturers (OEMs) to develop the UEFI firmware for us, and for the generation of servers with this issue, the UEFI firmware did not implement sending the boot progress of the host to the BMC.</p><p>One discrepancy we observed was the difference in the power status and the power going into the CPU, which we read with a sensor we call CPU_POWER.</p>
            <pre><code># Check power status
$ ipmitool &lt;some authentication&gt; &lt;bmc ip&gt;  power status
Chassis Power is on
</code></pre>
            <p>However, checking the power into the CPU shows that the CPU was not receiving any power.</p>
            <pre><code># Check power going into the CPU
$ ipmitool &lt;some authentication&gt; &lt;bmc ip&gt;  sdr | grep CPU_POWER    
CPU_POWER        | 0 Watts           | ok</code></pre>
            <p>The CPU_POWER being at 0 watts contradicts all the previous information that the host was powered up and working, when the host was actually completely shut down.</p><p><b><i>Missing Memory Modules:</i></b> Our servers would randomly boot up with less memory than expected. Computers can boot up with less memory than installed due to a number of problems, such as a loose connection, hardware problem, or faulty memory. For our case, it happened not to be any of the usual suspects, but instead was due to both the BMC and UEFI trying to simultaneously read from the memory modules, leading to access contentions. Memory modules usually contain a <a href="https://en.wikipedia.org/wiki/Serial_presence_detect"><u>Serial Presence Detect (SPD)</u></a>, which is used by the UEFI to dynamically detect the memory module. This SPD is usually located on an <a href="https://learn.sparkfun.com/tutorials/i2c/all"><u>inter-integrated circuit (i2c)</u></a>, which is a low speed, two write protocol for devices to talk to each other. The BMC also reads the temperature of the memory modules via the i2c. When the server is powered on, amongst other hardware initializations, the UEFI also initializes the memory modules that it can detect via their (i.e. each individual memory modules) Serial Presence Detect (SPD), the BMC could also be trying to access the temperature of the memory module at the same time, over the same i2c protocol. This simultaneous attempted read denies one of the parties access. When the UEFI is denied access to the SPD, it thinks the memory module is not available and skips over it. Below is an example of the related i2c-bus contention logs we saw in the <a href="https://www.freedesktop.org/software/systemd/man/latest/journalctl.html"><u>journal</u></a> of the BMC when the host is booting.</p>
            <pre><code>kernel: aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq. expected 0x00000021, but was 0x00000020</code></pre>
            <p>The above logs indicate that the i2c address 1e78a300 (which happens to be connected to the serial presence detect of the memory modules) could not properly handle a signal, known as an interrupt request (irq). When this scenario plays out on the UEFI, the UEFI is unable to detect the memory module.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Fe8wb6xqwXkanb8iPv8O2/eaecfe0474576a00cdc25bfeb6fba7a2/image4.png" />
          </figure><p><sup><i>Figure 3: I2C diagram showing I2C interconnection of the server’s memory modules (also known as DIMMs) with the BMC </i></sup></p><p><a href="https://www.techtarget.com/searchstorage/definition/DIMM"><u>DIMM</u></a> in Figure 3 refers to <a href="https://www.techtarget.com/searchstorage/definition/DIMM"><u>Dual Inline Memory Module</u></a>, which is the type of memory module used in servers.</p><p><b><i>Thermal telemetry:</i></b> During the boot-up process of some of our servers, some temperature devices, such as the temperature sensors of the memory modules, would show up as failed, thereby causing some of the fans to enter a fail-safe <a href="https://en.wikipedia.org/wiki/Pulse-width_modulation"><u>Pulse Width Modulation (PWM)</u></a> mode. <a href="https://en.wikipedia.org/wiki/Pulse-width_modulation"><u>PWM</u></a> is a technique to encode information delivered to electronic devices by adjusting the frequency of the waveform signal to the device. It is used in this case to control fan speed by adjusting the frequency of the power signal delivered to the fan. When a fan enters a fail-safe mode, PWM is used to set the fan speeds to a preset value, irrespective of what the optimized PWM setting of the fans should be, and this could negatively affect the cooling of the server and power consumption.</p>
    <div>
      <h2>Implementing host ACPI state on OpenBMC</h2>
      <a href="#implementing-host-acpi-state-on-openbmc">
        
      </a>
    </div>
    <p>In the process of studying the issues we faced relating to the boot-up process of the host, we learned how the power state of the subsystems within the chassis changes. Part of our learnings led us to investigate the Advanced Configuration and Power Interface (ACPI) and how the ACPI state of the host changed during the boot process.</p><p>Advanced Configuration and Power Interface (ACPI) is an open industry specification for power management used in desktop, mobile, workstation, and server systems. The <a href="https://uefi.org/sites/default/files/resources/ACPI_Spec_6_5_Aug29.pdf"><u>ACPI Specification</u></a> replaces previous power management methodologies such as <a href="https://en.wikipedia.org/wiki/Advanced_Power_Management"><u>Advanced Power Management (APM)</u></a>. ACPI provides the advantages of:</p><ul><li><p>Allowing OS-directed power management (OSPM).</p></li><li><p>Having a standardized and robust interface for power management.</p></li><li><p>Sending system-level events such as when the server power/sleep buttons are pressed </p></li><li><p>Hardware and software support, such as a real-time clock (RTC) to schedule the server to wake up from sleep or to reduce the functionality of the CPU based on RTC ticks when there is a loss of power.</p></li></ul><p>From the perspective of power management, ACPI enables an OS-driven conservation of energy by transitioning components which are not in active use to a lower power state, thereby reducing power consumption and contributing to more efficient power management.</p><p>The ACPI Specification defines four global “Gx” states, six sleeping “Sx” states, and four “Dx” device power states. These states are defined as follows:</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>Gx</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Name</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Sx</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Description</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>G0</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Working</span></span></p>
                    </td>
                    <td>
                        <p><span><span>S0</span></span></p>
                    </td>
                    <td>
                        <p><span><span>The run state. In this state the machine is fully running</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>G1</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Sleeping</span></span></p>
                    </td>
                    <td>
                        <p><span><span>S1</span></span></p>
                    </td>
                    <td>
                        <p><span><span>A sleep state where the CPU will suspend activity but retain its contexts.</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>S2</span></span></p>
                    </td>
                    <td>
                        <p><span><span>A sleep state where memory contexts are held, but CPU contexts are lost. CPU re-initialization is done by firmware.</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>S3</span></span></p>
                    </td>
                    <td>
                        <p><span><span>A logically deeper sleep state than S2 where CPU re-initialization is done by device. Equates to Suspend to RAM.</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>S4</span></span></p>
                    </td>
                    <td>
                        <p><span><span>A logically deeper sleep state than S3 in which DRAM is context is not maintained and contexts are saved to disk. Can be implemented by either OS or firmware. </span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>G2</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Soft off but PSU still supplies power</span></span></p>
                    </td>
                    <td>
                        <p><span><span>S5</span></span></p>
                    </td>
                    <td>
                        <p><span><span>The soft off state. All activity will stop, and all contexts are lost. The Complex Programmable Logic Device (CPLD) responsible for power-up and power-down sequences of various components e.g. CPU, BMC is on standby power, but the CPU/host is off.</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>G3</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Mechanical off</span></span></p>
                    </td>
                    <td> </td>
                    <td>
                        <p><span><span>PSU does not supply power. The system is safe for disassembly.</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Dx</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Name</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Description</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>D0</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Fully powered on</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Hardware device is fully functional and operational </span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>D1</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Hardware device is partially powered down</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Reduced functionality and can be quickly powered back to D0</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>D2</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Hardware device is in a deeper lower power than D1</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Much more limited functionality and can only be slowly powered back to D0.</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>D3</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Hardware device is significantly powered down or off</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Device is inactive with perhaps only the ability to be powered back on</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>The states that matter to us are:</p><ul><li><p><b>S0_G0_D0:</b> often referred to as the working state. Here we know our host system is running just fine.</p></li><li><p><b>S2_D2: </b>Memory contexts are held, but CPU context is lost. We usually use this state to know when the host’s UEFI is performing platform firmware initialization.</p></li><li><p><b>S5_G2:</b> Often referred to as the soft off state. Here we still have power going into the chassis, however, processor and DRAM context are not maintained, and the operating system power management of the host has no context.</p></li></ul><p>Since the issues we were experiencing were related to the power state changes of the host — when we asked the host to reboot or power on — we needed a way to track the various power state changes of the host as it went from power off to a complete working state. This would give us better management capabilities over the devices that were on the same power domain of the host during the boot process. Fortunately, the OpenBMC community already implemented an <a href="https://github.com/openbmc/google-misc/tree/master/subprojects/acpi-power-state-daemon"><u>ACPI daemon</u></a>, which we extended to serve our needs. We added an ACPI S2_D2 power state, in which memory contexts are held, but CPU context is lost, to the ACPI daemon running on the BMC to enable us to know when the host’s UEFI is performing firmware initialization, and also set up various management tasks for the different ACPI power states.</p><p>An example of a power management task we carry out using the S0_G0_D0 state is to re-export our Voltage Regulator (VR) sensors on S0_G0_D0 state, as shown with the service file below:</p>
            <pre><code>cat /lib/systemd/system/Re-export-VR-device.service 
[Unit]
Description=RE Export VR Device Process
Wants=xyz.openbmc_project.EntityManager.service
After=xyz.openbmc_project.EntityManager.service
Conflicts=host-s2-state.target

[Service]
Type=simple
ExecStart=/bin/bash -c 'set -a &amp;&amp; source /usr/bin/Re-export-VR-device.sh on'
SyslogIdentifier=Re-export-VR-device.service

[Install]
WantedBy=host-s0-state.target
</code></pre>
            <p>Having set this up, OpenBMC has a Net Function (ipmiSetACPIState) in <a href="https://github.com/openbmc/phosphor-host-ipmid/tree/master"><u>phosphor-host-ipmid</u></a> that is responsible for setting the ACPIState of the host on the BMC. This command is called by the host using the standard ipmi command with the corresponding NetFn=0x06 and Cmd=0x06.</p><p>In the event of an immediate power cycle (i.e. host reboots without operating system shutdown), the host is unable to send its S5_G2 state to the BMC. For this case, we created a patch to OpenBMC’s <a href="https://github.com/openbmc/x86-power-control/tree/master"><u>x86-power-control</u></a> to let the BMC become aware that the host has entered the ACPI S5_G2 state (i.e. soft-off). When the host comes out of the power off state, the UEFI performs the Power On Self Test (POST) and sends the S2_D2 to the BMC, and after the UEFI has loaded the OS on the host, it notifies the BMC by sending the ACPI S0_G0_D0 state.</p>
    <div>
      <h2>Fixing the issues</h2>
      <a href="#fixing-the-issues">
        
      </a>
    </div>
    <p>Going back to the boot-up issues we faced, we discovered that they were mostly caused by devices which were in the same power domain of the CPU, interfering with the UEFI/platform firmware initialization phase. Below is a high level description of the fixes we applied.</p><p><b><i>Servers not booting</i></b><b>:</b> After identifying the devices that were interfering with the POST stage of the firmware initialization, we used the host ACPI state to control when we set the appropriate power mode state for those devices so as not to cause POST to fail.</p><p><b><i>Memory modules missing</i></b><b>:</b> During the boot-up process, memory modules (DIMMs) are powered and initialized in S2_D2 ACPI state. During this initialization process, UEFI firmware sends read commands to the Serial Presence Detect (SPD) on the DIMM to retrieve information for DIMM enumeration. At the same time, the BMC could be sending commands to read DIMM temperature sensors. This can cause SMBUS collisions, which could either cause DIMM temperature reading to fail or UEFI DIMM enumeration to fail. The latter case would cause the system to boot up with reduced DIMM capacity, which could be mistaken as a failing DIMM scenario. After we had discovered the race condition issue, we disabled the BMC from reading the DIMM temperature sensors during S2_D2 ACPI state and set a fixed speed for the corresponding fans. This solution allows our UEFI to retrieve all the necessary DIMM subsystems information for enumeration, and our servers now boot up with the correct size of memory.</p><p><b>Thermal telemetry:</b> In S0_G0 power state, when sensors are not reporting values back to the BMC, the BMC assumes that devices may be overheating and puts the fan controller into fail-safe mode where fan speeds are ramped up to maximum speed. However, in S5_G2 state, some thermal sensors like CPU temperature, NIC temperature, etc. are not powered and not available. Our solution is to set these thermal sensors as non-functional in their exported configuration when in S5_G2 state and during the transition from S5_G2 state to S2_D2 state. Setting the affected devices as non-functional in their configuration, instead of waiting for thermal sensor read commands to error out, prevents the controller from entering the fail-safe mode.</p>
    <div>
      <h2>Moving forward</h2>
      <a href="#moving-forward">
        
      </a>
    </div>
    <p>Aside from resolving issues, we have seen other benefits from implementing ACPI Power State on our BMC firmware. An example is in the area of our automated firmware regression testing. Various parts of our tests require rebooting/power cycling the servers over a hundred times, during which we monitor the ACPI power state changes of our servers as against using a boolean (running or not running, pingable or not pingable) to assert the status of our servers.</p><p>Also, it has given us the opportunity to learn more about the complex subsystems in a server system, and the various power modes of the different subsystems. This is an aspect that we are still actively learning about as we look to further optimize various aspects of the boot sequence of our servers.</p><p>In the course of time, implementing ACPI states is helping us achieve the following:</p><ul><li><p>All components are enabled by end of boot sequence,</p></li><li><p>BIOS and BMC are able to retrieve component information,</p></li><li><p>And the BMC is aware when thermal sensors are in a non-functional state.
</p></li></ul><p>For better observability of the boot progress and “last state” of our systems, we have also started the process of adding the BootProgress object of the <a href="https://redfish.dmtf.org/schemas/v1/ComputerSystem.v1_13_0.json"><u>Redfish ComputerSystem Schema</u></a> into our systems. This will give us an opportunity for pre-operating system (OS) boot observability and an easier debug starting point when the UEFI has issues (such as when the server isn’t coming on) during the server platform initialization.</p><p>With each passing day, Cloudflare’s OpenBMC team, which is made up of folks from different embedded backgrounds, learns about, experiments with, and deploys OpenBMC across our global fleet. This has been made possible by relying on the OpenBMC community’s contribution (as well as upstreaming some of our own contributions), and our interaction with our various vendors, thereby giving us the opportunity to make our systems more reliable, and giving us the ownership and responsibility of the firmware that powers the BMCs that manage our servers. If you are thinking of embracing open-source firmware in your BMC, we hope this blog post written by a team which started deploying OpenBMC less than 18 months ago has inspired you to give it a try. </p><p>For those who are interested in considering making the jump to open-source firmware, check it out <a href="https://github.com/openbmc/openbmc"><u>here</u></a>!</p> ]]></content:encoded>
            <category><![CDATA[Infrastructure]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[OpenBMC]]></category>
            <category><![CDATA[Servers]]></category>
            <category><![CDATA[Firmware]]></category>
            <guid isPermaLink="false">2hySj1JFTXmlofjA6IRijm</guid>
            <dc:creator>Nnamdi Ajah</dc:creator>
            <dc:creator>Ryan Chow</dc:creator>
            <dc:creator>Giovanni Pereira Zantedeschi</dc:creator>
        </item>
        <item>
            <title><![CDATA[The effect of switching to TCMalloc on RocksDB memory use]]></title>
            <link>https://blog.cloudflare.com/the-effect-of-switching-to-tcmalloc-on-rocksdb-memory-use/</link>
            <pubDate>Wed, 03 Feb 2021 12:00:00 GMT</pubDate>
            <description><![CDATA[ Memory allocator is an important part of the system, so choosing the right allocator for a workload can give huge benefits. Here is a story of how we decreased service memory usage by almost three times. ]]></description>
            <content:encoded><![CDATA[ <p>In previous posts we wrote about our configuration distribution system <a href="/introducing-quicksilver-configuration-distribution-at-internet-scale/">Quicksilver</a> and the story of <a href="/moving-quicksilver-into-production/">migrating its storage engine to RocksDB</a>. This solution proved to be fast, resilient and stable. During the migration process, we noticed that <a href="/tag/quicksilver/">Quicksilver</a> memory consumption was unexpectedly high. After our investigation we found out that the root cause was a default memory allocator that we used. Switching memory allocator improved service memory consumption by almost three times.</p>
    <div>
      <h3>Unexpected memory growth</h3>
      <a href="#unexpected-memory-growth">
        
      </a>
    </div>
    <p>After migrating to RocksDB, the memory used by the application increased significantly. Also, the way memory was growing over time looked suspicious. It was around 15GB immediately after start and then was steadily growing for multiple days, until stabilizing at around 30GB.  Below, you can see a memory consumption increase after migrating one of our test instances to RocksDB.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/48oOnjn1zTcqYyKtQBeW0b/563b4460a530049d6879373a37ccfa11/image5-1.png" />
            
            </figure><p>We started our investigation with heap profiling with the assumption that we had a memory leak somewhere and found that heap size was almost three times less than the RSS value reported by the operating system. So, if our application does not actually use all this memory, it means that memory is ‘lost’ somewhere between the system and our application, which points to possible problems with the memory allocator.</p><p>We have multiple services running with the tcmalloc allocator, so in order to test our hypothesis, we ran a test with TCMalloc on a couple of instances. The test showed significant improvement in memory usage. So why did this happen? We’ll dig into memory allocator internals to understand the issue.</p>
    <div>
      <h3>glibc malloc</h3>
      <a href="#glibc-malloc">
        
      </a>
    </div>
    <p>Let’s begin with a high level view of glibc’s malloc design. malloc uses a concept called an <code>arena</code>. An arena is a contiguous block of memory obtained from the system. An important part of glibc malloc design is that it expects developers to free memory in a reverse order of allocation, otherwise a lot of memory will be ‘locked’, and never returned to the system. Let’s see what it means on practise:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3qbJXbtLNpoYVJltXOwZ5l/db3a7af30b7530c4d306aab178054dc3/image3-2.png" />
            
            </figure><p>In the picture, you can see an arena, from which we allocated three chunks of memory: 100kb, 40kb, 1kb. Next, the application frees the chunks with sizes of 40kb and 100kb:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3lc8IZCllTHUWs8tt5yYKY/6216c5ccd1210e6c9c4e0f6f565e9761/image7.png" />
            
            </figure><p>Before we go further, let me explain the terminology I use here and what each type of memory means:</p><ul><li><p>Free - this is virtual memory of a process, not backed by physical memory, and corresponds to the VIRT parameter of the top/ps command.</p></li><li><p>Used - memory used by the application, backed by physical memory, contributes to the RES parameter of the top/ps command.</p></li><li><p>Available - memory held by the allocator, backed by physical memory. The allocator can either return this memory to the OS, and it would become ‘Free’ or later reuse it to satisfy application requests. From a system perspective, this memory is still held by the application. Available + Used = RES.</p></li></ul><p>So we see that memory which was used by the application changed state to Available, and it’s not returned to the operating system. This is because malloc can only return memory from the top of the heap, and in the case above we have a chunk of memory that blocks 140kb from being released back to the system. As soon as we release this 1kb object, all memory can be returned to the system.</p><p>Let’s go further with our simple example, if our application allocates/frees memory without keeping malloc’s design in mind, after a while we will see roughly following picture:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/29qGxcTcszscPid2itjZdv/088280e8b87b336723239a315a425be5/image6.png" />
            
            </figure><p>Here we see one of the main problems that all allocators try to solve: memory fragmentation. We have some chunks used by the application, but a lot of the memory is not used at the moment. And since it’s not returned to the system, other services can’t use this memory either. Malloc implements several mechanisms to decrease memory fragmentation, but it’s a problem that all allocators have, and how bad this problem is depends on a lot of factors: allocator design, workload, settings, etc.</p><p>OK, so the problem is clear, memory fragmentation, but why did it lead to such high memory usage? To understand that, let’s take a step back and consider how malloc works for highly concurrent multithreaded applications.</p><p>To allocate a chunk of memory from an arena, a thread should acquire an exclusive lock for that arena. When an application has multiple threads this would create lock contention and poor performance for multithreaded services. To handle this situation malloc creates several arenas, using the following logic:</p><ul><li><p>A thread tries to get a chunk of memory from an arena it used last time, in order to do that it acquires an exclusive lock for the arena</p></li><li><p>If the lock is held by another thread, it tries the next arena</p></li><li><p>If all arenas were locked it creates a new arena and uses memory from it</p></li><li><p>There is a limit on the number of arenas - eight arenas per core</p></li></ul><p>Normally, our service has around 25 threads, and we have seen 60-80 arenas allocated by malloc using the logic above.</p><p>And this is a place where the fragmentation problem magnifies and leads to huge memory waste. All arenas are independent of each other and memory can never move from one arena to another. Why is that bad? Let’s take a look at the following example:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5jS0cCTFnudbPUHycTaXjB/18d99cf95cfdf8045a207f40292fc2dd/image4.png" />
            
            </figure><p>Here, we can see that Thread 1 requests 20kb of memory from Arena 1; as I’ve written before, malloc tries to allocate memory from the same arena it’s used before. Since Arena 1 still has enough free memory, Thread 1 will get a block from it, which at the end will increase memory that the process takes from the system. Ideally, in this scenario, we would prefer to get this block of memory from Arena 2, since it has a chunk of that size available. However, due to the design this won’t happen.</p><p>The main point here: having multiple independent arenas improves the performance of multithreaded applications, by reducing lock contention, but the trade-off is that it increases memory fragmentation, since each memory request chooses the best fit fragment from an individual arena and not the best fit fragment overall.</p><p>Remember, I wrote that memory locked between used chunks can never be returned to the system? Actually, there is a way to do that, ‘malloc_trim’ is a function provided by glibc malloc, and it does exactly that. It goes through all the unused chunks and returns them to the system. The problem is that you need to explicitly call this function from your application. You might say: “Oh, wait, I remember that this function is sometimes called when you call the free function, I saw it in the man page.” No, that never happens, it’s a bug in the man page that has existed for more than 15 years, which is now finally <a href="https://lore.kernel.org/linux-man/CAB6khqWO_meFaNn+cTtaKBDg8Zus-o6HD49Bo3KChk-5GkdFng@mail.gmail.com/T/#u">fixed</a>!</p><p>Let’s now discuss what options we have to improve the memory consumption of glibc malloc. Here are a couple of useful strategies to try out:</p><ul><li><p>The first thing you would find on the Internet is to reduce MALLOC_ARENA_MAX to a lower value, usually 2. This setting limits the number of arenas malloc would create per core. The fewer arenas we have the better the memory reuse, hence lower fragmentation, but at the same time it would increase lock contention.</p></li><li><p>Calling malloc_trim from time to time. This function goes through all arenas one at a time, it locks the arena and releases all locked chunks back to the system. This at the end increases lock contention and will execute a lot of syscalls to return memory and later would lead to more page faults and again worse performance.</p></li><li><p>M_MMAP_THRESHOLD. All allocations higher than this parameter would use the mmap syscall, and would not take memory from the arena directly. That means that memory allocated with this approach would never be locked between used chunks of memory and can always be returned to the system. It solves the fragmentation problem for large chunks, so only small chunks would be locked. The trade-off here is that each such allocation would execute an expensive syscall. And there is a system limit that caps the maximum number of chunks allocated with mmap.</p></li></ul><p>Short summary: multiple arenas cause higher memory fragmentation that can lead to 2-3x higher memory consumption.</p>
    <div>
      <h3>TCMalloc</h3>
      <a href="#tcmalloc">
        
      </a>
    </div>
    <p>While glibc malloc was designed for single-threaded applications and later optimized for multithreaded services, TCMalloc was built for multithreading at the beginning. Let’s take a look at how it tries to solve the problems we just talked about. The TCMalloc design is more complex, so if you want to understand the details I recommend reading the official design <a href="https://google.github.io/tcmalloc/design.html">page</a>. Here is a high level view of its design:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2rECXU30tUMO67GFrvNkAl/e14a5671d673d7814bfeee6aabe20d10/image1-1.png" />
            
            </figure><p>Here we can see 3 main parts of TCMalloc design:</p><ul><li><p>Back-end: allocates big chunks of memory from the system, returns these chunks back to the operating system when they are not needed and also serves big allocation requests.</p></li><li><p>Front-end: serves allocation requests, there is one cache per core.</p></li><li><p>Middle-end: this is a core part of the TCMalloc design, which helps to significantly reduce fragmentation for multithreaded applications. It populates caches and returns unused memory to the back-end, but most importantly it can move memory from one cache to another, dramatically improving memory reuse.</p></li></ul><p>Let's look how it works on the example that we showed for malloc:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/pKqecbbqmOGXr6ze3lcTj/89b5c985510fbbeee702645443034ce6/image2-3.png" />
            
            </figure><p>Here we see the following:</p><ol><li><p>Cache 2 has a chunk of memory that it doesn’t need, so it returns it to the middle-end</p></li><li><p>Thread 1 requests 20kb of memory from cache 1</p></li><li><p>Cache 1 doesn’t have a chunk of memory of that size, so it requests this memory from middle-end, where it can reuse memory from cache 2</p></li></ol><p>This design dramatically improves memory reuse. If memory was freed by one thread it can be moved to the middle-end and later reused by other threads.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>The main goal of this post is to make people aware of the importance of the choice of memory allocator. After deploying TCMalloc, we decreased memory usage by 2.5 times.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/57QSNKEwjnx9XL1rSiQIhN/10445087ef19eb588637bbc8010268b4/image8.png" />
            
            </figure><p>Usage of an allocator which is not optimal for a workload can cause a huge waste of memory. If you have a long-running application with a lot of threads and care about memory usage then glibc malloc is probably not your choice. Allocators that are designed for multithreaded services, like TCMalloc, jemalloc and others can provide much better memory utilization. So be conscious of this factor and go and check how much memory your application wastes.</p> ]]></content:encoded>
            <category><![CDATA[Servers]]></category>
            <guid isPermaLink="false">5iICvhHG1pdCmXMclo2H5z</guid>
            <dc:creator>Dmitry Vorobev</dc:creator>
        </item>
    </channel>
</rss>