Subscribe to receive notifications of new posts:

We rebuilt Cloudflare's developer documentation - here's what we learned

05/27/2022

9 min read

We recently updated developers.cloudflare.com, the Cloudflare Developers documentation website, to a new version of our custom documentation engine. This change consisted of a significant migration from Gatsby to Hugo and converged a collection of Workers Sites into a single Cloudflare Pages instance. Together, these updates brought developer experience, performance, and quality of life improvements for our engineers, technical writers, and product managers.

In this blog post, we’ll cover the history of Cloudflare’s developer docs, why we made this recent transition, and why we continue to dogfood Cloudflare’s products as we develop applications internally.

What are Cloudflare’s Developer Docs?

Cloudflare’s Developer Docs, which are open source on GitHub, comprise documentation for all of Cloudflare’s products. The documentation is written by technical writers, product managers, and engineers at Cloudflare. Like many open source projects, contributions to the docs happen via Pull Requests (PRs). At time of writing, we have 1,600 documentation pages and have accepted almost 4,000 PRs, both from Cloudflare employees and external contributors in our community.

The underlying documentation engine we’ve used to build these docs has changed multiple times over the years. Documentation sites are often built with static site generators and, at Cloudflare, we’ve used tools like Hugo and Gatsby to convert thousands of Markdown pages into HTML, CSS, and JavaScript.

When we released the first version of our Docs Engine in mid-2020, we were excited about the facelift to our Developer Documentation site and the inclusion of dev-friendly features like dark mode and proper code syntax highlighting.

Cloudflare’s Developer Documentation Engine using 1.0 with its updated design.

Most importantly, we also used this engine to transition all of Cloudflare’s products with documentation onto a single engine. This allowed all Cloudflare product documentation to be developed, built, and deployed using the same core foundation. But over the next eighteen months and thousands of PRs, we realized that many of the architecture decisions we had made were not scaling.

While the user interface that we had made for navigating the documentation continued to receive great feedback from our users and product teams, decisions like using client-side rendering for docs had performance implications, especially on resource-constrained devices.

At the time, our decision to dogfood Workers Sites — which served as a precursor to Cloudflare Pages — meant that we could rapidly deploy our documentation across all of Cloudflare’s network in a matter of minutes. We implemented this by creating a separate Cloudflare Workers deployment for each product’s staging and production instances. Effectively, this meant that more than a hundred Workers were regularly updated, which caused significant headaches when trying to understand the causes and downstream effects of any failed deployments.

Finally, we struggled with our choice of underlying static site generator, Gatsby. We still think Gatsby is a great tool of choice for certain websites and applications, but we quickly found it to be the wrong match for our content-heavy documentation experience. Gatsby inherits many dependency chains to provide its featureset, but running the dependency-heavy toolchain locally on contributors’ machines proved to be an incredibly difficult and slow task for many of our documentation contributors.

When we did get to the point of deploying new docs changes, we began to be at the mercy of Gatsby’s long build times – in the worst case, almost an entire hour – just to compile Markdown and images into HTML. This negatively impacted our team’s ability to work quickly and efficiently as they improved our documentation. Ultimately, we were unable to find solutions to many of these issues, as they were core assumptions and characteristics of the underlying tools that we had chosen to build on — it was time for something new.

Built using Go, Hugo is incredibly fast at building large sites, has an active community, and is easily installable on a variety of operating systems. In our early discovery work, we found that Hugo would build our docs content in mere seconds. Since performance was a core motive for pursuing a rewrite, this was a significant factor in our decision.

How we migrated

When comparing frameworks, the most significant difference between Hugo and Gatsby – from a user’s standpoint – is the allowable contents of the Markdown files themselves. For example, Gatsby makes heavy use of MDX, allowing developers to author and import React components within their content pages. While this can be effective, MDX unfortunately is not CommonMark-compliant and, in turn, this means that its flavor of Markdown is required to be very flexible and permissive. This posed a problem when migrating to any other non-MDX-based solution, including Hugo, as these frameworks don’t grant the same flexibilities with Markdown syntax. Because of this, the largest technical challenge was converting the existing 1,600 markdown pages from MDX to a stricter, more standard Markdown variant that Hugo (or almost any framework) can interpret.

Not only did we have to convert 1,600 Markdown pages so that they’re rendered correctly by the new framework, we had to make these changes in a way that minimized the number of merge conflicts for when the migration itself was ready for deployment. There was a lot of work to be done as part of this migration – and work takes time! We could not stall or block the update cycles of the Developer Documentation repository, so we had to find a way to rename or modify every single file in the repository without gridlocking deployments for weeks.

The only way to solve this was through automation. We created a migration script that would apply all the necessary changes on the morning of the migration release day. Of course, this meant that we had to identify and apply the changes manually and then record that in JavaScript or Bash commands to make sweeping changes for the entire project.

For example, when migrating Markdown content, the migrator needs to take the file contents and parse them into an abstract syntax tree (AST) so that other functions can access, traverse, and modify a collection of standardized objects representing the content instead of resorting to a sequence string manipulations… which is scary and error-prone.

Since the project started with MDX, we needed a MDX-compatible parser which, in turn, produces its own AST with its own object standardizations. From there, one can “walk” – aka traverse – through the AST and add, remove, and/or edit objects and object properties. With the updated AST and a final traversal, a “stringifier” function can convert each object representation back to its string representation, producing updated file contents that differ from the original.

Below is an example snippet that utilizes mdast-util-from-markdown and mdast-util-to-markdown to create and stringify, respectively, the MDX AST and astray to traverse the AST with our custom modifications. For this example, we’re looking for heading and anchor nodes – both names are provided by the mdast-* utilities – so that we can read the primary header (<h1>) text and ensure that all internal Developer Documentation links are consistent:

import * as fs from 'fs';
import * as astray from 'astray';
import { toMarkdown } from 'mdast-util-to-markdown';
import { fromMarkdown } from 'mdast-util-from-markdown';

/**
 * @param {string} file The "*.md" file path.
 */
export async function modify(file) {
  let content = await fs.promises.read(file, 'utf8');
  let AST = fromMarkdown(content);
  let title = '';

  astray.walk(AST, {
    /**
     * Read the page's <h1> to determine page's title.
     */
    heading(node) {
      // ignore if not <h1> header
      if (node.depth !== 1) return;

      astray.walk(node, {
        text(t: MDAST.Text) {
          // Grab the text value of the H1
          title += t.value;
        },
      });

      return astray.SKIP;
    },
    
    /**
     * Update all anchor links (<a>) for consistent internal linking.
     */
    link(node) {
      let value = node.url;
      
      // Ignore section header links (same page)
      if (value.startsWith('#')) return;

      if (/^(https?:)?\/\//.test(value)) {
        let tmp = new URL(value);
        // Rewrite our own "https://developers.cloudflare.com" links
        // so that they are absolute, path-based links instead.
        if (tmp.origin === 'https://developers.cloudflare.com') {
          value = tmp.pathname + tmp.search + tmp.hash;
        }
      }
      
      // ... other normalization logic ...
      
      // Update the link's `href` value
      node.url = value;
    }
  });
  
  // Now the AST has been modified in place.
  // AKA, the same `AST` variable is (or may be) different than before.
  
  // Convert the AST back to a final string.
  let updated = toMarkdown(AST);
  
  // Write the updated markdown file
  await fs.promises.writeFile(file, updated);
}

https://gist.github.com/lukeed/d63a4561ce9859765d8f0e518b941642#file-cfblog-devdocs-0-js

The above is an abbreviated snippet of the modifications we needed to make during our migration. You may find all the AST traversals and manipulations we created as part of our migration on GitHub.

We also took this opportunity to analyze the thousands and thousands of code snippets we have throughout the codebase. These serve an important role as they are crucial aides in reference documentation or are presented alongside tutorials as recipes or examples. So we added a code formatter script that utilizes Prettier to apply a consistent code style across all code snippets. As a bonus side effect, Prettier would throw errors if any snippets had invalid syntax for their given language. Any of these were fixed manually and the `format` script has been added as part of our own CI process to ensure that all JavaScript, TypeScript, Rust, JSON, and/or C++ code we publish is syntactically valid!

Finally, we created a Makefile that coordinated the series of Node scripts and git commands we needed to make. This orchestrated the entire migration, boiling down all our work into a single make run command.

In effect, the majority of the migration Pull Request was the result of automated commits – over one million changes were applied across nearly 5,000 files in less than two minutes. With the help of product owners, we reviewed the newly generated documentation site and applied any fine-tuning adjustments where necessary.

Previously, with the Gatsby-based Workers Sites architecture, each Cloudflare product needed to be built and deployed as its own individual documentation site. These sites would then be managed and proxied by an umbrella Worker, listening on developers.cloudflare.com, which ensured that all requests were handled by the appropriate product-specific Worker Site. This worked well for our production needs, but made it complicated for contributors to replicate a similar setup during local development. With the move to Hugo, we were able to merge everything into a single project – in other words, 48 moving pieces became 1 single piece! This made it extremely easy to build and develop the entire Developer Docs locally, which is a big confidence booster when working.

A unified Hugo project also means that there’s only one build command and one deployable unit… This allowed us to move the Developer Docs to Cloudflare Pages! With Pages attached and configured for the GitHub repository, we immediately began making use of preview deployments as part of our PR review process and our production branch commits automatically queued new production deployments to the live site.

Why we’re excited

After all the changes were in place, we ended up with a near-identical replica of the Developer Documentation site. However, upon closer inspection, a number of major improvements had been made:

  1. Our application now has fewer moving pieces for development and deployment, which makes it significantly easier to understand and onboard other contributors and team members.
  2. Our development flow is a lot snappier and fully replicated the production behavior. This hugely increased our iteration speed and confidence.
  3. Our application was now built as an HTML-first static site. Even though it was always a content site, we are now shipping 90% less JavaScript bytes, which means that our visitors’ browsers are doing less work to view the same content.

The last point speaks to our web pages’ performance, which has real-world implications. These days, websites with faster page-load times are preferred over competitor sites with slower response times. This is true for human and bot users alike! In fact, this is so true that Google now takes page speed into consideration when ranking search results and offers tools like WebMasters and Lighthouse to help site owners track and improve their scores. Below, you can see the before-after comparison of our previous JS-first site to our HTML-first replacement:

The Lightouse result for the mobile Developer Documentation site while using Gatsby. The Performance score is a 55.
The Lighthouse result for the mobile Developer Documentation site after our updates. The Performance score is now a 99!

Here you can see that our Performance grade has significantly improved! It’s this figure, which is a weighted score of the Metrics like First Contentful Paint, that is tied to Page Speed. While this does have SEO impact, the SEO score in a Lighthouse report has to do with Google Crawler’s ability to parse and understand the page’s metadata. This remains unchanged because the content (and its metadata) were not changed as part of the migration.

Conclusion

Developer documentation is incredibly important to the success of any product. At Cloudflare, we believe that technical docs are a product – one that we can continue to iterate on, improve, and make more useful for our customers.

One of the most effective ways to improve documentation is to make it easier for our writers to contribute to them. With our new Documentation Engine, we’re giving our product content team the ability to validate content faster with instantaneous local builds. Preview links via Cloudflare Pages allows stakeholders like product managers and engineering teams the ability to quickly review what the docs will actually look like in production.

As we invest more into our build and deployment pipeline, we expect to further develop our ability to validate both the content and technical implementation of docs as part of review – tools like automatic spell checking, link validation, and visual diffs are all things we’d like to explore in the future.

Importantly, our documentation continues to be 100% open source. If you read Cloudflare’s developer documentation, and have feedback, feel free to check out the project on GitHub and submit suggestions!

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet application, ward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
Technical WritingDeveloper Documentation

Follow on X

Kristian Freeman|@kristianf_
Luke Edwards|@lukeed05
Cloudflare|@cloudflare

Related posts