Blog What we do Support Community
Login Sign up

Upgrading Cloud Infrastructure Made Easier and Safer Using Cloudflare Workers and Workers KV

by Guest Author.

This is a guest post by Ben Chartrand, who is a Development Manager at Timely. You can check out some of Ben's other Workers projects on his GitHub and his blog.

At Timely we started a project to migrate our web applications from legacy Azure services to a modern PaaS offering. In theory it meant no code changes.

We decided to start with our webhooks. All our endpoints can be grouped into four categories:

  1. Integration with internal tools i.e. HelpScout, monitoring endpoint for PagerDuty
  2. Payment confirmations
  3. Calendar integrations i.e. Google Calendar
  4. SMS confirmations

Despite their limited number, these are vitally important. We did a lot of testing but it was clear we’d only really know if everything was working once we had production traffic. How could we migrate traffic?

Option 1

Change the CNAME to point to the new hosting infrastructure. This is high risk. DNS takes time to propagate so, if we needed to roll back, it would take time. We would also be shifting over everything at once.

Option 2

Use a traffic manager to shift a percentage of traffic using Cloudflare Load Balancing. We could start at, say, 5% traffic to the new infrastructure and, assuming everything appears to be ok, slowly increase the traffic.

In our case the vast majority of our traffic goes to our calendar integration endpoints. The other endpoints were unlikely to receive traffic, especially if started with just 5% of traffic. This wasn’t the best option.

Enter Option 3: Cloudflare Workers and Workers KV

I remember thinking: wouldn’t it be great if we could migrate traffic one endpoint at a time? We have about 20. We could start at the low risk endpoints and progressively move our way up.

We were able to write a Cloudflare Worker script that:

  • Detected the path i.e. /webhooks/paypal
  • If the path matched one our endpoints, we checked Workers KV (Key Value storage) to see if that endpoint was enabled. This was our feature flag / setting
  • If it was enabled and the path matched we redirected to the new infrastructure. This involved changing the domain but otherwise keeping the request as-is i.e. webhooks.currentdomain.com/webhooks/paypal to webhooks.newinfrastructure.com/webhooks/paypal

The first step was to add passThroughOnException mentioned in this post.

addEventListener('fetch', event => {
 event.passThroughOnException()
 event.respondWith(handleRequest(event))
})

Next, in the handleRequest method, I created a map of each endpoint (the path) and the corresponding Workers KV key, so I know where to look for the setting.

const endpoints = new Map()
   endpoints.set('/monitoring', 'monitoring')
   endpoints.set('/paypal', 'payPalIpnWebHook')
   // more endpoints
   endpoints.set('/helpscout', 'helpScoutWebHook')

Next I inspect the path for each request. If the path matches then we check the setting. If so, we set a redirect flag.

   for (var [key, value] of endpoints.entries()) {
     if (currentUrl.pathname.startsWith(key)) {
       const flag = await WEBHOOK_SETTINGS.get(value)
       if (flag == 1) {
         console.log(`redirected: ${key}`)
         redirect = true
         break
       }
     }
   }

If the redirect flag is true we change the hostname in the request but leave everything else as-is. This involves creating a new Request object. If we are not redirecting we fetch the request.

   // Handle the request
   let response = null
   if (redirect) {
     // Redirect to the new infra
     const newUrl = request.url.replace(currentHost, newHost)
     const init = {
         method: request.method,
         headers: request.headers,
         body: request.body
     }
     console.log(newUrl)
     const redirectedRequest = new Request(newUrl, init)
     console.log(redirectedRequest)

     response = await fetch(redirectedRequest)
   } else {
     // Handle with the existing infra
     response = await fetch(request)
   }

Complete Code

addEventListener('fetch', event => {
 event.passThroughOnException()
 event.respondWith(handleRequest(event))
})

function postLog(data) {
 return fetch("http://logs-01.loggly.com/inputs/<my id>/tag/http/", {
   method: "POST",
   body: data
 })
}

async function handleRequest(event) {
 try {
   const request = event.request
   const currentHost = 'webhooks.currentdomain.com'
   const newHost = 'webhooks.newinfrastructure.com'

   const currentUrl = new URL(request.url)
   let redirect = false

   // This is a map of the paths and the corresponding KV entry
   const endpoints = new Map()
   endpoints.set('/monitoring', 'monitoring')
   endpoints.set('/paypal', 'payPalIpnWebHook')
   // more endpoints
   endpoints.set('/helpscout', 'helpScoutWebHook')

   for (var [key, value] of endpoints.entries()) {
     if (currentUrl.pathname.startsWith(key)) {
       const flag = await WEBHOOK_SETTINGS.get(value)
       if (flag == 1) {
         console.log(`redirected: ${key}`)
         redirect = true
         break
       }
     }
   }

   // Handle the request
   let response = null
   if (redirect) {
     // Redirect to the new infra
     const newUrl = request.url.replace(currentHost, newHost)
     const init = {
         method: request.method,
         headers: request.headers,
         body: request.body
     }
     console.log(newUrl)
     const redirectedRequest = new Request(newUrl, init)
     console.log(redirectedRequest)

     response = await fetch(redirectedRequest)
   } else {
     // Handle with the existing infra
     response = await fetch(request)
   }

   return response
 } catch (error) {
   event.waitUntil(postLog(error))
   throw error
 }
}

Why use Workers KV?

We could have written everything as a hard coded script, which was updated each time to enable/disable redirection of traffic. This would require the team to make code changes and deploy the worker every time we wanted to make a change.

Using Workers KV, I enabled any member of the team to enable/disable endpoints using the Cloudflare API. To make things easier I created a Postman collection and shared it.

Go Live Problems - and Solutions!

We went live with our first endpoint. The Workers script and KV worked fine but I noticed a small number of exceptions were being reported in Workers > Worker Status.

Cloudflare provides Debugging Tips. I followed the section “Make subrequests to your debug server” and decided to incorporate Loggly. I could now catch the exceptions and send it to Loggly by running a POST using fetch to the URL provided by Loggly. With this I quickly determined what the problem was and corrected the issue.

Another problem that came up was a plethora of 403s. This was highly visible in the Workers > Status Code graph (the green).

Turns out our IIS box had rate limiting setup. Instead of returning a 429 (Too Many Requests), it returned 403 (Forbidden). Phew - it wasn’t an issue with my Worker or the new infrastructure!

We could have set up the rate limiting on the new infrastructure but we instead opted for Cloudflare Rate Limiting. It was cheap, easy to setup and meant the blocked requests didn’t even hit our infrastructure in the first place.

Where to From Here?

As I write this we’ve transitioned all traffic. All endpoints are enabled. Once we’re ready to decommission the old infrastructure we will:

  • Change the CNAME to point to the new infrastructure
  • Disable the worker
  • Celebrate!

We’ll then move onto our new web application, such as our API or main web app. We’re likely to use one of two options:

  1. Use the traffic manager to migrate a percentage of traffic
  2. Migrate traffic on a per-customer basis. It would be similar to above except we would store a setting per-customer (KV would store a setting per customer and we know the customer by the request header, which would have the customer ID). We could, for example, start with internal test accounts, then our beta users and, at the very end, migrate our VIPs.

Upgrading Cloud Infrastructure Made Easier and Safer Using Cloudflare Workers and Workers KV

comments powered by Disqus