Identifying and alerting on data loss using Cloudflare Workers

You hear about data breaches almost every day in the news these days. New regulations, such as GDPR, require companies to disclose data breaches within 72 hours of becoming aware. Becoming aware of and identifying data breaches as they happen, however, is not an easy task. It is often challenging for companies to become aware of their own data breaches and losses well-before they get picked up by the media.

One symptom of a data breach is data (such as passwords or PII) that should never leave internal systems making its way through an HTTP response into the public Internet. Since Cloudflare Workers sits between your infrastructure and the public for any endpoints exposed to the Internet, Workers can be used as a way of alerting you of canary data leaving.

In the following example, we will be inspecting the content of each response, checking to see if our canary data has leaked out, and if so, returning a static response and calling the PagerDuty API to notify of a potential breach.

Detecting Data Loss

In this example, we’ll be looking for a particular string in the body of the response. This string can be canary data in your database (in our example, the secret is “SHHHTHISISASECRET”, and we are matching on a very specific string).

To get the body of the content, we usejavascript let body = await response.text()This pulls the body of the response into the body variable. Note, that since this method consumes the body of the response, we will have to construct a new Response object prior to returning it. If you are expecting JSON, you may also call response.json(). Since we cannot read the body of images and other non-text formats, we check the Content-Type of the response prior to trying to parse the body.

 if(response.headers.get('Content-Type').includes('text')){
     let body = await response.text()
     if(body.includes('SHHHTHISISASECRET')) {
       response = new Response('Blocked.', {status: 403, headers: new Headers({'Private-block': true})})
       return response
     }
     return new Response(body, {status: response.status, headers: response.headers})
   }

By default, Workers will stream responses back to the client to help improve performance and TTFB. It is worthwhile noting that reading the body of the response in the Worker means that the response will not be streamed (as we must wait on the read to complete to identify the presence of this string in the body)

Returning a response to the client

Static block

In the example, we create a static response:

response = new Response('<html><h1>Blocked.</h1></html>', {status: 403, headers: new Headers({'Content-Type': 'text/html'})})

To make sure that the browser can parse and display it properly, we also add the Content-Type header with the value “text/html”. Assuming the origin responded with something it should not have responded with, we will replace the response headers so as not to reflect back any additional information.

Rate limiting the response

In the example above, we are looking for a string that clearly should never be leaked. However, in some cases, it may be possible that you are looking to detect data that is valid for your application to respond with, but you still want to rate limit the number of times it is being accessed.

Cloudflare Rate Limiting allows you to create Rate Limiting rules based on response headers and response status codes.

By defining the rule below, and setting the X-Rate-Limiting response header to true, we can make sure each IP can only access this data once per minute:

Adding the header in the Worker:

response = new Response(body, {status: response.status, headers: new Headers({'X-Rate-Limiting': true}})

Triggering PagerDuty

As our method of alerting on these incidents, we’ll be making an API call to PagerDuty.

To start, you will have to set up a Service to receive alerts and incidents on. In the Integration setting, make sure you select the API since that will allow us to make HTTP requests directly from the Worker.

You will additionally need to set up an API key (under Configuration → API Access) to allow the Worker to create Incident events and trigger alerts.

When making the API call, we will use event.waitUntil(). This serves two purposes:We don’t necessarily want to block the response that is returned to the client, and make it wait until we complete the request to PagerDuty (this is important for the performance of critical tasks).Outstanding asynchronous tasks are canceled as soon as a worker finishes sending its main response body back to the client. event.waitUntil() helps ensure that the call the PagerDuty is complete even after the response is sent to the client.

async function createPagerDutyIncident(event) {
 let body = `{
     "incident": {
         "type": "incident",
         "title": "Potential data breach",
         "service": {
           "id": "${PD_SERVICE_ID}",
           "type": "service_reference"
         },
       }
   }`

 let PDInit = {
   method: 'POST',
   headers: new Headers({
     "Content-Type": "application/json",
     "Accept": "application/vnd.pagerduty+json;version=2",
     "From": `${PD_FROM}`,
     "Authorization": `Token token=${PD_API_KEY}`
   }),
   body: body
 }
 event.waitUntil(fetch('https://api.pagerduty.com/incidents', PDInit))

}

The Complete Worker

const PD_API_KEY = 'key'
const PD_FROM = 'email@gmail.com'
const PD_SERVICE_ID = 'ID'

addEventListener('fetch', event => {
 let response = handleRequest(event)
 event.respondWith(response)
})

/**
* Find canary data in the response
* @param {Request} request
*/
async function handleRequest(event) {
 try{
   let request = event.request
   let response = await fetch(request)
   // Only check when content type contains "text"
   if(response.headers.get('Content-Type').includes('text')){
     let body = await response.text()
     if(body.includes('SHHHTHISISASECRET')) {
       response = new Response('<html><h1>Blocked.</h1></html>', {status: 403, headers: new Headers({'Content-Type': 'text/html'})})
       createPagerDutyIncident(event)
       return response
     }
     return new Response(body, {status: response.status, headers: response.headers})
   }
   else {
     return response
   }
 }
 catch (e) {
   console.log(e)
 }
}

async function createPagerDutyIncident(event) {
 let body = `{
     "incident": {
         "type": "incident",
         "title": "Potential data breach",
         "service": {
           "id": "${PD_SERVICE_ID}",
           "type": "service_reference"
         },
       }
   }`

 let PDInit = {
   method: 'POST',
   headers: new Headers({
     "Content-Type": "application/json",
     "Accept": "application/vnd.pagerduty+json;version=2",
     "From": `${PD_FROM}`,
     "Authorization": `Token token=${PD_API_KEY}`
   }),
   body: body
 }
 event.waitUntil(fetch('https://api.pagerduty.com/incidents', PDInit))

}

Conclusion

Cloudflare Workers give you full control over each request and response that flows through Cloudflare. Being able to inspect the body of the response means that you can identify, alert and modify content sent back by your origin, and thus use Workers for things such as data loss prevention.

You can check out more uses and recipes for Workers here.

As always, we would love to hear what you are doing with Workers.

The Cloudflare Blog

Identifying and alerting on data loss using Cloudflare Workers

Detecting Data Loss

Returning a response to the client

Static block

Rate limiting the response

Triggering PagerDuty

The Complete Worker

Conclusion

Developer Week 2025 wrap-up

Startup Program update: empowering every stage of the startup journey

How we simplified NCMEC reporting with Cloudflare Workflows

Startup spotlight: building AI agents and accelerating innovation with Cohort #5