With Cloudflare Workers, our JavaScript environment at the edge, it is possible to send traffic logs to arbitrary locations. In this post we are going to discuss an example Worker implementation on how to achieve this. So if you are building or maintaining your own traffic logging/analytics environment, read on.
To build the underlying script we are going to leverage sub requests. Sub requests, which can be spawned from the initial HTTP/S request, can be used to aggregate and compose a response from several back end services, or, like in the example discussed here, to post data to a specific endpoint. Sub requests can be made asynchronously and after the initial request has been fully served to avoid adding unnecessary latency to the main request.
The Worker Code
In this example we assume an Elastic stack has been set up at elk.example.com
and has been configured to receive via HTTP/S PUT
requests a number of fields for each log line. The full script that we are going to look at can be found below:
addEventListener('fetch', event => {
event.respondWith(fetchAndLog(event));
})
async function fetchAndLog(event) {
const response = await fetch(event.request);
event.waitUntil(logToElk(event.request, response));
return response;
}
async function logToElk(request, response) {
var ray = request.headers.get('cf-ray') || '';
var id = ray.slice(0, -4);
var data = {
'timestamp': Date.now(),
'url': request.url,
'referer': request.referrer,
'method': request.method,
'ray': ray,
'ip': request.headers.get('cf-connecting-ip') || '',
'host': request.headers.get('host') || '',
'ua': request.headers.get('user-agent') || '',
'cc': request.headers.get('Cf-Ipcountry') || '',
'colo': request.cf.colo,
'tlsVersion': request.cf.tlsVersion || '',
'tlsCipher': request.cf.tlsCipher || '',
'status': response.status,
};
var url = "https://elk.example.com/weblogs/logs/" + id + "?pipeline=weblogs&pretty"
await fetch(url, {
method: 'PUT',
body: JSON.stringify(data),
headers: new Headers({
'Content-Type': 'application/json',
})
})
}
Let's look at the script in a little more detail:
addEventListener('fetch', event => {
event.respondWith(fetchAndLog(event));
})
async function fetchAndLog(event) {
const response = await fetch(event.request);
event.waitUntil(logToElk(event.request, response));
return response;
}
At the start of the script we are listening to all request events, and on each request, we are calling the fetchAndLog
function. This function will proxy the request as is and asynchronously call the logToElk
function that will post a log line to our ELK stack. As this function is executed asynchronously, delays while logging to ELK will not affect the original request.
Let's take a deeper look at the logToElk
function.
var ray = request.headers.get('cf-ray') || '';
var id = ray.slice(0, -4);
var data = {
'timestamp': Date.now(),
'url': request.url,
'referer': request.referrer,
'method': request.method,
'ray': ray,
'ip': request.headers.get('cf-connecting-ip') || '',
'host': request.headers.get('host') || '',
'ua': request.headers.get('user-agent') || '',
'cc': request.headers.get('Cf-Ipcountry') || '',
'colo': request.cf.colo,
'tlsVersion': request.cf.tlsVersion || '',
'tlsCipher': request.cf.tlsCipher || '',
'status': response.status,
};
The first part is collecting the data we wish to log. Some of these fields are part of the standard fields of any HTTP request (e.g. the URL, the HTTP method etc.), however, we are also adding fields specific to Cloudflare such as the ray ID (a unique identifier of any HTTP request proxying via Cloudflare), the country code as provided by Cloudflare IP to country logic, and the PoP/colo ID that the request is hitting.
var url = "https://elk.example.com/weblogs/logs/" + id + " pipeline=weblogs&pretty"
await fetch(url, {
method: 'PUT',
body: JSON.stringify(data),
headers: new Headers({
'Content-Type': 'application/json',
})
})
Once we have all the fields we wish to log saved in our data
variable, we need to perform a sub request to PUT
the log line to our backend ELK stack. By calling the fetch
function we are initiating a new HTTP request for which we are specifying the request method, the body and additional headers as expected by the ELK stack (in this case we are PUT
ing JSON content).
And that is it, with a small Worker script you can import Cloudflare traffic logs into your ELK stack in real time. If you found this useful we also talked about logging events and alerts to Sentry via workers in the past, and community members have also shared similar methods for other logging tools such as logdna.
Making it better with Argo Tunnel and Access
As an additional improvement it is worth noting that an Elastic Stack uses Kibana as a front end visualisation interface available over HTTP/S. The Kibana endpoint (let's assume kibana.example.com
) can also be proxied via Cloudflare, but it is normally used only internally within an organization. The origin therefore needs to be protected and made accessible only to colleagues.
We can use two Cloudflare features to improve the Kibana deployment:
Argo Tunnel allows the Kibana origin to reach out directly to Cloudflare, avoiding the need of a publicly accessible IP address or hostname;
Cloudflare Access allows you to integrate Cloudflare with your Identity and Access Management tool, and define rules that specify which users or groups have access to the Kibana instance;
Logs for Everyone
Before we launched Workers, retrieving raw logs from Cloudflare was only available to enterprise customers. With a little technical effort you can now start receiving logs from the edge by leveraging Workers regardless of which plan you are using. Workers are available starting on a pay per usage model and the ELK stack is open source.
Enterprise customers can still retrieve raw traffic logs with our Enterprise Log Share (ELS) feature. If ELS is turned on, all requests for the chosen application will be logged and stored on Cloudflare infrastructure. The logs can be downloaded when required via a RESTful API or pushed directly to Amazon S3 or Google Cloud (soon also Microsoft Azure) for further processing in your favorite log analysis tool. Logs with ELS are available within 10 minutes of the request being processed at the edge regardless of location, and they will have a number of additional fields that are not yet available to the Worker environment. ELS also guarantees log delivery and retention periods and allows you not to worry about load and storage.