The Cloudflare team was so excited to hear how Twilio Segment solved problems they encountered with tracking first-party data and personalization using Cloudflare Workers. We are happy to have guest bloggers Pooya Jaferian and Tasha Alfano from Twilio Segment to share their story.
Introduction
Twilio Segment is a customer data platform that collects, transforms, and activates first-party customer data. Segment helps developers collect user interactions within an application, form a unified customer record, and sync it to hundreds of different marketing, product, analytics, and data warehouse integrations.
There are two “unsolved” problem with app instrumentation today:
Problem #1: Many important events that you want to track happen on the “wild-west” of the client, but collecting those events via the client can lead to low data quality, as events are dropped due to user configurations, browser limitations, and network connectivity issues.
Problem #2: Applications need access to real-time (<50ms) user state to personalize the application experience based on advanced computations and segmentation logic that must be executed on the cloud.
The Segment Edge SDK – built on Cloudflare Workers – solves for both. With Segment Edge SDK, developers can collect high-quality first-party data. Developers can also use Segment Edge SDK to access real-time user profiles and state, to deliver personalized app experiences without managing a ton of infrastructure.
This post goes deep on how and why we built the Segment Edge SDK. We chose the Cloudflare Workers platform as the runtime for our SDK for a few reasons. First, we needed a scalable platform to collect billions of events per day. Workers running with no cold-start made them the right choice. Second, our SDK needed a fast storage solution, and Workers KV fitted our needs perfectly. Third, we wanted our SDK to be easy to use and deploy, and Workers’ ease and speed of deployment was a great fit.
It is important to note that the Segment Edge SDK is in early development stages, and any features mentioned are subject to change.
Serving a JavaScript library 700M+ times per day
analytics.js is our core JavaScript UI SDK that allows web developers to send data to any tool without having to learn, test, or use a new API every time.
Figure 1 illustrates how Segment can be used to collect data on a web application. Developers add Segment’s web SDK, analytics.js, to their websites by including a JavaScript snippet to the HEAD
of their web pages. The snippet can immediately collect and buffer events while it also loads the full library asynchronously from the Segment CDN. Developers can then use analytics.js to identify the visitors, e.g., **analytics**.identify('john')
, and track user behavior, e.g., analytics.track('**Order** **Completed**')
. Calling the `analytics.js methods such as identify or track will send data to Segment’s API (api.segment.io
). Segment’s platform can then deliver the events to different tools, as well as create a profile for the user (e.g., build a profile for user “John”, associate “Order Completed”, as well as add all future activities of john to the profile).
Analytics.js also stores state in the browser as first-party cookies (e.g., storing an ajs_user_id
cookie with the value of john, with cookie scoped at the example.com domain) so that when the user visits the website again, the user identifier stored in the cookie can be used to recognize the user.
Figure 1- How analytics.js loads on a website and tracks events
While analytics.js only tracks first-party data (i.e., the data is collected and used by the website that the user is visiting), certain browser controls incorrectly identify analytics.js as a third-party tracker, because the SDK is loaded from a third-party domain (cdn.segment.com) and the data is going to a third-party domain (api.segment.com). Furthermore, despite using first-party cookies to store user identity, some browsers such as Safari have limited the TTL for non-HTTPOnly cookies to 7-days, making it challenging to maintain state for long periods of time.
To overcome these limitations, we have built a Segment Edge SDK (currently in early development) that can automatically add Segment’s library to a web application, eliminate the use of third-party domains, and maintain user identity using HTTPOnly cookies. In the process of solving the first-party data problem, we realized that the Edge SDK is best positioned to act as a personalization library, given it has access to the user identity on every request (in the form of cookies), and it can resolve such identity to a full-user profile stored in Segment. The user profile information can be used to deliver personalized content to users directly from the Cloudflare Workers platform.
The remaining portions of this post will cover how we solved the above problems. We first explain how the Edge SDK helps with first-party collection. Then we cover how the Segment profiles database becomes available on the Cloudflare Workers platform, and how to use such data to drive personalization.
Segment Edge SDK and first-party data collection
Developers can set up the Edge SDK by creating a Cloudflare Worker sitting in front of their web application (via Routes) and importing the Edge SDK via npm. The Edge SDK will handle requests and automatically injects analytics.js snippets into every webpage. It also configures first-party endpoints to download the SDK assets and send tracking data. The Edge SDK also captures user identity by looking at the Segment events and instructs the browser to store such identity as HTTPOnly cookies.
import { Segment } from "@segment/edge-sdk-cloudflare";
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const segment = new Segment(env.SEGMENT_WRITE_KEY);
const resp = await segment.handleEvent(request, env);
return resp;
}
};
How the Edge SDK works under the hood to enable first-party data collection
The Edge SDK's internal router checks the inbound request URL against predefined patterns. If the URL matches a route, the router runs the route's chain of handlers to process the request, fetch the origin, or modify the response.
export interface HandlerFunction {
(
request: Request,
response: Response | undefined,
context: RouterContext
): Promise<[Request, Response | undefined, RouterContext]>;
}
Figure 2 demonstrates the routing of incoming requests. The Worker calls segment.handleEvent
method with the request object (step 1), then the router matches the request.url
and request.method
against a set of predefined routes:
GET requests with
/seg/assets/*
path are proxied to Segment CDN (step 2a)POST requests with
/seg/events/*
path are proxied to Segment tracking API (step 2b)Other requests are proxied to the origin (step 2c) and the HTML responses are enriched with the analytics.js snippet (step 3)
Regardless of the route, the router eventually returns a response to the browser (step 4) containing data from the origin, the response from Segment tracking API, or analytics.js assets. When Edge SDK detects the user identity in an incoming request (more on that later), it sets an HTTPOnly cookie in the response headers to persist the user identity in the browser.
Figure 2- Edge SDK router flow
In the subsequent three sections, we explain how we inject analytics.js, proxy Segment endpoints, and set server-side cookies.
Injecting Segment SDK on requests to origin
For all the incoming requests routed to the origin, the Edge SDK fetches the HTML page and then adds the analytics.js snippet to the <HEAD>
tag, embeds the write key, and configures the snippet to download the subsequent javascript bundles from the first-party domain ([first-party host]/seg/assets/*)
and sends data to the first-party domain as well ([first-party host]/seg/events/*)
. This is accomplished using the HTMLRewriter API.
import snippet from "@segment/snippet"; // Existing Segment package that generates snippet
class ElementHandler {
constructor(host: string, writeKey: string)
element(element: Element) {
// generate Segment snippet and configure it with first-party host info
const snip = snippet.min({
host: `${this.host}/seg`,
apiKey: this.writeKey,
})
element.append(`<script>${snip}</script>`, { html: true });
}
}
export const enrichWithAJS: HandlerFunction = async (
request,
response,
context
) => {
const {
settings: { writeKey },
} = context;
const host = request.headers.get("host") || "";
return [
request,
new HTMLRewriter().on("head",
new ElementHandler(host, writeKey))
.transform(response),
context,
];
};
Proxy SDK bundles and Segment API
The Edge SDK proxies the Segment CDN and API under the first-party domain. For example, when the browser loads a page with the injected analytics.js snippet, the snippet loads the full analytics.js bundle from https://example.com/seg/assets/sdk.js
, and the Edge SDK will proxy that request to the Segment CDN:
https://cdn.segment.com/analytics.js/v1/<WRITEKEY>/analytics.min.js
export const proxyAnalyticsJS: HandlerFunction = async (request, response, ctx) => {
const url = `https://cdn.segment.com/analytics.js/v1/${ctx.params.writeKey}/analytics.min.js`;
const resp = await fetch(url);
return [request, resp, ctx];
};
Similarly, analytics.js collects events and sends them via a POST request to https://example.com/seg/events/[method]
and the Edge SDK will proxy such requests to the Segment tracking API:
https://api.segment.io/v1/[method]
export const handleAPI: HandlerFunction = async (request, response, context) => {
const url = new URL(request.url);
const parts = url.pathname.split("/");
const method = parts.pop();
let body: { [key: string]: any } = await request.json();
const init = {
method: "POST",
headers: request.headers,
body: JSON.stringify(body),
};
const resp = await fetch(`https://api.segment.io/v1/${method}`, init);
return [request, resp, context];
};
First party server-side cookies
The Edge SDK also re-writes existing client-side analytics.js cookies as HTTPOnly cookies. When Edge SDK intercepts an identify
event e.g., **analytics**.identify('john')
, it extracts the user identity (“john”) and then sets a server-side cookie when sending a response back to the user. Therefore, any subsequent request to the Edge SDK can be associated with “john” using request cookies.
export const enrichResponseWithIdCookies: HandlerFunction = async (
request, response, context) => {
const host = request.headers.get("host") || "";
const body = await request.json();
const userId = body.userId;
[…]
const headers = new Headers(response.headers);
const cookie = cookie.stringify("ajs_user_id", userId, {
httponly: true,
path: "/",
maxage: 31536000,
domain: host,
});
headers.append("Set-Cookie", cookie);
const newResponse = new Response(response.body, {
...response,
headers,
});
return [request, newResponse, newContext];
};
Intercepting the ajs_user_id
on the Workers, and using the cookie identifier to associate each request to a user, is quite powerful, and it opens the door for delivering personalized content to users. The next section covers how Edge SDK can drive personalization.
Personalization on the Supercloud
The Edge SDK offers a registerVariation
method that can customize how a request to a given route should be fetched from the origin. For example, let's assume we have three versions of a landing page in the origin: /red
, /green
, and /
(default), and we want to deliver one of the three versions based on the visitor traits. We can use Edge SDK as follows:
const segment = new Segment(env.SEGMENT_WRITE_KEY);
segment.registerVariation("/", (profile) => {
if (profile.red_group) {
return "/red"
} else if (profile.green_group)
return "/green"
}
});
const resp = await segment.handleEvent(request, env);
return resp
The registerVariation
accepts two inputs: the path that displays the personalized content, and a decision function that should return the origin address for the personalized content. The decision function receives a profile object visitor in Segment. In the example, when users visit example.com/(root path)
, personalized content is delivered by checking if the visitor has a red_group
or green_group
trait and subsequently requesting the content from either /red
or /green
path at the origin.
We already explained that Edge SDK knows the identity of the user via ajs_user_id
cookie, but we haven’t covered how the Edge SDK has access to the full profile object. The next section explains how the full profile becomes available on the Cloudflare Workers platform.
How does personalization work under the hood?
The Personalization feature of the Edge SDK requires storage of profiles on the Cloudflare Workers platform. A Cloudflare KV should be created for the Worker running the Edge SDK and passed to the Edge SDK during initialization. Edge SDK will store profiles in KV, where keys are the ajs_user_id, and values are the serialized profile object. To move Profiles data from Segment to the KV, the SDK uses two methods:
Profiles data push from Segment to the Cloudflare Workers platform: The Segment product can sync user profiles database with different tools, including pushing the data to a webhook. The Edge SDK automatically exposes a webhook endpoint under the first-party domain (e.g.,
example.com/seg/profiles-webhook
) that Segment can call periodically to sync user profiles. The webhook handler receives incoming sync calls from Segment, and writes profiles to the KV.Pulling data from Segment by the Edge SDK: If the Edge SDK queries the KV for a user id, and doesn’t find the profile (i.e., data hasn’t synced yet), it requests the user profile from the Segment API, and stores it in the KV.
Figure 3 demonstrates how the personalization flow works. In step 1, the user requests content for the root path ( / ), and the Worker sends the request to the Edge SDK (step 2). The Edge SDK router determines that a variation is registered on the route, therefore, extracts the ajs_user_id
from the request cookies, and goes through the full profile extraction (step 3). The SDK first checks the KV for a record with the key of ajs_user_id
value and if not found, queries Segment API to fetch the profile, and stores the profile in the KV. Eventually, the profile is extracted and passed into the decision function to decide which path should be served to the user (step 4). The router eventually fetches the variation from the origin (step 5) and returns the response under the / path to the browser (step 6).
Figure 3- Personalization flow
Summary
In this post we covered how the Cloudflare Workers platform can help with tracking first-party data and personalization. We also explained how we built a Segment Edge SDK to enable Segment customers to get those benefits out of the box, without having to create their own DIY solution. The Segment Edge SDK is currently in early development, and we are planning to launch a private pilot and open-source it in the near future.