Subscribe to receive notifications of new posts:

How Oxy uses hooks for maximum extensibility

2023-05-26

9 min read
How Oxy uses hooks for maximum extensibility

We recently introduced Oxy, our Rust framework for building proxies. Through a YAML file, Oxy allows applications to easily configure listeners (e.g. IP, MASQUE, HTTP/1), telemetry, and much more. However, when it comes to application logic, a programming language is often a better tool for the job. That’s why in this post we’re introducing Oxy’s rich dependency injection capabilities for programmatically modifying all aspects of a proxy.

The idea of extending proxies with scripting is well established: we've had great past success with Lua in our OpenResty/NGINX deployments and there are numerous web frameworks (e.g. Express) with middleware patterns. While Oxy is geared towards the development of forward proxies, they all share the model of a pre-existing request pipeline with a mechanism for integrating custom application logic. However, the use of Rust greatly helps developer productivity when compared to embedded scripting languages. Having confidence in the types and mutability of objects being passed to and returned from callbacks is wonderful.

Oxy exports a series of hook traits that “hook” into the lifecycle of a connection, not just a request. Oxy applications need to control almost every layer of the OSI model: how packets are received and sent, what tunneling protocols they could be using, what HTTP version they are using (if any), and even how DNS resolution is performed. With these hooks you can extend Oxy in any way possible in a safe and performant way.

First, let's take a look from the perspective of an Oxy application developer, and then we can discuss the implementation of the framework and some of the interesting design decisions we made.

Adding functionality with hooks

Oxy’s dependency injection is a barebones version of what Java or C# developers might be accustomed to. Applications simply implement the start method and return a struct with their hook implementations:

async fn start(
    _settings: ServerSettings<(), ()>,
    _parent_state: Metadata,
) -> anyhow::Result<Hooks<Self>> {
    Ok(Hooks {
        ..Default::default()
    })
}

We can define a simple callback, EgressHook::handle_connection, that will forward all connections to the upstream requested by the client. Oxy calls this function before attempting to make an upstream connection.

#[async_trait]
impl<Ext> EgressHook<Ext> for MyEgressHook
where
    Ext: OxyExt,
{
    async fn handle_connection(
        &self,
        upstream_addr: SocketAddr,
        _egress_ctx: EgressConnectionContext<Ext>,
    ) -> ProxyResult<EgressDecision> {
        Ok(EgressDecision::ExternalDirect(upstream_addr))
    }
}

async fn start(
    _settings: ServerSettings<(), ()>,
    _parent_state: Metadata,
) -> anyhow::Result<Hooks<Self>> {
    Ok(Hooks {
        egress: Some(Arc::new(MyEgressHook)),
        ..Default::default()
    })
}

Oxy simply proxies the connection, but we might want to consider restricting which upstream IPs our clients are allowed to connect to. The implementation above allows everything, but maybe we have internal services that we wish to prevent proxy users from accessing.

#[async_trait]
impl<Ext> EgressHook<Ext> for MyEgressHook
where
    Ext: OxyExt,
{
    async fn handle_connection(
        &self,
        upstream_addr: SocketAddr,
        _egress_ctx: EgressConnectionContext<Ext>,
    ) -> ProxyResult<EgressDecision> {
        if self.private_cidrs.find(upstream_addr).is_some() {
            return Ok(EgressDecision::Block);
        }

        Ok(EgressDecision::ExternalDirect(upstream_addr))
    }
}

This blocking strategy is crude. Sometimes it’s useful to allow certain clients to connect to internal services – a Prometheus scraper is a good example. To authorize these connections, we’ll implement a simple Pre-Shared Key (PSK) authorization scheme – if the client sends the header Proxy-Authorization: Preshared oxy-is-a-proxy, then we’ll let them connect to private addresses via the proxy.

To do this, we need to attach some state to the connection as it passes through Oxy. Client headers only exist in the HTTP CONNECT phase, but we need access to the PSK during the egress phase. With Oxy, this can be done by leveraging its Opaque Extensions to attach arbitrary (yet fully typed) context data to a connection. Oxy initializes the data and passes it to each hook. We can mutate this data when we read headers from the client, and read it later during egress.

#[derive(Default)]
struct AuthorizationResult {
    can_access_private_cidrs: Arc<AtomicBool>,
}

#[async_trait]
impl<Ext> HttpRequestHook<Ext> for MyHttpHook
where
    Ext: OxyExt<IngressConnectionContext = AuthorizationResult>,
{
    async fn handle_proxy_connect_request(
        self: Arc<Self>,
        connect_req_head: &Parts,
        req_ctx: RequestContext<Ext>,
    ) -> ConnectDirective {
        const PSK_HEADER: &str = "Preshared oxy-is-a-proxy";

        // Grab the authorization header and update 
        // the ingress_ctx if the preshared key matches.
        if let Some(authorization_header) = 
          connect_req_head.headers.get("Proxy-Authorization") {
            if authorization_header.to_str().unwrap() == PSK_HEADER {
                req_ctx
                    .ingress_ctx()
                    .ext()
                    .can_access_private_cidrs
                    .store(true, Ordering::SeqCst);
            }
        }

        ConnectDirective::Allow
    }
}

From here, any hook in the pipeline can access this data. For our purposes, we can just update our existing handle_connection callback:

#[async_trait]
impl<Ext> EgressHook<Ext> for MyEgressHook
where
    Ext: OxyExt<IngressConnectionContext = AuthorizationResult>,
{
    async fn handle_connection(
        &self,
        upstream_addr: SocketAddr,
        egress_ctx: EgressConnectionContext<Ext>,
    ) -> ProxyResult<EgressDecision> {
        if self.private_cidrs.find(upstream_addr).is_some() {
            if !egress_ctx
                .ingress_ctx()
                .ext()
                .can_access_private_cidrs
                .load(Ordering::SeqCst)
            {
                return Ok(EgressDecision::Block);
            }
        }

        Ok(EgressDecision::ExternalDirect(upstream_addr))
    }
}

This is a somewhat contrived example, but in practice hooks and their extension types allow Oxy apps to fully customize all aspects of proxied traffic.

A real world example would be implementing the RFC 9209 next-hop Proxy-Status header. This involves setting a header containing the IP address we connected to on behalf of the client. We can do this with two pre-existing callbacks and a little bit of state: first we save the upstream passed to EgressHook::handle_connection_established and then read the value in the HttpRequestHook:handle_proxy_connect_response in order to set the header on the CONNECT response.

#[derive(Default)]
struct ConnectProxyConnectionContext {
    upstream_addr: OnceCell<SocketAddr>,
}

#[async_trait]
impl<Ext> EgressHook<Ext> for MyEgressHook
where
    Ext: OxyExt<IngressConnectionContext = ConnectProxyConnectionContext>,
{
    fn handle_connection_established(
        &self,
        upstream_addr: SocketAddr,
        egress_ctx: EgressConnectionContext<Ext>,
    ) {
        egress_ctx
            .ingress_ctx()
            .ext()
            .upstream_addr
            .set(upstream_addr);
    }
}

#[async_trait]
impl<Ext> HttpRequestHook<Ext> for MyHttpRequestHook
where
    Ext: OxyExt<IngressConnectionContext = ConnectProxyConnectionContext>,
{
    async fn handle_proxy_connect_response(
        self: Arc<Self>,
        mut res: Response<OxyBody>,
        req_ctx: RequestContext<Ext>,
    ) -> ProxyConnectResponseHandlingOutcome {
        let ingress = req_ctx.ingress_ctx();
        let ingress_ext = ingress.ext();

        if let Some(upstream_addr) = ingress_ext.upstream_addr.get() {
            res.headers_mut().insert(
                "Proxy-Status",
                HeaderValue::from_str(&format!("next-hop=\"{upstream_addr}\"")).unwrap(),
            );
        }

        res.into()
    }
}

These examples only consider a few of the hooks along the HTTP CONNECT pipeline, but many real Oxy applications don’t even have L7 ingress! We will talk about the abundance of hooks later, but for now let’s look at their implementation.

Hook implementation

Oxy exists to be used by multiple teams, all with different needs and requirements. It needs a pragmatic solution to extensibility that allows one team to be productive without incurring too much of a cost on others. Hooks and their Opaque Extensions provide effectively limitless customization to applications via a clean, strongly typed interface.

The implementation of hooks within Oxy is relatively simple – throughout the code there are invocations of hook callbacks:

if let Some(ref hook) = self.hook {
    hook.handle_connection_established(upstream_addr, &egress_ctx)
        .await;
}

If a user-provided hook exists, we call it. Some hooks are more like events (e.g. handle_connection_established), and others have return values (e.g. handle_connection) which are matched on by Oxy for control flow. If a callback isn’t implemented, the default trait implementation is used. If a hook isn’t implemented at all, Oxy’s business logic just executes its default functionality. These levels of default behavior enable the minimal example we started with earlier.

While hooks solve the problem of integrating app logic into the framework, there is invariably a need to pass custom state around as we demonstrated in our PSK example. Oxy manages this custom state, passing it to hook invocations. As it is generic over the type defined by the application, this is where things get more interesting.

Generics and opaque types

Every team that works with Oxy has unique business needs, so it is important that one team’s changes don’t cause a cascade of refactoring for the others. Given that these context fields are of a user-defined type, you might expect heavy usage of generics. With Oxy we took a different approach: a generic interface is presented to application developers, but within the framework the type is erased. Keeping generics out of the internal code means adding new extension types to the framework is painless.

Our implementation relies on the Any trait. The framework treats the data as an opaque blob, but when it traverses the public API, the wrapped Any object is downcast into the concrete type defined by the user. The public API layer enforces that the user type must implement Default, which allows Oxy to be wholly responsible for creating and managing instances of the type. Mutations are then done by users of the framework through interior mutability, usually with atomics and locks.

Crates like reqwest_middleware, tracing and http have a similar extension mechanism.

There’s a hook for that

As you might have gathered, Oxy cares a lot about the productivity of Oxy app developers. The plethora of injection points lets users quickly add features and functionality without worrying about “irrelevant” proxy logic. Sane defaults help balance customizability with complexity.

Only a subset of callbacks will be invoked for a given packet: applications operating purely at L3 will see different hook callbacks fired compared to one operating at L7. This again is customizable – if desired, Oxy’s design allows connections to be upgraded (or downgraded)  which would cause a different set of callbacks to be invoked.

The ingress phase is where the hooks controlling the upgrading of L3 and decapsulation of specific L4 protocols reside. For our L3 IP Tunnel, Oxy has powerful callbacks like IpFlowHook::handle_flow which allow applications to drop, upgrade or redirect flows. IpFlowHook::handle_packet gives that same level of control at the packet level – even allowing us to modify the byte array as it passes through.

Let’s consider the H2 Proxy Protocol example in the above diagram. After Oxy has accepted the Proxy Protocol connection it fires ProxyProtocolConnectionHook::handle_connection with the parsed header, allowing applications to handle any TLVs of interest. Hook like these are common – Oxy handles the heavy lifting and then passes the application some useful information.

From here, L4 connections are funneled through the IngressHook which contains a callback we saw in our initial example: IngressHook::handle_connection. This works as you might expect, allowing applications to control whether to Allow or Block a connection as it ingresses. There is a counterpart: IngressHook::handle_connection_close, which when called gives applications insight into ingress connection statistics like loss, retransmissions, bytes transferred, etc.

Next up is the transformation phase, where we start to see some of our more powerful hooks. Oxy invokes TunnelHook::should_intercept_https, passing the SNI along with the usual connection context. This enables applications to easily configure HTTPS interception based on hostname and any custom context data (e.g. ACLs). By default, Oxy effectively splices the ingress and egress sockets, but if applications wish to have complete control over the tunneling, that is possible with TunnelHook::get_app_tunnel_pipeline, where applications are simply provided the two sockets and can implement whatever interception capabilities they wish.

Of particular interest to those wishing to implement L7 firewalls, the HttpRequestHookPipeline has two very powerful callbacks:  handle_request and handle_response. Both of these offer a similar high level interface for streaming rewrites or scanning of HTTP bodies.

The EgressHook has the most callbacks, including some of the most powerful ones. For situations where hostnames are provided, DNS resolution must occur. At its simplest, Oxy allows applications to specify the nameservers used in resolution. If more control is required, Oxy provides a callback – EgressHook::handle_upstream_ips – which gives applications an opportunity to mutate the resolved IP addresses before Oxy connects. If applications want absolute control, they can turn to EgressHook::dns_resolve_override which is invoked with a hostname and expects a Vec<IpAddr> to be returned.

Much like the IngressHook, there is an EgressHook::handle_connection hook, but rather than just Allow or Block, applications can instruct Oxy to egress their connection externally, internally within Cloudflare, or even downgrade to IP packets. While it’s often best to defer to the framework for connection establishment, Oxy again offers complete control to those who want it with a few override callbacks, e.g. tcp_connect_override, udp_connect_override. This functionality is mainly leveraged by our egress service, but available to all Oxy applications if they need it.

Lastly, one of the newest additions, the AppLifecycleHook. Hopefully this sees orders of magnitude fewer invocations than the rest. The AppLifecycleHook::state_for_restart callback is invoked by Oxy during a graceful shutdown. Applications are then given the opportunity to serialize their state which will be passed to the child process. Graceful restarts are a little more nuanced, but this hook cleanly solves the problem of passing application state between releases of the application.

Right now we have around 64 public facing hooks and we keep adding more. The above diagram is (largely) accurate at time of writing but if a team needs a hook and there can be a sensible default for it, then it might as well be added. One of the primary drivers of the hook architecture for Oxy is that different teams can work on and implement the hooks that they need. Business logic is kept outside Oxy, so teams can readily leverage each other's work.

We would be remiss not to mention the issue of discoverability. For most cases, it isn’t an issue, however application developers may find when developing certain features that a more holistic understanding is necessary. This inevitably means looking into the Oxy source to fully understand when and where certain hook callbacks will be invoked. Reasoning about the order callbacks will be invoked is even thornier. Many of the hooks alter control flow significantly, so there’s always some risk that a change in Oxy could mean a change in the semantics of the applications built on top of it. To solve this, we’re experimenting with different ways to record hook execution orders when running integration tests, maybe through a proc-macro or compiler tooling.

Conclusion

In this post we’ve just scratched the surface of what’s possible with hooks in Oxy. In our example we saw a glimpse of their power: just two simple hooks and a few lines of code, and we have a forward proxy with built-in metrics, tracing, graceful restarts and much, much more.

Oxy’s extensibility with hooks is “only” dependency injection, but we’ve found this to be an extremely powerful way to build proxies. It’s dependency injection at all layers of the networking stack, from IP packets and tunnels all the way up to proxied UDP streams over QUIC. The shared core with hooks approach has been a terrific way to build a proxy framework. Teams add generic code to the framework, such as new Opaque Extensions in specific code paths, and then use those injection points to implement the logic for everything from iCloud Private Relay to Cloudflare Zero Trust. The generic capabilities are there for all teams to use, and there’s very little to no cost if you decide not to use them. We can’t wait to see what the future holds and for Oxy’s further adoption within Cloudflare.

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
OxyDeep Dive

Follow on X

Cloudflare|@cloudflare

Related posts

April 12, 2024 1:00 PM

How we ensure Cloudflare customers aren't affected by Let's Encrypt's certificate chain change

Let’s Encrypt’s cross-signed chain will be expiring in September. This will affect legacy devices with outdated trust stores (Android versions 7.1.1 or older). To prevent this change from impacting customers, Cloudflare will shift Let’s Encrypt certificates upon renewal to use a different CA...