Subscribe to receive notifications of new posts:

Know your SCM_RIGHTS

2018-11-29

3 min read

As TLS 1.3 was ratified earlier this year, I was recollecting how we got started with it here at Cloudflare. We made the decision to be early adopters of TLS 1.3 a little over two years ago. It was a very important decision, and we took it very seriously.

It is no secret that Cloudflare uses nginx to handle user traffic. A little less known fact, is that we have several instances of nginx running. I won’t go into detail, but there is one instance whose job is to accept connections on port 443, and proxy them to another instance of nginx that actually handles the requests. It has pretty limited functionality otherwise. We fondly call it nginx-ssl.

Back then we were using OpenSSL for TLS and Crypto in nginx, but OpenSSL (and BoringSSL) had yet to announce a timeline for TLS 1.3 support, therefore we had to implement our own TLS 1.3 stack. Obviously we wanted an implementation that would not affect any customer or client that would not enable TLS 1.3. We also needed something that we could iterate on quickly, because the spec was very fluid back then, and also something that we can release frequently without worrying about the rest of the Cloudflare stack.

The obvious solution was to implement it on top of OpenSSL. The OpenSSL version we were using was 1.0.2, but not only were we looking ahead to replace it with version 1.1.0 or with BoringSSL (which we eventually did), it was so ingrained in our stack and so fragile that we wouldn’t be able to achieve our stated goals, without risking serious bugs.

Instead, Filippo Valsorda and Brendan McMillion suggested that the easier path would be to implement TLS 1.3 on top of the Go TLS library and make a Go replica of nginx-ssl (go-ssl). Go is very easy to iterate and prototype, with a powerful standard library, and we had a great pool of Go talent to use, so it made a lot of sense. Thus tls-tris was born.

The question remained how would we have Go handle only TLS 1.3 while letting nginx handling all prior versions of TLS?

And herein lies the problem. Both TLS 1.3 and older versions of TLS communicate on port 443, and it is common knowledge that only one application can listen on a given TCP port, and that application is nginx, that would still handle the bulk of the TLS traffic. We could pipe all the TCP data into another connection in Go, effectively creating an additional proxy layer, but where is the fun in that? Also it seemed a little inefficient.

Meet SCM_RIGHTS

So how do you make two different processes, written in two different programming languages, share the same TCP socket?

Fortunately, Linux (or rather UNIX) provides us with just the tool that we need. You can use UNIX-domain sockets to pass file descriptors between applications, and like everything else in UNIX connections are files.Looking at man 7 unix we see the following:

   Ancillary messages
       Ancillary  data  is  sent and received using sendmsg(2) and recvmsg(2).
       For historical reasons the ancillary message  types  listed  below  are
       specified with a SOL_SOCKET type even though they are AF_UNIX specific.
       To send them  set  the  cmsg_level  field  of  the  struct  cmsghdr  to
       SOL_SOCKET  and  the cmsg_type field to the type.  For more information
       see cmsg(3).

       SCM_RIGHTS
              Send or receive a set of  open  file  descriptors  from  another
              process.  The data portion contains an integer array of the file
              descriptors.  The passed file descriptors behave as though  they
              have been created with dup(2).

Technically you do not send “file descriptors”. The “file descriptors” you handle in the code are simply indices into the processes' local file descriptor table, which in turn points into the OS' open file table, that finally points to the vnode representing the file. Thus the “file descriptor” observed by the other process will most likely have a different numeric value, despite pointing to the same file.

We can also check man 3 cmsg as suggested, to find a handy example on how to use SCM_RIGHTS:

   struct msghdr msg = { 0 };
   struct cmsghdr *cmsg;
   int myfds[NUM_FD];  /* Contains the file descriptors to pass */
   int *fdptr;
   union {         /* Ancillary data buffer, wrapped in a union
                      in order to ensure it is suitably aligned */
       char buf[CMSG_SPACE(sizeof(myfds))];
       struct cmsghdr align;
   } u;

   msg.msg_control = u.buf;
   msg.msg_controllen = sizeof(u.buf);
   cmsg = CMSG_FIRSTHDR(&msg);
   cmsg->cmsg_level = SOL_SOCKET;
   cmsg->cmsg_type = SCM_RIGHTS;
   cmsg->cmsg_len = CMSG_LEN(sizeof(int) * NUM_FD);
   fdptr = (int *) CMSG_DATA(cmsg);    /* Initialize the payload */
   memcpy(fdptr, myfds, NUM_FD * sizeof(int));

And that is what we decided to use. We let OpenSSL read the “Client Hello” message from an established TCP connection. If the “Client Hello” indicated TLS version 1.3, we would use SCM_RIGHTS to send it to the Go process. The Go process would in turn try to parse the rest of the “Client Hello”, if it were successful it would proceed with TLS 1.3 connection, and upon failure it would give the file descriptor back to OpenSSL, to handle regularly.

So how exactly do you implement something like that?

Since in our case we established that the C process will listen for TCP connections, our other process will have to listen on a UNIX socket, for connections C will want to forward.

For example in Go:

type scmListener struct {
	*net.UnixListener
}

type scmConn struct {
	*net.UnixConn
}

var path = "/tmp/scm_example.sock"

func listenSCM() (*scmListener, error) {
	syscall.Unlink(path)

	addr, err := net.ResolveUnixAddr("unix", path)
	if err != nil {
		return nil, err
	}

	ul, err := net.ListenUnix("unix", addr)
	if err != nil {
		return nil, err
	}

	err = os.Chmod(path, 0777)
	if err != nil {
		return nil, err
	}

	return &scmListener{ul}, nil
}

func (l *scmListener) Accept() (*scmConn, error) {
	uc, err := l.AcceptUnix()
	if err != nil {
		return nil, err
	}
	return &scmConn{uc}, nil
}

Then in the C process, for each connection we want to pass, we will connect to that socket first:

int connect_unix()
{
    struct sockaddr_un addr = {.sun_family = AF_UNIX,
                               .sun_path = "/tmp/scm_example.sock"};

    int unix_sock = socket(AF_UNIX, SOCK_STREAM, 0);
    if (unix_sock == -1)
        return -1;

    if (connect(unix_sock, (struct sockaddr *)&addr, sizeof(addr)) == -1)
    {
        close(unix_sock);
        return -1;
    }

    return unix_sock;
}

To actually pass a file descriptor we utilize the example from man 3 cmsg:

int send_fd(int unix_sock, int fd)
{
    struct iovec iov = {.iov_base = ":)", // Must send at least one byte
                        .iov_len = 2};

    union {
        char buf[CMSG_SPACE(sizeof(fd))];
        struct cmsghdr align;
    } u;

    struct msghdr msg = {.msg_iov = &iov,
                         .msg_iovlen = 1,
                         .msg_control = u.buf,
                         .msg_controllen = sizeof(u.buf)};

    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
    *cmsg = (struct cmsghdr){.cmsg_level = SOL_SOCKET,
                             .cmsg_type = SCM_RIGHTS,
                             .cmsg_len = CMSG_LEN(sizeof(fd))};

    memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));

    return sendmsg(unix_sock, &msg, 0);
}

Then to receive the file descriptor in Go:

func (c *scmConn) ReadFD() (*os.File, error) {
	msg, oob := make([]byte, 2), make([]byte, 128)

	_, oobn, _, _, err := c.ReadMsgUnix(msg, oob)
	if err != nil {
		return nil, err
	}

	cmsgs, err := syscall.ParseSocketControlMessage(oob[0:oobn])
	if err != nil {
		return nil, err
	} else if len(cmsgs) != 1 {
		return nil, errors.New("invalid number of cmsgs received")
	}

	fds, err := syscall.ParseUnixRights(&cmsgs[0])
	if err != nil {
		return nil, err
	} else if len(fds) != 1 {
		return nil, errors.New("invalid number of fds received")
	}

	fd := os.NewFile(uintptr(fds[0]), "")
	if fd == nil {
		return nil, errors.New("could not open fd")
	}

	return fd, nil
}

Rust

We can also do this in Rust, although the standard library in Rust does not yet support UNIX sockets, but it does let you address the C library via the libc crate. Warning, unsafe code ahead!

First we want to implement some UNIX socket functionality in Rust:

use libc::*;
use std::io::prelude::*;
use std::net::TcpStream;
use std::os::unix::io::FromRawFd;
use std::os::unix::io::RawFd;

fn errno_str() -> String {
    let strerr = unsafe { strerror(*__error()) };
    let c_str = unsafe { std::ffi::CStr::from_ptr(strerr) };
    c_str.to_string_lossy().into_owned()
}

pub struct UNIXSocket {
    fd: RawFd,
}

pub struct UNIXConn {
    fd: RawFd,
}

impl Drop for UNIXSocket {
    fn drop(&mut self) {
        unsafe { close(self.fd) };
    }
}

impl Drop for UNIXConn {
    fn drop(&mut self) {
        unsafe { close(self.fd) };
    }
}

impl UNIXSocket {
    pub fn new() -> Result<UNIXSocket, String> {
        match unsafe { socket(AF_UNIX, SOCK_STREAM, 0) } {
            -1 => Err(errno_str()),
            fd @ _ => Ok(UNIXSocket { fd }),
        }
    }

    pub fn bind(self, address: &str) -> Result<UNIXSocket, String> {
        assert!(address.len() < 104);

        let mut addr = sockaddr_un {
            sun_len: std::mem::size_of::<sockaddr_un>() as u8,
            sun_family: AF_UNIX as u8,
            sun_path: [0; 104],
        };

        for (i, c) in address.chars().enumerate() {
            addr.sun_path[i] = c as i8;
        }

        match unsafe {
            unlink(&addr.sun_path as *const i8);
            bind(
                self.fd,
                &addr as *const sockaddr_un as *const sockaddr,
                std::mem::size_of::<sockaddr_un>() as u32,
            )
        } {
            -1 => Err(errno_str()),
            _ => Ok(self),
        }
    }

    pub fn listen(self) -> Result<UNIXSocket, String> {
        match unsafe { listen(self.fd, 50) } {
            -1 => Err(errno_str()),
            _ => Ok(self),
        }
    }

    pub fn accept(&self) -> Result<UNIXConn, String> {
        match unsafe { accept(self.fd, std::ptr::null_mut(), std::ptr::null_mut()) } {
            -1 => Err(errno_str()),
            fd @ _ => Ok(UNIXConn { fd }),
        }
    }
}

And the code to extract the file desciptor:

#[repr(C)]
pub struct ScmCmsgHeader {
    cmsg_len: c_uint,
    cmsg_level: c_int,
    cmsg_type: c_int,
    fd: c_int,
}

impl UNIXConn {
    pub fn recv_fd(&self) -> Result<RawFd, String> {
        let mut iov = iovec {
            iov_base: std::ptr::null_mut(),
            iov_len: 0,
        };

        let mut scm = ScmCmsgHeader {
            cmsg_len: 0,
            cmsg_level: 0,
            cmsg_type: 0,
            fd: 0,
        };

        let mut mhdr = msghdr {
            msg_name: std::ptr::null_mut(),
            msg_namelen: 0,
            msg_iov: &mut iov as *mut iovec,
            msg_iovlen: 1,
            msg_control: &mut scm as *mut ScmCmsgHeader as *mut c_void,
            msg_controllen: std::mem::size_of::<ScmCmsgHeader>() as u32,
            msg_flags: 0,
        };

        let n = unsafe { recvmsg(self.fd, &mut mhdr, 0) };

        if n == -1
            || scm.cmsg_len as usize != std::mem::size_of::<ScmCmsgHeader>()
            || scm.cmsg_level != SOL_SOCKET
            || scm.cmsg_type != SCM_RIGHTS
        {
            Err("Invalid SCM message".to_string())
        } else {
            Ok(scm.fd)
        }
    }
}

Conclusion

SCM_RIGHTS is a very powerful tool that can be used for many purposes. In our case we used to to introduce a new service in a non-obtrusive fashion. Other uses may be:

  • A/B testing

  • Phasing out of an old C based service in favor of new Go or Rust one

  • Passing connections from a privileged process to an unprivileged one

And more

You can find the full example here.

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
TLS 1.3TLSSecurityLinuxTCPOpenSSLGoNGINX

Follow on X

Cloudflare|@cloudflare

Related posts

October 09, 2024 1:00 PM

Improving platform resilience at Cloudflare through automation

We realized that we need a way to automatically heal our platform from an operations perspective, and designed and built a workflow orchestration platform to provide these self-healing capabilities across our global network. We explore how this has helped us to reduce the impact on our customers due to operational issues, and the rich variety of similar problems it has empowered us to solve....

October 08, 2024 1:00 PM

Cloudflare acquires Kivera to add simple, preventive cloud security to Cloudflare One

The acquisition and integration of Kivera broadens the scope of Cloudflare’s SASE platform beyond just apps, incorporating increased cloud security through proactive configuration management of cloud services. ...

October 06, 2024 11:00 PM

Enhance your website's security with Cloudflare’s free security.txt generator

Introducing Cloudflare’s free security.txt generator, empowering all users to easily create and manage their security.txt files. This feature enhances vulnerability disclosure processes, aligns with industry standards, and is integrated into the dashboard for seamless access. Strengthen your website's security today!...

October 02, 2024 1:00 PM

How Cloudflare auto-mitigated world record 3.8 Tbps DDoS attack

Over the past couple of weeks, Cloudflare's DDoS protection systems have automatically and successfully mitigated multiple hyper-volumetric L3/4 DDoS attacks exceeding 3 billion packets per second (Bpps). Our systems also automatically mitigated multiple attacks exceeding 3 terabits per second (Tbps), with the largest ones exceeding 3.65 Tbps. The scale of these attacks is unprecedented....