Partial kernel bypass merged into netmap master

In a previous post we described our work on a new netmap mode called single-rx-queue.

After submitting the pull request, the netmap maintainers told us that the patch was interesting, but they would prefer something more configurable instead of a tailored custom mode.

After an exchange of ideas and some more work, our patch just got merged to mainline netmap.

Meet the new netmap

Before our patch netmap used to be an all-or-nothing deal. That is: there was no way to put a network adapter partially in netmap mode. All of the queues would have to be detached from the host network stack. Even a netmap mode called “single ring pair” didn't help.

Our final patch is extended and more generic, while still supporting the simple functionality of our original single-rx-queue mode.

First we modified netmap to leave queues that are not explicitly requested to be in netmap mode attached to the host stack. In this way, if a user requests a pair of rings (for example using nm_open(“netmap:eth0-4”)) it will actually get a reference to both the number 4 RX and TX rings, while keeping the other rings attached to the kernel stack.

But since the NIC is still partially connected to the host stack, a new problem arises: what should we do with packets that are going to be transmitted by the operating system to a TX ring which is in netmap mode? The solution is simple: just move them to the RX host ring. In this way we can access these packets from netmap simply by opening the interface again in netmap mode and asking for its software ring pair.

Last, for simpler use cases we needed a way to ask for only the RX rings, without the TX counterpart - we do not need TX rings for our specific use case. To achieve this we introduced a couple of flags, NR_TX_RINGS_ONLY and NR_RX_RINGS_ONLY (which translate to /T and /R when we are using nm_open()) to request only TX or RX rings.

With these changes, the only line we needed to edit in our code was the netmap interface name passed to nm_open(). This:

snprintf(nm_if, sizeof(nm_if) “netmap:%s~%d”, if_name, ring_nr);

becomes this:

snprintf(nm_if, sizeof(nm_if), “netmap:%s-%d/R”, iface_name, ring_nr);

and everything kept working as expected!

Try it out

You can follow these instructions to build a test program under Linux. In this example we are using the ixgbe driver.

The test program source code is available on github:

First clone the test application and the netmap repository:

$ git clone https://github.com/cloudflare/cloudflare-blog
$ cd cloudflare-blog/2015-12-nm-single-rx-queue
$ git clone https://github.com/luigirizzo/netmap deps

build it:

$ make

build and load netmap:

$ cd deps/netmap/LINUX
$ ./configure --kernel-sources=/path/to/kernel/sources --driver=ixgbe
$ make
$ sudo insmod netmap.ko
$ sudo insmod ixgbe/ixgbe.ko

and launch the application:

$ sudo ./nm-single-rx-queue eth0 1

Thanks

We would like to thanks Luigi and Giuseppe for their great help shaping the final patch and their work on netmap.