Fosdem 2017 - Unweaving the Web

Fosdem is one of my favorite conferences. I guess this is true for many free software enthusiasts. It doesn’t matter how many editions it has been through, it still keeps the same spirit. Tons of people sharing, talking about their homebrew experiments, projects, ideas, and of course tons of people rushing around. That’s something unmatched in other technical events. Kudos to all the volunteers and people involved that make Fosdem possible every year.

As for this year edition, I tried a new approach. Instead of switching rooms, I decided to stick to one of the tracks. Since I’m mostly interested in networking lately, I attended the “SDN/NFV devroom”. Here is my summary of the talks:

“Opening network access in the Central Office” by Chris Price. Unfortunately I was late for this talk. According to the abstract it covered several SDN/NFV related technologies such as OpenDaylight, OPNFV and ONOS/CORD.

“The emergence of open-source 4G/5G ecosystems” by Raymond Knopp. Really interesting talk and topic.

Everyone is talking about 5G lately. However, it seems we’re still in the early stages. Unlike previous generations, 5G won’t be only a radio spectrum improvement, but an evolution in computing for wireless networks. For this reason, it’s not coincidence to see 5G connected to terms such as SDN & NFV.

Commoditization of 3GPP radio systems will also be a reality, making radio technology more accessible to hobbyist. This year at Fosdem, as it happened last year for the first time, there was a Software Defined Radio devroom.

Raymond’s talk covers a wide range of topics related with 5G, SDN/NFV and open-source. If you’re already familiar with SDN/NFV, I totally recommend his talk as an introduction to 5G.

“switchdev: the Linux switching framework” by Bert Vermeulen. History and evolution of Linux kernel’s switchdev.

A hardware switch maps MAC addresses to ports. To keep up track of that mapping a Forwarding Database, or FDB, is used.

Since its early beginnings the Linux kernel could emulate a switch by using a bridge, with the disadvantage of doing all the processing in the CPU. No chance to offload specific operations to specialized hardware. Eventually the DSA (Distributed Switch Architecture) subsystem came along but it was mostly bound to Mellanox switches (other vendors got supported later). The final step on this journey was switchdev, “an in-kernel driver model for switch devices which offload the forwarding (data) plane from the kernel”.

Currently both DSA and switchdev subsystems live in the kernel.

“Accelerating TCP with TLDK” by Ray Kinsella. TLDK is a TCP/IP stack in user-space for DPDK.

If you have been following the news in the networking world lately, likely you’ve heard of DPDK. But in case you have never heard of it and still wonder what it means, keep reading.

DPDK stands for Data-Plane Development Kit. It was project started by Intel, although other network-card vendors joined later. The advent of affordable high-speed NICs made developers realize of the bottlenecks in the Linux kernel forwarding-path. Basically, Linux’s networking stack has a hard time dealing with 10G Ethernet with only one core. This is particularly true for small packets.

This fact triggered two reactions:

One from the Linux kernel community to improve the kernel’s networking stack performance.
Another one from the network community to move away from the kernel and do packet processing in user-space.

The latter is called a kernel-bypass and there are several approaches to it. One of these approaches is to talk directly to the hardware from an user-space program. In other words, an user-space driver.

And that’s mostly what DPDK is about. Speeding up packet processing by providing user space drivers for several high-performance NICs (from different vendors). However, an user-space driver alone is not sufficient for squeezing a high-speed NIC performance. It’s also necessary to apply other techniques. For this reason, DPDK also implements bulk packet processing, non-blocking API, cache optimizations, memory locality, etc

On the other hand, by-passing the kernel means, among other things, there’s no TCP/IP stack. TLDK tries to solve that.

“Writing a functional DPDK application from scratch” by Ferruh Yigit.

It was an OK talk. It can serve as an introduction on how to write your first DPDK app. Unfortunately the slides are not available on-line. During the talk it was hard to follow the example code since the font-size was too small.

“eBPF and XDP walkthrough and recent updates” by Daniel Borkmann.

Very good speaker and one of the best talks of the day. The talk was a catch up on the latest developments on eBPF and XDP. If you’ve never heard of eBPF or XDP, let me introduce you these terms. If you’re already familiar with them, you can skip the next paragraphs completely.

eBPF stands for Extended BPF. But then, what BPF means? BPF (Berkeley Packet Filter) is a bytecode modeled after the Morotola 6502 instruction set. Packet filtering expressions used by tcpdump, such as “ip dst port 80” get compiled to a BPF bytecode program. Later compiled programs get executed by an interpreter. For instance:

$ tcpdump -d "ip dst 192.168.0.1"
(000) ldh      [12]
(001) jeq      #0x800           jt 2    jf 5
(002) ld       [30]
(003) jeq      #0xc0a80001      jt 4    jf 5
(004) ret      #262144
(005) ret      #0

The bytecode above is the result of compiling the expression “ip dst 192.168.0.1”.

On a side note, at Igalia we developed a packet-filtering expression compiler called pflua. In pflua instead of lowering expressions to BPF bytecode, they got lowered to a Lua code function which is later run and optimized by LuaJIT.

The Linux Kernel has its own BPF interpreters (yes, there actually two). One is the BPF interpreter and the other one is the eBPF interpreter, which understand BPF as well.

In 2013 Alexei Starovoitov extended BPF and created eBPF. The Linux Kernel’s eBPF interpreter is more sophisticated than the BPF one. Its main features are:

Similar architecture to x86-64. eBPF uses 64-bit registers and increases the number of available registers from 2 (Accumulator and X register) to 10.
System calls. It’s possible to execute system calls from eBPF programs. In addition, there’s now a bpf system call which allows to run eBPF programs from user-space.
Decoupling from the networking subsystem. The eBPF intrepreter lives now at its own path kernel/ebpf. Unlike BPF, eBPF is used for more than packet filtering. It’s possible to attach an eBPF program to a tracepoint or to a kprobe. This opens up the door to eBPF for instrumentation and performance analysis. It’s also possible to attach eBPF programs to sockets, so packets that do not match the filter are discarded earlier, improving performance.
Maps. It allows eBPF programs to remember values from previous calls, in other words, eBPF can be stateful. Maps data can be queried from user-space. They provide a good mechanism for collecting statistics.
Tail-calls. eBPF programs are limited to 4096 instructions per programs, but the tail-call features allows a eBPF program to control the next eBPF program to execute.
Helper functions. Such as packet rewrite, checksum calculation or packet cloning. Unlike user-space programming, these functions get executed inside the kernel.

Note: Actually, they’re not interpreters but JIT compilers as what they do is to translate eBPF bytecode programs to native assembly code which is later run by the kernel. Before compiling the code, several checks are performed. For instance, eBPF programs cannot contain loops (an unnitended infinite loop could hang the kernel).

Related to eBPF, there is XDP. XDP (eXpress Data Path) is a kernel subsystem which runs eBPF programs at the earliest place possible (as soon as the packet is read from the RX ring buffer). The execution of a eBPF program can return 4 possible values: DROP, PASS, TX or ABORT. If we manage to discard a packet before it hits the networking stack that will result into a performance gain. And although eBPF programs are meant to be simple, there’s a fairly big amount of things that can be expressed as eBPF programs (routing decisions, packet modifications, etc).

Recently there was a very interesting discussion in the Linux Kernel mailing list about the real value of XDP. The discussion is greatly summarized in this LWN article: Debating the value of XDP. After reading all the opinions, I mostly share Stephen Hemminger’s point of view. The networking world is complex. I think XDP has its space, but I honestly cannot imagine writing a network function as complex as the lwAFTR function as an eBPF program. User-space networking is a reality hard to deny, it solves real problems and it’s getting more and more common everyday.

“Cilium - BPF & XDP for containers” by Thomas Graf.

Another great talk. Thomas is a seasoned Linux kernel networking hacker with more than 10 years of experience. In addition, he knows how to deliver a talk which it highly helps to follow the topics at discussion.

Thomas talk focused on the Cillium project. Cillium is a system for easing Linux container networking. The project leverages heavily on eBPF and XDP. It was helpful to schedule Daniel’s talk right before this one, so all those concepts were already introduced.

The Cillium project provides fast in-kernel networking and security policy enforcement for containers. It does it by orchestrating eBPF programs to containers. The programs are directly executed on XDP, instead of being attached to a connection proxy. Programs can be modified, recompiled and distributed again to the containers without dropping the connection. Containers only care about the traffic that matters to them and since traffic is filtered at XDP level that results into a performance gain. I forgot to mention that since XDP access to the DMA buffer, it requires driver support by the NIC. At this moment only Mellanox cards support XDP although Intel support is coming.

Cillium provides a catalogue of network functions. It features functions such as L3/L4 load balancing, NAT46, connection tracking, port mapping, statistics etc. Another interesting thing it does is communication between containers via labels, which is implemented via IPv6.

“Stateful packet processing with eBPF” by Quentin Monnet.

The background of this talk was a R&D project called Beba. Firstly, Monnet introduced the OpenState project. OpenState is a stateful data plane API for SDN controllers. Two network functions implemented using OpenState, Port Knocking and Token Bucket, were discussed and Monet shown how they could be implemented using eBPF.

“Getting started with OpenDaylight” by Charles Eckel & Giles Heron.

OpenFlow is one of the basic building blocks for SDN. It standardizes a protocol by which an entity, called a SDN controller, can manage several remote switches. These switches can be either hardware appliances or software switches. OpenDaylight is an open-source implementation of a SDN controller which goal is to grow the adoption of SDN. OpenDaylight is hosted by the Linux Foundation.

“Open-Source BGP networking with OpenDaylight” by Giles Heron. Follow-up on the previous talk with a practical focus.

“FastDataStacks” by Tomas Cechvala. FastDataStacks is a stack composed by OpenStack, OpenDayLight and FD.io + OPNFV.

“PNDA.io” by Jeremie Garnier.

PNDA.io is a platform for network data analytics. It brings together several open-source technologies for data analysis (Yarn, HDFS, Spark, Zookeeper, etc) and combines them to streamline the process of developing data processing applications. Users can focus on their data analysis and not in developing a pipeline.

“When configuration management meet SDN” by Michael Scherer. Ansible + ssh as an orchestration tool.

“What do you mean ’SDN’ on traditional routers?” by Peter Van Eynde.

Really fun and interesting talk by one of the responsibles of network infrastructure at Fosdem. If you’re curious about how Fosdem’s network is deployed you should watch this talk. Peter’s talk focused mostly on network monitoring. It covered topics such as SNMP, NetFlow and YANG/Netconf.

That was all for day one.

On Sunday I planned to attend several mixed talks, covering a wide range of topics. In the morning I attended the “Small languages panel”. After the workshop I said hello to Justin Cormack and thank him for ljsyscall. After chatting a bit, I headed towards the “Open Game Development devroom”.

As it was crowded everywhere and switching rooms was tough, I decided in the end to stick to this track for the rest of the day. Some of the talks I enjoyed the most were:

“0xFF - Drawing games for a DIY console” by Xavier Moulet.
“openEMSstim” by Pedro Lopes. In my opinion, the best talk in this track. Watch it!
“Tablexia - Cognitive Training for Children with Dyslexia” by Andrea Šíchová.
“Can open source open minds?” by Jesse Himmelstein.
“Snap! Build Your Own Blocks” by Jens Mönig and Bernat Romagosa.

And that was all for Fosdem this year.

Besides the talks, I enjoyed hanging out with other igalians, meeting old friends, travelling with the little local community from Vigo and of course meeting new people. I found pleasure in walking the streets of Brussels and enjoying the beer, the fries, the parks, the buildings and all the things that make Brussels a charming place I always like to be back.

Please drop me a line if you have any feedback. Thanks!

igalia networking fosdem