Skip to main content

In-band signalling: GSO_BY_FRAGS

Modern network cards have the ability to take a long data buffer (bigger than the MTU of the link) and segment it in hardware, sticking a set of common headers on the front of each segment. This has a number of variants and  a number of names: Large Send Offload, TCP Segmentation Offload, and a number of tunnel offloads. (My favourite one would have to be UFO - UDP Fragmentation Offload)

In Linux, GSO - Generic Segmentation Offload - provides a lot of software infrastructure for these various offloads. GSO gives you a few perks even if you don't have hardware support: it allows you to do most of the work with a large buffer, and only split it right before you hand it to hardware.

Now SCTP is a bit of a special case. Per 90017accff61 ("sctp: Add GSO support"):
SCTP has this pecualiarity [sic] that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer.
So, if SCTP wants to get the advantages of GSO, they need to do some magic to allow a buffer (skb) to be split at the right spots. To do this, they create fragments and do GSO on the fragments rather than by splitting a long linear buffer.

As the commit message mentioned, normally the skb has its gso_size property set to the point at which it should be split. Now, here's where things get interesting. Because SCTP needs to signal that it's splitting somewhere else, it overrides the meaning of the gso_size property. If gso_size == GSO_BY_FRAGS then we're dealing with a skb that should be split on fragments, otherwise it should be split normally using the value of gso_size.

This is set up in 3953c46c3ac7 ("sk_buff: allow segmenting based on frag sizes"), and GSO_BY_FRAGS is set to 0xffff.

Now this requires that every user of gso_size checks for the GSO_BY_FRAGS case. Most do, usually by special handlers. However, I came across one that didn't today: the token bucket filter queuing discipline. In this case, it could cause massive performance regressions when using SCTP and the tbf qdisc. (For what it worth, I have no idea why you would want to do that, but it's still a bug.)

I proposed a fix, but it's likely there are other cases.

This is a good example of in-band signalling: there's both control signals (GSO_BY_FRAGS) and data signals in the one channel (the gso_size variable). It's a good example of the downsides: you need to check for both cases everywhere, but it's also a good example of the upside: no extra field was required in the data structure to support this case - and this is a big thing in the network stack.

Comments

Popular posts from this blog

Connecting to a wifi network with netplan

How do you connect to a a wifi network with netplan? I hang out on the #netplan IRC channel on Freenode, and this comes up every so often. netplan - the default network configuration tool in Ubuntu 17.10 onwards - currently supports WPA2 Personal networks, and open (unencrypted) networks only. If you need something else, consider using NetworkManager directly, or falling back to ifupdown and wpa_supplicant for a little longer. Without further ado, here are tested, working YAML files for connection to my local WPA2 and unencrypted network. The only things that have been changed are the SSIDs and password. Both networks have a router providing dhcp4. In both cases I assume there's only one wifi device in the system - if this is not true, replace match: {} with something more specific. You can drop these in  /etc/netplan and run netplan generate; netplan apply  and things should work. The network will also be brought up on subsequent boots. Note that, as always in YAML...

Anonymous bridges in netplan

netplan is the default network configuration system for new installs of Ubuntu 18.04 (Bionic). Introduced as the default in Artful, it replaces /etc/network/interfaces . One question that gets asked repeatedly is: "How do I set up an anonymous bridge in netplan?" (An anonymous bridge, I discovered, is one where the bridge doesn't have an IP address; it's more akin to a switch or hub.) It's been approached on  Launchpad , and comes up on the IRC channel. If you're trying to create a bridge without an IP address, the obvious first thing to try is this: network: version: 2 ethernets: ens8: match: macaddress: 52:54:00:f9:e9:dd ens9: match: macaddress: 52:54:00:56:0d:ce bridges: br0: interfaces: [ens8, ens9] This is neat, plausible, and wrong - the bridge will be created but will stay 'down'. Per ip a : 5: br0: <BROADCAST,MULTICAST> mtu 15...

Painless powerpc cross-compiling

As an ex-IBMer, I'm still quite fond of POWER/ppc64 processors, and occasionally cross-compile kernels for 64-bit little-endian PowerPC (ppc64el/ppc64le) from my amd64 system. It's not immediately obvious what the simplest way to do this is. On Ubuntu (and I'm told, Debian) it is really very simple. Installation sudo apt install gcc-powerpc64le-linux-gnu Congrats, you now have a ppc64le cross-compiling toolchain installed! If you need other languages, g++/gccgo/gfortran/gnat/gobjc-powerpc64le-linux-gnu are also available. Kernel cd your/linux/source make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- <your usual kernel build commands here> That's it. Userspace It depends a bit on the build system. Here's how to build, for example, sed , which uses autotools ( ./configure and friends). ./configure --host powerpc64le-linux-gnu make That's it.  For a dynamically linked binary, you only need the headers for any library depende...