Skb->hash and tunnel/ipv6 encapsulations and fq_codel and sqm

dtaht · June 6, 2020, 2:06am

Over here we had spectacular results by having wireguard pass the skb->hash to sqm...

However, that explicitly forces the hash on encapsulation if skb->hash has not already been calculated.

My question then became complicated -

For many years now, I have been making a possibly false assumption (in the case of embedded hardware) that the hash was already frequently calculated in the RSS (receive side steering) part of the path, and thus did not (usually) need to be calculated again. (Most of the higher end ethernet cards I work with do do RSS already...)

packets that enter the fq_codel qdisc or sqm directly gets a hash if it is not there, however in the case of encapsulation, it helps (see above wireguard link) a LOT if the inner hash is passed through to fq_codel and not recalculated on the outer ip header.

A) how many encapsulation types are commonly used in openwrt?

I imagine gre, ipsec, 6in4, fou, maps, 6xlat and other encapsulations are pretty common nowadays. of these it's the ipv6 related ones that concern/interest me most. Do any of these explicitly force the hash calc as wireguard now does?

B) Is the hash calculated on rx or not on many openwrt related ethernet chips? (the r7800 for example, ar71xx? others?) (e.g. how often should we try to force the hash if not found)

C) to what extent is RSS actually used in openwrt related devices? I mostly see ip running on a single cpu mostly and not load balancing well.

dtaht · June 6, 2020, 2:15am

There is another path that I'm now thinking about... consider a local service that runs through a 6to4 (or equivalent) tunnel. Does that get a skb->hash from the local tcp stack?

tohojo · June 6, 2020, 10:35am

$ cd ~/build/linux/drivers/net
$ git grep -l 'skb_set_hash' | sort -u
ethernet/amazon/ena/ena_netdev.c
ethernet/amd/xgbe/xgbe-drv.c
ethernet/aquantia/atlantic/aq_ring.c
ethernet/broadcom/bnx2.c
ethernet/broadcom/bnx2x/bnx2x_cmn.c
ethernet/broadcom/bnxt/bnxt.c
ethernet/cavium/liquidio/lio_core.c
ethernet/cavium/thunder/nicvf_main.c
ethernet/chelsio/cxgb4/sge.c
ethernet/cisco/enic/enic_main.c
ethernet/emulex/benet/be_main.c
ethernet/freescale/dpaa/dpaa_eth.c
ethernet/google/gve/gve_rx.c
ethernet/hisilicon/hns3/hns3_enet.c
ethernet/intel/e1000e/netdev.c
ethernet/intel/fm10k/fm10k_main.c
ethernet/intel/i40e/i40e_txrx.c
ethernet/intel/iavf/iavf_txrx.c
ethernet/intel/ice/ice_txrx_lib.c
ethernet/intel/igb/igb_main.c
ethernet/intel/igc/igc_main.c
ethernet/intel/ixgbe/ixgbe_main.c
ethernet/intel/ixgbevf/ixgbevf_main.c
ethernet/marvell/octeontx2/nic/otx2_txrx.c
ethernet/marvell/sky2.c
ethernet/mellanox/mlx4/en_rx.c
ethernet/mellanox/mlx5/core/en_rx.c
ethernet/neterion/vxge/vxge-main.c
ethernet/netronome/nfp/nfp_net_common.c
ethernet/pensando/ionic/ionic_txrx.c
ethernet/qlogic/qede/qede_fp.c
ethernet/sfc/falcon/rx.c
ethernet/sfc/rx_common.c
ethernet/stmicro/stmmac/stmmac_main.c
ethernet/sun/niu.c
ethernet/synopsys/dwc-xlgmac-net.c
hyperv/netvsc_drv.c
vmxnet3/vmxnet3_drv.c
xen-netback/netback.c

So no, doesn't seem like there are many embedded chipsets that sets the hash from hardware. Barring any out of tree patches in OpenWrt, of course.

The local TCP stack does generally set the skb->hash from the socket 5-tuple (see skb_set_hash_from_sk() in sock.h). Other socket types seem to call this as well (through skb_set_owner_w()), but I'm not sure exactly which code paths this takes effect on...

dtaht · June 6, 2020, 12:32pm

Re: A) Well, the flow dissector also handled many kinds of encapsulation (and grew batman as one example recently), but where on the venn diagram are we for what is not covered? (I really missed out on paying attention to xlat, fou, etc)

Re C) Going back to a specific problematic device, the mvneta doesn't do RSS, in general? That might explain some things on other bugs... and ar7xxx with a multicore doesn't either? benefit....

dtaht · June 6, 2020, 1:07pm

And then things get even more complicated. If we calculate the hash on the rx side, on an encapsulated
packet, do we preserve it all the way to egress or recalc it on decap?

tohojo · June 6, 2020, 2:04pm

mvneta does support RSS, it just doens't propagate the hash to the kernel. However, there are hardware configurations that map all interrupts to the same CPU so that RSS doesn't help with multi-core scaling. The Espressobin is one such platform, Jesper was saying...

Generally, tunnel decap does skb_clear_hash_if_not_l4(skb), which does what it says on the tin. See __skb_tunnel_rx().

No idea what exactly the flow dissector covers these days. Lots? There's also support for loading a custom flow BPF flow dissector these days...