Cake vs fq_codel performance benchmark on x86 + cake for > 1gbit shaping

Hi,

I experimented with openwrt inside container on a archlinux host (cpu: i3-3120m) for cake shaping and did some benchmark but found some surprising result:

Sqm config as follows:

root@openwrtc1:~# cat /etc/config/sqm

config queue 'eth1'
        option enabled '1'
        option interface 'eth0'
        option download '0'
        option upload '800000'
        option qdisc 'fq_codel'
        option script 'simple.qos'
#        option qdisc 'cake'
#        option script 'piece_of_cake.qos'
        option qdisc_advanced '0'
        option ingress_ecn 'ECN'
        option egress_ecn 'ECN'
        option qdisc_really_really_advanced '0'
        option itarget 'auto'
        option etarget 'auto'
        option linklayer 'none'

Observation
a) When upload is set to anything larger than '1000000' (1 gbps), cake seems to be disabled completely:

root@openwrtc1:~# iperf3 -c 192.168.4.190
Connecting to host 192.168.4.190, port 5201
[  5] local 192.168.4.191 port 58062 connected to 192.168.4.190 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.09 GBytes  9.31 Gbits/sec    0    130 KBytes
[  5]   1.00-2.00   sec  1.11 GBytes  9.56 Gbits/sec    0    130 KBytes
[  5]   2.00-3.00   sec  1.12 GBytes  9.61 Gbits/sec    0    130 KBytes
[  5]   3.00-4.00   sec  1.11 GBytes  9.57 Gbits/sec    0    130 KBytes
[  5]   4.00-5.00   sec  1.11 GBytes  9.57 Gbits/sec    0    130 KBytes
[  5]   5.00-6.00   sec  1.11 GBytes  9.56 Gbits/sec    0    130 KBytes
[  5]   6.00-7.00   sec  1.11 GBytes  9.55 Gbits/sec    0    130 KBytes
[  5]   7.00-8.00   sec  1.11 GBytes  9.57 Gbits/sec    0    130 KBytes
[  5]   8.00-9.00   sec  1.11 GBytes  9.57 Gbits/sec    0    130 KBytes
[  5]   9.00-10.00  sec  1.11 GBytes  9.55 Gbits/sec    0    130 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.1 GBytes  9.54 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  11.1 GBytes  9.50 Gbits/sec                  receiver

b) cake max out sirq on one core doing 1gbps shaping, while fq_codel shapes with less than ~20% SIRQ.

Now some testing result -
baseline:

Accepted connection from 192.168.4.191, port 58088
[  5] local 192.168.4.190 port 5201 connected to 192.168.4.191 port 58090
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.10 GBytes  9.44 Gbits/sec
[  5]   1.00-2.00   sec  1.19 GBytes  10.2 Gbits/sec
[  5]   2.00-3.00   sec  1.18 GBytes  10.2 Gbits/sec
[  5]   3.00-4.00   sec  1.18 GBytes  10.1 Gbits/sec
[  5]   4.00-5.00   sec  1.19 GBytes  10.2 Gbits/sec
[  5]   5.00-6.00   sec  1.18 GBytes  10.2 Gbits/sec
[  5]   6.00-7.00   sec  1.18 GBytes  10.2 Gbits/sec
[  5]   7.00-8.00   sec  1.19 GBytes  10.2 Gbits/sec
[  5]   8.00-9.00   sec  1.18 GBytes  10.2 Gbits/sec
[  5]   9.00-10.00  sec  1.18 GBytes  10.2 Gbits/sec
[  5]  10.00-10.04  sec  50.1 MBytes  10.2 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  11.8 GBytes  10.1 Gbits/sec                  receiver

fq_codel max throughput - SIRQ at 94%, upload is set to '10500000' (10gbps + 50mbps)

Accepted connection from 192.168.4.191, port 58128
[  5] local 192.168.4.190 port 5201 connected to 192.168.4.191 port 58130
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.10 GBytes  9.42 Gbits/sec
[  5]   1.00-2.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   2.00-3.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   3.00-4.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   4.00-5.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   5.00-6.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   6.00-7.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   7.00-8.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   8.00-9.00   sec  1.17 GBytes  10.0 Gbits/sec
[  5]   9.00-10.00  sec  1.17 GBytes  10.0 Gbits/sec
[  5]  10.00-10.04  sec  49.8 MBytes  10.0 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  11.7 GBytes  9.98 Gbits/sec                  receiver

For the entirety of iperf3 throughput test, I ping 192.168.4.190 and 192.168.4.191 continuously at 200ms interval and the latency remains < 1ms.

add:
Some additional benchmark showing that traffic shaping is in effect:

fq_codel with upload set to exactly 5gbps, SIRQ is ~ 64% on one core.

Accepted connection from 192.168.4.191, port 58136
[  5] local 192.168.4.190 port 5201 connected to 192.168.4.191 port 58138
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   539 MBytes  4.52 Gbits/sec
[  5]   1.00-2.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   2.00-3.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   3.00-4.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   4.00-5.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   5.00-6.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   6.00-7.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   7.00-8.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   8.00-9.00   sec   570 MBytes  4.78 Gbits/sec
[  5]   9.00-10.00  sec   570 MBytes  4.78 Gbits/sec
[  5]  10.00-10.04  sec  23.7 MBytes  4.77 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  5.56 GBytes  4.76 Gbits/sec                  receiver

cake with upload set to exactly 1gbps (1 bps higher would disable shaping completely) SIRQ is at 100%:

Accepted connection from 192.168.4.191, port 58140
[  5] local 192.168.4.190 port 5201 connected to 192.168.4.191 port 58142
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   105 MBytes   878 Mbits/sec
[  5]   1.00-2.00   sec   114 MBytes   956 Mbits/sec
[  5]   2.00-3.00   sec   114 MBytes   956 Mbits/sec
[  5]   3.00-4.00   sec   114 MBytes   956 Mbits/sec
[  5]   4.00-5.00   sec   114 MBytes   956 Mbits/sec
[  5]   5.00-6.00   sec   114 MBytes   956 Mbits/sec
[  5]   6.00-7.00   sec   114 MBytes   956 Mbits/sec
[  5]   7.00-8.00   sec   114 MBytes   956 Mbits/sec
[  5]   8.00-9.00   sec   114 MBytes   956 Mbits/sec
[  5]   9.00-10.00  sec   114 MBytes   956 Mbits/sec
[  5]  10.00-10.04  sec  4.64 MBytes   954 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.11 GBytes   949 Mbits/sec                  receiver

You've got better Ethernet with 10 Gbps than I do!

A suggestion is to not run iperf or the like on the Device Under Test (DUT). Just generating the packets can consume a lot of resources. If you can generate on one host, route through the OpenWRT instance, and measure on a third host, you may get very different numbers.

Sorry dont have unused 10gbit+ nic sitting around. The iperf test was done between containers hosted in the same laptop, and the max throughput just happens to be sitting at around 10gbps (is it a coincidence or maybe its a internal limitation of linux macvlan interface?)

I will try again later with the server and client pinned to a different cpu core.

1 Like

Interesting, could you post the output of:
tc -d qdisc
tc -s qdisc

from before and after each test, please? Some of cakes defaults are somewhat costly so to conmpare apples to apples you would need to pare down cake's options to only do what fq_codel/simple.qos will do, but to give a reasonable idea what to change I need to see how cake comes up... (plus it might well be that cake is more costly than the htb+fq_codel combination in simple.qos, the jury is still out whether that is just a "but cake does more" thing or whether it truly is less efficient)

Depending on which version of cake you are using from 1Gbps on cake will not segment super-packets (that should not necessarily cause your issues, but it might be related).

Best Regards