2.5Gbps Throughput & SQM, what do I need?

Thank you for your informational graph. Is this with cake ‘out of the box’ configuration or is there some tinkering involved?

This is the script I used to turn CAKE on and off:

#!/bin/bash
set -ex

ul_if=$1
cake_ul_rate=$2
cake_ul_options="diffserv4 wash"

if [ "$cake_ul_rate" -eq "0" ]; then
  ethtool -K "${ul_if}" tcp-segmentation-offload on
  ethtool -K "${ul_if}" generic-segmentation-offload on
  ethtool -K "${ul_if}" generic-receive-offload on
  tc qdisc delete dev "${ul_if}" root
else
  ethtool -K "${ul_if}" tcp-segmentation-offload off
  ethtool -K "${ul_if}" generic-segmentation-offload off
  ethtool -K "${ul_if}" generic-receive-offload off
  tc qdisc replace dev "${ul_if}" root handle 1: cake bandwidth ${cake_ul_rate}mbit ${cake_ul_options}
fi
1 Like

So cake can do gso-segmentation (split-gso keyword) so it should not be strictly necessary to disable these offloads.

1 Like

I figured turning off the hw offload features further shows that this box/processor has no problems at 2.5G.

2 Likes

I am under the impression that most 2.5ghz ethernet chips expose 1 tx/rx interface per core. Using fq_codel or cake as the default egress qdisc in these cases when actually running at 2.5gbit should first, invoke the mq qdisc, and then apply fq_codel or cake to the per core tx queues, an use up very little cpu compared to shaping. I would be interested in a performance comparison for both fq_codel and cake running as the native qdisc in the hw mq situation.

I have been collecting feedback and feature requests on a new version of cake over here.

3 Likes

But would that not require BQL and relay on that BQL back-pressure is "fairly" distributed across the multi-queues, which would only work for egress? NOt that this would not already be great!

The size of the BQL backlog is dependent on interrupt service time more than anything else. I do not have any 2.5gbit hw to test. At the time, bqlmon would typically show ~44k per mq at 1 Gbit with GSO/GRO disabled, 130-150k with GSO/GRO enabled. The side effects of GRO were ever more obvious if you tried to run at physical rates of 100Mbit and 10Mbit.

CAKE typically ended up with a larger BQL backlog than fq_codel.

1 Like

I am under the impression that most 2.5ghz ethernet chips expose 1 tx/rx interface per core.

The igc driver that my i226 NICs use has a hardcoded max of 4 queues. That's why you see the softirqs in my graph only going to 4 cores.

I would be interested in a performance comparison for both fq_codel and cake running as the native qdisc in the hw mq situation.

You'd like me to run the same test, but use fq_codel instead, correct?

attached to the 4 queues, not shaped.

It does appear bql is fully supported by this driver.

https://github.com/ffainelli/bqlmon can be used to monitor bql interactions.

Can you give me an example of how to configure what you are asking for? I don't quite understand how you want it configured.

sysctl -w net.core.default_qdisc=fq_codel # or cake
tc qdisc replace dev your_device root cake # or fq_codel
tc qdisc del dev your_device root # Will then reset that device to this new default
tc -s qdisc show dev your_device # should show mq with 4 fq_codels 
2 Likes

Not a big difference. First is fq_codel, second is CAKE:

1 Like

cake eats watts

1 Like

Roughly 20% more... could you post the output of tc -s qdisc please? I wonder which cake options are in play (probably the defaults, but it would be nice to confirm this).

Thanks for your detailed statistics. I was concerned newer cpu's with a mixed setup of performance cores (p-cores) and efficient cores, like the 1235u with two max 55 watt tdp p-cores, would draw way too much electricity. Reviews like https://www.notebookcheck.net/Intel-Core-i5-1235U-Processor-Benchmarks-and-Specs.589637.0.html had put me off too. Electricity is currently very expensive in Germany. 10 Watt 24/7 = ~ 35 € per year at 40 cent per kWh.

1 Like
root@router:~# ./start_shaping2.sh test fq_codel
+ ul_if=test
+ qdisc=fq_codel
+ sysctl -w net.core.default_qdisc=fq_codel
net.core.default_qdisc = fq_codel
+ tc qdisc replace dev test root fq_codel
+ tc qdisc del dev test root
+ tc -s qdisc show dev test
qdisc mq 0: root 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 32Mb ecn drop_batch 64 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
root@router:~# ./start_shaping2.sh test cake
+ ul_if=test
+ qdisc=cake
+ sysctl -w net.core.default_qdisc=cake
net.core.default_qdisc = cake
+ tc qdisc replace dev test root cake
+ tc qdisc del dev test root
+ tc -s qdisc show dev test
qdisc mq 0: root 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 0: parent :4 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms raw overhead 0 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 0b of 15140Kb
 capacity estimate: 0bit
 min/max network layer size:        65535 /       0
 min/max overhead-adjusted size:    65535 /       0
 average network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh           0bit         0bit         0bit
  target            5ms          5ms          5ms
  interval        100ms        100ms        100ms
  pk_delay          0us          0us          0us
  av_delay          0us          0us          0us
  sp_delay          0us          0us          0us
  backlog            0b           0b           0b
  pkts                0            0            0
  bytes               0            0            0
  way_inds            0            0            0
  way_miss            0            0            0
  way_cols            0            0            0
  drops               0            0            0
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            0            0            0
  bk_flows            0            0            0
  un_flows            0            0            0
  max_len             0            0            0
  quantum          1514         1514         1514

qdisc cake 0: parent :3 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms raw overhead 0 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 0b of 15140Kb
 capacity estimate: 0bit
 min/max network layer size:        65535 /       0
 min/max overhead-adjusted size:    65535 /       0
 average network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh           0bit         0bit         0bit
  target            5ms          5ms          5ms
  interval        100ms        100ms        100ms
  pk_delay          0us          0us          0us
  av_delay          0us          0us          0us
  sp_delay          0us          0us          0us
  backlog            0b           0b           0b
  pkts                0            0            0
  bytes               0            0            0
  way_inds            0            0            0
  way_miss            0            0            0
  way_cols            0            0            0
  drops               0            0            0
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            0            0            0
  bk_flows            0            0            0
  un_flows            0            0            0
  max_len             0            0            0
  quantum          1514         1514         1514

qdisc cake 0: parent :2 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms raw overhead 0 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 0b of 15140Kb
 capacity estimate: 0bit
 min/max network layer size:        65535 /       0
 min/max overhead-adjusted size:    65535 /       0
 average network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh           0bit         0bit         0bit
  target            5ms          5ms          5ms
  interval        100ms        100ms        100ms
  pk_delay          0us          0us          0us
  av_delay          0us          0us          0us
  sp_delay          0us          0us          0us
  backlog            0b           0b           0b
  pkts                0            0            0
  bytes               0            0            0
  way_inds            0            0            0
  way_miss            0            0            0
  way_cols            0            0            0
  drops               0            0            0
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            0            0            0
  bk_flows            0            0            0
  un_flows            0            0            0
  max_len             0            0            0
  quantum          1514         1514         1514

qdisc cake 0: parent :1 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms raw overhead 0 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 0b of 15140Kb
 capacity estimate: 0bit
 min/max network layer size:        65535 /       0
 min/max overhead-adjusted size:    65535 /       0
 average network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh           0bit         0bit         0bit
  target            5ms          5ms          5ms
  interval        100ms        100ms        100ms
  pk_delay          0us          0us          0us
  av_delay          0us          0us          0us
  sp_delay          0us          0us          0us
  backlog            0b           0b           0b
  pkts                0            0            0
  bytes               0            0            0
  way_inds            0            0            0
  way_miss            0            0            0
  way_cols            0            0            0
  drops               0            0            0
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            0            0            0
  bk_flows            0            0            0
  un_flows            0            0            0
  max_len             0            0            0
  quantum          1514         1514         1514
1 Like

Indeed the cake defaults, thanks for documenting this! So here cake does considerably more than fq_codel (whether this additional work is actually useful here is a different question ;))

1 Like

I would have liked a tc -s qdisc show after the tests completed. Or during. same for a bqlmon result.

1 Like

I agree getting a tc -s qdisc pair tightly "sandwiching" the test would be great :wink: