Software flow offloading implications

No idea, but SQM does nothing actively to foil software offloading...

3 Likes

For flows in the outbound direction , Im curious whether the sqm layer is before or after software flow offloading layer.

Software offloading sends packets direct to the xmit path, and that means into the qdiscs, so SQM works both directions afaik.

2 Likes

And in the end, which one was the expected/good result? the one with SFO on or SFO off?

Ok i have done a couple tests myself with SFO and SQM (first time setting up SQM for me).
Well, SQM is clearly working and the difference is like night and day on latency. Additionally i see a FLOWOFFLOAD target in iptables, and it is being hit since the counter increases. I'd say they are both working fine together then.

2 Likes

what is your speed max @xorbug

thanks

i came of test with mikrotik hap ac2 sqm + SFO and HWO my result is amazing ! :slight_smile:

Wait, you are saying you use HWO+SQM? But:

I'm confused now...

the result has a top has with sqm / sfo + hwo enabled but maybe on mikrotik hap ac2 has not effect for the moment , i haved execute many test for you show :wink:

SQM + SFO + HWO http://www.dslreports.com/speedtest/67711976
SQM + SFO http://www.dslreports.com/speedtest/67712001
SQM http://www.dslreports.com/speedtest/67712028
SFO http://www.dslreports.com/speedtest
SFO + HWO http://www.dslreports.com/speedtest/67712079

Eheh yeah i guess this is the case...

Thank you for caring and taking time to test and share. Indeed it looks like there is no difference between SFO and +HFO, meaning it is not being used, but maybe at those speeds there should be no impact even if it was working... while there is difference between SQM and no SQM for sure.

if you're using hardware flow offloading then SQM is not working, whatever your speed test results say, since it bypasses the qdisc mechanism.

1 Like

with sqm no sfo http://www.dslreports.com/speedtest/68678992

root@OpenWrt:~# tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1518 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 144618747286 bytes 174506922 pkt (dropped 0, overlimits 0 requeues 27)
 backlog 0b 0p requeues 27
  maxpacket 9108 drop_overlimit 0 new_flow_count 54047 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev lan1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan3 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan4 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8009: dev wan root refcnt 2 bandwidth 16Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 5048846069 bytes 40710429 pkt (dropped 3345, overlimits 16534528 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 918272b of 4Mb
 capacity estimate: 16Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       72 /    1544
 average network hdr offset:           14

                  Tin 0
  thresh         16Mbit
  target            5ms
  interval        100ms
  pk_delay       15.3ms
  av_delay       4.48ms
  sp_delay          3us
  backlog            0b
  pkts         40713774
  bytes      5053775579
  way_inds       564415
  way_miss       155112
  way_cols            0
  drops            3345
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len         17054
  quantum           488

qdisc ingress ffff: dev wan parent ffff:fff1 ----------------
 Sent 72353669760 bytes 53590324 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 800a: dev ifb4wan root refcnt 2 bandwidth 56Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 71833817021 bytes 52268548 pkt (dropped 1321776, overlimits 87391757 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 1342536b of 4Mb
 capacity estimate: 56Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       90 /    1544
 average network hdr offset:           14

                  Tin 0
  thresh         56Mbit
  target            5ms
  interval        100ms
  pk_delay        667us
  av_delay        136us
  sp_delay          8us
  backlog            0b
  pkts         53590324
  bytes     73683624904
  way_inds       294724
  way_miss       109165
  way_cols            0
  drops         1321776
  marks               0
  ack_drop            0
  sp_flows            2
  bk_flows            1
  un_flows            0
  max_len         39364
  quantum          1514

qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
root@OpenWrt:~#

with sqm with sfo

root@OpenWrt:~# tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1518 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 144832145622 bytes 174785566 pkt (dropped 0, overlimits 0 requeues 27)
 backlog 0b 0p requeues 27
  maxpacket 9108 drop_overlimit 0 new_flow_count 54088 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev lan1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan3 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev lan4 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8009: dev wan root refcnt 2 bandwidth 16Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 5095097320 bytes 40826079 pkt (dropped 4137, overlimits 16665515 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 918272b of 4Mb
 capacity estimate: 16Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       72 /    1544
 average network hdr offset:           14

                  Tin 0
  thresh         16Mbit
  target            5ms
  interval        100ms
  pk_delay        550us
  av_delay        134us
  sp_delay          7us
  backlog            0b
  pkts         40830216
  bytes      5101225230
  way_inds       565065
  way_miss       155330
  way_cols            0
  drops            4137
  marks               0
  ack_drop            0
  sp_flows            0
  bk_flows            1
  un_flows            0
  max_len         17054
  quantum           488

qdisc ingress ffff: dev wan parent ffff:fff1 ----------------
 Sent 72435286656 bytes 53667803 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 800a: dev ifb4wan root refcnt 2 bandwidth 56Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100ms noatm overhead 44
 Sent 71916206973 bytes 52345548 pkt (dropped 1322255, overlimits 87489041 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 1342536b of 4Mb
 capacity estimate: 56Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       90 /    1544
 average network hdr offset:           14

                  Tin 0
  thresh         56Mbit
  target            5ms
  interval        100ms
  pk_delay        290us
  av_delay         72us
  sp_delay          3us
  backlog            0b
  pkts         53667803
  bytes     73766740042
  way_inds       298245
  way_miss       109318
  way_cols            0
  drops         1322255
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len         39364
  quantum          1514

qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
root@OpenWrt:~#

test realized with rt3200 belkin

1 Like

Higly unscientific test from my side:

I tried enabling software offload on my x86 router on 1Gbit line (no SQM or shaping) and frankly, I did not notice some large improvement by looking at htop and running speedtest.

Only change is that Yamon stopped working, so I switched it back.

1 Like

@moeller0 since you're the SQM expert, what's the latest take on enabling software or hardware offloading with it? Does it do nothing or should it be avoided? It's a shame the LuCI page still doesn't provide any insight for this.

I have to admit that I have no reliable information to share. However as far as I understand, software offloading should work with SQM as it tries to avoid other parts of the network stack higher up than mere qdiscs like sqm uses. Hardware flow offloading however will hide all packets from sqm, so the hardware offloading engine needs to offer its own qdiscs, like the NSS stuff on e.g. r7800 as far as I understand.

The pages from luci-app-sqm? These are independent from sqm-scripts for some time now, so anybody having reliable information could create a PR to get well-tested/well-researched changed in.

As I said, I have never bothered to test any offload. But it seems that my current router offers software flow offloading so I could go and test it... Maybe I will get around to do that later this year...

4 Likes

It's working if the SQM interface set as wan (device), but software flow offloading does not help with SQM if the interface set as pppoe-wan (tunnel). My test on my friend's FTTH line with 200/20 plan showed that Xiaomi Mi Router 4A Gigabit Edition is able to shape up to 150mbps using CAKE without software flow offloading and it's fine doing 200mbps with software flow offloading turned on. But I'm afraid Diffserv is not working when shaping is done on wan instead of pppoe-wan, every packet goes into Best Effort tin. Othen than that I wasn't able to test if nat option is working, so not sure about that.

@moeller0

I am wondering if you were able to find time to do any tests comparing SQM vs. SQM+SFO. Information I have been able to find on forums so far on whether SFO does or does not impact SQM [including some great screenshots and info earlier in this same topic] seem to be inconclusive or at least not root-caused, same as my own experience with the burning question. Would be great to hear feedback from expert on the topic.

There is one simple solution to this question, getting hardware with sufficient margins to do sqm without any kind of offloading…

--
I'd never buy a new device with flow-offloading in mind, the technology is too new and quirky to do that with a clear conscience. If you suddenly find yourself in a situation where your old hardware won't cope without it after a 'sudden' speed upgrade, fine - give it a try, no harm done - but don't select hardware with it in mind.

2 Likes

I experienced slh's footnote with my ER-X gateway ...I suddenly found my existing hardware was too slow to handle SQM/QoS after an ISP speed upgrade from ~200 to 500 Mbps. So of course the very next thing I tried was to save my hardware by experimenting with software options!

I no longer recall the exact improvement software flow offloading provided with SQM/QoS, but I do recall it was marginal at best (less than 5-10%) and difficult to discern within test result variability. I did at least convince myself it didn't hurt LOL...

Flow offloading and SQM/QoS are not compatible - the CPU needs to handle SQM/QoS. I attributed any observed benefit to software offloading freeing up a few CPU cycles on lan traffic that could then be used for SQM/QoS, but that may be an entirely nonsensical attribution, considering my limited understanding of offloading "fringe benefits."

In the end, I did exactly what slh recommends in the post above - buy enough hardware to not need or care about software flow offloading. I replaced my ER-X gateway with a NanoPi R4S, the CPU of which is woefully underutilized with "only" 500 Mbps ISP service. It's embarrassing really. I think I need to upgrade to faster ISP service :wink:

No, I have to admit I did not, I even forgot to put this on my TODO list.

hi everybody veth is compatible with sfo ? thanks

Without having tested in any objective way, and just based on ordinary usage not including Teams or Zoom calls, I can’t tell the difference, albeit I think I am correct in stating that cake-qos-simple is SFO friendly.