R7800 SQM settings keep causing bufferbloat

I tried to input:
root@OpenWrt:~# echo 35 > /sys/devices/system/cpu/cpufreq/performance/up_threshold
-ash: can't create /sys/devices/system/cpu/cpufreq/performance/up_threshold: nonexistent directory
root@OpenWrt:~# echo 10 > /sys/devices/system/cpu/cpufreq/performance/sampling_down_factory
-ash: can't create /sys/devices/system/cpu/cpufreq/performance/sampling_down_factory: nonexistent directory

^ Think i misunderstood what you meant.

Tried this:
root@OpenWrt:~# echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
root@OpenWrt:~# echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor

Now in the graphs from your Hnyman build for R7800, went to CPU Freq graph and its 1.73Ghz.

I also tried flow offloading, both software and hardware... Weird how flow offloading requires the software offloading being in order for hardware offloading to be . Barely noticed anything. Tested once again bufferbloat via Dslreports and nothing good visually. I worry that when i'm sleeping as router cpufreq never goes back to idling freqs, my hardware might not last as long or temps will continue to rise over time/frequent reboots. Guess only time will tell what will happen.

https://www.reddit.com/r/HomeNetworking/comments/87m7r0/psa_software_nat_offloading_and_soon_hardware_nat/

So this indicates that sqm on eth0.2 should actually work, so this is quite odd.

The tc output shows that both expected cake instances seem up and running with nothing obvious wrong.

Could you, just for testing change sqm to eth0.2 again and then set

to 50000

and

to 25000
and run a dslreports speedtest and post a link ot the results here, please (keep the frequency at maximum for this test).

Also please post a link to a dslreports speedtest with SQM disabled, please.

It's kinda severe don't you think to go 50% download & 80% upload capped for just bufferbloat, mind you game packets are like 17 KBps for League of legends. I know thats not how it works but very severe case to test like this.

R7800 eth0, 100mbps symmetrical down & up SQM, 25mbps loss of throughput

R7800 eth0.2 change, 100mbps symmetrical down & up SQM, 25mbps loss of throughput. (Only eth0.2 change)

R7800 eth0.2 change, 50mbps down & 25mbps up SQM, over 50% loss of throughput.

R7800 eth0.2 SQM DISABLED change, 50mbps down & 25mbps up SQM, over 50% loss of throughput.

As you can see the results, going from 100mbps down to 50mbps down & 100mbps up to 25mbps up made a difference from Grade C -> Grade A+, what i don't understand is and i hope more users can chime in on this thread, PLEASE :smiley:

I used a WNDR3700v4, like situation i listed above about what had happened, before i was getting these Grades A/A+ if i was using a SINGLE CORE processor router with lower frequency & lower amounts of RAM, never changed the setting on that. Never touched it at the time other than installed UPnP & SQM packages. How does this make any sense that i went from 6250kbps down each 4 times from 125mbps to 100mbps i used to get Grade A/A+ to switching to a R7800 = Grade C? Especially when it has 2 Cpu cores 1.7Ghz and more RAM too, also if anyone says anything about me installing hnyman's R7800 packages, i've tried even the regular 18.06 and it didn't do anything good for me, 18.06 hnyman R7800 & 19.06 Hnyman R7800. Also, doesn't matter if i go with Layer_cake.qos/piece_of_cake.qos with dscp settings enabled. I need others to place their results of what their max throughputs they get and what they lost by trying to achieve a certain Grade in DSLReports. Over 50% loss is too far & differences between routers shouldn't do this.
Sorry, i have to clarify everything so everyone can understand perfectly.

WNDR3700v4 results from May 2019.

Yes, you did.
I wrote "/sys/devices/system/cpu/cpufreq/ondemand/up_threshold" and
you tried "/sys/devices/system/cpu/cpufreq/performance/up_threshold"

There is no up_threshold in the performance governor, as it is always maxed out.

There is that parameter for the ondemand governor (where CPU speed varies according to the load).

I provided two alternatives:

  • use the performance governor (which you have now figured out),
    echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
    or
  • tweak the parameters for the ondemand governor to modify its behaviour to be more eager to max out the CPU.
    echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
    ...

ondemand has several parameters that were also mentioned/discussed in those threads.

root@router1:~# ls /sys/devices/system/cpu/cpufreq/ondemand/
ignore_nice_load      powersave_bias        sampling_rate
io_is_busy            sampling_down_factor  up_threshold

more info at https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt

Only reason was because you wrote:

"* Use "performance" governor instead of "ondemand"."

so i took that as reasoning to have to change the command just a little to fit with what i wanted as you wrote it as an example. Sorry.

Also, i'll just stick to this one. If should i need to for any reason to go with the other one you listed as an alternative i'll try it out. I typically pretty much want it to be maxed out on performance imo to begin with.

EDIT: ** 6/30/2019, 1:56AM, CPU Freq changed back to idling or using ondemand because i had a issue with my modem for some reason resetting all its settings when i woke up and tried to get into my modem gateway address to config it and couldn't because everything was reset. My brother then turned off the surge protector and waited couple minutes then turned it back on and my R7800 is back to its idling freq states. >_<

moeller0, can you send me details on whats your downloads/uploads and whats your SQM settings to compare?

I've also found this article & thread.

I believe it has to do with DHCP being left on in the settings of my ATT gateway, i've tried to disable it but after i restart my modem, i can't reconnect so i have to reset the gateway and i lose like 5 minutes of my day each time trying to reconfig it over and over again to see if i might have done it wrong in some way or another.

The article talks about how this can affect QoS which SQM is a form of QoS.
LuCi GUI says i'm getting the Public IP Address passed-through from my modem to router properly as i set via "IP Passthrough" but when checking via tracert i notice two private IP's and then my public IP.
I can't seem to fix that.

Thanks for the data. First, the idea behind shaping to 50/25 was only to see whether at such extreme bandwidth sacrifice sqm/cake manage to work as intended, and indeed they do.

Comparing the dslreports detailed results pages indicates that the upload direction is the problem, download looks actually pretty good compared to without sqm. That means you can set downloading to 100 and we can concentrate on the upload side.
I would now iterativelyv search for the highest upload rate that still results in acceptable bufferbloat, for that test I would set the governor to performance again to only deal with one issue at a time....

I can post my settings later today, when I have access to that data again.
About double NAT, sure it has its challenges, but traffic shaping typically is not among them. I can see how port and IP based QoS schemes need special care or run into insurmountable issues, but I believe that this is not your biggest issue right now.

Sure here is my /etc/config/sqm:

config queue
        option debug_logging '0'
        option verbosity '5'
        option qdisc_advanced '1'
        option squash_dscp '0'
        option squash_ingress '0'
        option ingress_ecn 'ECN'
        option egress_ecn 'ECN'
        option qdisc_really_really_advanced '1'
        option linklayer 'ethernet'
        option overhead '34'
        option linklayer_advanced '1'
        option tcMTU '2047'
        option tcTSIZE '128'
        option linklayer_adaptation_mechanism 'default'
        option iqdisc_opts 'nat dual-dsthost ingress'
        option eqdisc_opts 'nat dual-srchost'
        option interface 'pppoe-wan'
        option qdisc 'cake'
        option script 'layer_cake.qos'
        option tcMPU '68'
        option enabled '1'
        option download '49000'
        option upload '9500'

you will notice I have a much slower link (50/10) and an ISP that insists upon using PPPoE.

About double NAT, I rethought this and cake's per-internal-IP fairness will still works, because even after double NAT the internal IP addresses are unambiguous and resolved by cake's deNATting, so while double NAT is far from double fine, I would recommend to deal with that issue later (if at all)

Gotcha, but one thing that still itches me is not understanding why one router requires a huge loss of bandwidth comparing between the two. I hope i find out the cause of this if not someone has an answer for this issue.

Also, i noticed you only took a 1mbps dive on down & 500kbps on ups. What grade are you getting?

I never knew CAKE can deNAT.

Let's first figure out how much bandwidth sacrifice is actually needed, and then we can try to figure out the root cause?

Well, I tried iteratively to "optimise"these settings. Also note the option iqdisc_opts 'nat dual-dsthost ingress' line, the ingress keyword makes cake shape the ingress so that the incoming rate is targeted, which results in more aggressive shaping which in turn allows to set the shaper closer to the real limit. On egress, in theory shaping should work at 100% of the real egress rate, so shaving off 5% is already more than theoretically required, one of my issues is, that I know my ISP employs a shaper at its BNG level, but I have no information about that shaper's setting so I need to approximate those limits. (And I value keeping latency low over maxing out the bandwidth, so I could not care less about even trading 10-20% of bandwidth for consistently low latency-under-load-increase. but this is a policy issue, and I fully understand that other's have different policies.)

Mostly A+ and occasionally As, but I tend to look more at the detailed bufferbloat plots anyway, the grades are too coarse for my taste.

Well, this is one of the cool features which allows cake to get to the real internal and external addresses basically virtually undoing the NAT masquerading, and that in turn allows the nifty per-internal or per-external IP address fairness modes (in my config nat dual-dsthost in ingress and nat dual-srchost on egress fairly share download and upload bandwidth between all concurrently active internal host addresses, so no computer can easily take over the network completely)

Ok, i'm fine with that. I will be patient. :slight_smile:

Well, I tried iteratively to "optimise"these settings. Also note the option iqdisc_opts 'nat dual-dsthost ingress' line, the ingress keyword makes cake shape the ingress so that the incoming rate is targeted, which results in more aggressive shaping which in turn allows to set the shaper closer to the real limit. On egress, in theory shaping should work at 100% of the real egress rate, so shaving off 5% is already more than theoretically required, one of my issues is, that I know my ISP employs a shaper at its BNG level, but I have no information about that shaper's setting so I need to approximate those limits. (And I value keeping latency low over maxing out the bandwidth, so I could not care less about even trading 10-20% of bandwidth for consistently low latency-under-load-increase. but this is a policy issue, and I fully understand that other's have different policies.)

^ I too prefer low latency over bandwith, I also went with 20% reduction in total throughput as latency is a big must for me.

Mostly A+ and occasionally As, but I tend to look more at the detailed bufferbloat plots anyway, the grades are too coarse for my taste.

^That's great for the very little loss.

Well, this is one of the cool features which allows cake to get to the real internal and external addresses basically virtually undoing the NAT masquerading, and that in turn allows the nifty per-internal or per-external IP address fairness modes (in my config nat dual-dsthost in ingress and nat dual-srchost on egress fairly share download and upload bandwidth between all concurrently active internal host addresses, so no computer can easily take over the network completely)

Ooh, i think i might try nat dual-dsthost and nat dual-srchost, i forgot to mention but with SQM disabled i think i was getting less bufferbloat with it jumping around compared to enabled. I might have to retest but still got the same grade overall though.

Hi Hnyman, I've been researching on my problem. the performance governor imo barely does much as i don't notice anything when doing dslreports performance tests, it doesn't really mitigate bufferbloat imo from just bring the Cpu Frequency to max.

  1. Also if i reboot my router, the router running at 1.73Ghz goes back to idling afterwards.
    Is there any command i can insert in CLI to get it to stay at 1.73Ghz possibly?
    Or maybe i should just go with the other alternative you left me.

  2. Right now i figured out couple things, i'm talking to mindwolf via private message to see if my issue is similar to his because he had a issue with SQM not working right or something to do with the R7800 build possibly?

I asked him for help and he willingly kindly helped me out.
Right now I've figured out something is the issue with SQM, if i insert the command for the upload, it works fine, i get A/A+ Grade but with SQM through GUI, it does not work properly.

Right now i need to know what the command to get the downloads to work as my downloads were at 126mbps and uploads were like 93-100mbps.

  1. I am curious, do you use SQM via CLI or GUI? Majority use it via CLI i'm sure.
    Could i get one command for layer_cake.qos & one command for piece_of_cake.qos with both download and uploads at 100mbps bandwith dual srchost & dual dsthost?

I want to do individual tests between layer_cake.qos & piece_of_cake.qos. I don't want to continually ask one at a time so i'll ask for both. :smiley:

  1. Just had the second crash on this router in 2 weeks of ownership or so. How should i go about testing or checking for what caused it?

P.S. Mindwolf told me he was testing my uploads because he thought that was the issue and we're going off one thing one at a time to figure stuff out.

Thank you!

This might help you

Okay, will do this later in day. Have to sleep, appreciate the help :smiley:

I'm having a hard time how to get this running.
I installed ethtool package from LuCI GUI software page, now what should i do? I looked into the thread you sent me but i can't figure it out. Thanks

Run these commands from /etc/rc.local or in LuCI System / Startup at the vey bottom of the page.

echo 1 > /proc/irq/28/smp_affinity
echo 1 > /proc/irq/29/smp_affinity
echo 2 > /proc/irq/31/smp_affinity
echo 2 > /proc/irq/32/smp_affinity

echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
sleep 1                                                                
echo 1750000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1750000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
ethtool -C eth0 tx-usecs 0
ethtool -C eth1 tx-usecs 0 
ethtool -C eth0 rx-usecs 31
ethtool -C eth1 rx-usecs 31

Curious as to why i get

root@OpenWrt:~# /etc/rc.local
-ash: /etc/rc.local: Permission denied
root@OpenWrt:~#

I just want to know how to do this both ways, through ssh or startup. I just performed it on startup.
Thank you very much!

Because it does not have a execute bit set: run it as sh /etc/rc.local

I have one more question fantom-x,

Is this the same way to do this for SQM at startup? I keep losing my SQM if i reboot the router and have to ssh back to CLI to insert these commands
(e.g.

sh /etc/init.d/sqm or is it also sh /etc/rc.local

ifstatus wan 
tc -d -s qdisc show dev eth0.2 
uci show sqm 

/etc/init.d/sqm stop && tc -s qdisc del dev eth1.2 root && tc qdisc add dev eth0.2 root cake bandwidth 100mbit wash dual-srchost no-split-gso

/etc/init.d/sqm stop 

tc -s qdisc del dev eth0.2 root 

tc qdisc add dev eth0.2 root cake bandwidth 100mbit wash dual-srchost no-split-gso 

ip link add name ifb4eth0.2 type ifb
tc qdisc del dev eth0.2 ingress <- i get an error with this, must have this command for some reason though.
tc qdisc add dev eth0.2 handle ffff: ingress
tc qdisc del dev ifb4eth0.2 root
tc qdisc add dev ifb4eth0.2 root cake bandwidth 100mbit wash dual-dsthost ingress no-split-gso
ifconfig ifb4eth0.2 up # if you don't bring the device up your connection will lock up on the next step.
tc filter add dev eth0.2 parent ffff: protocol all prio 10 u32 match u32 0 0 flowid 1:1 action mirred egress redirect dev ifb4eth0.2