Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

How do you do that?
That is... what changes do you make to kernel config and where?

Wouldn't mind learning a new thing or two.

To do what, exactly - for ftrace (the reporting mechanism used to indicate rcu stalls), i first screwed around with the kernel_menuconfig options, but was diving too deep there.

Make menuconfig has options which I'll try to locate next time I'm in front of my system.

Most productive was googling something like enable ftrace in openwrt ...

I've backed away from the v23 cpu 0 stall issues ... it reproduces pretty often with a reduced grace period (default is 20sec ... really?!) I used 5 for this testing; this option is under kernel hacking via kernel_menuconfig.

... backing away as i don't have the kernel degubbing chops and don't want to learn ftrace well enough to locate the cpu 0 stall cause.

I did notice my net was stuttering a bit ... there may be shorter stalls that aren't triggering a cpu 0 nmi to reset it. I'm back on 22.03 until my curiosity grabs me by the nose again.

Mmmv,
M.

1 Like

@apccv -

make menuconfig ->
Global build settings ->
Kernel build options ->
Compile the kernel with tracing support (KERNEL_FTRACE [=y])

... a lot of trace opts available in the kernel ...

hope this helps,
M.

1 Like

Thanks for the heads up.

I did notice my net was stuttering a bit ... there may be shorter stalls that aren't triggering a cpu 0 nmi to reset it. I'm back on 22.03 until my curiosity grabs me by the nose again.

I want to expand on this a bit - in a perfect world dropping in a newer kernel wouldn't affect the processing mix a bit - like going from 5.10 to 5.15.

One thing which will show many nasty timing, co-processing, locking and data sharing issues will be some change to the scheduler rules or capabilities.

Both 22.03 and 23.05 have kernel option General Setup -> Preemption Model set to 'server'. If 5.15 changed what 'server' means for scheduling, then a different processing path could be exercised. One that results in cpu 0 stalling.

As far as that stalling goes - now that I've reloaded 22.03, I realize just how smooth and snappy it is compared to 23.05. the latter seems quite jerky in comparison.

My pet theory is some interrupt is being unserviced or pre-empted quite often, and mopped up by normal router operation. only when it hangs for the full 21 seconds do we see log entries and feel an impact under use, if we're watching at the time.

2 ideas -

  1. drop that RCU CPU stall timeout: make kernel_menuconfig -> kernel hacking -> RCU debugging ... to maybe 2 or 3. that's seconds which still seems a seriously long time for anything to hog/disable a cpu in the openwrt world.

B) build 22.03 with that pre-emption model set to low-latency and see if anything interesting happens. scheduler changes show code which implicitly requires a particular sched model.

3'rd idea -- A whole 'nother avenue of work would be something that manages router hardware and code ... resetting, restarting transparently if something runs amok.

hnyman noticed/posted following that may be of interest to some.

1 Like

Firmware pooped itself after 9 days, 20 hours, and 49 minutes .I am going to keep it and set it to reboot weekly.

Dog still not barking. 22.03 with preemption set to low latency seems to work fine - still crisp. I need to verify the kernel was actually built that way ... if so the good news is r7800 nss doesn't have any hidden or implicit dependencies on a fixed/deterministic scheduler.

I'll go back to 23.05 with ftrace tomorrow and see if i can figure out the last thing cpu 0 did before going idle forever.

@ACwifidude is it possible to get 22.03 nss chromium build ?

Unfortunately chromium is not available for 22.03- chromium was added after 22.03 branched off. I’d try the 23.05 version.

1 Like

anyone with good experience here, can you please try compiling a guide how to use privaxy on openwrt via docker ? for adblocking

Docker… on this platform? Haven’t tried. Not sure if it is an option.

There are two opkg packages for ad blocking using dnsmasq. SimpleAdBlock and something else. Both work well.

Docker on tha x86 platform works well. Haven’t used privoxy but pihole instead. Works really well, straightforward to set up.

So where again are you planning to run your container?

anyone been able to rebase onto rc3? Having a hell of a time getting the pptp patch to apply.

advantage of privaxy type of man in the middle based adblocking is its acts similar to adblocking browser addons and can block hardcoded ads like youtube ads which is impossible with domain based adblocking.
I am planning to run it on tplink onhub (I think it has enough juice- 1 gb ram, 4gb flash storage)

You can run privoxy directly on the router. May want to use my builds or privoxy package from my repo, since it is more recent than openwrt privoxy package. Last year I sent my wpad code upstream which is now integrated into privoxy. Through wpad feature you can instruct your clients to automatically configure privoxy as proxy source. This way you don't have to manually enter proxy settings in your clients config. If a client has a proxy set instead of doing transparent proxying, it will also use privoxy for https, thus also filters ads in https sites. I use a combination of dns and privoxy for the best experience. If you also run a guest network, then you have to configure a few things in privoxy in order to block clients entering local ressources through privoxy. I probably should write some guide on how to use privoxy with wpad, guest network etc.

3 Likes

Hi everyone, I feel like I've hit a wall with the SQM, no matter what I do, it seems like SQM is causing a huge bufferbloat let alone reducing it. I'm using an NBG6817 with the "NBG6817-20230717-Stable2305NSS-ath10k-sysupgrade.bin" build of @ACwifidude 's. I've tried both the rc.local script that is included in the second post and @rickkz0r 's nss-rk.qos. This is what my tc -s qdisc looks like:

root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc nsstbl 1: dev eth0 root refcnt 2 buffer/maxburst 4554b rate 38Mbit mtu 1518b accel_mode 0
 Sent 184746693 bytes 681587 pkt (dropped 796, overlimits 27943 requeues 0)
 backlog 0b 0p requeues 0
qdisc nssfq_codel 10: dev eth0 parent 1: target 5ms limit 312p interval 100ms flows 1024 quantum 304 set_default accel_mode 0
 Sent 184746693 bytes 681587 pkt (dropped 796, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 maxpacket 1506 drop_overlimit 315 new_flow_count 291766 ecn_mark 0
 new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 17477848 bytes 17547 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 1506 drop_overlimit 0 new_flow_count 396 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth1.1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.35 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev pppoe-wan root refcnt 2 limit 10240p flows 1024 quantum 1518 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
 Sent 538840 bytes 3490 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 705 drop_overlimit 0 new_flow_count 98 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc nsstbl 1: dev nssifb root refcnt 2 buffer/maxburst 59202b rate 475Mbit mtu 1518b accel_mode 0
 Sent 1633912738 bytes 1115180 pkt (dropped 8832, overlimits 26883 requeues 0)
 backlog 0b 0p requeues 0
qdisc nssfq_codel 10: dev nssifb parent 1: target 5ms limit 3911p interval 100ms flows 1024 quantum 1518 set_default accel_mode 0
 Sent 1633912806 bytes 1115181 pkt (dropped 8832, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 maxpacket 1518 drop_overlimit 0 new_flow_count 30679 ecn_mark 0
 new_flows_len 0 old_flows_len 1

I have a 500/40 fibre connection.

Here is a result without SQM:

And here is one with it:

It seems it caps the speed but as you can see it introduces around 30 ms or more latency.

What am I doing wrong?

Maybe that’s enough capacity to run it, I can’t tell.

If it is already containerized then it is straightforward.

  • install dockerman
  • get a compose file for running the image
  • run it

I would start by locating an image for privaxy for your architecture. If it exists and you trust it life is infinitely more simple. If not you will need to build it.

I would continue with installing docker and running the hello world container. See how bad that gets from a flash space perspective.

The trickiest part might be networking. Depends on how privacy works. If it is a proxy or acts as dns the best solution I’ve found is to use a macvlan. That goes in the compose file.

I’m curious about it so I’ll look for a privaxy image. Maybe I can edit this post later with better information.

Good luck

@KONG I could not find chromium build. https://www.desipro.de/openwrt/

found the patch breaking things if anyone more kernel-savvy than me wants take a crack at fixing 999-203-qca-nss-clients-ppptp-support.patch

(although correct me if I'm wrong but all that should need to happen is that the sk->sk_bound_dev_if in the new pptp_route_output struct is changed to 0)

edit lmao:

for your cherrypicking pleasure

double edit: hold off a sec i mighta messed up quilt

triple edit: jk its fine

3 Likes

Thanks! I can build NSS 23.05 images again. I was having the same problem as you described above. I was able to build a working image for the TP-Link OnHub without issue, flashed it to the router and away we go.