For $100 or so you can get an RPi4 with 4GB of RAM and an SD card with 32GB of storage. And yet OpenWrt is designed to run still on 8MB of flash and 64MB of RAM.
Now RPi4 is not really a router/network appliance, but the point is a router that can route a Gigabit, SQM a gigabit, wireguard a gigabit, and still have a couple CPUs left over to do smart stuff is kinda the baseline we should be targeting. Let's put Julia on a router and do Bayesian inference on the DSCP values each flow should have to maximize the expected satisfaction of the users based on a history of its transmission rates and such... (perhaps by compiling eBPF from Julia via https://github.com/jpsamaroo/BPFnative.jl and directly installing the result in the kernel, updated every 20 minutes) let's put a mesh database of named network resources on the internet so you can go to your browser and say "what new pictures or stories has my friend dtaht put up on his friends gallery?" and get 100% of the actual benefit of facebook without the intermediary and with complete separation of UI from publication... also using strong encryption using session keys published to your friends.
Let's auto-detect suspicious behavior by studying a sample of 10% of packets going through the router by forking them to user-space from nftables and running them through a neural network classifier into a Bayesian decision theory optimizer and have the router send Signal messages to your phone when it detects serious probes or that devices on your LAN have been infected with malware.
Have hot-spare fallback support built-in via keepalived / VRRP. When you buy a router, it's two routers in one box with two separate power supplies.
Is it worth splitting these items into a few different categories? Long-haul wireless, for example, requires a commercial hardware infrastructure solution. Thread support is going to require some level of home hardware. Network reporting tools are a UX/UI problem. etc.
Some of these are multi domain problems, but it seems there's usually a primary problem that will require the majority of effort for the first pass.
Would it help to break out improvements to some sort of tier structure based on audience or hardware requirements?
@dtaht: First and foremost, thank you so much for you work in eliminating Bufferbloat. CAKE SQM is the best. I have a few questions although they may be a bit off topic:
Does CAKE currently work with only a single CPU core in the current OpenWrt 21.02 and SNAPSHOT (kernel 5.4, kernel 5.10 and maybe kernel 5.15 in the future)? If yes, can it be made to use multiple CPU cores and thereby make it process more bandwidth?
I had a Belkin RT3200 (dual-core 1.35 GHz ARM v8 Mediatek CPU) that I was using with Comcast Xfinity 600/20 Mbps DOCSIS plan (20% overprovisioned, to 720/24 Mbps). CAKE + layer_cake.qos at 600,000/20,000 setting in the RT3200, I still only get around 400 to 450 Mbps download (wired), which I think is because of CAKE single-core limitation. With FQ_CODEL + simple.qos, I am getting close to the 600 Mbps download (wired).
I read that FQ_CODEL is the default qdisc in Linux and maybe *BSD, macOS, iOS etc. Do you know if it is also implemented in Windows 10, Windows 11 and Android?
I would use cake with the ack-filter on your upload, fq_codel on the download.
fq_codel is the default on most linux distros, the default in multiple wifi chips also, osx, and ios. We have talked to the windows people, they haven't said a word. I would like to measure typical windows 11 latency under load performance one day soon.
freebsd and openbsd (pfsense, etc) have fq_codel implementations usable via their pf tool, but it is just used for shaping. The BSD's have thus far completely missed on the BQL innovation in linux that makes line-rate qdiscs like fq_codel do their job.
Multiple folk in the android post-market have turned on the qdisc version of fq_codel,
but the core problem is that the USB/Wifi/3/4/5G device drivers themselves were horrifically overbuffered and putting fq_codel on top of that did very little good.
I am hopeful that the next generation of cellphones will finally have deployed fixes.
Right now I am using Raspberry Pi 4B as main router + Netgear R7800 as AP, both running SNAPSHOT, with my Comcast connection in USA. When I tested the Belkin RT3200, I had both "Packet Steering" (no idea what it does) and irqbalance (I have some idea what it does) enabled. I still could not exceed about 450 Mbps download speed.
I sent the Belkin RT3200 to another country (India) and it is currently being used for a EPON FTTH 60/60 Mbps PPPoE connection (ISP: BSNL) running regular CAKE + layer_cake.qos. I can connect to it and monitor the device and make changes if necessary (me being the family's IT dept).
I am still planning to buy a Belkin RT3200 or Linksys E8450 (whichever is cheaper) for myself during the Black Friday sales next week in USA.
I haven't tried Qosify yet, neither with the Belkin RT3200 when I had it nor with my current Raspberry Pi 4B setup.
I am very interested in before/after cake measurements especially of new technologies such as EPON.
yes, I was suggesting using fq_codel on the download.
I note also there may well be other tuning factors we have not tried yet on making downloads go faster through cake or even in general (unshaped), one thing I worry about is that the device drivers' rx-ring may well be too small and we are dropping packets there. Overall linux switched to doing far more processing on rx (and then compensated by inserting XDP into that path also) in the last few years.
VNC's traffic may heisenbug your test. (were your prior results via vnc?) It's hard to complain about either result, but you are experiencing wider latency swings with sqm off. My guess (without looking at a packet capture) is that that service is using a policer rather than shaper, but don't take that for truth. I have seen more than a few "no bufferbloat" results recently which implies a policer hard dropping packets at a limit, which... a little buffering is needed to effectively use a connection! 5ms of extra buffering is not bufferbloat. Dropping packets willy nilly messes with other protocols.... but... philosophically, at least... I think policing is a better approach than a giant tail drop fifo, especially at higher rates. In fact I've mostly come to the opinion that we don't need all these fancy algorithms above a gbit - a little fq maybe, but not more than 1ms.
The quality portion of the dslreports test factors in packet loss (which it shouldn't). The best way to get an A+ score on that test is to have ecn enabled.