So something went wrong between "kernel: bump 5.4 to 5.4.65" and "kernel: bump 5.4 to 5.4.66". As there were only two commits that can cause this (realetd to ramips/mediatek), I will further narrow it down.
Still find it strange that noone is having this issue with latest snapshots...
I still want to try and disable flow control on ALL ports instead of just the CPU port, to see if that makes any difference for how often this issue crops up. Unfortunately, my knowledge is falling short of being able to write a patch myself. If there are any developers willing to collaborate, please have a look at the new topic I've started: Mt7621 / mt7530 programming: Disabling Flow Control on all ports
For those who are interested, the above linked topic by me contains patches to disable flow control on ALL MACs instead of only one, disable it globally as well AND disable pause frame advertisement on the PHYs. I have tested it on my home router and everything is running fine.
However, this router has always been stable, so not sure if it actually fixes the transmit queue has timed out issue. I will deploy it to the router having issues this week. If anyone else wants to test it out feel free
On the other hand, can someone tell me what are the recommended settings for mt7621? In the last couple weeks I noticed that with default settings (packet steering and soft flow offload ON) I cannot reach more than 350Mbits and only a single core is utilized (with PPPoE).
It only solves the PPPoE disconnect issue. No HW offload, I tried it. However it would be very nice if someone can clarify what these commits are actually achieving and what are the recommended settings, as it is quite clear that the default settings and only enabling software offload is far from enough.
Testing it now on my R6850 router (mt7621a/t). Was getting lots of modem hangup on PPPoE like every hour or 2. So far 5 hours in and PPPoE still up with that commit.
The packet steering with software offload became the default in one of the commits months ago. I can't recall, but it said it provides more performance with it enabled along with SW offload. Maybe you can try it now with HW NAT, since along with that latest MT76 patch, HW NAT on my device works (based on the fact that with it enabled, SQM is ignored as intended. That's as far as I can test with my capabilities).
Packet steering and SW offload is enabled, yet without tweaking kernel parameters, by default this setting combination gets 350Mbits and single core limit. This is clearly not the desired operation.
I found an interesting patch set among the preliminary 5.9 kernel support in Felix's repository:
The PPE (packet processing engine) is used to offload NAT/routed or even bridged flows. This patch brings up the PPE and uses it to get a packet hash. It also contains some functionality that will be used to bring up flow offloading later.
I was almost sure this bug was fixed in the latest trunk with the DSA changes and patches but it occured again. The router continued to work after the exception.
I found that the DSA-driven mt7530 switch can now set VLAN through UCI. Netifd provided support in the latest submission:
This is my uci settings:
uci set network.sw=interface
uci set network.sw.type='bridge'
uci add network bridge-vlan
uci set network.@bridge-vlan[0].device='br-sw'
uci set network.@bridge-vlan[0].vlan='1'
uci set network.@bridge-vlan[0].ports='lan1:t lan2 lan3'
uci add network bridge-vlan
uci set network.@bridge-vlan[1].device='br-sw'
uci set network.@bridge-vlan[1].vlan='3'
uci set network.@bridge-vlan[1].ports='lan1:t lan4'
uci add network bridge-vlan
uci set network.@bridge-vlan[2].device='br-sw'
uci set network.@bridge-vlan[2].vlan='4'
uci set network.@bridge-vlan[2].ports='lan1:t'
uci set network.lan.ifname='br-sw.1 bat0'
The VLAN is correctly set, and the iptv multicast data is transmitted stably on the VLAN; but batman-adv will cause the kernel to panic: