Mt76 wireless driver debugging

FWIW, I have been enabling it via the mt7915e module file:

root@AP-Office:~# cat /etc/modules.d/mt7915e
mt7915e wed_enable=Y
root@AP-Office:~# cat /sys/module/mt7915e/parameters/wed_enable
Y

A couple things around this... I have only been looking at bind and frankly didn't even explore enough to realize there was a separate entries file as well. But FWIW, when WED seemed to be working, I could watch the bind file and see the offloaded flows coming and going.

(For the curious, I use this: watch -n0.1 'cat /sys/kernel/debug/ppe0/bind')

However, the "packets=0 bytes=0" is now resolved as of these commits from @nbd, I believe: Okay, after looking at entries I now see what you're saying. I can confirm I see only "packets=0 bytes=0" there. These commits I posted below did address the "packets=0 bytes=0" for bind specifically:

When I do actually see an offloaded flow, I can confirm I see these counters increasing over the life of that flow.

For example, straight after a clean reboot:

But, the behavior around what gets offloaded and consistency of it is pretty poor at this point from what I can tell on my APs, if bind is the source of truth as to whether flow offloading is actually occurring.

A few more questions for you, sir, if you don't mind...

I have updated my firewall config on one of my APs to include the SW & HW offloading enablement. However, as I mentioned I run my 3x RT3200s as "dumb APs"/WAPs. So, I have always had the firewall service itself disabled.

In the name of experimenting, I started the firewall service post update of the offloading settings and was greeted with this:

root@AP-Office:~# /etc/init.d/firewall start
Hardware flow offloading unavailable, falling back to software offloading

I assume that was expected, but could you let me know if you recognize something I might be missing here? Is it true that I should now have the firewall service enabled to have the offloading settings applied properly?

root@AP-Office:~# cat /etc/config/firewall

config defaults
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option flow_offloading '1'
	option flow_offloading_hw '1'

root@AP-Office:~# cat /etc/config/bridger
config defaults
	# example for blacklisting individual devices or bridges
	# list blacklist eth0

root@AP-Office:~# /etc/init.d/bridger status
running

And it should stay disabled (i meant firewall in dap mode).
Also i have disabled ip forwarding.
I enabled blacklist for eth0 and the 2.4ghz device radio in bridger.conf.
Using myself dap mode only on my ax6s.
I have these on rc.local to ensure persistence even after sysupgrades:

/etc/init.d/firewall disable
/etc/init.d/firewall stop
/etc/init.d/dnsmasq disable
/etc/init.d/dnsmasq stop
/etc/init.d/odhcpd disable
/etc/init.d/odhcpd stop
# disable forwarding for DAP
echo "0" > /proc/sys/net/ipv4/ip_forward

Hey, I am running four e8450 as access points but do not have IP forwarding disabled. Could you explain or point me to docs I could read more why it is required?

I have always configured my dumb APs in the same way. But I'm not clear on how the SW & HW offloading that @daniel mentioned would take effect without the firewall service running. Hoping someone can bring some more clarity to that for the good of this group.

I am curious about this, as I see that @grzesiczek1 is as well. I very much appreciate some more information about what the purpose is there.

It sounds like you have WED enabled if you're using bridger. Are you seeing the flow offloading (as indicated by watching the bind file) stop after some time?

Disable ip forwarding: It's the last line pasted 2 posts above this from my rc.local.

I have watched myself the bind output and was not offloading at all for me.

Apparently setting enable_wed =Y dissapeared from modules.

After i fix that and reboot i could see 1 ffloar, after that nothing.

I havent check this for quite while but i'm sure it worked before, i could see alot of data output watching the ppe0 bind.
I run this build, just upgraded: OpenWrt SNAPSHOT, r22610-28ce677fa7

I have it fixed now, wed offlod looking great so far, over 30k packets offloaded in few minutes.

Big slap in the face, i have disabled all blacklisted devices in bridger.conf and execute a bridger restart from cli.

Will keep monitor this.

L.E. @_FailSafe you're right, offloading stops after some time, like a timed stop, 5 minutes or so?

I have checked dmesg and bridger status while this happens but no errors in dmesg and bridger is running.

Restarting bridger (/etc/init.d/bridger restart) resumes the offload untill stops again.

Indeed--similar story here from what I'm seeing. I haven't paid close enough attention to a clock to put a definite time frame against when it stops, but it doesn't run long.

It is interesting because it used to run a lot longer before stopping. Not sure if/where a regression occurred.

But, oddly enough, sometimes offloaded flows just pop up in ppe0/bind and work again for some time without restarting anything. It is just wildly inconsistent right now. :frowning:

Then there's this:

Do you see a high amount of retries when you have a flow go through ppe0?

@_FailSafe I'm working with the new device based MT7681b (cudy wr3000) and I can confirm bridger failing same way for me
I've monitored /sys/kernel/debug/ppe0/bind and when there is no new stream
/etc/init.d/bridger restart ;watch -n 1 'cat /sys/kernel/debug/ppe0/bind'
will enable WED
also I cannot confirm high retries in compare to not using ppe0 on my device .

I don't see open ticket in issues both for openwrt/openwrt and nbd168/bridger . Do we know when bridger started failing ? Maybe will be good time to open an issue ticket

I also cannot confirm retries.

In the ppe0 folder there is another file, watch that one, it shows alot of packets that should be offloaded but intead they are all zeroes for the NEW corresponding src/dest/MAC.

I don't have access atm, but i did captured the output in /tmp by redirecting stdout to a file.

Will paste some lines in the morning.

So i did a watch -n0.1 'cat /sys/kernel/debug/ppe0/entries > /tmp/wed.txt' for couple of seconds, after watching bind and output went "silent".

Some extras:

00270 UNB IPv4 5T orig=2.20.xxx.xxx:443->10.0.0.20:51762 new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0 ib1=10000013 ib2=00000000 packets=0 bytes=0
0056e UNB IPv4 5T orig=86.xxx.xx.44:443->10.0.0.20:50422 new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0 ib1=50001e17 ib2=00000000 packets=0 bytes=0
00794 UNB IPv4 5T orig=10.0.0.201:53->10.0.0.20:62850 new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0 ib1=50000113 ib2=00000000 packets=0 bytes=0
00a6c UNB IPv4 5T orig=146.xx.xx.5:443->10.0.0.20:52632 new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0 ib1=50000110 ib2=00000000 packets=0 bytes=0
00a78 UNB IPv4 5T orig=54.xx.xx.8:443->10.0.0.20:49826 new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0 ib1=10000317 ib2=00000000 packets=0 bytes=0
00aa0 UNB IPv4 5T orig=10.0.0.15:445->10.0.0.20:49771 new=10.0.0.15:445->10.0.0.20:49771 eth=b2:66:ae:xx:xx:xx->f4:ce:23:xx:xx:xx etype=0008 vlan=0,256 ib1=10000114 ib2=007e0460 packets=15 bytes=6483
00bf0 UNB IPv4 5T orig=10.0.0.201:53->10.0.0.20:58288 new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0 ib1=50000115 ib2=00000000 packets=0 bytes=0
00d4c UNB IPv4 5T orig=173.240.xx.xx:2257->10.0.0.20:60076 new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0 ib1=50000010 ib2=00000000 packets=0 bytes=0

Did you manage to create a patch for this? I'm still happy to test against 22.03.x when a patch is available

Not yet. It's going to take a lot of time to track this one down. We're outside of the wireless driver and in the network subsystem. Which I don't know if I'm just wasting my time because some of the network code is many versions behind the latest Linux kernel, and the changes are significant. I'd hate to pour 100 hours into something that is already fixed. I am at a fork in the road and I don't know what direction to take.

2 Likes

@daniel A few more questions for you, sir, if you don't mind...

I have updated my firewall config on one of my APs to include the SW & HW offloading enablement. However, as I mentioned I run my 3x RT3200s as "dumb APs"/WAPs. So, I have always had the firewall service itself disabled.

In the name of experimenting, I started the firewall service post update of the offloading settings and was greeted with this:

root@AP-Office:~# /etc/init.d/firewall start
Hardware flow offloading unavailable, falling back to software offloading

I assume that was expected, but could you let me know if you recognize something I might be missing here? Is it true that I should now have the firewall service enabled to have the offloading settings applied properly?

root@AP-Office:~# cat /etc/config/firewall

config defaults
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option flow_offloading '1'
	option flow_offloading_hw '1'

root@AP-Office:~# cat /etc/config/bridger
config defaults
	# example for blacklisting individual devices or bridges
	# list blacklist eth0

root@AP-Office:~# /etc/init.d/bridger status
running

1 Like

Bridger itself should suffice, i'm using it like that.

1 Like

You are missing kmod-nft-offload maybe?

Yes, and for offloading for work for dump-(bridged)-APs you will also need to install bridger.

I understand that, but my point is that @daniel posted this some days ago:

But @Brain2000 and I raised questions specifically around the statement of "In both cases you also need to switch on hardware flow offloading..." because documentation for WED seems to state SW/HW offloading is not required.

So, my question back to @daniel was meant to clarify, if you enable the HW & SW offloading options in /etc/config/firewall, then what, besides the firewall service, would care about those settings? Hence, does the firewall service itself need to be running? Does bridger look for those settings?

I was just trying to get to the bottom of the reasoning from @daniel's perspective, given his expertise at the code level of much of this. And he confirmed the answer here:

I do have this module installed:
kmod-nft-offload - 5.15.108-1

root@AP-Office:~# lsmod | grep -E nft.*offload
nf_conntrack           90112  7 nft_redir,nft_nat,nft_masq,nft_flow_offload,nft_ct,nf_nat,nf_flow_table
nf_flow_table          32768  4 nf_flow_table_ipv6,nf_flow_table_ipv4,nf_flow_table_inet,nft_flow_offload
nf_tables             163840 24 nft_fib_inet,nf_flow_table_ipv6,nf_flow_table_ipv4,nf_flow_table_inet,nft_reject_ipv6,nft_reject_ipv4,nft_reject_inet,nft_reject,nft_redir,nft_quota,nft_objref,nft_numgen,nft_nat,nft_masq,nft_log,nft_limit,nft_hash,nft_flow_offload,nft_fib_ipv6,nft_fib_ipv4,nft_fib,nft_ct,nft_counter,nft_chain_nat
nft_flow_offload       16384  0

But I still see this when I start the FW service:

root@AP-Office:~# cat /etc/config/firewall

config defaults
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option flow_offloading '1'
	option flow_offloading_hw '1'

root@AP-Office:~# /etc/init.d/firewall start
Hardware flow offloading unavailable, falling back to software offloading

This packet? (my router Linksys E8450 (UBI))
bridge 1.7.1-1 10.96 KiB Manage ethernet bridging: a way to connect networks together to…

Bridger is only available for development builds.

2 Likes