I'm sure this is part of what you're trying to reason through, and I'm mostly thinking out loud, but in what case would the mt76 device unregister? When it hits an error? During a reboot?
Lots of questions. Anyway, if you find a way to replicate the condition(s), I will happily help test any potential fixes.
I have been hard at work trying to make sense of what I'm seeing with WED. Not coming up with anything as of yet. Part of me is questioning if there is some underlying error going on within bridger
itself and I'm just chasing symptoms/ghosts.
From all indications, the mtk_wed_flow_add
and mtk_wed_flow_remove
functions are working as they should. But at some point, I get an "offload disable" message (part of my debug output) as happens dozens (or hundreds) of times, but then the flows just stop and WED offload is no longer enabled until a reboot.
Typical operation:
...
[25316.156505] mt7915e 0000:01:00.0: WED offload enable - wed_token_count==0
[25316.167778] mt7915e 0000:01:00.0: WED offload enable - phy is NULL
[25347.216498] mtk_wed_flow_remove : pre num_flows 1
[25347.216527] mt7915e 0000:01:00.0: WED offload disable
[25356.266089] mtk_wed_flow_add : pre num_flows 0
[25356.266112] mt7915e 0000:01:00.0: WED offload enable - wed_token_count==0
[25356.277378] mt7915e 0000:01:00.0: WED offload enable - phy is NULL
[25385.323470] mtk_wed_flow_remove : pre num_flows 1
[25385.323499] mt7915e 0000:01:00.0: WED offload disable
[25396.465507] mtk_wed_flow_add : pre num_flows 0
[25396.465530] mt7915e 0000:01:00.0: WED offload enable - wed_token_count==0
[25396.476777] mt7915e 0000:01:00.0: WED offload enable - phy is NULL
[25412.505774] mtk_wed_flow_add : pre num_flows 1
[25412.505792] mtk_wed_flow_add : post num_flows 2
[25425.530151] mtk_wed_flow_remove : pre num_flows 2
[25425.534714] mtk_wed_flow_remove : post num_flows 1
[25436.545274] mtk_wed_flow_add : pre num_flows 1
[25436.550079] mtk_wed_flow_add : post num_flows 2
[25441.560928] mtk_wed_flow_remove : pre num_flows 2
[25441.565500] mtk_wed_flow_remove : post num_flows 1
[25465.593090] mtk_wed_flow_remove : pre num_flows 1
[25465.597932] mt7915e 0000:01:00.0: WED offload disable
[25476.644671] mtk_wed_flow_add : pre num_flows 0
[25476.644694] mt7915e 0000:01:00.0: WED offload enable - wed_token_count==0
[25476.656005] mt7915e 0000:01:00.0: WED offload enable - phy is NULL
[25498.692831] mtk_wed_flow_add : pre num_flows 1
[25498.692849] mtk_wed_flow_add : post num_flows 2
[25502.703662] mtk_wed_flow_add : pre num_flows 2
[25502.708236] mtk_wed_flow_add : post num_flows 3
[25527.750186] mtk_wed_flow_remove : pre num_flows 3
[25527.754798] mtk_wed_flow_remove : post num_flows 2
[25531.757825] mtk_wed_flow_remove : pre num_flows 2
[25531.762629] mtk_wed_flow_remove : post num_flows 1
[25592.836428] mtk_wed_flow_remove : pre num_flows 1
[25592.841258] mt7915e 0000:01:00.0: WED offload disable
...
But in the case of when the flows stop, the "WED offload disable" output I added (within mmio.c -> mt7915_mmio_wed_offload_disable
) is the last WED message logged until after reboot.
Good question, what causes the physical device to unregister? I'm going to try disabling and reenabling the ethernet port on the switch port that the wifi is connected to and see if that triggers it.
I will look at the WED code again, but if the WED enable is never run again, it depends on the num_flows, and maybe there's another variable that's also causing it to skip.
Working backwards let's try the next thing that would stop WED.
In the same folder there's a file named mtk_ppe_offload.c that has the function mtk_flow_get_wdma_info( ) around line 90. If this function returns anything but zero, then WED will not be enabled.
If we add drv_info( ) before returns other than 0, it will let us know whether this is the issue or not, and which portion went bad.
static int
mtk_flow_get_wdma_info(struct net_device *dev, const u8 *addr, struct mtk_wdma_info *info)
{
struct net_device_path_stack stack;
struct net_device_path *path;
int err;
if (!dev) {
netdev_info(dev, "mtk_flow_get_wmd_info::!dev\n"); //add this line
return -ENODEV;
}
if (!IS_ENABLED(CONFIG_NET_MEDIATEK_SOC_WED)) {
netdev_info(dev, "mtk_flow_get_wmd_info::!IS_ENABLED(CONFIG_NET_MEDITEK_SOC_WED)\n"); //add this line
return -1;
}
err = dev_fill_forward_path(dev, addr, &stack);
if (err) {
netdev_info(dev, "mtk_flow_get_wmd_info::dev_fill_forward_path( ) == err\n"); //add this line
return err;
}
path = &stack.path[stack.num_paths - 1];
if (path->type != DEV_PATH_MTK_WDMA) {
netdev_info(dev, "mtk_flow_get_wmd_info::path->type != DEV_PATH_MTK_WDMA\n"); //add this line
return -1;
}
info->wdma_idx = path->mtk_wdma.wdma_idx;
info->queue = path->mtk_wdma.queue;
info->bss = path->mtk_wdma.bss;
info->wcid = path->mtk_wdma.wcid;
return 0;
}
1 Like
@Brain2000 Hit another overnight crash on one of my APs. Based on this post from @darksky a couple months back, he was only seeing it with WED enabled.
[34299.852820] mt7915e 0000:01:00.0: done sta_remove=0x000000008ef4d065
[34299.912828] wl1-ap0: HW problem - can not stop rx aggregation for 90:81:58:xx:xx:3c tid 0
[34299.952826] wl1-ap0: HW problem - can not stop rx aggregation for 90:81:58:xx:xx:3c tid 1
[34300.012823] wl1-ap0: HW problem - can not stop rx aggregation for 90:81:58:xx:xx:3c tid 5
[34300.082823] wl1-ap0: HW problem - can not stop rx aggregation for 90:81:58:xx:xx:3c tid 6
[34300.122858] ------------[ cut here ]------------
[34300.127487] WARNING: CPU: 1 PID: 1345 at ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34300.136225] Modules linked in: nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e mt7615e mt7615_common mt76_connac_lib mt76 mac80211 cfg80211 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c hwmon compat cls_flower act_vlan cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact cryptodev autofs4 seqiv authencesn authenc leds_gpio gpio_button_hotplug
[34300.206520] CPU: 1 PID: 1345 Comm: hostapd Tainted: G S W 5.15.98 #0
[34300.213996] Hardware name: Linksys E8450 (UBI) (DT)
[34300.218865] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[34300.225820] pc : ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34300.232452] lr : ___ieee80211_stop_tx_ba_session+0x210/0x3ac [mac80211]
[34300.239077] sp : ffffffc00904b7a0
[34300.242383] x29: ffffffc00904b7a0 x28: ffffff8000846300 x27: ffffffc00904bdb0
[34300.249515] x26: ffffff8000066880 x25: ffffff80027f0880 x24: ffffffc008bc5000
[34300.256645] x23: ffffffc0009b0344 x22: ffffff80027e80e8 x21: 0000000000000003
[34300.263776] x20: ffffff80037d3700 x19: ffffff80067e6000 x18: 0000000000000000
[34300.270907] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[34300.278037] x14: 0000000000000000 x13: 00173df968314f0e x12: ffffffc008828720
[34300.285168] x11: 00000000fa83b2da x10: 0000000000000840 x9 : 0000000000000000
[34300.292298] x8 : ffffff8008e0da4a x7 : 0000000000000000 x6 : 0000000000000000
[34300.299429] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffff8000846300
[34300.306559] x2 : 0000000000000000 x1 : ffffff8000846300 x0 : 00000000fffffff4
[34300.313691] Call trace:
[34300.316129] ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34300.322412] ieee80211_sta_tear_down_BA_sessions+0x74/0x130 [mac80211]
[34300.328952] ieee80211_find_sta_by_link_addrs+0x370/0x540 [mac80211]
[34300.335318] sta_info_destroy_addr_bss+0x38/0x70 [mac80211]
[34300.340902] ieee80211_color_change_finish+0x1278/0x1500 [mac80211]
[34300.347181] cfg80211_check_station_change+0x1384/0x4720 [cfg80211]
[34300.353457] genl_family_rcv_msg_doit+0xb4/0x110
[34300.358070] genl_rcv_msg+0xd0/0x1c0
[34300.361636] netlink_rcv_skb+0x58/0x120
[34300.365468] genl_rcv+0x34/0x50
[34300.368603] netlink_unicast+0x1f0/0x2ec
[34300.372519] netlink_sendmsg+0x19c/0x3d0
[34300.376436] ____sys_sendmsg+0x258/0x2a0
[34300.380355] ___sys_sendmsg+0x78/0xc0
[34300.384010] __sys_sendmsg+0x54/0xb0
[34300.387578] __arm64_sys_sendmsg+0x20/0x30
[34300.391666] invoke_syscall+0x44/0x110
[34300.395409] el0_svc_common.constprop.0+0x48/0xf0
[34300.400105] do_el0_svc+0x18/0x20
[34300.403412] el0_svc+0x14/0x50
[34300.406461] el0t_64_sync_handler+0xe0/0x110
[34300.410723] el0t_64_sync+0x158/0x15c
[34300.414378] ---[ end trace 8c1a4584c49333ea ]---
[34300.442861] ------------[ cut here ]------------
[34300.447489] WARNING: CPU: 1 PID: 1345 at ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34300.456226] Modules linked in: nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e mt7615e mt7615_common mt76_connac_lib mt76 mac80211 cfg80211 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c hwmon compat cls_flower act_vlan cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact cryptodev autofs4 seqiv authencesn authenc leds_gpio gpio_button_hotplug
[34300.526518] CPU: 1 PID: 1345 Comm: hostapd Tainted: G S W 5.15.98 #0
[34300.533995] Hardware name: Linksys E8450 (UBI) (DT)
[34300.538866] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[34300.545820] pc : ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34300.552458] lr : ___ieee80211_stop_tx_ba_session+0x210/0x3ac [mac80211]
[34300.559084] sp : ffffffc00904b7a0
[34300.562389] x29: ffffffc00904b7a0 x28: ffffff8000846300 x27: ffffffc00904bdb0
[34300.569521] x26: ffffff8000066880 x25: ffffff80027f0880 x24: ffffffc008bc5000
[34300.576652] x23: ffffffc0009b0344 x22: ffffff80027e8508 x21: 0000000000000003
[34300.583783] x20: ffffff80051bbf00 x19: ffffff80067e6000 x18: 0000000000000000
[34300.590914] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[34300.598045] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[34300.605176] x11: 00000000000003b1 x10: 0000000000000840 x9 : 0000000000000000
[34300.612308] x8 : ffffff8008e0ce4a x7 : 0000000000000000 x6 : 0000000000000000
[34300.619438] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffff8000846300
[34300.626568] x2 : 0000000000000000 x1 : ffffff8000846300 x0 : 00000000fffffff4
[34300.633700] Call trace:
[34300.636137] ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34300.642421] ieee80211_sta_tear_down_BA_sessions+0x74/0x130 [mac80211]
[34300.648961] ieee80211_find_sta_by_link_addrs+0x370/0x540 [mac80211]
[34300.655326] sta_info_destroy_addr_bss+0x38/0x70 [mac80211]
[34300.660912] ieee80211_color_change_finish+0x1278/0x1500 [mac80211]
[34300.667191] cfg80211_check_station_change+0x1384/0x4720 [cfg80211]
[34300.673470] genl_family_rcv_msg_doit+0xb4/0x110
[34300.678082] genl_rcv_msg+0xd0/0x1c0
[34300.681649] netlink_rcv_skb+0x58/0x120
[34300.685482] genl_rcv+0x34/0x50
[34300.688618] netlink_unicast+0x1f0/0x2ec
[34300.692535] netlink_sendmsg+0x19c/0x3d0
[34300.696452] ____sys_sendmsg+0x258/0x2a0
[34300.700372] ___sys_sendmsg+0x78/0xc0
[34300.704027] __sys_sendmsg+0x54/0xb0
[34300.707593] __arm64_sys_sendmsg+0x20/0x30
[34300.711682] invoke_syscall+0x44/0x110
[34300.715426] el0_svc_common.constprop.0+0x48/0xf0
[34300.720122] do_el0_svc+0x18/0x20
[34300.723430] el0_svc+0x14/0x50
[34300.726478] el0t_64_sync_handler+0xe0/0x110
[34300.730741] el0t_64_sync+0x158/0x15c
[34300.734397] ---[ end trace 8c1a4584c49333eb ]---
[34300.762820] mt7915e 0000:01:00.0: done sta_remove=0x000000007186fb2e
[34300.802839] wl1-ap1: HW problem - can not stop rx aggregation for 20:69:80:xx:xx:b4 tid 0
[34300.842824] wl1-ap1: HW problem - can not stop rx aggregation for 20:69:80:xx:xx:b4 tid 1
[34300.892815] wl1-ap1: HW problem - can not stop rx aggregation for 20:69:80:xx:xx:b4 tid 6
[34300.901177] ------------[ cut here ]------------
[34300.905793] WARNING: CPU: 0 PID: 1345 at ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34300.914530] Modules linked in: nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e mt7615e mt7615_common mt76_connac_lib mt76 mac80211 cfg80211 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c hwmon compat cls_flower act_vlan cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact cryptodev autofs4 seqiv authencesn authenc leds_gpio gpio_button_hotplug
[34300.984826] CPU: 0 PID: 1345 Comm: hostapd Tainted: G S W 5.15.98 #0
[34300.992303] Hardware name: Linksys E8450 (UBI) (DT)
[34300.997171] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[34301.004126] pc : ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34301.010755] lr : ___ieee80211_stop_tx_ba_session+0x210/0x3ac [mac80211]
[34301.017381] sp : ffffffc00904b7a0
[34301.020686] x29: ffffffc00904b7a0 x28: ffffff8000846300 x27: ffffffc00904bdb0
[34301.027818] x26: ffffff8000066880 x25: ffffff80027f0880 x24: ffffffc008bc5000
[34301.034949] x23: ffffffc0009b0344 x22: ffffff80027ec0e8 x21: 0000000000000003
[34301.042080] x20: ffffff80056d5000 x19: ffffff80027ee000 x18: ffffffc008aea320
[34301.049211] x17: 353a30383a39363a x16: 303220726f66206e x15: 00000000000004fb
[34301.056342] x14: 0000000000000000 x13: 0000000000000000 x12: ffffffc008828720
[34301.063472] x11: 000000000000037e x10: 0000000000000840 x9 : 0000000000000000
[34301.070602] x8 : ffffff80057f5e4a x7 : 0000000000000000 x6 : 0000000000000000
[34301.077732] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffff8000846300
[34301.084862] x2 : 0000000000000000 x1 : ffffff8000846300 x0 : 00000000fffffff4
[34301.091994] Call trace:
[34301.094432] ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
[34301.100712] ieee80211_sta_tear_down_BA_sessions+0x74/0x130 [mac80211]
[34301.107252] ieee80211_find_sta_by_link_addrs+0x370/0x540 [mac80211]
[34301.113618] sta_info_destroy_addr_bss+0x38/0x70 [mac80211]
[34301.119202] ieee80211_color_change_finish+0x1278/0x1500 [mac80211]
[34301.125481] cfg80211_check_station_change+0x1384/0x4720 [cfg80211]
[34301.131756] genl_family_rcv_msg_doit+0xb4/0x110
[34301.136368] genl_rcv_msg+0xd0/0x1c0
[34301.139935] netlink_rcv_skb+0x58/0x120
[34301.143767] genl_rcv+0x34/0x50
[34301.146903] netlink_unicast+0x1f0/0x2ec
[34301.150820] netlink_sendmsg+0x19c/0x3d0
[34301.154737] ____sys_sendmsg+0x258/0x2a0
[34301.158656] ___sys_sendmsg+0x78/0xc0
[34301.162311] __sys_sendmsg+0x54/0xb0
[34301.165878] __arm64_sys_sendmsg+0x20/0x30
[34301.169966] invoke_syscall+0x44/0x110
[34301.173710] el0_svc_common.constprop.0+0x48/0xf0
[34301.178405] do_el0_svc+0x18/0x20
[34301.181711] el0_svc+0x14/0x50
[34301.184760] el0t_64_sync_handler+0xe0/0x110
[34301.189023] el0t_64_sync+0x158/0x15c
[34301.192678] ---[ end trace 8c1a4584c49333ec ]---
[34301.222812] mt7915e 0000:01:00.0: done sta_remove=0x0000000055089642
Update
I didn't scroll up enough in my kernel log. Here's where the ugliness started:
[29002.880588] mtk_wed_flow_remove : pre num_flows 5
[29002.885474] mtk_wed_flow_remove : post num_flows 4
[29003.889903] mtk_wed_flow_remove : pre num_flows 4
[29003.894707] mtk_wed_flow_remove : post num_flows 3
[29008.840245] mtk_wed_flow_add : pre num_flows 3
[29008.845049] mtk_wed_flow_add : post num_flows 4
[29010.853413] mtk_wed_flow_add : pre num_flows 4
[29010.857981] mtk_wed_flow_add : post num_flows 5
[29011.864398] mtk_wed_flow_add : pre num_flows 5
[29011.868963] mtk_wed_flow_add : post num_flows 6
[29011.873686] mtk_wed_flow_add : pre num_flows 6
[29011.878242] mtk_wed_flow_add : post num_flows 7
[29030.038667] mt7915e 0000:01:00.0: WED device stop
[29060.657166] ------------[ cut here ]------------
[29060.661806] Timeout waiting for MCU reset state 8
[29060.666538] WARNING: CPU: 0 PID: 8911 at mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29060.674211] Modules linked in: nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e mt7615e mt7615_common mt76_connac_lib mt76 mac80211 cfg80211 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c hwmon compat cls_flower act_vlan cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact cryptodev autofs4 seqiv authencesn authenc leds_gpio gpio_button_hotplug
[29060.744505] CPU: 0 PID: 8911 Comm: kworker/u4:3 Tainted: G S 5.15.98 #0
[29060.752416] Hardware name: Linksys E8450 (UBI) (DT)
[29060.757285] Workqueue: mt76 mt7915_mac_reset_work [mt7915e]
[29060.762860] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[29060.769815] pc : mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29060.775385] lr : mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29060.780955] sp : ffffffc00b153c50
[29060.784261] x29: ffffffc00b153c50 x28: ffffff80027f4968 x27: ffffff80027f25d0
[29060.791392] x26: 0000000000000000 x25: 0000000000000000 x24: ffffff80027f2020
[29060.798523] x23: 0000000000000000 x22: ffffff80027f6020 x21: 0000000000000008
[29060.805654] x20: ffffff80027f9d88 x19: 0000000000000000 x18: ffffffc008aea320
[29060.812785] x17: 0000000000002880 x16: ffffffc008d97000 x15: 0000000000002a3c
[29060.819916] x14: 0000000000003e14 x13: ffffffc00b153978 x12: ffffffc008b42320
[29060.827047] x11: 00000000000783e8 x10: 00000000000783b8 x9 : 00000000000151e0
[29060.834178] x8 : ffffffc008aea2d0 x7 : ffffffc008aea320 x6 : 0000000000000001
[29060.841309] x5 : ffffff801fea26f8 x4 : 0000000000000000 x3 : 0000000000000027
[29060.848440] x2 : 0000000000000027 x1 : 0000000000000023 x0 : 0000000000000025
[29060.855571] Call trace:
[29060.858009] mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29060.863232] mt7915_mac_reset_work+0x1d4/0xd1c [mt7915e]
[29060.868542] process_one_work+0x210/0x3b0
[29060.872549] worker_thread+0x170/0x4d0
[29060.876291] kthread+0x11c/0x130
[29060.879513] ret_from_fork+0x10/0x20
[29060.883083] ---[ end trace 8c1a4584c49333dd ]---
[29091.376767] ------------[ cut here ]------------
[29091.381396] Timeout waiting for MCU reset state 20
[29091.386230] WARNING: CPU: 0 PID: 8911 at mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29091.393904] Modules linked in: nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e mt7615e mt7615_common mt76_connac_lib mt76 mac80211 cfg80211 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c hwmon compat cls_flower act_vlan cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact cryptodev autofs4 seqiv authencesn authenc leds_gpio gpio_button_hotplug
[29091.464198] CPU: 0 PID: 8911 Comm: kworker/u4:3 Tainted: G S W 5.15.98 #0
[29091.472110] Hardware name: Linksys E8450 (UBI) (DT)
[29091.476979] Workqueue: mt76 mt7915_mac_reset_work [mt7915e]
[29091.482553] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[29091.489508] pc : mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29091.495077] lr : mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29091.500647] sp : ffffffc00b153c50
[29091.503953] x29: ffffffc00b153c50 x28: ffffff80027f4968 x27: ffffff80027f25d0
[29091.511084] x26: 0000000000000000 x25: 0000000000000000 x24: ffffff80027f2020
[29091.518215] x23: 0000000000000000 x22: ffffff80027f6020 x21: 0000000000000020
[29091.525346] x20: ffffff80027f9d88 x19: 0000000000000000 x18: ffffffc008aea320
[29091.532477] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000002a93
[29091.539607] x14: 0000000000003e31 x13: ffffffc00b153978 x12: ffffffc008b42320
[29091.546737] x11: 00000000000789b0 x10: 0000000000078980 x9 : 0000000000015498
[29091.553868] x8 : ffffffc008aea2d0 x7 : ffffffc008aea320 x6 : 0000000000000001
[29091.560999] x5 : ffffff801fea26f8 x4 : 0000000000000000 x3 : 0000000000000027
[29091.568129] x2 : 0000000000000027 x1 : 0000000000000023 x0 : 0000000000000026
[29091.575260] Call trace:
[29091.577697] mt7915_mcu_rf_regval+0x428/0x714 [mt7915e]
[29091.582921] mt7915_mac_reset_work+0x2fc/0xd1c [mt7915e]
[29091.588230] process_one_work+0x210/0x3b0
[29091.592238] worker_thread+0x170/0x4d0
[29091.595979] kthread+0x11c/0x130
[29091.599201] ret_from_fork+0x10/0x20
[29091.602770] ---[ end trace 8c1a4584c49333de ]---
[29111.856505] mt7915e 0000:01:00.0: Message 000026ed (seq 10) timeout
[29132.336224] mt7915e 0000:01:00.0: Message 00005aed (seq 11) timeout
[29152.815952] mt7915e 0000:01:00.0: Message 000026ed (seq 12) timeout
[29173.295683] mt7915e 0000:01:00.0: Message 0000aded (seq 13) timeout
[29173.302128] mtk_wed_flow_remove : pre num_flows 7
[29173.302144] mtk_wed_flow_remove : post num_flows 6
[29193.775440] mt7915e 0000:01:00.0: Message 00005aed (seq 14) timeout
[29193.811680] mtk_wed_flow_remove : pre num_flows 6
[29193.811698] mtk_wed_flow_remove : post num_flows 5
[29214.255139] mt7915e 0000:01:00.0: Message 00005aed (seq 15) timeout
[29234.734924] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[29255.214601] mt7915e 0000:01:00.0: Message 00005aed (seq 2) timeout
[29275.694337] mt7915e 0000:01:00.0: Message 00005aed (seq 3) timeout
[29296.174060] mt7915e 0000:01:00.0: Message 000026ed (seq 4) timeout
[29296.191028] mtk_wed_flow_remove : pre num_flows 5
[29296.191045] mtk_wed_flow_remove : post num_flows 4
[29296.196059] mtk_wed_flow_remove : pre num_flows 4
[29296.200886] mtk_wed_flow_remove : post num_flows 3
[29316.653789] mt7915e 0000:01:00.0: Message 00005aed (seq 5) timeout
[29316.666309] mtk_wed_flow_remove : pre num_flows 3
[29316.666328] mtk_wed_flow_remove : post num_flows 2
[29316.671632] mtk_wed_flow_remove : pre num_flows 2
[29316.676486] mtk_wed_flow_remove : post num_flows 1
[29337.133519] mt7915e 0000:01:00.0: Message 00005aed (seq 6) timeout
[29357.613264] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[29378.092993] mt7915e 0000:01:00.0: Message 0000aded (seq 8) timeout
[29398.572731] mt7915e 0000:01:00.0: Message 00005aed (seq 9) timeout
[29419.052472] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[29439.532203] mt7915e 0000:01:00.0: Message 00005aed (seq 11) timeout
[29460.011944] mt7915e 0000:01:00.0: Message 00005aed (seq 12) timeout
[29480.491678] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[29500.971419] mt7915e 0000:01:00.0: Message 000026ed (seq 14) timeout
[29521.451159] mt7915e 0000:01:00.0: Message 00005aed (seq 15) timeout
...
[31077.911194] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[31098.390928] mt7915e 0000:01:00.0: Message 0000aded (seq 2) timeout
[31118.870669] mt7915e 0000:01:00.0: Message 00005aed (seq 3) timeout
[31139.350408] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[31159.830141] mt7915e 0000:01:00.0: Message 00005aed (seq 5) timeout
[31180.309887] mt7915e 0000:01:00.0: Message 0000aded (seq 6) timeout
[31200.789616] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[31221.269366] mt7915e 0000:01:00.0: Message 0000aded (seq 8) timeout
[31241.749117] mt7915e 0000:01:00.0: Message 00005aed (seq 9) timeout
[31262.228831] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[31282.708601] mt7915e 0000:01:00.0: Message 00005aed (seq 11) timeout
[31303.188304] mt7915e 0000:01:00.0: Message 0000aded (seq 12) timeout
[31323.668043] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[31344.147782] mt7915e 0000:01:00.0: Message 0000aded (seq 14) timeout
[31364.627515] mt7915e 0000:01:00.0: Message 00005aed (seq 15) timeout
[31385.107258] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[31405.586991] mt7915e 0000:01:00.0: Message 00005aed (seq 2) timeout
[31426.066741] mt7915e 0000:01:00.0: Message 0000aded (seq 3) timeout
[31446.546465] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[31467.026223] mt7915e 0000:01:00.0: Message 0000aded (seq 5) timeout
[31487.505988] mt7915e 0000:01:00.0: Message 00005aed (seq 6) timeout
[31507.985740] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[31528.465491] mt7915e 0000:01:00.0: Message 00005aed (seq 8) timeout
[31548.945255] mt7915e 0000:01:00.0: Message 0000aded (seq 9) timeout
[31569.425011] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[31589.904765] mt7915e 0000:01:00.0: Message 0000aded (seq 11) timeout
[31610.384517] mt7915e 0000:01:00.0: Message 00005aed (seq 12) timeout
[31630.864276] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[31651.344030] mt7915e 0000:01:00.0: Message 00005aed (seq 14) timeout
[31671.823789] mt7915e 0000:01:00.0: Message 0000aded (seq 15) timeout
[31692.303542] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[31712.783301] mt7915e 0000:01:00.0: Message 0000aded (seq 2) timeout
[31733.263053] mt7915e 0000:01:00.0: Message 00005aed (seq 3) timeout
[31753.742874] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[31774.222566] mt7915e 0000:01:00.0: Message 00005aed (seq 5) timeout
[31794.702379] mt7915e 0000:01:00.0: Message 0000aded (seq 6) timeout
[31815.182080] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[31835.661842] mt7915e 0000:01:00.0: Message 0000aded (seq 8) timeout
[31856.141595] mt7915e 0000:01:00.0: Message 00005aed (seq 9) timeout
[31876.621349] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[31897.101101] mt7915e 0000:01:00.0: Message 00005aed (seq 11) timeout
[31917.580856] mt7915e 0000:01:00.0: Message 0000aded (seq 12) timeout
[31938.060611] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[31958.540373] mt7915e 0000:01:00.0: Message 0000aded (seq 14) timeout
[31979.020124] mt7915e 0000:01:00.0: Message 00005aed (seq 15) timeout
[31999.499889] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[32019.979639] mt7915e 0000:01:00.0: Message 00005aed (seq 2) timeout
[32040.459391] mt7915e 0000:01:00.0: Message 0000aded (seq 3) timeout
[32060.939156] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[32081.418904] mt7915e 0000:01:00.0: Message 0000aded (seq 5) timeout
[32101.898654] mt7915e 0000:01:00.0: Message 00005aed (seq 6) timeout
[32122.378453] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[32142.858166] mt7915e 0000:01:00.0: Message 00005aed (seq 8) timeout
[32163.337922] mt7915e 0000:01:00.0: Message 0000aded (seq 9) timeout
[32183.817683] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[32204.297432] mt7915e 0000:01:00.0: Message 0000aded (seq 11) timeout
[32224.777202] mt7915e 0000:01:00.0: Message 00005aed (seq 12) timeout
[32245.256944] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[32265.736704] mt7915e 0000:01:00.0: Message 00005aed (seq 14) timeout
[32286.216451] mt7915e 0000:01:00.0: Message 0000aded (seq 15) timeout
[32306.696293] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[32327.175959] mt7915e 0000:01:00.0: Message 0000aded (seq 2) timeout
[32347.655711] mt7915e 0000:01:00.0: Message 00005aed (seq 3) timeout
[32368.135472] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[32388.615226] mt7915e 0000:01:00.0: Message 00005aed (seq 5) timeout
[32409.094975] mt7915e 0000:01:00.0: Message 0000aded (seq 6) timeout
[32429.574734] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[32450.054483] mt7915e 0000:01:00.0: Message 0000aded (seq 8) timeout
[32470.534235] mt7915e 0000:01:00.0: Message 00005aed (seq 9) timeout
[32491.013998] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[32511.493744] mt7915e 0000:01:00.0: Message 00005aed (seq 11) timeout
[32531.973499] mt7915e 0000:01:00.0: Message 0000aded (seq 12) timeout
[32552.453257] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[32572.933020] mt7915e 0000:01:00.0: Message 0000aded (seq 14) timeout
[32593.412763] mt7915e 0000:01:00.0: Message 00005aed (seq 15) timeout
[32613.892522] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[32634.372264] mt7915e 0000:01:00.0: Message 00005aed (seq 2) timeout
[32654.852020] mt7915e 0000:01:00.0: Message 0000aded (seq 3) timeout
[32675.331776] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[32695.811548] mt7915e 0000:01:00.0: Message 0000aded (seq 5) timeout
[32716.291282] mt7915e 0000:01:00.0: Message 00005aed (seq 6) timeout
[32736.771101] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[32757.250788] mt7915e 0000:01:00.0: Message 00005aed (seq 8) timeout
[32777.730598] mt7915e 0000:01:00.0: Message 0000aded (seq 9) timeout
[32798.210294] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[32818.690057] mt7915e 0000:01:00.0: Message 0000aded (seq 11) timeout
[32839.169804] mt7915e 0000:01:00.0: Message 00005aed (seq 12) timeout
[32859.649557] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[32880.129305] mt7915e 0000:01:00.0: Message 00005aed (seq 14) timeout
[32900.609058] mt7915e 0000:01:00.0: Message 0000aded (seq 15) timeout
[32921.088818] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[32941.568567] mt7915e 0000:01:00.0: Message 0000aded (seq 2) timeout
[32962.048320] mt7915e 0000:01:00.0: Message 00005aed (seq 3) timeout
[32982.528093] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[33003.007822] mt7915e 0000:01:00.0: Message 00005aed (seq 5) timeout
[33023.487579] mt7915e 0000:01:00.0: Message 0000aded (seq 6) timeout
[33043.967331] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[33064.447083] mt7915e 0000:01:00.0: Message 0000aded (seq 8) timeout
[33084.926834] mt7915e 0000:01:00.0: Message 00005aed (seq 9) timeout
[33105.406602] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[33125.886341] mt7915e 0000:01:00.0: Message 00005aed (seq 11) timeout
[33146.366094] mt7915e 0000:01:00.0: Message 0000aded (seq 12) timeout
[33166.845844] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[33187.325598] mt7915e 0000:01:00.0: Message 0000aded (seq 14) timeout
[33207.805348] mt7915e 0000:01:00.0: Message 00005aed (seq 15) timeout
[33228.285104] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[33248.764852] mt7915e 0000:01:00.0: Message 00005aed (seq 2) timeout
[33269.244605] mt7915e 0000:01:00.0: Message 0000aded (seq 3) timeout
[33289.724372] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[33310.204113] mt7915e 0000:01:00.0: Message 0000aded (seq 5) timeout
[33330.683866] mt7915e 0000:01:00.0: Message 00005aed (seq 6) timeout
[33351.163629] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[33371.643368] mt7915e 0000:01:00.0: Message 00005aed (seq 8) timeout
[33392.123122] mt7915e 0000:01:00.0: Message 0000aded (seq 9) timeout
[33412.602878] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[33433.082626] mt7915e 0000:01:00.0: Message 0000aded (seq 11) timeout
[33453.562376] mt7915e 0000:01:00.0: Message 00005aed (seq 12) timeout
[33474.042132] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[33494.521894] mt7915e 0000:01:00.0: Message 00005aed (seq 14) timeout
[33515.001654] mt7915e 0000:01:00.0: Message 0000aded (seq 15) timeout
[33535.481450] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[33555.961188] mt7915e 0000:01:00.0: Message 0000aded (seq 2) timeout
[33576.440959] mt7915e 0000:01:00.0: Message 00005aed (seq 3) timeout
[33596.920740] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[33617.400525] mt7915e 0000:01:00.0: Message 00005aed (seq 5) timeout
[33637.880292] mt7915e 0000:01:00.0: Message 0000aded (seq 6) timeout
[33658.360046] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[33678.839816] mt7915e 0000:01:00.0: Message 0000aded (seq 8) timeout
[33699.319583] mt7915e 0000:01:00.0: Message 00005aed (seq 9) timeout
[33719.799365] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[33740.279125] mt7915e 0000:01:00.0: Message 00005aed (seq 11) timeout
[33760.758910] mt7915e 0000:01:00.0: Message 0000aded (seq 12) timeout
[33781.238670] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[33801.718492] mt7915e 0000:01:00.0: Message 0000aded (seq 14) timeout
[33822.198205] mt7915e 0000:01:00.0: Message 00005aed (seq 15) timeout
[33842.678032] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[33863.157751] mt7915e 0000:01:00.0: Message 00005aed (seq 2) timeout
[33883.637524] mt7915e 0000:01:00.0: Message 0000aded (seq 3) timeout
[33904.117286] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[33924.597053] mt7915e 0000:01:00.0: Message 0000aded (seq 5) timeout
[33945.076824] mt7915e 0000:01:00.0: Message 00005aed (seq 6) timeout
[33965.556593] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[33986.036363] mt7915e 0000:01:00.0: Message 00005aed (seq 8) timeout
[34006.516150] mt7915e 0000:01:00.0: Message 0000aded (seq 9) timeout
[34026.995910] mt7915e 0000:01:00.0: Message 00005aed (seq 10) timeout
[34047.475674] mt7915e 0000:01:00.0: Message 0000aded (seq 11) timeout
[34067.955440] mt7915e 0000:01:00.0: Message 00005aed (seq 12) timeout
[34088.435209] mt7915e 0000:01:00.0: Message 00005aed (seq 13) timeout
[34108.914977] mt7915e 0000:01:00.0: Message 00005aed (seq 14) timeout
[34129.394747] mt7915e 0000:01:00.0: Message 0000aded (seq 15) timeout
[34149.874527] mt7915e 0000:01:00.0: Message 00005aed (seq 1) timeout
[34170.354282] mt7915e 0000:01:00.0: Message 0000aded (seq 2) timeout
[34190.834058] mt7915e 0000:01:00.0: Message 00005aed (seq 3) timeout
[34211.313822] mt7915e 0000:01:00.0: Message 00005aed (seq 4) timeout
[34231.793588] mt7915e 0000:01:00.0: Message 00005aed (seq 5) timeout
[34252.273361] mt7915e 0000:01:00.0: Message 0000aded (seq 6) timeout
[34272.753124] mt7915e 0000:01:00.0: Message 00005aed (seq 7) timeout
[34293.232898] mt7915e 0000:01:00.0: Message 0000aded (seq 8) timeout
[34293.312900] wl1-ap0: failed to remove key (0, 7c:04:d0:xx:xx:8c) from hardware (-12)
[34293.432895] wl1-ap0: failed to remove key (0, 90:81:58:xx:xx:87) from hardware (-12)
[34293.513093] mt7622-wmac 18000000.wmac: idx==0, skipping standard sta_remove procedure
[34293.560579] mt7622-wmac 18000000.wmac: idx==0, skipping standard sta_remove procedure
[34293.642833] mt7622-wmac 18000000.wmac: idx==0, skipping standard sta_remove procedure
[34293.703259] mt7622-wmac 18000000.wmac: idx==0, skipping standard sta_remove procedure
[34293.747378] mt7622-wmac 18000000.wmac: idx==0, skipping standard sta_remove procedure
[34294.312890] wl1-ap0: HW problem - can not stop rx aggregation for 7c:04:d0:xx:xx:8c tid 0
[34294.362899] wl1-ap0: HW problem - can not stop rx aggregation for 7c:04:d0:xx:xx:8c tid 1
[34294.402925] ------------[ cut here ]------------
[34294.407554] WARNING: CPU: 0 PID: 1345 at ___ieee80211_stop_tx_ba_session+0x348/0x3ac [mac80211]
I need to look more into those MCU reset states.
_FailSafe:
I still don't know why, but for some reason I continue to notice that with WED enabled, offloading seems to stop after some time. Wireless connectivity continues to work, but I can definitely see the lack of offloaded flows resulting in significantly higher CPU usage than when the flow offloading is functional.
Quick update on this. After a lot of code review (along with @Brain2000 ), we found nothing that stood out as leading to the issue I was experiencing with stoppage of WED flows.
I ended up "grasping at straws" and disabled Usteer . I've been running it for quite a while now and generally had good success with it for band steering.
But for some reason, something about Usteer's interaction with hostapd (??) can break WED offloaded flows. Since disabling Usteer, I am over 27 hours without losing offloaded flows on any of my three APs. I have never made it this long without losing flows on one or more of the APs with Usteer enabled.
Am I saying Usteer is broken? Not necessarily. It's still a fine tool as far as I am concerned. But I am weighing its benefit vs the benefit of WED, and the significant CPU reduction that WED brings on the TX path is hard to give up.
If anyone feels adventurous and wants to help figure out why Usteer leads to this effect, I'll be interested to see where the troubleshooting leads.
I am using static-neighbor-reports now to provide neighbor reports for 802.11k capable clients since Usteer isn't providing this role for me now.
3 Likes
Lynx
March 23, 2023, 7:45am
87
Would you mind sharing what you did to make this work and any tests to verify that it is working as intended? I have read the documentation but it's doesn't seem entirely clear to me how this tool should be used.
Not on this thread. Such a topic is outside the scope of this fine thread.
1 Like
Well, as luck would have it, offloaded flows stopped on one of three APs. Definitely made it longer with offloading functional than when Usteer was running, but I'm back to being pretty perplexed as to the root cause here.
So the issue is most likely when wifi devices are jumping between access points.
I haven't had a chance to jump back on this yet, but we will get it. Later this evening I should be able to dig in again.
I've been working with Felix on splitting up the patches and submitting them upstream. Also, found a better way to fix it in the mac80211 framework, so it will span all wireless devices, not just mt76.
5 Likes
Lynx
March 23, 2023, 3:39pm
91
Well you suggested a workaround for the issue you've experienced and this relates to an elaboration of that workaround. So seems not so off topic. But, in any case, for those interested - whether arising from this thread or otherwise - perhaps you might instead post on this thread:
Also, found a better way to fix it in the mac80211 framework, so it will span all wireless devices, not just mt76.
That is great news! Are you hoping to get the patch for this into OpenWRT (via https://github.com/openwrt/openwrt/tree/master/package/kernel/mac80211/patches/subsys ) as an interim patch as well? Sometimes it can take a while for kernel patches to make it into kernel releases, and this could be a good way to get the fix in quickly, and hopefully also into the 22.03.x release stream as well.
1 Like
Looks like Felix just pushed a series of mtk ppe / WED related patches into master. I'm going to get those running on my APs and see if the issue I'm seeing persists.
Update Hmmm.... not off to a good start:
2023-03-23 15:42 kernel [ 5320.784522] Unable to handle kernel paging request at virtual address 00ffffff80057ad9
Gotta go power cycle that AP.
I'm not 100% sure yet how the patch process works. It uses git email to send patch files to repo managers. Felix is listed on the one in the Linux upstream as well as openwrt mt76, so I think he'll be merging that commit over once it's in the upstream?
Get the latest from the Brain2000/mt76 repo. I put them closer to what the patches are that we kept. The sta_remove and sta_pre_rcu_remove need to be separated, so I want to make sure that's not causing your crash (let me know if it compiles with any errors, I didn't test to make sure it builds)
Got it! Compiled fine, so I'm building a new image now. I'll put it through the wringer as usual and see how it goes.
Such awesome progress via this thread! Thanks to @Brain2000 for his trailblazing to help get some things addressed
For anyone running WED, have you noticed a significant regression with this commit as I described here?
committed 04:54PM - 23 Mar 23 UTC
Properly track L2 flows, and ensure that stale data gets cleared
Signed-off-by:… Felix Fietkau <nbd@nbd.name>
I just got a crash dump from this issue:
<1>[ 84.515285] Unable to handle kernel paging request at virtual address 00010102464c457f
<1>[ 84.523230] Mem abort info:
<1>[ 84.526015] ESR = 0x0000000096000004
<1>[ 84.529766] EC = 0x25: DABT (current EL), IL = 32 bits
<1>[ 84.535072] SET = 0, FnV = 0
<1>[ 84.538136] EA = 0, S1PTW = 0
<1>[ 84.541274] FSC = 0x04: level 0 translation fault
<1>[ 84.546143] Data abort info:
<1>[ 84.549033] ISV = 0, ISS = 0x00000004
<1>[ 84.552863] CM = 0, WnR = 0
<1>[ 84.555822] [00010102464c457f] address between user and kernel address ranges
<0>[ 84.562967] Internal error: Oops: 96000004 [#1] SMP
<7>[ 84.567843] Modules linked in: nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e mt7615e mt7615_common mt76_connac_lib mt76 mac80211 cfg80211 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c hwmon compat cls_flower act_vlan cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact cryptodev autofs4 seqiv authencesn authenc leds_gpio gpio_button_hotplug
<7>[ 84.637105] CPU: 1 PID: 343 Comm: napi/mtk_eth-4 Tainted: G S 5.15.102 #0
<7>[ 84.645192] Hardware name: Linksys E8450 (UBI) (DT)
<7>[ 84.650060] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
<7>[ 84.657015] pc : __mtk_ppe_check_skb+0x10c/0x55c
<7>[ 84.661634] lr : __mtk_ppe_check_skb+0x154/0x55c
<7>[ 84.666244] sp : ffffffc00942bb30
<7>[ 84.669549] x29: ffffffc00942bb30 x28: 0000000000000000 x27: ffffff80065ffd00
<7>[ 84.676682] x26: 000000000000ffff x25: 0000000000000028 x24: ffffff80056b4780
<7>[ 84.683814] x23: ffffffc0085cf7e0 x22: 00010102464c457f x21: ffffffc008c05000
<7>[ 84.690944] x20: 00000000000008f4 x19: ffffff8000d00080 x18: 0000000000000001
<7>[ 84.698075] x17: 0000000000003300 x16: ffffffc0091f1000 x15: ffffff80027f0886
<7>[ 84.705205] x14: 00000000428684c4 x13: 01bbed2234865a94 x12: 1c3d0b440994d43c
<7>[ 84.712335] x11: 2600170000000186 x10: 0000000020632063 x9 : 1c3d0b440994d43c
<7>[ 84.719466] x8 : 2600170000000186 x7 : 0000000020632063 x6 : 2a0086c01a001e53
<7>[ 84.726597] x5 : c086002a63206320 x4 : 0000000000068808 x3 : 000000000000ffff
<7>[ 84.733727] x2 : 0000000000000001 x1 : 0000000000000003 x0 : 00000000ffffffff
<7>[ 84.740859] Call trace:
<7>[ 84.743296] __mtk_ppe_check_skb+0x10c/0x55c
<7>[ 84.747560] mtk_poll_rx+0x93c/0xa5c
<7>[ 84.751128] mtk_napi_rx+0x90/0x180
<7>[ 84.754610] __napi_poll+0x54/0x1b0
<7>[ 84.758096] napi_threaded_poll+0x84/0xe4
<7>[ 84.762098] kthread+0x11c/0x130
<7>[ 84.765320] ret_from_fork+0x10/0x20
<0>[ 84.768893] Code: f8767b16 529ffffa b40002d6 d503201f (f94002d8)
<4>[ 84.774978] ---[ end trace 8222ee3554dbe808 ]---
<0>[ 84.786025] Kernel panic - not syncing: Oops: Fatal exception in interrupt
<2>[ 84.792896] SMP: stopping secondary CPUs
<0>[ 84.796814] Kernel Offset: disabled
<0>[ 84.800291] CPU features: 0x0,00006000,00000802
<0>[ 84.804814] Memory Limit: none
This is very similar to the crash I had, but yours is in the WED poll.
Felix sent me an addition about an hour ago. It's on my latest repo commit if you want to try it out.
EDIT I might be wrong about that. Where is this mtk_poll function located?
EDIT2 found it, looking at it
EDIT3 this might be the time we try to use aarch64-openwrt-linux-gdb in the toolchain bin folder
(gdb) l *__mtk_ppe_check_skb+0x10c
Thanks for the heads up, we have one more crash that needs to get in there before I think we're in the clear...
Did it work for you? I'm having to get back up to speed on gdb, so I haven't uncovered any useful details yet. Did find the gdb binary, though!