After almost 6 days the system restarted itself.
At that time the activity was low or very low on the network.
Also the problem seems linked to the NSS core. I think it is not the general problem with automatic restarts, but rather that it is linked to the NSS offload

<5>[180534.588538] sd 2:0:0:0: [sdb] Attached SCSI removable disk
<1>[458908.011865] NSS core 0 signal COREDUMP COMPLETE 4000
<1>[458908.011937] bf249500: Starting NSS-FW logbuffer dump for core 0
<1>[458908.015948] bf249500: Warn: trap[813]: Trap on CHIP ID 00050000
<1>[458908.022083] bf249500: Warn: trap[620]: Trapped: TRAP_TD(00000004) DCAPT(3C000080)
<1>[458908.028000] bf249500: Warn: trap[645]: Trapped: Thread: 2, reason: 00000020, PC: 4002FBF4, previous PC: 4002FBF0
<1>[458908.035369] bf249500: Warn: trap[594]: A0_3: 4301DBD0 402321C0 3F0218C8 4301DBD2
<1>[458908.045695] bf249500: Warn: trap[594]: A4_7: 4301DBD2 40053D04 3F0218C8 3F00AEF0
<1>[458908.053172] bf249500: Warn: trap[599]: D0_3: 00000026 00000001 00000009 4301DBC0
<1>[458908.060625] bf249500: Warn: trap[599]: D4_7: 00060000 00000026 C0A8000A 000005DC
<1>[458908.068090] bf249500: Warn: trap[599]: D8_11: FFFFFFFF 1F0D5331 47F0A2FD 00000000
<1>[458908.075556] bf249500: Warn: trap[599]: D12_15: 00000000 00000000 00D84001 00005805
<1>[458908.083021] bf249500: Warn: trap[649]: Thread_2 has non-recoverable trap
<6>[458908.092661] ipq8064-mdio 37000000.mdio eth1: nss_gmac_xmit_frames: dropping skb
<1>[458908.100892] NSS core 1 signal COREDUMP COMPLETE 4000
<1>[458908.104993] bf24dc00: Starting NSS-FW logbuffer dump for core 1
<0>[458908.110019] Kernel panic - not syncing: NSS FW coredump: bringing system down
<2>[458908.116008] CPU1: stopping
<4>[458908.123028] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.74 #0
<4>[458908.125720] Hardware name: Generic DT based system
<4>[458908.131975] [<c030f6e0>] (unwind_backtrace) from [<c030b07c>] (show_stack+0x14/0x20)
<4>[458908.136664] [<c030b07c>] (show_stack) from [<c064f524>] (dump_stack_lvl+0x40/0x4c)
<4>[458908.144648] [<c064f524>] (dump_stack_lvl) from [<c030dea4>] (do_handle_IPI+0x12c/0x184)
<4>[458908.152114] [<c030dea4>] (do_handle_IPI) from [<c030df14>] (ipi_handler+0x18/0x2c)
<4>[458908.160446] [<c030df14>] (ipi_handler) from [<c037b644>] (handle_percpu_devid_irq+0x80/0x168)
<4>[458908.167828] [<c037b644>] (handle_percpu_devid_irq) from [<c0374f20>] (handle_domain_irq+0x64/0x98)
<4>[458908.176508] [<c0374f20>] (handle_domain_irq) from [<c066991c>] (gic_handle_irq+0x80/0xb4)
<4>[458908.185449] [<c066991c>] (gic_handle_irq) from [<c0300b7c>] (__irq_svc+0x5c/0x78)
<4>[458908.193778] Exception stack(0xc146df20 to 0xc146df68)
<4>[458908.201334] df20: 00000000 0001a15f 1cd56000 dd99c480 00000000 def6a8c0 c1d68840 00000000
<4>[458908.206459] df40: dd99b6f0 0001a15f 00000000 0001a15f 56c4eaa0 c146df70 c07fa574 c07fa594
<4>[458908.214700] df60: 60000013 ffffffff
<4>[458908.222942] [<c0300b7c>] (__irq_svc) from [<c07fa594>] (cpuidle_enter_state+0x180/0x37c)
<4>[458908.226681] [<c07fa594>] (cpuidle_enter_state) from [<c07fa7e0>] (cpuidle_enter+0x3c/0x5c)
<4>[458908.234752] [<c07fa7e0>] (cpuidle_enter) from [<c0353008>] (do_idle+0x1e8/0x298)
<4>[458908.242912] [<c0353008>] (do_idle) from [<c03533b4>] (cpu_startup_entry+0x1c/0x20)
<4>[458908.250549] [<c03533b4>] (cpu_startup_entry) from [<42301530>] (0x42301530)

Only dmesg-ramoops-0 was generate. No dmesg-ramoops-1.

1 Like

Yes, this is a restart caused by the NSS firmware core dump. I have been seeing this recently as well for my R7800, which in fact restarted itself yesterday with similar log after being up for two weeks.

Interestingly my Askey RT-4230W is still running (both R7800 and the Askey running 21.02 - 5.4 builds) after 50 days.

Unfortunately we can't really do anything about this type of reboots.

3 Likes

@ACwifidude, all others ...

blue-sky wondering here ... If the router can take itself down gracefully enough to log the crime, can it re-spawn the task(s) which interact with the NSS cores?

a more targeted reboot.

I also had a reboot today, under heavy network load with 2000+ connections, which is sort of justifiable

Probably worth a try. When the nss-drv driver loads the firmware for a core, it resets the NSS core, but that was before any other modules that depends on the nss-drv loads and runs.

When the NSS firmware crashed, all dependent NSS driver have to be reset as well after that NSS core is reset. This entails a lot of trial and error and may make the system unstable tho.

@ACwifidude, @quarky ... and anyone else interested in kernel patches to enhance TCP throughput - I read (most of) Aggregating Without Bloating: Hard Times for TCP on Wi-Fi -- and for all I know, some of you may have helped author it (!) BUT it has some recent concrete changes to linux tcp which may invite a look here.

Tested are linux ATH9, ATH10 drivers with linux ipv4 patched to enhance TSQ tunability, and report in their Conclusions section:

We developed and tested a
patch that enables to tune the default TSQ size to address the
issue, which is now included in the Linux kernel mainline.

Given the recent dating (Oct 5, 2022) This looks like some new work in buffer bloat and changing TSQ for increased throughput ('goodput'?) as well as effects on the set of linux traffic control algorithms available.

If nothing else, it looks like an excellent overview of the issues, mechanisms to deal with them as well as concrete research to back everything up. The middle is pretty thick and sciencey. But there are graphs. in color.

But the patch is relatively short and may well warrant inclusion in @ACwifidude's tree as it provides a couple of knobs to turn to get some real improvements.

The whole package (1+Gb) is linked in reference 42 - the patches and pdf are available Doc and TSQ_patch. BBRp_patch.

I should note that 1 change in TSQ.patch is to set Italy as the country for regdb purposes. starts at line 36. Going to skip that one.

I have looked at an openwrt master kernel for ramips_mt7621 but they aren't there yet; the authors applied them against kernel 5.4 for their testing.

2 Likes

As an update, I'm trying to integrate the patch into my (@ACwifidude's) 22.03 kernel 5.10 tree but it's not a clean matchup -- describing the new data to existing TCP has changed.

I've emailed one of the article's authors as to where in the mainline kernel this lives.

I obviously haven't dealt with mainline kernels at the patch level so if anyone has suggestions for git trees to examine I'd appreciate them.

M.

  Hi people just a heads up about the reboots. OpenWrt master is making my r7800 reboot with out of the NSS patches. I was just using stock master with Kernel 5.15 and addblock luci plus SQM.
1 Like

same, i got the same packages installed too, that’s all

1 Like

@tohojo is listed as one of the author, so I believe this should make it's way into OpenWrt sooner or later.

At first glance, this is way over my head tbh :stuck_out_tongue: I'm still trying to figure out the ath10k flow ... haha.

PSA: @ACwifidude Not sure if you're aware that the QCA repository in the Code Aurora Git repository will no longer be accessible from 1 Apr 2023. They have moved to Code Linaro Git repository. New clones of the NSS branch that uses the Code Aurora Git repo will not be able to get the source codes from 1 Apr 2023.

The URLs in the Makefiles will have to point to the Code Linaro Git repo.

3 Likes

@ACwifidude These should already be updated in my nss repo.

4 Likes

Took a stab at modifying the patch for 5.15. I was able to build and boot with it. Haven't done any extensive test yet. Not sure why they hardcoded Italy in regd.c vs. setting it using regulatory.db

master branch does not include this patch.
@Ansuel
Is it necessary for stability?

1 Like

Lots of interesting stuff lately.
@tapper and @Pmtroutok do you use ondemand scheduler. Have you tried performance governor just to check if there are any reboots.

it got backported to 5.15 and present upstream

2 Likes

I mistake. I had overlooked it being included in the master branch.
I think there might be something in clk-krait.c, so
invesitigated related patches.

I've done the same - but changing the 2 key tcp_limit params in /proc hasn't changed thruput one bit.

I'm going to drill down into what's in the way, but that will take some time.

I'm also thinking about moving this work over to a dual-core MT7921 router with pathologicaly poor 5ghz performance (netgear AC2400, R6700v2) - there's less going on for that target and the poor performance has been explained away as a weak cpu.

It's also a spare router i can setup as a test platform ...

ADDING an afterthought - I'm wondering if calls to this area of tcp have been patched around by openwrt in favor of other pacing/aggregation algorithms.

results this un-flaky and regular have to be coded!

1 Like

hello what is the latest status for ZyXEL NBG6817 ? Can I use it ?

Hi, errors on qca-nss-drv compiling:

make[3]: Entering directory '/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/linux-5.15.76'
  CC [M]  /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_cmn.o
  CC [M]  /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.o
  CC [M]  /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_coredump.o
  CC [M]  /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_drv_stats.o
In file included from /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_coredump.c:24:
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h: In function 'nss_core_dma_cache_maint':
In file included from /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.c:22:
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h: In function 'nss_core_dma_cache_maint':
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:116:17: error: implicit declaration of function 'dmac_inv_range'; did you mean 'outer_inv_range'? [-Werror=implicit-function-declaration]
  116 |                 dmac_inv_range(start, start + size);
      |                 ^~~~~~~~~~~~~~
      |                 outer_inv_range
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:116:17: error: implicit declaration of function 'dmac_inv_range'; did you mean 'outer_inv_range'? [-Werror=implicit-function-declaration]
  116 |                 dmac_inv_range(start, start + size);
      |                 ^~~~~~~~~~~~~~
      |                 outer_inv_range
In file included from /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_hal/include/nss_hal.h:26,
                 from /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_tx_rx_common.h:25,
                 from /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_cmn.c:26:
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h: In function 'nss_core_dma_cache_maint':
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:116:17: error: implicit declaration of function 'dmac_inv_range'; did you mean 'outer_inv_range'? [-Werror=implicit-function-declaration]
  116 |                 dmac_inv_range(start, start + size);
      |                 ^~~~~~~~~~~~~~
      |                 outer_inv_range
In file included from /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_drv_stats.c:17:
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h: In function 'nss_core_dma_cache_maint':
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:119:17: error: implicit declaration of function 'dmac_clean_range'; did you mean 'dmac_flush_range'? [-Werror=implicit-function-declaration]
  119 |                 dmac_clean_range(start, start + size);
      |                 ^~~~~~~~~~~~~~~~
      |                 dmac_flush_range
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:116:17: error: implicit declaration of function 'dmac_inv_range'; did you mean 'outer_inv_range'? [-Werror=implicit-function-declaration]
  116 |                 dmac_inv_range(start, start + size);
      |                 ^~~~~~~~~~~~~~
      |                 outer_inv_range
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:119:17: error: implicit declaration of function 'dmac_clean_range'; did you mean 'dmac_flush_range'? [-Werror=implicit-function-declaration]
  119 |                 dmac_clean_range(start, start + size);
      |                 ^~~~~~~~~~~~~~~~
      |                 dmac_flush_range
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:119:17: error: implicit declaration of function 'dmac_clean_range'; did you mean 'dmac_flush_range'? [-Werror=implicit-function-declaration]
  119 |                 dmac_clean_range(start, start + size);
      |                 ^~~~~~~~~~~~~~~~
      |                 dmac_flush_range
/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.h:119:17: error: implicit declaration of function 'dmac_clean_range'; did you mean 'dmac_flush_range'? [-Werror=implicit-function-declaration]
  119 |                 dmac_clean_range(start, start + size);
      |                 ^~~~~~~~~~~~~~~~
      |                 dmac_flush_range
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:289: /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_coredump.o] Error 1
make[4]: *** Waiting for unfinished jobs....
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:289: /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_cmn.o] Error 1
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:289: /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_drv_stats.o] Error 1
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:289: /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/nss_core.o] Error 1
make[3]: *** [Makefile:1901: /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43] Error 2
make[3]: Leaving directory '/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/linux-5.15.76'
make[2]: *** [Makefile:288: /home/ubuntu/Desktop/ipq806x-kernel515-ac2203/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/qca-nss-drv-2020-03-20-3cfb9f43/.built] Error 2
make[2]: Leaving directory '/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/feeds/nss/qca-nss-drv'
time: package/feeds/nss/qca-nss-drv/compile#5.15#0.55#2.72
    ERROR: package/feeds/nss/qca-nss-drv failed to build.
make[1]: *** [package/Makefile:116: package/feeds/nss/qca-nss-drv/compile] Error 1
make[1]: Leaving directory '/home/ubuntu/Desktop/ipq806x-kernel515-ac2203'
make: *** [/home/ubuntu/Desktop/ipq806x-kernel515-ac2203/include/toplevel.mk:231: package/feeds/nss/qca-nss-drv/compile] Error 2
ubuntu@ubuntu:~/Desktop/ipq806x-kernel515-ac2203$ 

kernel 5.15 branch, default configs. How to solve this? Thanks.