IPQ806x NSS Drivers

It did not work. Thanks anyway!

can you give me the log?

Sure, just PM'd you. Thank you for checking on this.

from what i can see the wifi crash. Did you test if ethernet works?

It seemed like it did. After booting, I was able to access the router/luci via cable and even was able to upload a file in an attempt for a sysupgrade. It did upgrade but not correctly. Then I had to tftp it to recover.

Hi @Ansuel,

Got couple NSS FW Coredumps during the system bootup today.

I've setup pstore/ramoops to capture the kernel oops/panics by reducing available memory to 460M and setup pstore/ramoops after that.
Below relevant cat /proc/iomem part:

cat /proc/iomem
root@R7800:/# cat /proc/iomem
...
42000000-5e1fffff : System RAM
  42208000-42bfffff : Kernel code
  42d00000-42d9bd9f : Kernel data
5e2fffff-5e304197 : ramoops:dmesg(0/9)
5e304198-5e308330 : ramoops:dmesg(1/9)
5e308331-5e30c4c9 : ramoops:dmesg(2/9)
5e30c4ca-5e310662 : ramoops:dmesg(3/9)
5e310663-5e3147fb : ramoops:dmesg(4/9)
5e3147fc-5e318994 : ramoops:dmesg(5/9)
5e318995-5e31cb2d : ramoops:dmesg(6/9)
5e31cb2e-5e320cc6 : ramoops:dmesg(7/9)
5e320cc7-5e324e5f : ramoops:dmesg(8/9)
5e324e60-5e328ff8 : ramoops:dmesg(9/9)
5e328ff9-5e329ff8 : ramoops:console
5e329ff9-5e32a7f8 : ramoops:ftrace(0/1)
5e32a7f9-5e32aff8 : ramoops:ftrace(1/1)
5e32aff9-5e32bff8 : ramoops:pmsg

Not sure whether these NSS FW core dumps are related to that pstore/ramoops setup but below are 2 separate core dumps I got today.

Core dump 1
<6>[   17.530271] Initializing XFRM netlink socket
<6>[   17.533793] **********************************************************
<6>[   17.536718] * Driver    :NSS GMAC Driver - RTL v(3.72a)
<6>[   17.543152] * Version   :1.0
<6>[   17.548171] * Copyright :Copyright (c) 2013-2018 The Linux Foundation. All rights reserved.
<6>[   17.551302] **********************************************************
<1>[   17.673240] nss_driver - fw of size 536324  bytes copied to load addr: 40000000, nss_id : 0
<1>[   17.673889] nss_driver - Turbo Support 1
<1>[   17.680417] Supported Frequencies - 
<1>[   17.680419] 800Mhz 
<1>[   17.684645] 800Mhz 
<1>[   17.688124] 800Mhz 
<1>[   17.689949] 
<1>[   17.694367] a748e25a: meminfo init succeed
<1>[   17.719322] nss_driver - fw of size 218224  bytes copied to load addr: 40800000, nss_id : 1
<1>[   17.719689] 005584a1: meminfo init succeed
<1>[   17.726631] node size 1 # items 2
<1>[   17.726635] memory: 0 0 (avl 456900608) items 2 active_cores 2
<1>[   17.726644] a748e25a: nss core 0 booted successfully
<3>[   17.739829] debugfs: File 'n2h' in directory 'stats' already present!
<3>[   17.744958] debugfs: File 'qrfs' in directory 'stats' already present!
<3>[   17.751229] debugfs: File 'c2c_tx' in directory 'stats' already present!
<3>[   17.757731] debugfs: File 'c2c_rx' in directory 'stats' already present!
<3>[   17.764573] debugfs: File 'unaligned' in directory 'stats' already present!
<1>[   17.775682] node size 1 # items 2
<1>[   17.777899] memory: 0 0 (avl 456900608) items 2 active_cores 2
<1>[   17.781358] 005584a1: nss core 1 booted successfully
<6>[   17.811006] Mirror/redirect action on
<6>[   17.816784] u32 classifier
<6>[   17.816802]     input device check on
<6>[   17.818383]     Actions configured
<6>[   17.826910] Loading modules backported from Linux version v5.8-0-gbcf876870b95
<6>[   17.826933] Backport generated by backports.git v5.8-1-0-g79400d9e
<6>[   17.841927] xt_time: kernel timezone is -0000
<6>[   17.945216] PPP generic driver version 2.4.2
<6>[   17.946223] NET: Registered protocol family 24
<7>[   17.969410] ath10k_pci 0000:01:00.0: assign IRQ: got 39
<6>[   17.969900] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
<7>[   17.969982] ath10k_pci 0000:01:00.0: enabling bus mastering
<6>[   17.970494] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
<6>[   20.662626] EXT4-fs (sda1): recovery complete
<6>[   20.721831] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: 
<4>[   20.721903] ext4 filesystem being mounted at /mnt/rrd supports timestamps until 2038 (0x7fffffff)
<6>[   21.800022] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
<6>[   21.800089] ath10k_pci 0000:01:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 0
<6>[   21.814943] ath10k_pci 0000:01:00.0: firmware ver 10.4-3.9.0.2-00070 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 873782fb
<6>[   24.081849] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 85498734
<6>[   27.840665] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
<7>[   27.934312] ath: EEPROM regdomain sanitized
<7>[   27.934330] ath: EEPROM regdomain: 0x64
<7>[   27.934341] ath: EEPROM indicates we should expect a direct regpair map
<7>[   27.934365] ath: Country alpha2 being used: 00
<7>[   27.934437] ath: Regpair used: 0x64
<7>[   27.942255] ath10k_pci 0001:01:00.0: assign IRQ: got 41
<6>[   27.943724] ath10k_pci 0001:01:00.0: enabling device (0140 -> 0142)
<7>[   27.943845] ath10k_pci 0001:01:00.0: enabling bus mastering
<6>[   27.944603] ath10k_pci 0001:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
<6>[   28.249208] ath10k_pci 0001:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
<6>[   28.249248] ath10k_pci 0001:01:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 0
<6>[   28.261380] ath10k_pci 0001:01:00.0: firmware ver 10.4-3.9.0.2-00070 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast,no-ps,peer-fixed-rate crc32 873782fb
<6>[   30.543014] ath10k_pci 0001:01:00.0: board_file api 2 bmi_id 0:2 crc32 85498734
<6>[   34.332684] ath10k_pci 0001:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
<7>[   34.423979] ath: EEPROM regdomain sanitized
<7>[   34.423997] ath: EEPROM regdomain: 0x64
<7>[   34.424007] ath: EEPROM indicates we should expect a direct regpair map
<7>[   34.424031] ath: Country alpha2 being used: 00
<7>[   34.424041] ath: Regpair used: 0x64
<14>[   34.437965] kmodloader: done loading kernel modules from /etc/modules.d/*
<6>[   37.092767] ECM init
<6>[   37.092820] ECM database jhash random seed: 0xf86fd4c4
<6>[   37.095424] ECM init complete
<4>[   38.811887] print_req_error: 14 callbacks suppressed
<3>[   38.811894] blk_update_request: I/O error, dev mtdblock0, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
<3>[   38.816864] blk_update_request: I/O error, dev mtdblock0, sector 8 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
<3>[   38.827413] blk_update_request: I/O error, dev mtdblock0, sector 16 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
<3>[   38.838191] blk_update_request: I/O error, dev mtdblock0, sector 24 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
<3>[   38.848754] blk_update_request: I/O error, dev mtdblock0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
<3>[   38.858628] Buffer I/O error on dev mtdblock0, logical block 0, async page read
<3>[   38.878530] blk_update_request: I/O error, dev mtdblock0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
<3>[   38.878563] Buffer I/O error on dev mtdblock0, logical block 0, async page read
<3>[   38.891032] blk_update_request: I/O error, dev mtdblock1, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
<3>[   38.896112] blk_update_request: I/O error, dev mtdblock1, sector 8 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
<3>[   38.906827] blk_update_request: I/O error, dev mtdblock1, sector 16 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
<3>[   38.917571] blk_update_request: I/O error, dev mtdblock1, sector 24 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
<3>[   38.928238] Buffer I/O error on dev mtdblock1, logical block 0, async page read
<3>[   38.941544] Buffer I/O error on dev mtdblock1, logical block 0, async page read
<6>[   39.095308] ipq8064-mdio 37000000.mdio eth1: 1000 Mbps Full Duplex
<6>[   40.174228] br-lan: port 1(eth1.1) entered blocking state
<6>[   40.174254] br-lan: port 1(eth1.1) entered disabled state
<6>[   40.178920] device eth1.1 entered promiscuous mode
<6>[   40.184094] device eth1 entered promiscuous mode
<6>[   40.189621] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
<6>[   40.196833] br-lan: port 1(eth1.1) entered blocking state
<6>[   40.199457] br-lan: port 1(eth1.1) entered forwarding state
<6>[   40.217871] ipq8064-mdio 37000000.mdio eth0: 1000 Mbps Full Duplex
<6>[   41.296267] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
<6>[   41.296538] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
<7>[   42.052544] ath: EEPROM regdomain: 0x82be
<7>[   42.052563] ath: EEPROM indicates we should expect a country code
<7>[   42.055558] ath: doing EEPROM country->regdmn map search
<7>[   42.061654] ath: country maps to regdmn code: 0x3b
<7>[   42.067088] ath: Country alpha2 being used: SG
<7>[   42.071587] ath: Regpair used: 0x3b
<7>[   42.076023] ath: regdomain 0x82be dynamically updated by user
<7>[   42.079423] ath: EEPROM regdomain: 0x82be
<7>[   42.085364] ath: EEPROM indicates we should expect a country code
<7>[   42.089295] ath: doing EEPROM country->regdmn map search
<7>[   42.095462] ath: country maps to regdmn code: 0x3b
<7>[   42.100754] ath: Country alpha2 being used: SG
<7>[   42.105422] ath: Regpair used: 0x3b
<7>[   42.109777] ath: regdomain 0x82be dynamically updated by user
<4>[   48.541324] ath10k_pci 0001:01:00.0: Unknown eventid: 36933
<6>[   48.547316] wlan1: Created a NSS virtual interface
<6>[   48.563374] br-lan: port 2(wlan1) entered blocking state
<6>[   48.563403] br-lan: port 2(wlan1) entered disabled state
<6>[   48.568040] device wlan1 entered promiscuous mode
<1>[   48.581600] NSS core 0 signal COREDUMP COMPLETE 4000
<1>[   48.581632] 
<1>[   48.581632] a748e25a: Starting NSS-FW logbuffer dump for core 0
<1>[   48.585698] a748e25a: Warn: trap[813]: Trap on CHIP ID 00050000
<1>[   48.593065] a748e25a: Warn: trap[620]: Trapped: TRAP_TD(00000400) DCAPT(3C000080)
<1>[   48.598647] a748e25a: Warn: trap[645]: Trapped: Thread: 10, reason: 00001000, PC: 3F000168, previous PC: 3F000074
<1>[   48.606391] a748e25a: Warn: trap[594]: A0_3: 3F024588 3F00C000 000C0000 5D3E7880
<1>[   48.616604] a748e25a: Warn: trap[594]: A4_7: 3F000078 3F000934 3F014C98 3F006F08
<1>[   48.624062] a748e25a: Warn: trap[599]: D0_3: 00000004 0000003F 00000060 0000005A
<1>[   48.631371] a748e25a: Warn: trap[599]: D4_7: 00000001 20000000 FFFF8000 00008100
<1>[   48.638799] a748e25a: Warn: trap[599]: D8_11: 39003000 3FFF0000 00000001 00000000
<1>[   48.646210] a748e25a: Warn: trap[599]: D12_15: 00000001 0000005A 001C0021 00000000
<1>[   48.653577] a748e25a: Warn: trap[649]: Thread_10 has non-recoverable trap
<1>[   48.664165] NSS core 1 signal COREDUMP COMPLETE 4000
<1>[   48.667833] 
<1>[   48.667833] 005584a1: Starting NSS-FW logbuffer dump for core 1
<0>[   48.672955] Kernel panic - not syncing: NSS FW coredump: bringing system down
<2>[   48.680262] CPU1: stopping
<4>[   48.687289] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.72 #0
<4>[   48.689884] Hardware name: Generic DT based system
<4>[   48.695736] [<c0311068>] (unwind_backtrace) from [<c030c4e4>] (show_stack+0x14/0x20)
<4>[   48.700592] [<c030c4e4>] (show_stack) from [<c09272c0>] (dump_stack+0x94/0xa8)
<4>[   48.708485] [<c09272c0>] (dump_stack) from [<c0310088>] (handle_IPI+0x364/0x39c)
<4>[   48.715516] [<c0310088>] (handle_IPI) from [<c0602f50>] (gic_handle_irq+0xb4/0xb8)
<4>[   48.723064] [<c0602f50>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
<4>[   48.730429] Exception stack(0xdb8cdf08 to 0xdb8cdf50)
<4>[   48.738000] df00:                   00000000 0000000b 1b186000 dbdddb40 db073c00 00000000
<4>[   48.743043] df20: 0000000b 0000000b c0d5e5f0 5591a900 dbddcef0 00000000 00000015 db8cdf58
<4>[   48.751190] df40: c075f4a0 c075f4c0 60000013 ffffffff
<4>[   48.759355] [<c0301a8c>] (__irq_svc) from [<c075f4c0>] (cpuidle_enter_state+0x178/0x650)
<4>[   48.764388] [<c075f4c0>] (cpuidle_enter_state) from [<c075f9dc>] (cpuidle_enter+0x30/0x4c)
<4>[   48.772548] [<c075f9dc>] (cpuidle_enter) from [<c034f6dc>] (do_idle+0x1d8/0x240)
<4>[   48.780615] [<c034f6dc>] (do_idle) from [<c034f9ec>] (cpu_startup_entry+0x1c/0x20)
<4>[   48.788165] [<c034f9ec>] (cpu_startup_entry) from [<423025cc>] (0root@R7800:/sys/fs/pstore#
Core dump 2
<6>[   15.172646] Initializing XFRM netlink socket
<6>[   15.175880] **********************************************************
<6>[   15.179149] * Driver    :NSS GMAC Driver - RTL v(3.72a)
<6>[   15.185540] * Version   :1.0
<6>[   15.190632] * Copyright :Copyright (c) 2013-2018 The Linux Foundation. All rights reserved.
<6>[   15.193830] **********************************************************
<1>[   15.334045] nss_driver - fw of size 536324  bytes copied to load addr: 40000000, nss_id : 0
<1>[   15.334721] nss_driver - Turbo Support 1
<1>[   15.341225] Supported Frequencies - 
<1>[   15.341227] 800Mhz 
<1>[   15.345462] 800Mhz 
<1>[   15.348928] 800Mhz 
<1>[   15.350758] 
<1>[   15.355151] ef620990: meminfo init succeed
<1>[   15.381108] nss_driver - fw of size 218224  bytes copied to load addr: 40800000, nss_id : 1
<1>[   15.381387] 3292c01e: meminfo init succeed
<3>[   15.388463] debugfs: File 'n2h' in directory 'stats' already present!
<3>[   15.392491] debugfs: File 'qrfs' in directory 'stats' already present!
<3>[   15.398954] debugfs: File 'c2c_tx' in directory 'stats' already present!
<3>[   15.405454] debugfs: File 'c2c_rx' in directory 'stats' already present!
<3>[   15.412295] debugfs: File 'unaligned' in directory 'stats' already present!
<1>[   15.419244] node size 1 # items 2
<1>[   15.425679] memory: 0 0 (avl 456900608) items 2 active_cores 2
<1>[   15.429082] ef620990: nss core 0 booted successfully
<1>[   15.434995] NSS core 0 signal COREDUMP COMPLETE 4000
<1>[   15.439943] 
<1>[   15.439943] ef620990: Starting NSS-FW logbuffer dump for core 0
<1>[   15.444959] ef620990: Warn: trap[813]: Trap on CHIP ID 00050000
<1>[   15.452327] ef620990: Warn: trap[620]: Trapped: TRAP_TD(00000084) DCAPT(3C000080)
<6>[   15.455220] Mirror/redirect action on
<1>[   15.457904] ef620990: Warn: trap[645]: Trapped: Thread: 2, reason: 00000800, PC: 40047480, previous PC: 4004747C
<1>[   15.469193] ef620990: Warn: trap[594]: A0_3: 3F00C0D0 3F02F8F4 00000001 5D094800
<6>[   15.470173] u32 classifier
<1>[   15.479492] ef620990: Warn: trap[594]: A4_7: 3F02FA88 00000000 5C759080 3F00AF40
<1>[   15.479497] ef620990: Warn: trap[599]: D0_3: 00000001 00000001 00000041 00000001
<6>[   15.486879]     input device check on
<1>[   15.489303] ef620990: Warn: trap[599]: D4_7: 00000001 00000000 00000000 00000000
<6>[   15.496906]     Actions configured
<1>[   15.504283] ef620990: Warn: trap[599]: D8_11: 00000000 00000000 00000000 00000000
<6>[   15.512646] Loading modules backported from Linux version v5.8-0-gbcf876870b95
<1>[   15.515334] ef620990: Warn: trap[599]: D12_15: 00000000 00000000 00000000 00000000
<6>[   15.518473] Backport generated by backports.git v5.8-1-0-g79400d9e
<1>[   15.526085] ef620990: Warn: trap[649]: Thread_2 has non-recoverable trap
<1>[   15.526170] node size 1 # items 2
<6>[   15.544494] xt_time: kernel timezone is -0000
<1>[   15.546942] memory: 0 0 (avl 456900608) items 2 active_cores 2
<1>[   15.561279] 3292c01e: nss core 1 booted successfully
<1>[   15.567215] NSS core 1 signal COREDUMP COMPLETE 4000
<1>[   15.572174] 
<1>[   15.572174] 3292c01e: Starting NSS-FW logbuffer dump for core 1
<0>[   15.577095] Kernel panic - not syncing: NSS FW coredump: bringing system down
<2>[   15.584454] CPU1: stopping
<4>[   15.591470] CPU: 1 PID: 224 Comm: kmodloader Not tainted 5.4.72 #0
<4>[   15.594073] Hardware name: Generic DT based system
<4>[   15.600254] [<c0311068>] (unwind_backtrace) from [<c030c4e4>] (show_stack+0x14/0x20)
<4>[   15.605026] [<c030c4e4>] (show_stack) from [<c09272c0>] (dump_stack+0x94/0xa8)
<4>[   15.612918] [<c09272c0>] (dump_stack) from [<c0310088>] (handle_IPI+0x364/0x39c)
<4>[   15.619953] [<c0310088>] (handle_IPI) from [<c0602f50>] (gic_handle_irq+0xb4/0xb8)
<4>[   15.627502] [<c0602f50>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
<4>[   15.634874] Exception stack(0xda5cfd78 to 0xda5cfdc0)
<4>[   15.642428] fd60:                                                       0000001c 0000001c
<4>[   15.647472] fd80: 000e131c 000d6d1c 00000000 00000008 dd84d754 00007b38 00007b40 00000000
<4>[   15.655632] fda0: c030d984 000148f0 dd855294 da5cfdc8 c05af3c0 c030d9a0 60000013 ffffffff
<4>[   15.663786] [<c0301a8c>] (__irq_svc) from [<c030d9a0>] (cmp_rel+0x1c/0x3c)
<4>[   15.671947] [<c030d9a0>] (cmp_rel) from [<c05af3c0>] (sort_r+0xe4/0x1e0)
<4>[   15.678714] [<c05af3c0>] (sort_r) from [<c05af4d8>] (sort+0x1c/0x24)
<4>[   15.685571] [<c05af4d8>] (sort) from [<c030de10>] (module_frob_arch_sections+0x180/0x2cc)
<4>[   15.691912] [<c030de10>] (module_frob_arch_sections) from [<c03ac460>] (load_module+0x438/0x248c)
<4>[   15.699985] [<c03ac460>] (load_module) from [<c03ae61c>] (sys_init_module+0x168/0x19c)
<4>[   15.708834] [<c03ae61c>] (sys_init_module) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
<4>[   15.716647] Exception stack(0xda5cffa8 to 0xda5cfff0)
<4>[   15.724896] ffa0:                   00000000 00000000 b6e60010 000d99fc 000129d5 00000014
<4>[   15.729850] ffc0: 00000000 00000000 00000003 00000080 000d99fc 00000000 01257a00 00000000
<4>[   15.738007] ffe0: bea16d14 bea16cf8 00011e14 b6f9ef24

Edit:
These are after fresh git clone master & your nss repo merge compilation today.
OpenWrt SNAPSHOT r14740+63-0b31713c85 / LuCI Master git-20.297.75914-63aea8f

Edit 2:
I would consider the above Core dumps False Positive triggered by the earlier pstore/ramoops configuration.

  • Without pstore/ramoops setup I don't have issues during reboot
  • With the new adjusted pstore/ramoops setup I don't have issues during reboot
Current related /proc/iomem
42000000-5affffff : System RAM
  42208000-42bfffff : Kernel code
  42d00000-42d9bd9f : Kernel data
5bffffff-5c004197 : ramoops:dmesg(0/9)
5c004198-5c008330 : ramoops:dmesg(1/9)
5c008331-5c00c4c9 : ramoops:dmesg(2/9)
5c00c4ca-5c010662 : ramoops:dmesg(3/9)
5c010663-5c0147fb : ramoops:dmesg(4/9)
5c0147fc-5c018994 : ramoops:dmesg(5/9)
5c018995-5c01cb2d : ramoops:dmesg(6/9)
5c01cb2e-5c020cc6 : ramoops:dmesg(7/9)
5c020cc7-5c024e5f : ramoops:dmesg(8/9)
5c024e60-5c028ff8 : ramoops:dmesg(9/9)
5c028ff9-5c029ff8 : ramoops:console
5c029ff9-5c02a7f8 : ramoops:ftrace(0/1)
5c02a7f9-5c02aff8 : ramoops:ftrace(1/1)
5c02aff9-5c02bff8 : ramoops:pmsg

For whatever reason (most likely the point where I cut the RAM availability) NSS FW was not happy and was randomly crashing either during the reboot or after some time of uptime (max 1h). Other drivers/applications did not though triggered crashes.

1 Like

Could you provide your cmdline argument for the kernel? I'm trying to set this up too on my r7800.

1 Like

Hi qosmio,

Below related dmesg messages including the kernel command line arguments.

[    0.000000] Kernel command line: mem=400M ramoops.mem_address=0x5BFFFFFF ramoops.mem_size=0x2C000 ramoops.record_size=0x4000
[    0.020532] ramoops: using module parameters
[    0.021017] pstore: Registered ramoops as persistent store backend
[    0.021034] ramoops: using 0x2c000@0x5bffffff, ecc: 0

With above kernel cmdline parameters my R7800 seem to run now "normally" and system reports 409M total memory.
[ 0.000000] Memory: 394424K/409600K available

Notes:

  • At least NSS FW seem to be sensitive where the available memory is cut. Can be platform memory management related but I'm not knowledgeable enough to be able to comment more.
  • I changed the record_size from default 4k to 16k (ramoops.record_size=0x4000). Record size should be a power of 2.
  • Be prepared to tftp working factory images + backup (without pstore/ramoops changes) :wink:
  • in pstore setting I only selected below to keep it as simple as possible (removed the default selected deflate compression).
<*>   Persistent store support
<*>     Log panic/oops to a RAM buffer
2 Likes

Thanks for this! Was banging my head over how to get it all working. The offsets were confusing and I tried various "memmap=1M!0", "memmap=1M@0", "memmap=1M$0" since I didn't know what regions to allocate initially. But this works. I slightly modified it to mem=419M that seems to be the maximum I could do before the NSS cores starting panicing.

I've also compiled in support for pstore_ftrace, but for obvious performance reasons, am only going to enable it during off hours. Hopefully I can capture full trace dump during off-peak hours with it enabled.

Good it works. Did several hours of fiddling with that mem parameter over the weekend but returned that 400M at the end of it.

Can you share your cat /proc/iomem for System RAM part as system is rounding the value defined with mem parameter?

Mine with mem=400M ends up for below that's the 409M seen by system?

42000000-5affffff : System RAM

A slight change. Had to reduce it to 415M. Even though it booted up fine, and showed in /sys/fs/pstore it wouldn't show up in /proc/iomem. So far everything's been stable with that size.

Here's the /proc/iomem:

42208000-42afffff : Kernel code
42c00000-42c9535f : Kernel data
5bffffff-5c004197 : ramoops:dmesg(0/9)
5c004198-5c008330 : ramoops:dmesg(1/9)
5c008331-5c00c4c9 : ramoops:dmesg(2/9)
5c00c4ca-5c010662 : ramoops:dmesg(3/9)
5c010663-5c0147fb : ramoops:dmesg(4/9)
5c0147fc-5c018994 : ramoops:dmesg(5/9)
5c018995-5c01cb2d : ramoops:dmesg(6/9)
5c01cb2e-5c020cc6 : ramoops:dmesg(7/9)
5c020cc7-5c024e5f : ramoops:dmesg(8/9)
5c024e60-5c028ff8 : ramoops:dmesg(9/9)
5c028ff9-5c029ff8 : ramoops:console
5c029ff9-5c02a7f8 : ramoops:ftrace(0/1)
5c02a7f9-5c02aff8 : ramoops:ftrace(1/1)
5c02aff9-5c02bff8 : ramoops:pmsg

/proc/cmdline

mem=415M ramoops.mem_address=0x5BFFFFFF ramoops.mem_size=0x2C000 ramoops.record_size=0x4000

Total memory ends up being 402M
When I had it set to mem=400M the total reported was 380M.

Hi qosmio,

One thing to note is the ramoops starting address, ramoops.mem_address=0x5BFFFFFF that need to be beyond available System RAM end address or it will be cleared during the restart.
I selected that 0x5BFFFFFF without any logic just to be beyond the System RAM end address (0x5AFFFFFF) when selecting mem=400M.

42000000-5affffff : System RAM

That selected ramoops.mem_address=0x5BFFFFFF is actually starting at 425M.
If you want to try higher mem values I'd suggest to try to move mem_address at the end of the available RAM, something like ramoops.mem_address=0x5F000000 (starting at 475M).

R7800 System RAM is located at 0x42000000 - 0x5FFFFFFF without kernel cmdline mem parameter cutting it smaller.

ramoops.mem_address + ramoops.mem_size cannot be bigger than 0x5FFFFFFF.

I'll try that mem=415M/419M over the weekend as cannot do any changes during the working days.

So my build (With @Ansuel's last patches - r14530+271-9085343) was stable for about 5 days.
Performance was consistently good both wifi and wired.
@Gram and @qosmio, is this ramoops thing supposed to catch the crashes?
If yes, how do you set it up and how do you look at the captured info ?

Ramoops is a linux kernel functionality since 2011 and designed for that particular purpose. It just seem it is not widely discussed in the forum.

My hunt for crash logs with the ramoops started after last Thursday router restart during the MS Teams conference call with customer (luckily was not presenting at the time) and my remote syslog server did not capture any related logs.

Basically you need to activate pstore/ramoops in the kernel using kernel_menuconfig or modifying dts file. As I've no knowledge of dts modifications, I modified kernel parameters (make kernel_menuconfig).

  1. Activate pstore with ramoops as backend (File systems --> Miscellaneous filesystems --> Persistent store support (x) --> Log panic/oops to a RAM buffer (x))
<*>   Persistent store support
<*>     Log panic/oops to a RAM buffer
  1. Define kernel cmdline parameters for ramoops (Boot options --> Default kernel command string)
mem=400M ramoops.mem_address=0x5BFFFFFF ramoops.mem_size=0x2C000 ramoops.record_size=0x4000
  1. After a crash you need to mount pstore to check if any logs captured. I've added below to /etc/fstab for easy mount after.
root@R7800:~# cat /etc/fstab 
# <file system> <mount point> <type> <options> <dump> <pass>
pstore /sys/fs/pstore pstore default 0 0

Obviously messing with kernel_menuconfig parameters may at least softbrick your device so a working factory image + backup is highly recommended.

I have now 3d uptime with above parameters so it is not causing at least any severe issues by itself.

Take note on the other observations/considerations in the above discussions with qosmio.

2 Likes

Thank you for sharing. I'll have to try it out.
In the mean time it seems that a simple daily reboot will have to do :slight_smile:

What about this Bug can this releated to the Random Reboots with the R7800 ?

The Workaround is to set the CPU governor to Performance.

I've been testing the latest availble code from Ansuals github. both original and rebased on top of the latest master.
I use the R7800 as a repeater where it is a client of my main workhorsee (Linksys WRT1900AC-V2)
it's connected via a dedicated SSID on 5GHZ. same radio has a second SSID that is identical to the one on the Linksys and thus i have the same SSID all around the house.
What i experience is that with the NSS offloading drivers enabled, wifi starts to stutter and speeds eventually drop to zero after about half a day. this happens when the client is on the linksys as well as on the R7800. First few minutes after connecting are ok, but after a few minutes the connections just slows and dies eventually.
An older Iphone 7 we have seems to suffer worse from it than the most recent Iphone, and the Laptop i have with an Intel AC chipset seems to be less effected, but eventually all show it.

Might anyone have a clue how to fix or tweak this. It might even be a bug in the linksys, that is triggered by some behavior of the NSS stack. no idea.

I'm not quite sure, if in your use case (AP repeater) NSS acceleration is actually involved.
But to make sure, you can disable the option "Enable NSS support for IPQ platform" (under kmod-mac80211) and give it a try.
Another possibility would be the debug mode of hostapd (https://openwrt.org/docs/guide-developer/debugging#wireless, loglevel)
Maybe there is something useful in the log.

Ansuel's changes accelerate wifi as well. I don't really need the acceleration, but i just wanted to test the code to be able to give some more feedback :slight_smile:

Anyone know if the port forwarding issue is still there? Do any of you have port forwards from wan to internal ip's active and working?