My R4S has lasted twice now for about exactly a week uptime at which some point the CPU pegs 100% and locks up. I've tried outputting the syslog to file to see if I can capture some error message but theres nothing relevant in there.
The fact that it exactly lasted the same amount of days (Monday to Monday) both times now is really strange.
Packages being used: upnp, collectd, ddns, SQM QoS, pbr, three WG clients, one WG server
No wifi on my APU4D4; just Ethernet and a couple of Raspberry Pi Zero devices connected directly via USB using cdc_ether and an internal mSATA SSD. Mysterious CPU frequency issue there.
I run OpenWRT on a separate Ubiquiti U6 LR v1 for wifi.
Went hunting down for X86 kernel changes and found this commit - #16317
The issues before the commit seemed exactly what I was experiencing on my end after the commit. Compiled a version whilst reverting the changes in #16317 and voilá, my CPU runs as before.
The changes to the config of linux 6.6 kernel may help with modern CPUs like Cezanne but for my older CPU with Jaguar cores it introduced the issues that the commit was trying to address for moderns CPUs.
It's beyond my competence to understand which change introduced the regression. It'd be great if someone more knowledgeable reviewed them again to ensure there isn't a regression for older x86 CPUs like the one in the PC Engines APU2E4.
Issues with router webservers not fetching data still persist. Yamon reports page is still broken. This one is tricky as not even sure where to look for it.
BTHH5. Crash and reboot. First seen about an hour. Next after 30 hours. I don't know what to post that will be helpful, nothing obvious in error logs before crash.
My gateway is an APU2E4, too, but I'm not seeing anything strange on mine. Been running 24.10.0 for a couple weeks, just looked at it:
$ cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
schedutil
$ stats # My little awk script that looks at /sys/devices/cpu...
2025-02-26T08:19:06 PST
CPU 0: 598.875 MHz
CPU 1: 598.871 MHz
CPU 2: 598.883 MHz
CPU 3: 598.882 MHz
acpitz : 50.0°C
Maybe because I'm running coreboot 4.17?
$ dmidecode --type 0
# dmidecode 3.5
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: coreboot
Version: v4.17.0.1
Release Date: 06/22/2022
ROM Size: 8 MB
Characteristics:
PCI is supported
PC Card (PCMCIA) is supported
BIOS is upgradeable
Selectable boot is supported
ACPI is supported
Targeted content distribution is supported
BIOS Revision: 4.13
Firmware Revision: 0.0
Thanks for chiming in. Acid8000 had also confirmed no issues on his end running coreboot v4.19.0.1 like mine. I can't really explain why it happens on mine but not on others but after removing the commit it ran as before. Governor is the same on both versions and from htop it's clear it's not a process ramping up the CPU. Maybe some hardware revision on my end, no idea.
Do you also not run a WiFi card? I may try to yank it out and test to see if it makes any difference.
Here's my BIOS version:
dmidecode --type 0
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: coreboot
Version: v4.19.0.1
Release Date: 01/31/2023
ROM Size: 8192 kB
Characteristics:
PCI is supported
PC Card (PCMCIA) is supported
BIOS is upgradeable
Selectable boot is supported
ACPI is supported
Targeted content distribution is supported
BIOS Revision: 0.0
Firmware Revision: 0.0
And hardware version
dmidecode --type 1
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: PC Engines
Product Name: apu2
Version: 1.0
Serial Number: 1378467
UUID: Not Settable
Wake-up Type: Reserved
SKU Number: 4 GB
Family: Not Specified
Here’s my own hardware and Coreboot info if it’s any help:
BusyBox v1.36.1 (2025-02-04 19:21:11 UTC) built-in shell (ash)
_______ ________ __
| |.-----.-----.-----.| | | |.----.| |_
| - || _ | -__| || | | || _|| _|
|_______|| __|_____|__|__||________||__| |____|
|__| W I R E L E S S F R E E D O M
-----------------------------------------------------
OpenWrt 24.10.0, r28427-6df0e3d02a
-----------------------------------------------------
root@router:~# dmidecode --type 0
# dmidecode 3.5
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: coreboot
Version: v4.19.0.1
Release Date: 01/31/2023
ROM Size: 8 MB
Characteristics:
PCI is supported
PC Card (PCMCIA) is supported
BIOS is upgradeable
Selectable boot is supported
ACPI is supported
Targeted content distribution is supported
BIOS Revision: 0.0
Firmware Revision: 0.0
root@router:~# dmidecode --type 1
# dmidecode 3.5
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: PC Engines
Product Name: apu4
Version: 1.0
Serial Number: 1432635
UUID: Not Settable
Wake-up Type: Reserved
SKU Number: 4 GB
Family: Not Specified
root@router:~#
Thanks for sharing. All seems the same to my machine as expected. When I have some time I'll try 24.10 without the WiFi radio plugged in. Can't think of anything else that would justify the erratic CPU freq. On my end, the kernel changes seem to have a regression somewhere.
Correct, it's a router only with its two LAN cables out to switches, and wifi is served up by dedicated APs. The only non-network related connection is a USB cable to a UPS, and the router is running apcupsd.
Thanks @efahl and @acid8000 for your helpful inputs. Indeed, after physically removing the wifi card (Compex wle600vx) the CPU achieves the lower frequency as expected. It seems that the kernel commit to move to a tickless timer doesn't play well with an older CPU paired with an mPCIe wifi card based on the Qualcomm Atheros QCA9882 chipset. It may be an edge case as it may be a combination of these two chips only. Unclear without another card being tested.
I still have the issue with webservers in v24.10.0 not fetching data albeit no issues with serving other devices and routing traffic when connected to the router. All internet connectivity is fine from the router and connected devices but weirdly not when the router is serving a webpage.
EDIT: deleting images to avoid burdening the server.
There seem to be a number of issues with dnsmasq and unbound. My issue is likely related to the resolution of some addresses that aren't resolved. Using unbound, www.gstatic.com isn't resolved at all.
I am experiencing similar issues with Rockchip CPUs running hot and with high clocks really often after kernel 6.6 (more specifically NanoPi R4S, now upgraded to R6S which is running quite hot even when idling). I had already tried CONFIG_NO_HZ with no change. I will check if these other configurations would potentially be applicable to other ARM CPUs as well.
Regarding the NanoPi R4S reboot troubles (LAN not coming up):
It would be terrific if people with affected devices could test this patch (https://github.com/openwrt/openwrt/pull/18078) and provide feedback whether it helps. In the forum there are several confirmations that it fixes it, but feedback on Github would hopefully speed up merging the patch into OpenWrt. On the Linux kernel mailing list itself this unfortunately does not gain much traction.
I installed the stock x86 version(x86-64-generic-squashfs-combined-efi). Everything seems to be functioning well. I am seeing a microcode warning appear at bootup and some alignment errors with the partitions, both of which were not there before I flashed:
Anything I can do to get microcode to load earlier? This sounds like something which might require fixing in the next service release?
Sun Mar 2 20:55:27 2025 kern.err kernel: [ 4.488203] microcode: Attempting late microcode loading - it is dangerous and taints the kernel.
Sun Mar 2 20:55:27 2025 kern.err kernel: [ 4.497438] microcode: You should switch to early loading, if possible.
Not sure what to do about this one. I want to continue using the squashfs images so I can upgrade:
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 3.954091] ata1.00: ATA-9: SanDisk X400 2.5 7MM 128GB, X4152012, max UDMA/133
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 3.961929] ata1.00: 250069680 sectors, multi 1: LBA48 NCQ (depth 32), AA
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 3.970009] ata1.00: Features: Dev-Sleep
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 3.977167] ata1.00: configured for UDMA/133
Sun Mar 2 20:55:27 2025 kern.notice kernel: [ 3.981789] scsi 0:0:0:0: Direct-Access ATA SanDisk X400 2.5 2012 PQ: 0 ANSI: 5
Sun Mar 2 20:55:27 2025 kern.notice kernel: [ 3.991029] sd 0:0:0:0: [sda] 250069680 512-byte logical blocks: (128 GB/119 GiB)
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 3.999007] usb 1-4: new full-speed USB device number 2 using xhci_hcd
Sun Mar 2 20:55:27 2025 kern.notice kernel: [ 4.005607] sd 0:0:0:0: [sda] Write Protect is off
Sun Mar 2 20:55:27 2025 kern.debug kernel: [ 4.010804] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Sun Mar 2 20:55:27 2025 kern.notice kernel: [ 4.010827] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 4.020530] sd 0:0:0:0: [sda] Preferred minimum I/O size 512 bytes
Sun Mar 2 20:55:27 2025 kern.warn kernel: [ 4.029108] GPT:Primary header thinks Alt. header is not at the end of the disk.
Sun Mar 2 20:55:27 2025 kern.warn kernel: [ 4.036996] GPT:246304 != 250069679
Sun Mar 2 20:55:27 2025 kern.warn kernel: [ 4.040688] GPT:Alternate GPT header not at the end of the disk.
Sun Mar 2 20:55:27 2025 kern.warn kernel: [ 4.046903] GPT:246304 != 250069679
Sun Mar 2 20:55:27 2025 kern.warn kernel: [ 4.050596] GPT: Use GNU Parted to correct GPT errors.
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 4.055941] sda: sda1 sda2 sda128
Sun Mar 2 20:55:27 2025 kern.notice kernel: [ 4.059635] sd 0:0:0:0: [sda] Attached SCSI disk
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 4.066741] VFS: Mounted root (squashfs filesystem) readonly on device 8:2.
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 4.074352] Freeing unused kernel image (initmem) memory: 2436K
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 4.080540] Write protecting the kernel read-only data: 18432k
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 4.087019] Freeing unused kernel image (rodata/data gap) memory: 1816K
...
Sun Mar 2 20:55:27 2025 user.info kernel: [ 8.687901] block: attempting to load /tmp/overlay/upper/etc/config/fstab
Sun Mar 2 20:55:27 2025 user.err kernel: [ 8.695533] block: unable to load configuration (fstab: Entry not found)
Sun Mar 2 20:55:27 2025 user.info kernel: [ 8.702436] block: attempting to load /tmp/overlay/etc/config/fstab
Sun Mar 2 20:55:27 2025 user.err kernel: [ 8.708911] block: unable to load configuration (fstab: Entry not found)
Sun Mar 2 20:55:27 2025 user.info kernel: [ 8.715811] block: attempting to load /etc/config/fstab
Sun Mar 2 20:55:27 2025 user.err kernel: [ 8.721811] block: unable to load configuration (fstab: Entry not found)
Sun Mar 2 20:55:27 2025 user.err kernel: [ 8.728720] block: no usable configuration
Sun Mar 2 20:55:27 2025 user.info kernel: [ 8.733009] block: attempting to load /etc/config/fstab
Sun Mar 2 20:55:27 2025 user.err kernel: [ 8.738446] block: unable to load configuration (fstab: Entry not found)
Sun Mar 2 20:55:27 2025 user.err kernel: [ 8.745362] block: no usable configuration
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 8.749793] loop0: detected capacity change from 0 to 212992
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 8.793687] loop0: detected capacity change from 212992 to 155392
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 8.808780] EXT4-fs (loop0): recovery complete
Sun Mar 2 20:55:27 2025 kern.info kernel: [ 8.814639] EXT4-fs (loop0): mounted filesystem 0495c517-afa8-4b0b-a8eb-ffbacf9cdee8 r/w with ordered data mode. Quota mode: disabled.
Sun Mar 2 20:55:27 2025 user.info kernel: [ 8.836146] block: attempting to load /tmp/overlay/upper/etc/config/fstab
Sun Mar 2 20:55:27 2025 user.err kernel: [ 8.895228] block: check_filesystem: /usr/sbin/e2fsck returned 8
Sun Mar 2 20:55:27 2025 kern.warn kernel: [ 8.903132] overlayfs: null uuid detected in lower fs '/', falling back to xino=off,index=off,nfs_export=off.
Sun Mar 2 20:55:27 2025 user.info kernel: [ 8.914513] mount_root: switched to extroot
Sun Mar 2 20:55:27 2025 user.warn kernel: [ 8.974115] urandom-seed: Seeding with /etc/urandom.seed
For those using AdGuardHome with 24.10.0, has anyone encountered this error? (i/o timeout when doing queries to local dnsmasq for local private reverse DNS resolution).
Mon Mar 3 08:24:42 2025 daemon.err AdGuardHome[5689]: 2025/03/03 11:24:42.018910 [error] dnsproxy: exchange failed upstream=127.0.0.1:54 question=";b._dns-sd._udp.0.1.168.192.in-addr.arpa.\tIN\t PTR" duration=2.002772846s err="exchanging with 127.0.0.1:54 over udp: read udp 127.0.0.1:36991->127.0.0.1:54: i/o timeout"
It started appearing after I upgraded to 24.10.0. I've already upgraded AdGuardHome package to 0.107.56, which solved some other issues reported in this topic previously, but the error above remains.
I suspect it might be related to the newer dnsmasq version included in 24.10.0. More specifically I will experiment by disabling the option "Filter Private" in DHCP/DNS filter LuCI page (dhcp.@dnsmasq[0].boguspriv='0').
In the meantime I will observe if this error appears again and I will do some additional investigation.