Rpi4 < $(community_build)

Quick update, it's happening again. But i can't catch oom-killer (in dmesg log/logread and i havent removed nft-qos-helper.sh). Before that, i catch them with external log server on my laptop by chance. Only option i changed after first boot to dnsmasq is ticking ignore resolv file, because if i dont tick that option, the router will use dns server from isp, not from https-dns-proxy (Update DNSMASQ Config on Start/Stop set to do not update configs).


i hit the limit replying, lol. no need new image i think, what should i do after removing that file? reboot, tick again ignore resolv file option and wait for error?


root@Fibel /50# nslookup
google.com
;; Got recursion not available from 2001:4489:502:102::2, trying next server
;; Got recursion not available from 180.250.13.54, trying next server
;; Got recursion not available from 180.250.13.50, trying next server
^C
root@Fibel /49# nslookup
facebook.com
netmgr/netmgr.c:1737: REQUIRE((((handle) != ((void*)0) && ((const isc__magic_t *)(handle))->magic == ((('N') << 24 | ('M') << 16 | ('H') << 8 | ('D')))) && __extension__ ({ __auto_type __atomic_load_ptr = (&(handle)->references); __typeof__ ((void)0, *__atomic_load_ptr) __atomic_load_tmp; __atomic_load (__atomic_load_ptr, &__atomic_load_tmp, (5)); __atomic_load_tmp; }) > 0)) failed.
Aborted

Run nslookup fresh after boot... after that i got soon-to-be oom problem. at 700mb memory usage, i quickly restart https-dns-proxy and dsmasq. i've attached other info in logs folder, maybe you could take a look that file.

1 Like

i'm really lost on why it's starting (only for you but I will look at it) you can remove it from the sdcard on a pc also remove these;
(they are the ones that call the helper)

rm /etc/hotplug.d/dhcp/*nft-qos-monitor

I will also make an image without those now (they are in all builds going back 7-10 versions)


yeah... just reboot / boot up as normal and wait to see if there are any problems

the resolv (or noresolv) should not cause problems I think (in a general sense)... so you can set it to what you want...

https-dns-proxy is another ballgame... but hopefully that is working... i've seen dnsmasq config state problems with it from time to time...

so you may need to restart dnsmasq... then restart https-dns-proxy... and how the other script is interacting with this i'm not sure...

(may ask the kopi-developer for a debug mode or send a pr for that :slight_smile: )


i never needed this / knew this needed to be done... but may also be an interaction with the script...

it's a shame there is no 19.07 version for rpi4... because I think for your needs it would really help or at least make things a bit simpler...

note: i did see a reboot in the kopi script... so could just be that rebooting the pi also... (and not oom this time)

		# Jika status SSID tidak sesuai padahal sebenarnya sudah sesuai di jaringan WMS, maka:
		elif [[ "$loginout" = "Login Gagal (SSID Tidak Sesuai)" ]] && [[ "$wifivariant" = "wms" ]]; then
			# Coba logout terlebih dahulu
			logout_wifi
			# Lalu minta penggantian alamat IP
			releaseobtainnewip
			# Lalu coba login kembali
			login_wifi
			# Jika masih dan router tidak memiliki MAC tetap, maka:
			if [[ "$loginout" = "Login Gagal (SSID Tidak Sesuai)" ]] && [[ "$detectrandommac" != "" ]]; then
				# Reboot router supaya menggunakan MAC lain


######################################### original
				# reboot
######################################### dont reboot
				logger -t "$(basename $0) "rebootNOPE"
                                sleep 10

			fi
			# Istirahat sebelum mencoba kembali
			sleep "$sleeptime"
		# Jika login sebelumnya tidak diblokir, maka:

yes got second logs

to confirm... this script/setup was working for you in the past? this looks like some sort of mwan/script ifup/ifconfig up race condition or something...

1 Like

note: i did see a reboot in the kopi script... so could just be that rebooting the pi also... (and not oom this time)

iirc, there is switch for debug mode, -x i think. should i run with debug mode next time?

to confirm... this script/setup was working for you in the past? this looks like some sort of mwan/script ifup/ifconfig up race condition or something...

yes, but with older version i think. let me check if i still have file from older version.


i've uploaded old version of that script on my folder.

curl -sSL http://rpi4.wulfy23.info/misc/https-dns-proxy.hotplug.iface > /etc/hotplug.d/iface/90-https-dns-proxy

done

otherwise I think I have to find the old build for you or spend some time looking at recent comits / thinking about what's going on...

I'm thinking to remove that script from rc.local and place it inside cron tab instead, with trigger every 00:00 executing the script (combined with ifdown/ip link down wwan and ifup/ip link up wwan to get new session or ip from my provider).

1 Like

yeah that will probably help... we need to try to separate the sequence of events... between mwan3 vs https-dns vs script (the script does whacky ifup stuff which will trigger dnsmasq restarts, which might mess with https-dns)...

because looks like a race condition... i will think about what has changed but from the top of my head for those services (did not check mwan much) there was no major start trigger change... (edit: nope no recent changes in anywhere... there was a busybox bump and some ipv6 netifd stuff which will be hard to debug if related but unlikely)


you can get the original https hotplug here also to see if that is part of the problem (i recently added debug stuff to it)


curl -sSL http://rpi4.wulfy23.info/misc/https-dns-proxy.hotplug.iface > /etc/hotplug.d/iface/90-https-dns-proxy

otherwise I think I have to find the old build for you or spend some time looking at recent commits / thinking about what's going on...

i found interesting lines from the logs while i'm going outside,

[   57.093596] br-lan: hw csum failure
[   57.097132] skb len=32 headroom=184 headlen=32 tailroom=1960
               mac=(130,14) net=(144,40) trans=184
               shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
               csum(0xffffeca1 ip_summed=2 complete_sw=0 valid=0 level=0)
               hash(0x0 sw=0 l4=0) proto=0x86dd pkttype=0 iif=19
[   57.125847] dev name=br-lan feat=0x00002007bfdd78a9
....snip
[   58.201288] CPU: 0 PID: 30189 Comm: nft-qos Tainted: G         C        5.10.92 #0
[   58.208994] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[   58.214828] Call trace:
[   58.217288]  dump_backtrace+0x0/0x15c
[   58.220955]  show_stack+0x18/0x30
[   58.224278]  dump_stack+0xd4/0x110
[   58.227684]  netdev_rx_csum_fault.part.0+0x48/0x58
[   58.232483]  netdev_rx_csum_fault+0x3c/0x40
[   58.236670]  __skb_checksum_complete+0xdc/0xe0
[   58.241121]  icmpv6_rcv+0xec/0x580
... snip
[   59.790997] CPU: 0 PID: 31270 Comm: nft-qos-helper. Tainted: G         C        5.10.92 #0
[   59.799263] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[   59.805096] Call trace:
[   59.807558]  dump_backtrace+0x0/0x15c
[   59.811226]  show_stack+0x18/0x30
[   59.814553]  dump_stack+0xd4/0x110
[   59.817958]  netdev_rx_csum_fault.part.0+0x48/0x58
[   59.822755]  netdev_rx_csum_fault+0x3c/0x40
[   59.826942]  __skb_checksum_complete+0xdc/0xe0
[   59.831393]  icmpv6_rcv+0xec/0x580
[   59.834798]  ip6_protocol_deliver_rcu+0xec/0x4fc
[   59.839418]  ip6_input+0x98/0xc0
[   59.842646]  ipv6_rcv+0xf0/0x130

it's strange that i didn't enable qos service at all. Also, i have uploaded more logs in 5.0.19-9 folder

can confirm. after update to 5.0.19-9_r18609 (coming from 5.0.11-59_r18531) cpu is stuck to 600mhz.
i checked after luci felt somewhat slower than usual.

root@vpn_pi /41# vcgencmd measure_clock arm 
frequency(48)=600117184

even under heavy load it is not scaling up.

it results in degraded vpn-performance, means decreased wireguard speeds (...and cooler temps :slight_smile: )

no other practical impacts noticed so far.

2 Likes

Can also confirm the CPU speed bug. No good for us on gigabit connections. I’ve rolled back.

2 Likes
2022-01-19 07:17:43	User.Notice	192.168.1.4	Jan 19 07:17:44 Fibel dhcp/01-nft-qos-dynamic_>: nohelper
2022-01-19 07:17:43	User.Notice	192.168.1.4	Jan 19 07:17:44 Fibel dhcp/01-nft-qos-dynamic_>: nohelper
2022-01-19 07:17:44	User.Notice	192.168.1.4	Jan 19 07:17:45 Fibel dhcp/01-nft-qos-dynamic_>: nohelper
2022-01-19 07:17:46	User.Notice	192.168.1.4	Jan 19 07:17:47 Fibel dhcp/01-nft-qos-dynamic_>: nohelper
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.831207] dnsmasq invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.839726] CPU: 3 PID: 9153 Comm: dnsmasq Tainted: G         C        5.10.92 #0
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.847220] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.853054] Call trace:
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.855516]  dump_backtrace+0x0/0x15c
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.859185]  show_stack+0x18/0x30
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.862508]  dump_stack+0xd4/0x110
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.865914]  dump_header+0x48/0x184
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.869410]  oom_kill_process+0x1e0/0x1ec
2022-01-19 07:17:51	Kernel.Warning	192.168.1.4	Jan 19 07:17:52 Fibel kernel: [33074.873423]  out_of_memory+0x1cc/0x320
---snip
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.827378] [  26683]     0 26683      312        9    28672        0             0 flock
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.835753] [  26699]     0 26699      377       96    28672        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.843699] [  26708]     0 26708      312        9    28672        0             0 flock
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.851916] [  26740]     0 26740      312        9    28672        0             0 flock
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.860305] [  26741]     0 26741      377       96    32768        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.868260] [  26755]     0 26755      377       96    32768        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.876188] [  26775]     0 26775      312        9    28672        0             0 flock
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.884400] [  26778]     0 26778      377       95    32768        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.892345] [  26785]     0 26785      312        9    28672        0             0 flock
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.900556] [  26806]     0 26806      312        9    28672        0             0 flock
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.908766] [  26808]     0 26808      377       96    32768        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.916711] [  26818]     0 26818      377       96    32768        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.924821] [  26825]     0 26825      377       96    28672        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.932957] [  26841]     0 26841      312       10    28672        0             0 flock
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.941176] [  26844]     0 26844      377       94    28672        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.949127] [  26865]     0 26865      377       95    32768        0             0 sh
2022-01-19 07:17:52	Kernel.Info	192.168.1.4	Jan 19 07:17:53 Fibel kernel: [33075.957076] [  26880]     0 26880      377       96    32768        0             0 sh
---snip

New logs. And this time oom-killer invoked by dnsmasq i think, no nft-qos-helper spawned, only sh (bash?) and flock (locking problem?) dozen spawned. Anyway, i will disable that script (kopijahe) temporarily (and invoking maually if i need it) to see if the problems still occurs or not.

yeah... I thought about this last night... and that script (2.2.2?) is the only thing with a recent change (committed 4 days ago which by co-incedence is the same reboot section I quoted above) + your symptoms... you could try downgrading it to 2.2.1 or lower...

so 95% chance that is in runaway ( re-initting-ip/interface )... (other possibility is some change a long time ago)

edit: made a report just because runaways are nasty things

i'm a little bit unsure about whether to remove ujail but in your case, if you upgrade / downgrade the os, best to remove procd-ujail procd-seccomp and reboot if installed ( 5.0.x ), as it complicates things (we already tested with no ujail and it was a bit better but the main issues seemed to still be there)

you may also try disabling;

option rebind_protection 1

as there is something always showing in the logs about this...


also apologies for the 600Mhz thing... other users (thankyou!) have reported this an my commands also show this so you were right on this one... still looking at it...

1 Like

edit: made a report just because runaways are nasty things

will chime in if the developer needs more log from me.
also is that normal my syslog spammed by dnsmasq every second (uptime : 1h 30m, already disabled the script)?

Wed Jan 19 09:37:31 2022 daemon.info dnsmasq[18565]: read /etc/hosts - 4 addresses
Wed Jan 19 09:37:31 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/dhcp.cfg01411c - 6 addresses
Wed Jan 19 09:37:31 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/odhcpd - 6 addresses
Wed Jan 19 09:37:31 2022 daemon.info dnsmasq-dhcp[18565]: read /etc/ethers - 0 addresses
Wed Jan 19 09:37:31 2022 daemon.info dnsmasq[18565]: read /etc/hosts - 4 addresses
Wed Jan 19 09:37:31 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/dhcp.cfg01411c - 6 addresses
Wed Jan 19 09:37:31 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/odhcpd - 6 addresses
Wed Jan 19 09:37:31 2022 daemon.info dnsmasq-dhcp[18565]: read /etc/ethers - 0 addresses
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:31 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq[18565]: read /etc/hosts - 4 addresses
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/dhcp.cfg01411c - 6 addresses
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/odhcpd - 6 addresses
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq-dhcp[18565]: read /etc/ethers - 0 addresses
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq[18565]: read /etc/hosts - 4 addresses
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/dhcp.cfg01411c - 6 addresses
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq[18565]: read /tmp/hosts/odhcpd - 6 addresses
Wed Jan 19 09:37:32 2022 daemon.info dnsmasq-dhcp[18565]: read /etc/ethers - 0 addresses
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper
Wed Jan 19 09:37:32 2022 user.notice dhcp/01-nft-qos-dynamic_>: nohelper

regarding cpu clock speed, after i searching more info on internet, i stumbled upon this commands

This command should also tell you what your CPU clock speed is configured to be limited to. It displays in megahertz.

vcgencmd get_config arm_freq

i run on my rpi and got this value

root@Fibel /51# vcgencmd get_config arm_freq
arm_freq=1500

also, i noticed the script you've created shows blank at SCALING: output line

### Time       Temp     CPU     Core         Health           Vcore    PMIC
### 09:46:59  51.6'C   600MHz  200MHz  00000000000000000000  0.8688V  51.4'C
SCALING:

and gpu firmware seems to have just been released compared to fwV(videocore cpu?) fimware... idk the problem lies in firmware or kernel side tho.

do i need to upgrade/downgrade first? with setting intact or fresh from start? i've disabled that, no spam log so far (~5min). i spoke too soon, it spammed again :frowning:


arm_freq=1500
core_freq=500
core_freq_min=200
gpu_freq=500
gpu_freq_min=250

iirc, it was never work consistently after i accidentally upgraded to stable branch (spammed logs, oom, or random reboot). not sure about release branch, because i rarely inspect log msg until someone in my house complained intermittent problem about internet connection (and only this time they often complained to me). i can go back to release branch, but i need to prepare the config (since i will manually set up the configs one by one).

will try do that. i doubt that simple changes will break my internet. if that happens, i will bring down mwan3 too since i've already have many rules in place.


no luck.

1 Like

no this is all due to the runaway condition...

rm /etc/hotplug.d/dhcp/*nft*

will stop my errors but they are just informational/a sign that dnsmasq is continously reloading/loading/calling hotplug.. (< which is likely caused by an interface being brought up / taken down / flapping )

this is probably the most interesting section from your logs I think...

interesting

cat /usbstick/_kopijahe_dudeERRs/5.0.19-9/Fresh-around-19-2100/plogread-202201182203-boot-5.0.19-9.log

Tue Jan 18 22:03:57 2022 daemon.warn dnsmasq[22730]: possible DNS-rebind attack detected: dns.msftncsi.com
Tue Jan 18 22:03:57 2022 user.notice SQM: Stopping SQM on pppoe-wan
Tue Jan 18 22:03:57 2022 user.warn kernel: [ 62.790708] rc.local> ######################################################### end rc.local
Tue Jan 18 22:03:57 2022 daemon.warn dnsmasq[22730]: possible DNS-rebind attack detected: dns.msftncsi.com
Tue Jan 18 22:03:58 2022 daemon.warn dnsmasq[22730]: failed to create listening socket for fe80::a6c6:4fff:fea0:3798%pppoe-wan: Address not available
Tue Jan 18 22:03:58 2022 daemon.warn dnsmasq[22730]: failed to create listening socket for fe80::a6c6:4fff:fea0:3798%pppoe-wan: Address not available
Tue Jan 18 22:03:58 2022 user.notice SQM: Starting SQM script: ctinfo_4layercake_rpi4.qos on pppoe-wan, in: 20000 Kbps, out: 6784 Kbps
Tue Jan 18 22:03:59 2022 user.notice kopijahe 2.2.2: Percobaan login. Interface: wwan. Hasil: Gagal Login (User Anda Sedang Aktif)
Tue Jan 18 22:07:45 2022 daemon.err collectd[29462]: configfile: stat (/etc/collectd/conf.d/*.conf) failed: No such file or directory
Tue Jan 18 22:07:45 2022 daemon.err collectd[29462]: rrdtool plugin: RRASingle = true: creating only AVERAGE RRAs
Tue Jan 18 22:07:45 2022 daemon.notice procd: /etc/rc.d/S99transmission: Transmission not enabled. Please enable in /etc/config/transmission
Tue Jan 18 22:07:45 2022 daemon.info transmission: Transmission not enabled. Please enable in /etc/config/transmission

Tue Jan 18 22:07:45 2022 daemon.info procd: - init complete -
Tue Jan 18 22:07:50 2022 daemon.warn dnsmasq[22730]: failed to create listening socket for fe80::a6c6:4fff:fea0:3798%pppoe-wan: Address not available
Tue Jan 18 22:07:50 2022 daemon.warn dnsmasq[22730]: failed to create listening socket for fe80::a6c6:4fff:fea0:3798%pppoe-wan: Address not available
Tue Jan 18 22:07:51 2022 user.notice SQM: ctinfo_4layercake_rpi4.qos was started on pppoe-wan successfully
Tue Jan 18 22:07:51 2022 user.notice mwan3-hotplug[26138]: Execute ifup event on interface wan (pppoe-wan)
Tue Jan 18 22:07:51 2022 user.warn mwan3-hotplug[26138]: failed to add 10.88.96.1 dev pppoe-wan proto kernel scope link src 10.88.118.107 to table 1
Tue Jan 18 22:07:51 2022 user.warn mwan3-hotplug[26138]: failed to add default via 10.88.96.1 dev pppoe-wan proto static metric 10 to table 1
Tue Jan 18 22:07:52 2022 user.info mwan3track[4485]: Detect ifup event on interface wan ()
Tue Jan 18 22:07:52 2022 user.notice mwan3track[4485]: Interface wan (pppoe-wan) is online
Tue Jan 18 22:07:53 2022 user.notice firewall: Reloading firewall due to ifup of wan (pppoe-wan)
Tue Jan 18 22:07:53 2022 user.notice nlbwmon: Reloading nlbwmon due to ifup of wan (pppoe-wan)
Tue Jan 18 22:07:53 2022 user.notice https-dns-proxy: Restarting https-dns-proxy due to ifup of wan
Tue Jan 18 22:07:53 2022 user.notice https-dns-proxy: Stopping service ✓
Tue Jan 18 22:07:54 2022 user.notice https-dns-proxy: Starting service ✓✓
Tue Jan 18 22:07:54 2022 user.notice SQM: Stopping SQM on pppoe-wan

so there is a https restart + a rebind message just before sqm stops (interface is taken down by the script?)

there is also an mwan message with an empty interface() and some odd ipv6 errors which may or may not be related... only way to check would be to see old logs when it was working...

maybe you can try to disable ipv6 on those interfaces...

hmmmm... double check you removed ujail.... as that respawn behavior may be linked to it...

dnsmasq-dhcp[1]
[root@dca632 /usbstick 55°] fgrep -r dnsmasq-dhcp /boot/plog/plogread-* | grep \\[1\\] | wc -l
732
[root@dca632 /usbstick 54°] fgrep -r dnsmasq-dhcp /usbstick/_zPREMOVED/plogread-* | grep \\[1\\] | wc -l
0

thankyou, that is useful... i've tried several builds and they all show similar output (even 5.4) which is odd... so not sure if we are looking at a utility/false-positive or what just yet...

i do know that in the past... i used to sometimes see changes when running;

/bin/rpi-throttlewatch.sh -P -C 4

but on all the builds I tried today... it never changed... which again is very odd...

this is another useful command;

vcgencmd get_config int | grep freq | grep -v hdmi

yes... this seems to be the one 'new' reliable 'symptom'... which points towards some sort of scaling driver problem(change?) on newer kernel or similar...

jackpot;

[root@dca632 /usbstick 53°]# fgrep freq /boot/plog/p* | grep cpufreq
/boot/plog/pdmesg-202201181255-5.0.19-6.log:[    0.368384] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2
/boot/plog/pdmesg-202201181343-5.0.19-6.log:[    0.368348] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2
/boot/plog/pdmesg-202201181413-5.0.19-7.log:[    0.368386] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2
1 Like

Also had problems with DNS. I'm using only IPv4 and DNS over HTTPS, Adblock and SQM. Rolled back to rpi4.64-snapshot-27132-3.5.317-5-r18370-ext4-fac.img and everything works perfectly again.

2 Likes

on which build version?

grep "^ENABLEDSERVICES" /root/wrt.ini

you shouldn't have to use a 'fac' to roll back... (well I can see if internet is down it may cause issues but you can flash 'sys' via scp and sysupgrade -R <img> or luci with AUTORESTOREAPPLY=1 enabled if you have one saved locally)


thanks to testing from neil1 , we know that https-dns-proxy is hit and miss on upgrades... and anyone using it should first try to restart dnsmasq then https-dns-proxy after firstboot (is complete ~ 5mins) if dns is not working...

(edit: added a new bugfix for this @ 5.0.11-71+ )

also... you will have problems if it is not in your ENABLEDSERVICES

sir, I am recommended your firmware to my community but some of them have problems with multiple USB modems 4G is always assigned randomly. are these solutions to fix that problem.

gday, it's a big goal to improve support / detection / predictability / drivers for most common hardware usage scenarios (including popular modems)...

sadly, doing this without the hardware is going to be a looooong and painful process unless i get remote / wired access for a few days or someone at the other end knows 90% of the solution / underlying problem already...

i.e. the build has a script for multiple usb ethernet adaptors to pin them to a specific 'device'... part of what you are asking probably has a very similar requirement to this...


what i'd suggest is to ask your community to try;

http://www.ofmodemsandmen.com/upload/RaspberryPi-CM4-GO2022-01-18.zip

it's optimized for modems and alot of what needs to be done is already taken care of in that build... they can ask questions about it here


the other thing is if you can get;

############# unplug the modem(s)
cp /sbin/hotplug-call-debug /sbin/hotplug-call
############# plug the modem(s) in
(dmesg | tail -n30; logread | tail -n30) > /tmp/HOTPLUG.logs 
tar -cvzf /tmp/hotplug.tar.gz /tmp/HOTP* /etc/config/*
############# and send me (put somewhere for download) /tmp/hotplug.tar.gz

that is probably 70% if what I need to know...

nice find. can confirm.

root@vpn_pi /38# fgrep freq /boot/plog/p* | grep cpufreq
/boot/plog/pdmesg-202201181851-5.0.19-9.log:[    0.348238] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2
/boot/plog/pdmesg-202201181900-5.0.19-9.log:[    0.358210] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2
/boot/plog/plogread-202201181851-boot-5.0.19-9.log:Tue Jan 18 18:51:23 2022 kern.warn kernel: [ 0.348238] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2
/boot/plog/plogread-202201181900-boot-5.0.19-9.log:Tue Jan 18 19:00:29 2022 kern.warn kernel: [ 0.358210] raspberrypi-cpufreq: probe of raspberrypi-cpufreq failed with error -2

is there already something we can do to get behaviour back to normal?
i mean, am i expected to do something yet? i myself am still uncapable (unworthy) with such things.

1 Like

thanks for your help on this...

nah... we just forget that r18609 (5.0.19) ever happened... :slight_smile:

and start to panic in a month or two if not resolved

just reverting back to 5.0.11 should sort out majority I think...

i'll keep poking at that throttlewatch to see why it doesn't change much anymore (it used to step down at times when overheating I think... but the perftweaks were tuned to stop it doing that so maybe it's normal)... but since 21.02.1/5.4 also did not change... and there were no reports of issues with these builds to date... then its probably false-positive or something minor...
1 Like

thanks, scaling works fine again

root@vpn_pi /40# vcgencmd measure_clock arm
frequency(48)=1100249984

i doubt that. it may be minor, yes, but noticeable. i.e. vpn-performance dropped 30-40%.
also refreshing blocklists in ban-ip or mergeing lists in simple-adblock after update took
almost 2-3 times longer than usual.

thanks for the help (again)!

1 Like

I just upgraded downgraded to 5.0.11-73, cpu problem is fixed for me.
so far there are no problem about spamming log dnsmasq, nftqos, or oom thingy (note: i've disabled ipv6 on wan side, lan still serving ipv6+ipv4, will try to enable back some time later to see what could be wrong with that).
Thanks for helping me (altho its mostly problem on my end).


oot: yesterday, i was switching my rpi with my backup router (gl-mt300n-v2) running owrt 21.02.1 momentarily. i notice that the logs also spammed by dnsmasq thingy. setup mostly same (sqm+mwan3+usbwwan+script) minus https-dns-proxy. but, i already disabled all of them to near stock. the similarity is i enabled ipv6 on wan side both of them. wondering if that problem is related or not...


whoops. i spoke too soon. got oom again after 20h. but i cannot catch the log on my laptop because that was occured in the middle of my disassembly. i didnt enable my login script at all this time, ipv6 disabled on wan side.

1 Like

no problem... i learn't some cool stuff along the way

like about your setup;

  1. cool background script
  2. cool wwan / usb link

(i'm truly amazed when I see some fancy implementations of the build like this and always learn heaps... because while I try not to limit what you can do... a lot of stuff is only tested / confirmed working on typical scenarios)

like about failure modes;

  1. even tho oom/nft-qos-helper stuff was just a symptom... it helps to optimize / fail safer / earlier in the future
  2. troubleshooting was very interesting on this one... so many services and combing through the symptoms and logs to pin down where key issues are was a challenge / good learning experience, i'm still not 50% happy about how the whole logging / flow of events went down in this case... but in IT when in doubt... just start removing bits one by one until the symptoms change enough to pin something down...

about what a good bug report looks like;

  1. made great use of persistent logs
  2. made great use of rpi-support.sh / built in inventory script
  3. pulled some creative tweaks to get some verbose stuff into the log where it was needed

sets a fantastic example, and without those I truly doubt I could have offered you much input at all...


upsides...

  1. debugging scripts were improved due to your report
  2. https-dns-proxy(and other dnsmasq interacting services) long time issue gained some minor improvements, possibly even fixes (tba)
  3. I believe you might have triggered a ujail edge case... which I can look at some more... and as a result that was reverted for everyone... which could save us more drama in the near term...
  4. thanks to you and others, eventually pinned down what was wrong with 5.0.19 ... so we can continue happily testing and migrating to future master builds... ( 5.0.19-25+ now at 'current' should now be working fine if anyone wanted to test and confirm )
  5. perftweaks got 1800000 support (on supported boards, sorry cm4 users hardcoded bug, but saved you some power :wink: )
1 Like