Mwan3 (2.10.1): CPU-hog

I switched back to use mwan3 2.9.0, as newest version on trunk causes very high CPU-load, in summary I only have 3%-5% free.

1401 1 root S 1388 1% 8% /bin/sh /usr/sbin/mwan3track wifiwan
1400 1 root S 1396 1% 6% /bin/sh /usr/sbin/mwan3track wwan

I have 3 WANs: LAN, WIFI, 3g.
option family 'ipv4' #set for all WANs

I am running custom image, no IPv6.

1 Like

I just emailed @feckert about it, as I'm seeing this too. My otherwise idle APU2C4 is reaching a loadavg of 1 to 1.3.

Bug was introduced here:

and is fixed here:

1 Like

Great! I'll test again when it's merged (this is a production system, so I won't risk doing my own builds for it; running master snapshots is already risky enough).

I do custom builds. How can I see in advance, before doing a new build, that this update of mwan3 package is available in feeds ?

There will be a version bump to 2.10.2

The CPU load issue is fixed, but now mwan3 status says there's an error on both interfaces. Everything seems to work, though.

root@apu:~# mwan3 status
Interface status:
 interface meo is error and tracking is active
 interface nos is error and tracking is active

I can do better: A lot more error messages ....
But no more CPU-hog.

mwan3 status
Interface status:
interface wan is offline and tracking is paused
Command failed: Not found
Failed to parse message data
WARNING: Variable 'interfaces' does not exist or is not an array/object
WARNING: Variable 'wwan' does not exist or is not an array/object
interface wwan is online 00h:00m:00s, uptime 00h:00m:00s and tracking is active
Command failed: Not found
Failed to parse message data
WARNING: Variable 'interfaces' does not exist or is not an array/object
WARNING: Variable 'wifiwan' does not exist or is not an array/object
interface wifiwan is online 00h:00m:00s, uptime 00h:00m:00s and tracking is active

Current ipv4 policies:
standard:
wifiwan (100%)

And some wrong info:
Interface status:
interface wan is offline and tracking is paused
Command failed: Not found
Failed to parse message data
WARNING: Variable 'interfaces' does not exist or is not an array/object
WARNING: Variable 'wwan' does not exist or is not an array/object
interface wwan is online 00h:00m:00s, uptime 00h:00m:00s and tracking is active
interface wifiwan is offline and tracking is paused

Current ipv4 policies:
standard:
wwan (100%)

@rsalvaterra - I'm not sure what is going on there. I can't reproduce the issue, but @feckert can, so hopefully we can figure this out. https://github.com/openwrt/packages/pull/13881#issuecomment-724054096.

That message happens when one of the conditions here is not met:

Could you figure out which one is failing (missing ip rule, missing default route, or missing iptables mwan3_iface_in_ chain, and post here?

@reinerotto - How did you install the package? It looks like /usr/libexec/rpcd/mwan3 does not have the +x bit set. Can you run

chmod u+x /usr/libexec/rpcd/mwan3
/etc/init.d/rpcd restart`

I did not install any package. As I wrote, I do custom builds. And all packages are included in the image directly. And there is no LuCi. Which might fetch some required modules for mwan3. As
that is correct on my system already:

ls -l /usr/libexec/rpcd/mwan3
-rwxr-xr-x    1 root     root          5704 Nov  9 13:06 /usr/libexec/rpcd/mwan3

BUT

/etc/init.d/rpcd restart
-ash: /etc/init.d/rpcd: not found
root@:~# which rpcd
root@~# cd /
root@# find -name rpcd
./overlay/upper/usr/libexec/rpcd
./rom/usr/libexec/rpcd
./usr/libexec/rpcd

So it looks like some missing dependencies, when building mwan3. I have seen, LuCi uses quite some rpcd-stuff, which could silently be required by mwan3. Using "make menuconfig" there are quite some options for rpcd, you might include into the dependencies for mwan3.

Both default routes and mwan3_iface_in_<DEVICE> chains are present. I noticed something strange in the policy routing database, though. At boot, this is the output from ip rule

0:	from all lookup local
1001:	from all iif eth1 lookup 1
1002:	from all iif eth2 lookup 2
2001:	from all fwmark 0x100/0x3f00 lookup 1
2002:	from all fwmark 0x200/0x3f00 lookup 2
3001:	from all fwmark 0x100/0x3f00 unreachable
3002:	from all fwmark 0x200/0x3f00 unreachable
32766:	from all lookup main
32767:	from all lookup default

… however, after a mwan3 stop-start cycle, I have two additional rules:

2061:	from all fwmark 0x3d00/0x3f00 blackhole
2062:	from all fwmark 0x3e00/0x3f00 unreachable

For some reason, these blackhole and unreachable rules aren't being loaded at boot. Aside from what I had already told @feckert yesterday, this is the only additional oddness I found.

I have to add, that I can not access the web from my PC, connected to my router running mwan 2.10.2 . Router itself can access the web, via wifi connection, or 3g.
ping to my router itself works, but thats the end.
Do not have these problems running mwan 2.9.0

@reinerotto rpcd is just needed to report uptime on the status lines, and seems to have been added in b0acbf057e. Could you file a bug report for this? We can move the uptime checks to a common function and remove the ubus calls.

The version on the master branch has some bugs right now, including the missing unreachable and blackhole routes. Can you try this P/R and see if your connections work.

@reinerotto
Could you also try the aforemention P/R? I added some error codes to mwan3 interfaces to help diagnose problems with the interfaces.

Actually, I did a new custom build from trunk, but including mwan3 2.9.0 instead of 2.10.2.
And I explicitly included package rpcd (without options). And, voila,
no more error messages:

mwan3 status
Interface status:
interface wan is offline and tracking is down
interface wwan is online 03h:42m:10s, uptime 03h:42m:12s and tracking is active
interface wifiwan is online 03h:42m:43s, uptime 03h:42m:45s and tracking is active

Current ipv4 policies:
standard:
wifiwan (100%)

Which looks very fine for me.

So, my suspicion was correct, that there is a missing dependency to rpcd.

Having made the expirience, that (my ?) bug reports are taken care of months (at least) after submission, I refrain to do so.
It might be much more effective, you link Florian to this post.

I've been actively working on maintaining mwan3 and having issues in the tracker is helpful for my consolidating and prioritizing issues. It sounds like you have a workaround though, so maybe this is not important.

2.9.0 is the version on the 19.07 branch, so will be more stable while we are working out the bugs in 2.10.0 on the master branch prior to its inclusion in the 20.x release.

However, there is an issue with 2.9.0 on the 5.x kernel: if any of your rules have a fallback to "unreachable", then interfaces may not come back online after they go down.

I submitted bug report for my issue. Also present in trunk.

Thanks. I’ll take a look once we get the regression bugs in master ironed out.

I don't think version 2.9.0 ever made it into 19.07, it's currently under 2.8.14, it's before the optimisations and startup improvements, which were in 2.9.0 and above.

Thanks for pointing that out. While the latest 2.10.3 as most of the bugs sorted out, 2.8.14 should be considered the "stable" version until OpenWRT 20.x is released.

However, if you have rules with a fallback to "unreachable", then 2.8.14 will have issues on the 5.x kernel.

@rsalvaterra - the P/R to fix the CPU issue has been merged in to snapshot, so let me know if that fixes the issue.

Hi, @aaronjg. Yeah, the CPU hogging issue is fixed. I still have issues when invoking mwan3 status, though:

Interface status:
Command failed: Not found
Failed to parse message data
WARNING: Variable 'interfaces' does not exist or is not an array/object
WARNING: Variable 'meo' does not exist or is not an array/object
 interface meo is online 00h:00m:00s, uptime 00h:00m:00s and tracking is active
Command failed: Not found
Failed to parse message data
WARNING: Variable 'interfaces' does not exist or is not an array/object
WARNING: Variable 'nos' does not exist or is not an array/object
 interface nos is online 00h:00m:00s, uptime 00h:00m:00s and tracking is active

And also ip errors in the log:

Thu Nov 12 10:21:59 2020 user.warn mwan3rtmon[2253]: failed: 'ip -4 route replace table 1 62.28.38.138 dev eth1 proto static scope link metric 10 linkdown '
Thu Nov 12 10:21:59 2020 user.warn mwan3rtmon[2253]: failed: 'ip -4 route replace table 1 100.64.194.112/30 dev eth1 proto kernel scope link src 100.64.194.114 linkdown '
Thu Nov 12 10:21:59 2020 user.warn mwan3rtmon[2253]: failed: 'ip -4 route replace table 1 100.64.194.112/30 dev eth1 proto static scope link metric 10 linkdown '
Thu Nov 12 10:22:02 2020 user.warn mwan3-hotplug[3927]: failed to add 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.1 linkdown to table 1
Thu Nov 12 10:22:04 2020 user.warn mwan3rtmon[2253]: failed: 'ip -4 route replace table 1 default via 100.64.194.113 dev eth1 proto static metric 10 linkdown '
Thu Nov 12 10:22:04 2020 user.warn mwan3rtmon[2253]: failed: 'ip -4 route replace table 2 89.152.251.72/29 dev eth2 proto kernel scope link src 89.152.251.76 linkdown '
Thu Nov 12 10:22:05 2020 user.warn mwan3rtmon[2253]: failed: 'ip -4 route replace table 2 89.152.251.72/29 dev eth2 proto static scope link metric 20 linkdown '
Thu Nov 12 10:22:05 2020 user.warn mwan3rtmon[2253]: failed: 'ip -4 route replace table 2 default via 89.152.251.78 dev eth2 proto static metric 20 linkdown '

Otherwise, it seems to be working.