[solved] Logd runs out of memory because of unbound when wan goes down

aboaboit · November 19, 2020, 2:38pm

Context:

openwrt 19.07.3 on Fritzbox 4040 with external VDSL modem
unbound setup in parallel mode
tun0 created by openvpn when it starts (that is, nothing in /etc/config/network)
he.net tunnel with /48 split across VLANs and vpn (each gets a /64)

Now the fun fact: when the modem loses the connection for more than a few moments, unbound will loudly complain it cannot talk to upstream servers over the lone IPV6 interface remaining, which is of course the vpn endpoint leading to nowhere. After a short while, the system runs out of memory and must be power-cycled once the modem is back in sync, meaning it won't recover by itself.

I thought of shutting down the vpn interface and openvpn itself when wan is down and restarting it later:

is that a good idea? Do you have a better one?
can I use something along the lines of this script?

aboaboit · November 20, 2020, 5:45pm

Nope, situation is way worse than I thought: ran "logread -f" in one terminal and "ifdown wan" in the other, I could see the notification from the tunnel interface and that's the end of it. One moment later, ssh stopped responding and I had to restart the router.

Perhaps I should look for a way to prevent unbound from using the vpn interface? Or just force it to use IPV4 for lookups?

vgaetera · November 20, 2020, 7:32pm

This behavior is certainly abnormal.
I guess you should try to contact the package maintainer:

aboaboit · November 20, 2020, 7:45pm

If I stop the vpn before the wan, the increase in overall memory consumption is much slower but still measurable.

Actually, I'm not so sure which maintainer: the process consuming an ungodly amount of RAM is "logd", not "unbound". My bad, I assumed instead of checking. Fixing the title....

Restarting logd brings available memory back to normal levels.

vgaetera · November 20, 2020, 7:50pm

ubus call system board; uci show system

aboaboit · November 20, 2020, 7:57pm

{
        "kernel": "4.14.180",
        "hostname": "router-casa-andrea",
        "system": "ARMv7 Processor rev 5 (v7l)",
        "model": "AVM FRITZ!Box 4040",
        "board_name": "avm,fritzbox-4040",
        "release": {
                "distribution": "OpenWrt",
                "version": "19.07.3",
                "revision": "r11063-85e04e9f46",
                "target": "ipq40xx/generic",
                "description": "OpenWrt 19.07.3 r11063-85e04e9f46"
        }
}
system.@system[0]=system
system.@system[0].hostname='router-casa-andrea'
system.@system[0].ttylogin='0'
system.@system[0].log_size='64'
system.@system[0].urandom_seed='0'
system.@system[0].zonename='UTC'
system.@system[0].timezone='GMT0'
system.@system[0].log_proto='udp'
system.@system[0].conloglevel='8'
system.@system[0].cronloglevel='8'
system.@system[0].log_file='/mnt/sda1/log/messages'
system.ntp=timeserver
system.ntp.server='193.204.114.232' '193.204.114.233'
system.ntp.enable_server='1'
system.led_wlan=led
system.led_wlan.name='WLAN'
system.led_wlan.sysfs='fritz4040:green:wlan'
system.led_wlan.trigger='phy0tpt'
system.led_wan=led
system.led_wan.name='WAN'
system.led_wan.sysfs='fritz4040:green:wan'
system.led_wan.trigger='netdev'
system.led_wan.mode='link tx rx'
system.led_wan.dev='eth1'
system.led_lan=led
system.led_lan.name='LAN'
system.led_lan.sysfs='fritz4040:green:lan'
system.led_lan.trigger='switch0'
system.led_lan.port_mask='0x1e'
system.@led[3]=led
system.@led[3].dev='6in4-he_1_nyc'
system.@led[3].sysfs='fritz4040:amber:info'
system.@led[3].default='0'
system.@led[3].trigger='netdev'
system.@led[3].mode='tx' 'rx'
system.@led[3].name='IPV6-tunnel'

vgaetera · November 20, 2020, 7:59pm

Consider upgrading to OpenWrt 19.07.4 and try to disable this for testing:

aboaboit · November 20, 2020, 8:05pm

19.07.4 is cursed for the 4040, no go.

What should I disable exactly? The UDP protocol specification for an external server shouldn't be an issue because there is no server defined.

I see I've left the loglevel set to "debug", this might not be a terribly smart move... what do you say?

vgaetera · November 20, 2020, 8:13pm

Check if the issue persists with default logging settings.

aboaboit · November 21, 2020, 11:26am

I see I have "conloglevel" which may be ineffective:

        option conloglevel '8'                  
        option cronloglevel '8'

Comparing the defaults with mine, I see the most notable difference is the physical logfile on a pendrive; removing that and rebooting prevents the problem from happening when wan is brought down: the memory usage of logd remains stable.

My guess is that the pendrive cannot keep up and the write buffer in logd blows up.
I guess I really have to set up remote logging on either my raspberry or my nas.

system · December 1, 2020, 11:26am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.