Bizzare issues with SQM + Openvpn

Having a somewhat strange issue with openwrt + sqm + openvpn.

Running an open vpn tunnel on a client behind the router (not on the router), and running SQM on the router. The router is running 19.07.3. SQM is configured with piece_of_cake.qos as recommended with all the defaults except setting the downstream and upstream badnwidth. SQM itself is working great.

The router is a DIR-825 and has 64Mb of memory. When not under any particular load, it has around 34Mb free, so it's not like it's exactly memory constrained.

The VPN under low usage also works fine. If I do something that puts the whole thing under high load all hell breaks loose. On the VPN client, I see lots of:

openvpn[96884]: Authenticate/Decrypt packet error: bad packet ID (may be a replay): [ #83487 ] -- see the man page entry for --no-replay and --replay-window for more info or silence this warning with --mute-replay-warnings

On the router, memory usage starts going up and up, chews through all of the available memory and eventually the OOM killer comes in kills everything and the router reboots.

Various notes of wierdness:

  1. Disabling SQM removes the problem (both the duplicate messages and the memory usage), though of course leaves me with a laggy network. But it's stable under load and the router doesn't crash.

  2. Switching openvpn from UDP transport to TCP transport also fixes the issue (the workaround I have gone with for now).

So:
openvpn-udp + SQM = boom
openvpn-tcp + SQM = no problem
openvpn-udp + No SQM = no problem
openvpn-tcp + no SQM = not tested, but presumably no problem

No idea what to do with this, anyone have any ideas or want any more info? I can reproduce it reliably.

How is cpu utilization during all three scenarios?

65% idle, doesn't seem to be under particularly high CPU load, just memory.
That's under the failure scenario, I didn't look at the CPU load on the working scenarios, since, well it works.

The CPU of the 825 isn't state of the art, but since it is just forwarding packets without encrypting them, it should be fine. Do you experience the same if you try to download/upload something big without OpenVPN from the same host? UDP and TCP.
What is the internet connection speed and what have you configured in sqm? uci export sqm

Yeah I don't think it's CPU related, it seems even under heavy traffic to be mostly idle.
The connection is only 16mbit down and 2mbit up, so really is shouldn't even be straining that CPU at all (and it's not). UCI output:

package sqm

config queue 'eth1'
option interface 'eth1'
option enabled '1'
option download '16000'
option upload '2000'
option debug_logging '0'
option verbosity '5'
option qdisc 'cake'
option script 'piece_of_cake.qos'
option linklayer 'ethernet'
option overhead '22'
option qdisc_advanced '0'

As for testing without openvpn, I tried torrenting a linux ISO (no idea whether that was using TCP or UDP possibly a mix), and yes, with SQM on it blew up. With SQM off it was fine. It was also fine doing a direct HTTPS download at the same bit rate.

So I guess that removes openvpn as a component and we're down to:

SQM + bittorrent = boom
SQM + http download = no boom
No SQM = no boom

The SQM doesn't seem to be adjusted properly. You are not supposed to configure the raw internet speed in the upload/download options. Also did you take in consideration the overhead value or is it also random?
Read here for more details how to configure properly and some basic troubleshooting.

While those settings are far from optimal, it should only negatively affect his latency or speed. It shouldn't cause SQM to break down. So we should probably fix those breakdown issues first before we optimize overhead and speed settings :slight_smile:

SQM was configured using Luci (which is what the page you linked me recommends you do). That same page also discusses setting the upload and download speed, which is what I did. If you mean what I set them to the actual connection is like a 20/2.5 s, so the limits of 16/2 seem appropriate. In any case if I set them lower to say 14/1.5 (which I tried at one point) it still blows up.

As for the overhead, it's a cable modem connection so I set the type to Ethernet, and the overhead to 22. Which is also what the linked page says to do.

For the basic troubleshooting on that page, most of it is just generic information commands (like ifstatus wan isn't going to change during this test). Catting the sqm config file is just going to show you the uci output I already pasted.

Logread shows nothing printed to syslog. The debug version of starting SQM shows it set it up (which it seems to do correctly), but this isn't a problem with it trying to configure the interface, just what happens once it's up. tc -d qdisc shows it appear to be setup correctly (the qdiscs are all where I'd expect to seem them based on where the SQM commands put them.

tc -s qdisc shows the ingress qdisc using a total of 204k of memory with a backlog of eighteen packets with a total of 17k of space. So that doesn't seem to be eating 30+Mb of memory.

I agree stopping it from exploding is far more important than getting ideal latency, though I'm curious what is far from optimal from those settings. When it's not exploding a huge ball of fire, it actually seems to perform quite well. I can have an HTTP download pegging my connection and still have decent latency in games.

Without SQM if there's something pegging the connection like that, my ISP's buffers helpfully add about 10 seconds of latency....

So SQM itself seems very nice ... if it could avoid the whole running the router out of memory and exploding thing.

Have you tried fq_codel with simple_qos or simplest_qos? Do these cause SQM to blow up as well? I guess it might just simply be too little memory for what your asking from the device, but I am not entirely sure on that.

Would be interesting to see the output of 'tc -s qdisc' from before a torrent test, and from an ongoing test (obviously before it goes bad). I wonder whether there is a crashlog cat /sys/kernel/debug/crashlog visible after the crash?

I tried it with simplest.qos just now and it still blew up.

I'll note from looking at the output of "tc disc" that simplest.qos seems to have the exact same configuration for ingress that piece_of_cake.qos did. The only thing changing the script it was change the egress qdisc, the ingress is the same.

As for not being enough memory? I dunno maybe. Seems strange though that cake is configured for "using a maximum of 4Mb" and tc -s qdisc shows it's doing just that, and yet SOMETHING is eating 35Mb of memory.

I can post the full "tc -s" output if someone wants to tell me how to use spoiler tags or something similar, since if I just post them they are, well long. But the relevant portions as far as memory usage go:

Idle:

backlog 0b 0p requeues 0
memory used: 271808b of 4Mb

Will crash in 5 seconds if I don't kill download:

backlog 18081b 18p requeues 0
memory used: 214272b of 4Mb

As for the crash log, I don't have one handy right now, I'd have to actually let it die (for these tests I've been saving it before it crashes (still with ~25Mb of memory chewed up), but I looked at one before. It just shows the OOM killer kicking in, not finding any particularly large process, killing one anyway, then rinse/repeat, until everything is dead and it reboots.

If you think the output might be useful I can let it explode and get one.

Yes, please do! Both of them. To hide these in the outout simply click the cog icon on the top right of the editor window and select "Hide Details" then add your text. Please frame your pasted tc output with a line only containing three backticks "```" to get fixed font formatting.

Summary
This text will be hidden, and also formatted with a fixed width font, suitable for most console output...

Also It would be good to repeat the test while you are logged into your router and run top -d 1 and look at the % idle and note the minima you see (yes, that is not a terrible precise measurement, but it might tell us something).

With top running the lowest idle I saw was 41% during the "eat all memory" phase. Once the OOM killer started its spree top did get off one last gasp showing 19% idle. Though by then there were only 5 processes left on the poor router at all so I suspect some of that CPU was eaten by the OOM killer itself.

tc -s qdisc while idle
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 19167728250 bytes 20449609 pkt (dropped 0, overlimits 0 requeues 5)
 backlog 0b 0p requeues 5
  maxpacket 1514 drop_overlimit 0 new_flow_count 5 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc cake 8021: dev eth1 root refcnt 2 bandwidth 2Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms noatm overhead 22
 Sent 9411450 bytes 31028 pkt (dropped 54, overlimits 34679 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 94848b of 4Mb
 capacity estimate: 2Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       50 /    1522
 average network hdr offset:           14

                  Tin 0
  thresh          2Mbit
  target          9.1ms
  interval      104.1ms
  pk_delay       25.8ms
  av_delay        2.4ms
  sp_delay        520us
  backlog            0b
  pkts            31082
  bytes         9482252
  way_inds          221
  way_miss         1592
  way_cols            0
  drops              54
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len          1514
  quantum           300

qdisc ingress ffff: dev eth1 parent ffff:fff1 ----------------
 Sent 66825905 bytes 118354 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.3 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.4 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8022: dev ifb4eth1 root refcnt 2 bandwidth 16Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100.0ms noatm overhead 22
 Sent 67817501 bytes 117881 pkt (dropped 473, overlimits 93810 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 367040b of 4Mb
 capacity estimate: 16Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       68 /    1522
 average network hdr offset:           14

                  Tin 0
  thresh         16Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        4.0ms
  av_delay        290us
  backlog            0b
  pkts           118354
  bytes        68482861
  way_inds          269
  way_miss         1650
  way_cols            0
  drops             473
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len          1514
  quantum           488

qdisc fq_codel 0: dev vnet root refcnt 2 limit 10240p flows 1024 quantum 1500 ta
rget 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 9208 bytes 110 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
tc -s qdisc shortly before OOM
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 19221124388 bytes 20502322 pkt (dropped 0, overlimits 0 requeues 5)
 backlog 0b 0p requeues 5
  maxpacket 1514 drop_overlimit 0 new_flow_count 5 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc cake 8021: dev eth1 root refcnt 2 bandwidth 2Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms noatm overhead 22
 Sent 17427952 bytes 94248 pkt (dropped 122, overlimits 157626 requeues 0)
 backlog 67b 1p requeues 0
 memory used: 236096b of 4Mb
 capacity estimate: 2Mbit
 min/max network layer size:           28 /    1500
 min/max overhead-adjusted size:       50 /    1522
 average network hdr offset:           14

                  Tin 0
  thresh          2Mbit
  target          9.1ms
  interval      104.1ms
  pk_delay       23.4ms
  av_delay        5.5ms
  sp_delay         25us
  backlog           67b
  pkts            94371
  bytes        17537650
  way_inds         4164
  way_miss         2922
  way_cols            0
  drops             122
  marks               0
  ack_drop            0
  sp_flows          128
  bk_flows            1
  un_flows            0
  max_len          1514
  quantum           300

qdisc ingress ffff: dev eth1 parent ffff:fff1 ----------------
 Sent 152556872 bytes 208683 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.2 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.3 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.4 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 8022: dev ifb4eth1 root refcnt 2 bandwidth 16Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100.0ms noatm overhead 22
 Sent 148229609 bytes 202253 pkt (dropped 6278, overlimits 243078 requeues 0)
 backlog 196815b 153p requeues 0
 memory used: 531712b of 4Mb
 capacity estimate: 16Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       68 /    1522
 average network hdr offset:           14

                  Tin 0
  thresh         16Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay      532.3ms
  av_delay       35.0ms
  sp_delay        993us
  backlog       196815b
  pkts           208684
  bytes       155479901
  way_inds         6134
  way_miss         2631
  way_cols            0
  drops            6278
  marks               0
  ack_drop            0
  sp_flows           98
  bk_flows           41
  un_flows            0
  max_len          1514
  quantum           488

As for the crash log, now I can't get it to reboot. It'll use up so much memory the OOM killer trashes half the processes on the router and things like logread start failing, but it sort of limps along half dead without rebooting now, not that that's much better. But as a substitute, here's some OOM killer gobblygook (which is mostly what the crashlog was the time I saw one).

And by here I mean in next post because there seems to be a character limit.

What I mean is:

"```"

your pasted tc output

"```"

Just without the double-quotes, I only use these so that the backticks are visible :wink:

Alternitively, paste your text, select it with the mouse and click the </> icon in the tool bar.

OOM killer trashing the place
Thu Sep 17 11:57:28 2020 kern.warn kernel: [  150.017683] procd invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.034444] CPU: 0 PID: 1 Comm: procd Not tainted 4.14.180 #0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.042085] Stack : 00000000 800b2c04 80500000 804b05e0 00000000 00000000 00000000 00000000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.050450]         00000000 00000000 00000000 00000000 00000000 00000001 83825b30 96eaea08
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.062032]         83825bc8 00000000 00000000 00004d80 00000038 80448a38 00000008 00000000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.070392]         00000104 d5e15692 00000103 00000000 83825b10 80000000 83825d20 80489120
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.083144]         00000004 00200000 ffffffff 00000010 00000000 802802a4 00000000 80630000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.091510]         ...
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.097567] Call Trace:
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.097590] [<800b2c04>] 0x800b2c04
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.107576] [<80500000>] 0x80500000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.111120] [<80448a38>] 0x80448a38
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.117913] [<802802a4>] 0x802802a4
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.121422] [<8006a56c>] 0x8006a56c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.127920] [<8006a574>] 0x8006a574
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.133364] [<800f66bc>] 0x800f66bc
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.136879] [<800f5994>] 0x800f5994
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.140378] [<800f655c>] 0x800f655c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.146881] [<800faa60>] 0x800faa60
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.150430] [<800f12bc>] 0x800f12bc
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.156892] [<800f326c>] 0x800f326c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.160396] [<8011d87c>] 0x8011d87c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.166928] [<80119dc4>] 0x80119dc4
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.170456] [<8011e084>] 0x8011e084
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.176657] [<800bf360>] 0x800bf360
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.180231] [<80071800>] 0x80071800
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.186188] [<800a8388>] 0x800a8388
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.189697] [<800a8a8c>] 0x800a8a8c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.195423] [<800bce24>] 0x800bce24
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.198959] [<8044a9ac>] 0x8044a9ac
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.204198] [<80185c28>] 0x80185c28
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.207734] [<8007717c>] 0x8007717c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.211228]
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.215203] Mem-Info:
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.217499] active_anon:495 inactive_anon:14 isolated_anon:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.217499]  active_file:86 inactive_file:131 isolated_file:8
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.217499]  unevictable:0 dirty:0 writeback:0 unstable:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.217499]  slab_reclaimable:263 slab_unreclaimable:1718
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.217499]  mapped:7 shmem:30 pagetables:80 bounce:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.217499]  free:1961 free_pcp:3 free_cma:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.252765] Node 0 active_anon:1980kB inactive_anon:56kB active_file:344kB inactive_file:524kB unevictable:0kB isolated(anon):0kB isolated(file):32kB mapped:28kB dirty:0kB writeback:0kB shmem:120kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.279769] Normal free:8268kB min:8192kB low:10240kB high:12288kB active_anon:1980kB inactive_anon:56kB active_file:344kB inactive_file:524kB unevictable:0kB writepending:0kB present:65536kB managed:59552kB mlocked:0kB kernel_stack:392kB pagetables:320kB bounce:0kB free_pcp:12kB local_pcp:12kB free_cma:0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.311204] lowmem_reserve[]: 0 0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.318366] Normal: 223*4kB (U) 148*8kB (UM) 121*16kB (UM) 30*32kB (UM) 18*64kB (UM) 7*128kB (UM) 3*256kB (U) 1*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 8300kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.337424] 255 total pagecache pages
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.341086] 0 pages in swap cache
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.348263] Swap cache stats: add 0, delete 0, find 0/0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.357650] Free swap  = 0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.360533] Total swap = 0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.367082] 16384 pages RAM
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.369876] 0 pages HighMem/MovableOnly
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  150.378324] 1496 pages reserved
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.381462] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.393716] [  664]     0   664      305       19       3       0        0             0 ubusd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.405481] [  680]     0   680      231        9       4       0        0             0 askfirst
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.416532] [  697]     0   697      257       12       4       0        0             0 urngd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.427553] [ 1076]     0  1076      312       27       3       0        0             0 logd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.440295] [ 1107]     0  1107      509       41       4       0        0             0 rpcd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.450723] [ 1199]     0  1199      270       11       4       0        0             0 dropbear
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.461819] [ 1256]     0  1256      437       44       5       0        0             0 netifd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.470507] [ 1735]     0  1735      329       19       4       0        0             0 uhttpd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.482890] [ 2077]     0  2077      303       10       4       0        0             0 udhcpc
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.491583] [ 2174]     0  2174      303       10       3       0        0             0 sh
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.503284] [ 2180]     0  2180      526       68       4       0        0             0 log-journal-upl
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.517517] [ 2182]     0  2182      342       22       4       0        0             0 logread
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.529796] [ 2191]     0  2191      445       35       5       0        0             0 hostapd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.541009] [ 2307]     0  2307      304       12       4       0        0             0 ntpd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.551783] [ 2866]     0  2866      358       66       5       0        0             0 sh
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.560126] [ 3054]   453  3054      339       28       4       0        0             0 dnsmasq
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.572943] [ 3160]     0  3160      304       11       4       0        0             0 sleep
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.581548] [ 3161]     0  3161      281       20       4       0        0             0 dropbear
Thu Sep 17 11:57:29 2020 kern.info kernel: [  150.593833] [ 3162]     0  3162      305       12       4       0        0             0 ash
Thu Sep 17 11:57:29 2020 kern.err kernel: [  150.605713] Out of memory: Kill process 2180 (log-journal-upl) score 4 or sacrifice child
Thu Sep 17 11:57:29 2020 kern.err kernel: [  150.618373] Killed process 2182 (logread) total-vm:1368kB, anon-rss:80kB, file-rss:4kB, shmem-rss:4kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.355825] log-journal-upl invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.370594] CPU: 0 PID: 2180 Comm: log-journal-upl Not tainted 4.14.180 #0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.382635] Stack : 00000000 800b2c04 80500000 804b05e0 00000000 00000000 00000000 00000000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.391000]         00000000 00000000 00000000 00000000 00000000 00000001 8211db10 bbeb5ec4
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.402033]         8211dba8 00000000 00000000 000062a8 00000038 80448a38 00000008 00000000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.410395]         00000149 e0314ba2 00000148 00000000 8211daf0 80000000 8211dd00 80489120
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.422522]         00000004 00200000 ffffffff 00000010 00000001 802802a4 00000000 80630000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.430890]         ...
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.436836] Call Trace:
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.436858] [<800b2c04>] 0x800b2c04
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.446335] [<80500000>] 0x80500000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.449826] [<80448a38>] 0x80448a38
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.458325] [<802802a4>] 0x802802a4
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.463064] [<8006a56c>] 0x8006a56c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.467779] [<8006a574>] 0x8006a574
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.472547] [<800f66bc>] 0x800f66bc
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.476035] [<800f5994>] 0x800f5994
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.481829] [<800f655c>] 0x800f655c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.485318] [<800faa60>] 0x800faa60
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.488841] [<800f12bc>] 0x800f12bc
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.498021] [<800f326c>] 0x800f326c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.501543] [<80155898>] 0x80155898
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.506334] [<8322817c>] 0x8322817c [ext4@83200000+0x5cfe0]
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.514482] [<80119dc4>] 0x80119dc4
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.519188] [<8011e084>] 0x8011e084
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.524139] [<80071800>] 0x80071800
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.527635] [<80155a30>] 0x80155a30
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.533245] [<8006b42c>] 0x8006b42c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.538918] [<801539c8>] 0x801539c8
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.544765] [<8007717c>] 0x8007717c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.548262]
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.549742] Mem-Info:
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.554663] active_anon:475 inactive_anon:14 isolated_anon:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.554663]  active_file:50 inactive_file:128 isolated_file:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.554663]  unevictable:0 dirty:0 writeback:0 unstable:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.554663]  slab_reclaimable:261 slab_unreclaimable:1735
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.554663]  mapped:2 shmem:30 pagetables:76 bounce:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.554663]  free:2048 free_pcp:15 free_cma:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.589226] Node 0 active_anon:1900kB inactive_anon:56kB active_file:200kB inactive_file:512kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:8kB dirty:0kB writeback:0kB shmem:120kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.614958] Normal free:8112kB min:8192kB low:10240kB high:12288kB active_anon:1900kB inactive_anon:56kB active_file:200kB inactive_file:512kB unevictable:0kB writepending:0kB present:65536kB managed:59552kB mlocked:0kB kernel_stack:392kB pagetables:304kB bounce:0kB free_pcp:60kB local_pcp:60kB free_cma:0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.645526] lowmem_reserve[]: 0 0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.648854] Normal: 226*4kB (UM) 153*8kB (UM) 128*16kB (UM) 32*32kB (UM) 13*64kB (UM) 7*128kB (UM) 3*256kB (U) 1*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 8208kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.667193] 208 total pagecache pages
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.670855] 0 pages in swap cache
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.677867] Swap cache stats: add 0, delete 0, find 0/0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.687069] Free swap  = 0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.689955] Total swap = 0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.696862] 16384 pages RAM
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.699653] 0 pages HighMem/MovableOnly
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  152.707193] 1496 pages reserved
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.710330] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.722611] [  664]     0   664      305       19       3       0        0             0 ubusd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.731214] [  680]     0   680      231        9       4       0        0             0 askfirst
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.744225] [  697]     0   697      257       12       4       0        0             0 urngd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.756530] [ 1076]     0  1076      312       27       3       0        0             0 logd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.769562] [ 1107]     0  1107      509       41       4       0        0             0 rpcd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.781549] [ 1199]     0  1199      270       11       4       0        0             0 dropbear
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.794200] [ 1256]     0  1256      437       39       5       0        0             0 netifd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.806962] [ 1735]     0  1735      329       19       4       0        0             0 uhttpd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.818879] [ 2077]     0  2077      303       10       4       0        0             0 udhcpc
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.830651] [ 2174]     0  2174      303       10       3       0        0             0 sh
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.842579] [ 2180]     0  2180      526       68       4       0        0             0 log-journal-upl
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.855899] [ 2191]     0  2191      445       35       5       0        0             0 hostapd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.868178] [ 2307]     0  2307      304       12       4       0        0             0 ntpd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.879686] [ 2866]     0  2866      358       66       5       0        0             0 sh
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.891233] [ 3054]   453  3054      339       28       4       0        0             0 dnsmasq
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.902420] [ 3160]     0  3160      304       11       4       0        0             0 sleep
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.911022] [ 3161]     0  3161      281       20       4       0        0             0 dropbear
Thu Sep 17 11:57:29 2020 kern.info kernel: [  152.922797] [ 3162]     0  3162      305       12       4       0        0             0 ash
Thu Sep 17 11:57:29 2020 kern.err kernel: [  152.931219] Out of memory: Kill process 2180 (log-journal-upl) score 4 or sacrifice child
Thu Sep 17 11:57:29 2020 kern.err kernel: [  152.942878] Killed process 2180 (log-journal-upl) total-vm:2104kB, anon-rss:264kB, file-rss:4kB, shmem-rss:4kB
Thu Sep 17 11:57:29 2020 kern.info kernel: [  154.005364] oom_reaper: reaped process 2180 (log-journal-upl), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.750704] ubusd invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.767510] CPU: 0 PID: 664 Comm: ubusd Not tainted 4.14.180 #0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.776409] Stack : 00000000 800b2c04 80500000 804b05e0 00000000 00000000 00000000 00000000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.790221]         00000000 00000000 00000000 00000000 00000000 00000001 831abb30 7181214b
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.801897]         831abbc8 00000000 00000000 00007798 00000038 80448a38 00000008 00000000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.810257]         0000018c 1545a0c6 0000018b 00000000 831abb10 80000000 831abd20 80489120
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.823349]         00000004 00200000 ffffffff 00000010 00000002 802802a4 00000000 80630000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.836789]         ...
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.839247] Call Trace:
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.839259] [<800b2c04>] 0x800b2c04
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.848513] [<80500000>] 0x80500000
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.855449] [<80448a38>] 0x80448a38
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.858959] [<802802a4>] 0x802802a4
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.865515] [<8006a56c>] 0x8006a56c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.870479] [<8006a574>] 0x8006a574
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.877083] [<800f66bc>] 0x800f66bc
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.882793] [<800f5994>] 0x800f5994
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.888422] [<800f655c>] 0x800f655c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.893953] [<800faa60>] 0x800faa60
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.897508] [<800f12bc>] 0x800f12bc
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.901015] [<800f326c>] 0x800f326c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.907043] [<8011d87c>] 0x8011d87c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.910612] [<80119dc4>] 0x80119dc4
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.916752] [<8011e084>] 0x8011e084
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.920312] [<80071800>] 0x80071800
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.926977] [<8015ba98>] 0x8015ba98
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.930462] [<8015bd4c>] 0x8015bd4c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.936889] [<800a578c>] 0x800a578c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.940395] [<80185c28>] 0x80185c28
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.947186] [<8009df38>] 0x8009df38
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.950682] [<8007717c>] 0x8007717c
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.956827]
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.960465] Mem-Info:
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.964620] active_anon:409 inactive_anon:14 isolated_anon:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.964620]  active_file:98 inactive_file:123 isolated_file:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.964620]  unevictable:0 dirty:0 writeback:0 unstable:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.964620]  slab_reclaimable:257 slab_unreclaimable:1745
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.964620]  mapped:3 shmem:30 pagetables:72 bounce:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  154.964620]  free:2129 free_pcp:15 free_cma:0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.004059] Node 0 active_anon:1636kB inactive_anon:56kB active_file:392kB inactive_file:492kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:12kB dirty:0kB writeback:0kB shmem:120kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.030259] Normal free:8756kB min:8192kB low:10240kB high:12288kB active_anon:1636kB inactive_anon:56kB active_file:392kB inactive_file:492kB unevictable:0kB writepending:0kB present:65536kB managed:59552kB mlocked:0kB kernel_stack:384kB pagetables:288kB bounce:0kB free_pcp:60kB local_pcp:60kB free_cma:0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.061331] lowmem_reserve[]: 0 0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.068920] Normal: 259*4kB (UME) 141*8kB (UME) 124*16kB (UME) 46*32kB (UME) 16*64kB (UME) 6*128kB (ME) 3*256kB (E) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 8692kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.087221] 251 total pagecache pages
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.090884] 0 pages in swap cache
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.098161] Swap cache stats: add 0, delete 0, find 0/0
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.107329] Free swap  = 0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.110209] Total swap = 0kB
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.116947] 16384 pages RAM
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.119742] 0 pages HighMem/MovableOnly
Thu Sep 17 11:57:29 2020 kern.warn kernel: [  155.127403] 1496 pages reserved
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.130541] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.143055] [  664]     0   664      305       20       3       0        0             0 ubusd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.155362] [  680]     0   680      231        9       4       0        0             0 askfirst
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.167600] [  697]     0   697      257       12       4       0        0             0 urngd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.180299] [ 1076]     0  1076      312       28       3       0        0             0 logd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.192159] [ 1107]     0  1107      509       41       4       0        0             0 rpcd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.200675] [ 1199]     0  1199      270       11       4       0        0             0 dropbear
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.213013] [ 1256]     0  1256      437       39       5       0        0             0 netifd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.225977] [ 1735]     0  1735      329       19       4       0        0             0 uhttpd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.238221] [ 2077]     0  2077      303       10       4       0        0             0 udhcpc
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.250043] [ 2174]     0  2174      303       10       3       0        0             0 sh
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.261483] [ 2191]     0  2191      445       35       5       0        0             0 hostapd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.272910] [ 2307]     0  2307      304       12       4       0        0             0 ntpd
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.281427] [ 2866]     0  2866      358       66       5       0        0             0 sh
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.293146] [ 3054]   453  3054      339       29       4       0        0             0 dnsmasq
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.305547] [ 3160]     0  3160      304       11       4       0        0             0 sleep
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.317806] [ 3161]     0  3161      281       20       4       0        0             0 dropbear
Thu Sep 17 11:57:29 2020 kern.info kernel: [  155.329922] [ 3162]     0  3162      305       12       4       0        0             0 ash
Thu Sep 17 11:57:29 2020 kern.err kernel: [  155.341589] Out of memory: Kill process 2866 (sh) score 4 or sacrifice child
Thu Sep 17 11:57:29 2020 kern.err kernel: [  155.348652] Killed process 3160 (sleep) total-vm:1216kB, anon-rss:40kB, file-rss:4kB, shmem-rss:0kB

I agree that this really does not look like a CPU issue then. But this:

qdisc cake 8022: dev ifb4eth1 root refcnt 2 bandwidth 16Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100.0ms noatm overhead 22
 Sent 148229609 bytes 202253 pkt (dropped 6278, overlimits 243078 requeues 0)
 backlog 196815b 153p requeues 0
 memory used: 531712b of 4Mb
 capacity estimate: 16Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       68 /    1522
 average network hdr offset:           14

                  Tin 0
  thresh         16Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay      532.3ms
  av_delay       35.0ms
  sp_delay        993us
  backlog       196815b
  pkts           208684
  bytes       155479901
  way_inds         6134
  way_miss         2631
  way_cols            0
  drops            6278
  marks               0
  ack_drop            0
  sp_flows           98
  bk_flows           41
  un_flows            0
  max_len          1514
  quantum           488

looks unhealthy on multiple fronts:

That should be well below 20ms.... but that could be the consequence of the OOM situation and not its trigger.

This should be much closer to 5 ms...