Upstream kernel memleak 5.10+? octeon-ethernet.ko

iperf with static ip and no other service should be a good test
but again it can't be something in userspace as we should have the same leak in every other target

the fact that the driver is in staging in the kernel makes me think that the assertion of the maintainers that the driver doesn't leak is BS...

1 Like

Message to the Maintainer:

Under the 5.4 kernel, I was seeing no issues.  When testing under the 5.10 kernel, I am showing a series of memory leaks.  I've enabled KMEMLEAK and begun trying to trace this down. I am not a programmer, but I can follow direction.  In this case, it was decided I should talk to you, as the Cavium-Octeon listed maintainer.

Any suggestion or insight you could provide would be much appreciated.  This seems to be affecting Octeon+/Octeon2/Octeon3 targets currently supported by OpenWrt.

I've only included the first four, though there were thousands on boot

unreferenced object 0x8000000005cc6800 (size 2048):
  comm "swapper/0", pid 1, jiffies 4294937844 (age 3174.260s)
  hex dump (first 32 bytes):
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  backtrace:
    [<0000000092547866>] __kmalloc+0x1c4/0x768
    [<000000002902bd0d>] cvm_oct_mem_fill_fpa+0x60/0x1a8
    [<00000000b84a9f23>] cvm_oct_probe+0xb4/0xab8
    [<00000000bdc4ede7>] platform_drv_probe+0x28/0x88
    [<00000000a567e8b8>] really_probe+0xfc/0x4e0
    [<0000000096837a2a>] device_driver_attach+0x120/0x130
    [<00000000ec1cb103>] __driver_attach+0x7c/0x148
    [<00000000fb6265da>] bus_for_each_dev+0x68/0xa8
    [<000000004feb0e7d>] bus_add_driver+0x1d0/0x218
    [<0000000069658853>] driver_register+0x98/0x160
    [<00000000cec7f896>] do_one_initcall+0x54/0x168
    [<0000000035c2e6f9>] kernel_init_freeable+0x280/0x31c
    [<0000000046a35530>] kernel_init+0x14/0x104
    [<00000000991d0df4>] ret_from_kernel_thread+0x14/0x1c
unreferenced object 0x8000000005323680 (size 216):
  comm "softirq", pid 0, jiffies 4295252469 (age 28.020s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000078af28d6>] kmem_cache_alloc+0x1ac/0x708
    [<000000001d074ea2>] __build_skb+0x34/0xd8
    [<00000000339b0f83>] __netdev_alloc_skb+0x118/0x1f0
    [<00000000db3556b0>] cvm_oct_mem_fill_fpa+0x154/0x1a8
    [<00000000a49e80de>] cvm_oct_napi_poll+0x4c0/0x988
    [<00000000d0c3cba0>] __napi_poll+0x3c/0x158
    [<00000000bb0c10eb>] net_rx_action+0xe8/0x210
    [<000000003322eb9f>] __do_softirq+0x168/0x360
    [<00000000d4037fcb>] irq_exit+0x9c/0xe8
    [<00000000224da306>] plat_irq_dispatch+0x48/0xd0
    [<00000000327ba56b>] handle_int+0x14c/0x158
    [<000000003eae4681>] __r4k_wait+0x20/0x40
unreferenced object 0x80000000052fd880 (size 216):
  comm "softirq", pid 0, jiffies 4295253111 (age 21.600s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000078af28d6>] kmem_cache_alloc+0x1ac/0x708
    [<000000001d074ea2>] __build_skb+0x34/0xd8
    [<00000000339b0f83>] __netdev_alloc_skb+0x118/0x1f0
    [<00000000db3556b0>] cvm_oct_mem_fill_fpa+0x154/0x1a8
    [<00000000a49e80de>] cvm_oct_napi_poll+0x4c0/0x988
    [<00000000d0c3cba0>] __napi_poll+0x3c/0x158
    [<00000000bb0c10eb>] net_rx_action+0xe8/0x210
    [<000000003322eb9f>] __do_softirq+0x168/0x360
    [<00000000d4037fcb>] irq_exit+0x9c/0xe8
    [<00000000224da306>] plat_irq_dispatch+0x48/0xd0
    [<00000000327ba56b>] handle_int+0x14c/0x158
    [<000000003eae4681>] __r4k_wait+0x20/0x40
unreferenced object 0x80000000052fc680 (size 216):
  comm "softirq", pid 0, jiffies 4295253112 (age 21.590s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000078af28d6>] kmem_cache_alloc+0x1ac/0x708
    [<000000001d074ea2>] __build_skb+0x34/0xd8
    [<00000000339b0f83>] __netdev_alloc_skb+0x118/0x1f0
    [<00000000db3556b0>] cvm_oct_mem_fill_fpa+0x154/0x1a8
    [<00000000a49e80de>] cvm_oct_napi_poll+0x4c0/0x988
    [<00000000d0c3cba0>] __napi_poll+0x3c/0x158
    [<00000000bb0c10eb>] net_rx_action+0xe8/0x210
    [<000000003322eb9f>] __do_softirq+0x168/0x360
    [<00000000d4037fcb>] irq_exit+0x9c/0xe8
    [<00000000224da306>] plat_irq_dispatch+0x48/0xd0
    [<00000000327ba56b>] handle_int+0x14c/0x158
    [<000000003eae4681>] __r4k_wait+0x20/0x40

The response

Those are not real memory leaks. If you unload the driver, and run the
kmemleak again you'll see they are gone.

The reason kmemleak thinks those are unreferenced is because those
memory buffers are given to FPA, and they are not visible to kmemleak
until FPA gives them back.

This is FACTUALLY true, from what I can tell (and what others have told me).. I even went so far as to kmod-octeon-ethernet.ko so I could unload it.. However, if the driver is the issue, then removing the driver would stop the leak and we have no way to say differnetly

Then why not use the damn kmemleak_not_leak and correctly handle them?
Also what if the driver just alloc mem and just frees all of it on unload? aka there is a logic error where buffer is never freed but only alloc every time? (and only freed on driver unload?)

Is this something I should be doing, or was that directed towards the upstream? I'm very far outside my knowledge zone on this, but I don't want to see the Octeon tree die because, frankly, I'm not up to replacing the routers that work great (when they aren't leaking like a sieve).

I'm open to testing anything that needs to be done, and spend whatever time is required for it, but I had to find out what kmemleak was and how to use it before sending it to the Maintainer.

1 Like

Both... If the buffer is used like that then why they don't use the correct way by flagging the memory as not leaking?

The problem is always the same... understand if it's actually the driver or something else. For sure it's something in the kernel badly handled by the driver.
So not strictly the driver but a defect in the driver handling things.

Could totally be that in the net code they changed something and nobody cared to fix this in the staging driver.
That would also explain why with the same code the driver leaks on 5.10 but doesn't leak in 5.4

Right, I looked at that, but when I asked about bisecting the OpenWrt kernel, I got looks of pity and condolences :smiley: and I'm not entirely sure how to go about it.. One of those "If you have to ask..." kinda things I guess

I know there were some krb_free changes (I think?) or something similar, but I couldn't take it any further than that.

well it's really something where you need to know what you are doing. the fact that no devs want to "waste" some time on this is sad :frowning:

1 Like

Well, thats kind of normal.
Octeon I/II/III is EOL as far as the vendor is concerned, they ain't gonna be paying somebody for staging drivers

I think it's more an issue where the arch isn't widely used (I mean, I put a PR for the 5.15 kernel and it involved, what.. 6 patches that haven't really changed since 4.19), and the "serious" devs don't have/can't test. The SNIC was a good shot at getting interest because it was cheap and plentiful, but the people who can test, and have the knowledge, just don't intersect.

If you see below, the Memleak isn't there when I turn network and dnsmasq off, but then when I start turning additional services off (dropbear, uhttpd, sysntpd) the memleak reappears???? What the actual hell?

Fri Apr  1 17:51:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41364      907844          28       16800      891656
Swap:             0           0           0
Fri Apr  1 17:56:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41140      908052          28       16816      891872
Swap:             0           0           0
Fri Apr  1 18:01:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       40840      908344          28       16824      892168
Swap:             0           0           0
Fri Apr  1 18:06:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41060      908120          28       16828      891948
Swap:             0           0           0
Fri Apr  1 18:11:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       40856      908324          28       16828      892152
Swap:             0           0           0
Fri Apr  1 18:16:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41336      907844          28       16828      891672
Swap:             0           0           0
Fri Apr  1 18:21:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41288      907892          28       16828      891720
Swap:             0           0           0
Fri Apr  1 18:26:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41276      907904          28       16828      891732
Swap:             0           0           0
Fri Apr  1 18:31:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41268      907912          28       16828      891740
Swap:             0           0           0
Fri Apr  1 18:36:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41300      907880          28       16828      891708
Swap:             0           0           0
Fri Apr  1 18:41:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41048      908132          28       16828      891960
Swap:             0           0           0
Fri Apr  1 18:46:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41484      907696          28       16828      891524
Swap:             0           0           0
Fri Apr  1 18:51:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41280      907900          28       16828      891728
Swap:             0           0           0
Fri Apr  1 18:56:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41464      907712          28       16832      891540
Swap:             0           0           0
Fri Apr  1 19:01:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41212      907964          28       16832      891792
Swap:             0           0           0
Fri Apr  1 19:06:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41208      907964          28       16836      891796
Swap:             0           0           0
Fri Apr  1 19:11:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41484      907688          28       16836      891520
Swap:             0           0           0
Fri Apr  1 19:16:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41232      907940          28       16836      891772
Swap:             0           0           0
Fri Apr  1 19:21:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41460      907708          28       16840      891540
Swap:             0           0           0
Fri Apr  1 19:26:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41412      907756          28       16840      891588
Swap:             0           0           0
Fri Apr  1 19:31:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41392      907772          28       16844      891604
Swap:             0           0           0
Fri Apr  1 19:36:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41668      907496          28       16844      891328
Swap:             0           0           0
Fri Apr  1 19:41:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41416      907748          28       16844      891580
Swap:             0           0           0
Fri Apr  1 19:46:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41620      907544          28       16844      891376
Swap:             0           0           0
Fri Apr  1 19:51:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41412      907752          28       16844      891584
Swap:             0           0           0
Fri Apr  1 19:56:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41604      907560          28       16844      891392
Swap:             0           0           0
Fri Apr  1 20:01:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41388      907776          28       16844      891608
Swap:             0           0           0
Fri Apr  1 20:06:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       42088      907076          28       16844      890908
Swap:             0           0           0
Fri Apr  1 20:11:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41868      907296          28       16844      891128
Swap:             0           0           0
Fri Apr  1 20:16:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41808      907356          28       16844      891188
Swap:             0           0           0
Fri Apr  1 20:21:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41852      907312          28       16844      891144
Swap:             0           0           0
Fri Apr  1 20:26:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       42036      907124          28       16848      890960
Swap:             0           0           0
Fri Apr  1 20:31:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       42032      907128          28       16848      890964
Swap:             0           0           0
Fri Apr  1 20:36:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41812      907348          28       16848      891184
Swap:             0           0           0
Fri Apr  1 20:41:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41764      907396          28       16848      891232
Swap:             0           0           0
Fri Apr  1 20:46:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       42016      907144          28       16848      890980
Swap:             0           0           0
Fri Apr  1 20:51:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       41776      907380          28       16852      891220
Swap:             0           0           0
Usage: service <service> [command]
/etc/init.d/boot                   enabled         stopped
/etc/init.d/cron                   enabled         stopped
/etc/init.d/dnsmasq               disabled         stopped
/etc/init.d/done                   enabled         stopped
/etc/init.d/dropbear              disabled         stopped
/etc/init.d/firewall               enabled         stopped
/etc/init.d/gpio_switch            enabled         stopped
/etc/init.d/led                    enabled         stopped
/etc/init.d/log                    enabled         running
/etc/init.d/network               disabled         stopped
/etc/init.d/odhcpd                 enabled         running
/etc/init.d/rpcd                   enabled         running
/etc/init.d/sysctl                 enabled         stopped
/etc/init.d/sysfixtime             enabled         stopped
/etc/init.d/sysntpd               disabled         stopped
/etc/init.d/system                 enabled         stopped
/etc/init.d/ucitrack               enabled         stopped
/etc/init.d/uhttpd                disabled         stopped
/etc/init.d/umount                 enabled         stopped
/etc/init.d/urandom_seed           enabled         stopped
/etc/init.d/urngd                  enabled         running
Fri Apr  1 20:56:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       54876      894272          24       16860      878116
Swap:             0           0           0
Fri Apr  1 21:01:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       54672      894476          24       16860      878320
Swap:             0           0           0
Fri Apr  1 21:06:11 UTC 2022
              total        used        free      shared  buff/cache   available
Mem:         966008       65176      883968          24       16864      867816
Swap:             0           0           0
root@OpenWrt:/#

Well, it's getting closer and there has been some folks bravely trying to assist.

On 5.15, If I turn off ALL services but networking, it still leaks.. However, if I then remove the lan interface and wan6? You get the below.

Current theory by @neg2led is that its the way UDP packets are handled (or not) to 0.0.0.0 and :: address binding.. Which seems to correlate. Hopefully it'll let the smart folks start tracing where it might be. But! BUT! Remember, I had network running the entire time.. Granted, I had to set it static, but I am not sure we can directly blame the octeon-ethernet.ko driver exclusively, or at the very least, can narrow it even further.

root@OpenWrt:/# ifconfig
eth0      Link encap:Ethernet  HWaddr 2C:26:5F:00:00:00
          inet addr:192.168.200.10  Bcast:192.168.200.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:134966 errors:0 dropped:33247 overruns:0 frame:0
          TX packets:70 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:16132923 (15.3 MiB)  TX bytes:5706 (5.5 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

root@OpenWrt:/#
2 Likes

Stijn Tintel bisected down to https://github.com/torvalds/linux/commit/f9cb654cb550b7b87e8608b14fc3eca432429ffe.

As it turns out, it's fixed upstream:

https://lore.kernel.org/lkml/20220314145108.GB13438@alpha.franken.de/

It is an issue specific to >32-bit MIPS.

3 Likes

After applying the patch https://lore.kernel.org/lkml/20220310113116.2068859-1-yaliang.wang@windriver.com/raw

root@OpenWrt:/# free -m; for i in 0 1 2 3 4 5 6 7 8 9; do service dnsmasq restar
t; sleep 1; done; free -m
              total        used        free      shared  buff/cache   available
Mem:         965972       33296      913392          68       19284      898428
Swap:             0           0           0
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
udhcpc: no lease, failing
              total        used        free      shared  buff/cache   available
Mem:         965972       33484      913128          68       19360      898200
Swap:             0           0           0
root@OpenWrt:/#

And thanks to @stintel for already introducing the fix into the tree!

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=9283359bd53a889a270da4a7d5bbe3eaaa771e70

4 Likes

Ohh, lovely! The little black box has received hope for life once more :smiley:

1 Like

Awesome catch, nice to see Octeon living on

1 Like

Thanks a lot! Great work! :slight_smile:

what a joke for 9 kernel version the entire page allocation logic was flawed.... (fun stuff also show how little the upstream kernel is used for this kind of arch)

This honestly isnt that unusual, especially with obscure archs whose only users are pretty much old SDK kernels.

Upstream kernel aint magic, weird stuff happens on ARM, why wouldnt it happen on MIPS64

1 Like

My only issue with this is that, at least in 5.15, they are actively updating code. The e300 was included in the upstream, so someone is submitting it and someone is accepting it. If they want it to die and be left to those old SDKs, they should do that and just properly tell us to bugger off. shrug

I think you understood my reply wrong, upstream obviously doesn't want an arch to die, and it's great that somebody is sending stuff upstream.

My point is that people always seem pissed when there is a regression or a bug found after X releases and are like "Why wasn't this found in testing, we need more testing, etc".
They think that there has got to be some huge testing and automation setup for everything, while the truth is that unless somebody notices the bug then it wont get caught, the more obscure the arch the lower the chance.

1 Like