I am running OpenWRT 24.10 on an Archer MR 600 v2 with 128 MB of RAM. Every now and then the device runs out of memory and either reboots or slowly recovers. I am trying to understand what (kernel? Drivers? conntrack?) is using memory.
In normal idle operation the device uses 60-70 mb of 118 listed as available by free. During periods of heavy network traffic, I see used memory shooting up to 100mb or more. If memory use goes above that I first see sys CPU usage spiking - I suspect the kernel is decompressing the same squashfs pages again and again), the OOM killer kicking in or the device outright rebooting.
I looked at the process list, /proc/meminfo, conntrack -L (or rather the count of connections it prints) as well as /tmp file size while idle and in cases of high memory load, but I can not find out what allocates memory during the spikes. The sum of memory listed by ps is almost the same - in fact it is 100kb higher in the idle case. /proc/meminfo lists less memory as free and available, but doesnât show an equivalent amount of increase in any of the other fields. If I sum up all rows in /proc/meminfo in the busy case and the idle case, about 30mb of memory seem to be missing in the busy case.
Conntrack shows approximately 200 connections in either case. It can fall lower, but I havenât seen it go much above 200.
The memory is not leaked though. If the router doesnât crash, it will eventually free whatever it allocated, assuming it doesnât crash. Something uses it, but it doesnât show up in /proc/meminfo, ps, and df (for tmpfs mounts).
What am I missing? What else can use considerable amounts of memory (in the order of 30-40 mb) and not show up in the numbers I looked at?
One thing that drives up the static use of RAM is that I am using OpenSSL rather than mbedtls as the TLS library. Using mbedtls frees up about 10 MB in total, but in particular with LuCI via https the speed difference is night and day. And it doesnât solve the problem, it just mitigates it somewhat. I still get memory use spikes of dozens of MBs without any clue where they are going.
I grabbed the /proc/meminfo (and ps,free,df -h) I posted in the second post while (high mem use) and after (low mem use) one of my computers was running downloading in Battle.net . The low use state came after the high use one (which to me shows there is no classic leak where something loses track of allocated memory). The OOM killer was also - barely - not triggered, but the kernel did re-read and re-read data from squashfs for 10 or so seconds, suggesting it had to throw away data from the page cache that was actually needed. If the OOM killer gets triggered it grabs a random victim - most of the time uhttpd, dnsmasq or https-dns-proxy and kills it. procd then tries to restart it. I donât think either of those processes is actually at fault for the high memory consumption though.
My Internet connection is an LTE connection with a bandwidth of 30 mbit/second only, so I am not moving around massive amounts of data during those downloads. The OpenWRT device is managing this connection.
I put the process list, /proc/meminfo, free output into a LibreOffice Calc sheet for easy comparison. Sadly the forum doesnât allow me to attach it.
My OpenWRT image is a self-compiled image with everything in squashfs. I reduced the squashfs block size from the default 256kb to 128kb in the hope that it makes discarding data from the page cache and re-reading it a bit more efficient and because I still have a few MB of flash space to spare.
I do not recall any similar issue described on this forum. Obviously I'm not reading every thread so I might be missing something. But to me this sounds like a unique issue. Considering that (based on the information you provided so far) nothing in your usage pattern or config is unique, but you are running a self-built image with some pretty uncommon changes (particularly the block size), my best guess is that something in those modifications is the source of the problem.
Thanks for looking at my problem description anyhow .
The block size change was something I tried in response to the memory issues - I saw the same with the default squashfs settings.
A while ago I came across a claim that the Mediathek Ethernet driver is pretty greedy and allocates a few MB for DMA or so. I donât remember where I read that though and at the moment canât find it again. I think it was a pull request that suggested to drop support for 64mb ram ramips devices. But I doubt that it is causing my issues. While I do have two wired devices attached, one of them (an Android TV) is usually off, and the other one (A Rasperry pi) is idle.
To send some traffic through a VPN running on my raspberry pi. I couldnât get this to work with the stripped down dnsmasq, even if I replaced the string âclassless-static-routeâ with a numerical value.
The (compressed) nlbwmon data is a few kb on the file system, and it is written to disk:
root@tplink:~# ls -lh /nlbwmon/ -rw-r----- 1 root root 2.4K Jun 1 00:00 20250501.db.gz -rw-r----- 1 root root 1.1K Jul 1 00:00 20250601.db.gz -rw-r----- 1 root root 1006 Aug 2 21:04 20250701.db.gz -rw-r----- 1 root root 2.1K Aug 10 11:17 20250801.db.gz
collectd (which keeps track of CPU and RAM use, which nlbwmon doesnât track) stores 560 kb in /tmp/rrd. The total tmpfs use is between 928 kb after boot up to 1100 kb after a day of uptime. I havenât seen it grow beyond 1100 kb. Iâm not persisting collectd data on disk.
I read the comments in that PR and I failed to find similarities to your case. There is one user saying that MT7621 build consumes significantly more memory compared to MT7620, however: 1. this is only one report which may or may not be accurate and representative, 2. the report (to my understanding) is about static increase in memory use, rather than dynamic spikes as in your case.
Actually most of pull request 17628 was merged in 15887235 and cherry-picked into 24.10 in 642b5b61, but the part that adjusted the DMA size in mtk_eth_soc.c seems to be missing. I have manually added this to my build and it reduced the memory used immediately after boot from 60mb to 54mb. Iâll see if it has an influence on spikes during load.
Yeah the issues described in that PR arenât the same, but it is the best breadcrumb I have for now. Iâll see what that dma_size change brings. The two answers I am hoping for:
Does it magically reduce dynamic memory use too? (Probably not)
Are the 6mb of static use freed enough to make the router survive the dynamic spikes without crashing? (more likey, but not a given)
At least wifi<->ethernet speed seems unaffected by the change. Wrt the actual impact Iâll have to gather data for a few days.
Since that one comment talks about the number of CPU cores: I have packet steering set to âEnabled (all CPUs)â. It did improve speed between wifi and ethernet a little bit. Disabling it is something Iâll test eventually.
Flow offloading is disabled as it breaks QoS (tested) and probably nlbwmon too (assumed, not tested). Software flow offloading works in the sense that it reduces the CPU load, but it doesnât have a noticeable effect on memory use (I tested that a while ago). Thereâs no HW flow offloading support on my hardware.
If neither the DMA change nor flow offloading bring an answer Iâll flash the prebuilt image and see if I can reproduce the problem with it.
So here are a few hours of data. The mt2701_data.{rx, tx}.dma_size = MTK_DMA_SIZE(512) seems to have made a surprisingly big difference. This is from free:
total used free shared buff/cache available Mem: 118820 59780 30404 732 28636 19780 Swap: 59388 512 58876
Only 512kb swap space used, still 28mb still used for disk cache.
I sadly donât have one without the change, but the âusedâ there used to hover around 60-70 on idle, with only about 10mb used for the page cache and 5-8mb in swap. The highest spike I saw in the graph earlier was 85mb; If it went higher the system rebooted and the collectd data was lost. (semi off topic rant: Every tool seems to have a different idea what used, cached, and free memory means)
Yes, I did test downloading things. I updated some games with Wargaming.net Game Center - which uses libtorrent under the hood. It did not cause any spikes yet. Iâll have to keep this running for a while to be sure. Furthermore my family is out of the house, so there are fewer WiFi clients connected than usual.
I canât entirely explain this yet. Static use reduced ok. But why would changing the ethernet driver eliminate the spikes?