[tl-wr841n-v9] Remote upgrade over SSH failed (32 MB RAM device)

I wanted to upgrade my CC to the latest avaliable LEDE.
I followied the cli procedure and just before upgrade was supoposed to be started it disconnected me.
Device is pingable but not reach able anymore over SSH ot HTTP/S.

root@OpenWrt:/tmp# sysupgrade -v /tmp/lede-17.01.1-ar71xx-generic-tl-wr841-v9-squashfs-sysupgrade.bin 
Saving config files...
etc/adblock.sh
etc/black.list
etc/config/dhcp
etc/config/dropbear
etc/config/firewall
etc/config/firewall.orig
etc/config/luci
etc/config/network
etc/config/rpcd
etc/config/system
etc/config/ubootenv
etc/config/ucitrack
etc/config/uhttpd
etc/config/wireless
etc/crontabs/root
etc/dnsmasq.conf
etc/dropbear/dropbear_dss_host_key
etc/dropbear/dropbear_rsa_host_key
etc/firewall.user
etc/group
etc/hosts
etc/inittab
etc/opkg.conf
etc/opkg/customfeeds.conf
etc/opkg/keys/53bad1233d4c98c5
etc/opkg/keys/de98a2dd1d0f8a07
etc/passwd
etc/ppp/chap-secrets
etc/ppp/filter
etc/ppp/options
etc/profile
etc/protocols
etc/rc.local
etc/services
etc/shadow
etc/shells
etc/sysctl.conf
etc/sysupgrade.conf
etc/white.list
tmp/block.hosts
killall: watchdog: no process killed
Sending TERM to remaining processes ... udhcpc ntpd Connection to 84.74.152.129 closed by remote host.

Did you try the failsafe mode ?

17.01.1 or snapshot?

I have downloaded this one. So in my understanding it is stable 17.01.1
https://downloads.lede-project.org/releases/17.01.1/targets/ar71xx/generic/lede-17.01.1-ar71xx-generic-tl-wr841-v9-squashfs-sysupgrade.bin

Here is what I did next:

  1. I went on site, rebooted device and found out that it has default settings/configs.
  2. I tried to restore my configurations from backup: sysupgrade -r /tmp/OpenWRT.20161124.tar.gz AND later tried also over the web interfaces.
    In both cases after reboot configuration was not kept.
  3. I have downgraded OS to OpenWrt Chaos Calmer 15.05.1 and repeated upgrade to lede-17.01.1 (CLI and later over the web interfaces).
    In both cases new OS cannot kept any new settings after reboot.
  4. At the moment I have downgraded OS to: OpenWrt Chaos Calmer 15.05.1

Any clue what might be wrong?

Check if you still have enough free space on your flash (df -h), 4 MB is very tight.

Looks like it is space problem on /overlay
But how come /dev/mtdblock3 with OpenWrt Chaos Calmer 15.05.1 size is: 576K and with LEDE-17.01.1 only 512K?
Additionally, isnt this image dedicated specially for this device? There should be no surprises like this, huh?

OpenWrt Chaos Calmer 15.05.1

root@OpenWrt:~# uname -a
Linux OpenWrt 3.18.23 #1 Sun Jan 31 18:39:35 CET 2016 mips GNU/Linux
root@OpenWrt:~# df -h
Filesystem                Size      Used Available Use% Mounted on
rootfs                  576.0K    256.0K    320.0K  44% /
/dev/root                 2.3M      2.3M         0 100% /rom
tmpfs                    14.0M      2.3M     11.6M  17% /tmp
/dev/mtdblock3          576.0K    256.0K    320.0K  44% /overlay
overlayfs:/overlay      576.0K    256.0K    320.0K  44% /
tmpfs                   512.0K         0    512.0K   0% /dev

LEDE-17.01.1

root@LEDE:~# uname -a
Linux LEDE 4.4.61 #0 Sat Apr 15 16:13:45 2017 mips GNU/Linux
root@LEDE:~# df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/root                 2.3M      2.3M         0 100% /rom
tmpfs                    13.8M    328.0K     13.4M   2% /tmp
/dev/mtdblock3          512.0K    512.0K         0 100% /overlay
tmpfs                    13.8M     56.0K     13.7M   0% /tmp/root
overlayfs:/tmp/root      13.8M     56.0K     13.7M   0% /
tmpfs                   512.0K         0    512.0K   0% /dev

With 512 KB potentially free space on your overlay, you can still consider yourself lucky - that is more than you can achieve on most other devices with just 4MB flash. While you won't be able to install (m)any additional packages, it should be sufficient to keep most configuration persistent, but given the way sysupgrading works at the moment, it might get tight(er) if you keep your configs (as that pushes more files onto the overlay than strictly necessary).

See 432_warning for a detailed analysis of the background, in short - software rarely gets smaller over time/ with newer versions and a growths of 64 KB == 1 erase block over two years isn't a whole lot, but certainly enough to push devices that have always been marginal over the edge. This just shows once more that devices with 4MB flash/ 32 MB RAM are past their prime.

Hey, thanks for reply.
So, if I understand correctly, LEDE-17.01.1 takes one more "erase block" than CC 15.05.1 was taking.
And that is why SIZE of /dev/mtdblock3 is shorter by 64K? I thought this is FIXED parameter...

Anyway, after clean installation of lede-17.01.1-ar71xx-generic-tl-wr841-v9-squashfs-sysupgrade.bin (default settings without any additional packages and config files), the disk/storage usage is as shown in the above post - 100%.

For test I have tried to do very simple and minor config change: in /etc/config/dropbear I changed the port. After reboot, this change was gone. I believe this is due the space problem on /dev/mtdblock3, right?

But then, this specific image: lede-17.01.1-ar71xx-generic-tl-wr841-v9-squashfs-sysupgrade.bin is dedicated for this specific device: TP-LINK TL-WR841N/ND v9.
So, it should fit to this device and allow save at least minimum configuration.

OR... what am I doing wrong?

If the overlay is full, a RAM based emergency overlay gets used - obviously that can't offer persistent storage. However if you have 512 KB available to play with, that should be sufficient for most configuration, so you should investigate why it's 100% in use.

It would probably make sense to reset your device (firstboot), not to restore any backups nor to install any packages and to check then after the subsequent reboots if your configuration changes persist and how much space remains for the overlay.

Hi,
by firstboot, I understand I should do the following:

  1. Enter failsafe mode
  2. mount_root
  3. Soft Factory Reset: umount /overlay && firstboot && reboot

It is nicely described under this link:
https://lede-project.org/docs/user-guide/failsafe_and_factory_reset

However, the description of the procedure to enter failsafe is quite... mystery:

Wait for a flashing LED and press a button

My questions here:

  1. which LED? There are so many on the device
  2. I guess by "button" they mean "Reset button" as the other one (on this device) is only "Power button" :wink:

Any help here please?

I am not not sure if you need to go into the failsafe mode at all.
It may be just enough to issue the "firstboot" command in the normal console and then reboot. (not quite sure, as you have read-only overlay)

[quote="czezz, post:8, topic:3471"]
So, if I understand correctly, LEDE-17.01.1 takes one more "erase block" than CC 15.05.1 was taking.And that is why SIZE of /dev/mtdblock3 is shorter by 64K? I thought this is FIXED parameter...
[/quote]Size of the "overlay" partition is not FIXED. It is whatever space is left free after the firmware.
Overlay size = total size - firmware size.

Typically the power LED, but depending on the router it can be some other LED.

For you the button means "reset" as there is no other such button.

There is a 2-second long window during the boot, when the system waits for a button press to indicate that the user want to enter the LEDE/Openwrt failsafe mode. That two seconds is normally (for most routers) indicated with a rapidly blinking power LED, which then slows down to normal "boot blinking" after the 2-second time has gone without a button press.

You router-specific (Openwrt) wiki page may give more exact info about the timing for your router, but I haven't checked. (some router pages tell rather clearly how the different LEDs blink during the boot)

Generic advice is to press a button rapidly during the whole boot if you do not know the exact moment, as you need to get a button press registered during those two seconds.

EDIT:
I added more advice to the wiki page:
https://lede-project.org/docs/user-guide/failsafe_and_factory_reset#entering_failsafe_mode

Hi, thank you all very much for help :slight_smile:
Reset in failsafe mode was good idea and that helped.

root@LEDE:~# df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/root                 2.3M      2.3M         0 100% /rom
tmpfs                    13.8M      2.6M     11.2M  19% /tmp
/dev/mtdblock3          512.0K    248.0K    264.0K  48% /overlay
overlayfs:/overlay      512.0K    248.0K    264.0K  48% /
tmpfs                   512.0K         0    512.0K   0% /dev

Now after LEDE 17.01.1 is up and running I have number of other issues:

  1. device reboots randomly from time to time

  2. lost root access to device. Had to go to failsafe and reset root passwd

  3. executing "sort -n" on file (2mb) ends up like this:

    root@LEDE:/tmp# cat /tmp/block.build.list|sort -u > /tmp/block.build.before
    Killed

Typical symptoms of RAM memory exhaustion.

I just checked from wiki and your device has only 32 MB RAM memory, right?
That is going to be a larger problem than the 4 MB flash :frowning:

You were already pointed to the "4/32 warning"
https://lede-project.org/meta/infobox/432_warning
It is no joke. Likely your device's resources are not up to match the current kernel's and core apps' resource requirements.

You will likely be happier with sticking to the old CC15.05.1, as its resource requirements are marginally lower. But still so much lower that your device can live with it.

You can check your current RAM memory situation with "free":

root@LEDE:~# free
             total       used       free     shared    buffers     cached
Mem:        479240      94880     384360       1260       5104      16832
-/+ buffers/cache:      72944     406296
Swap:            0          0          0

Hi,
yes, I have similar conclusion.
This morning I noticed that router hanged up completely. I decided to roll back to CC15.
Tho, im not sure if this is just memory problem.
Here is current free taken from restored CC15:

root@OpenWrt:~# free
             total         used         free       shared      buffers
Mem:         28580        27520         1060         2376         2148
-/+ buffers:              25372         3208
Swap:            0            0            0

I dont have free taken form LEDE but I would say it looks quite similar.

Too bad - I was so happy to move forward from CC15 as for long time there was no update or any maintenance release.
In this case either I will try to wait for next LEDE release (maybe it will bring some improvement) or will try to customize CC15 by myself.

ps. great support here on this forum :slight_smile:

It won't improve things.
It is not realistic to assume that the RAM consumption of the kernel and related core apps would go down. It is more likely that it grows marginally.

Note that even with CC15.05 branch you only have 1 MB free RAM (or 3 MB when counting also buffers). That is not much. Just sorting a 2 MB list may be too much. Or two apps randomly peaking at their memory consumption at the same time...

Probably you are right but might be that in future there will appear something like "LEDE light edition" or something like dedicated for 32 MB RAM devices.

Additionally, under this link https://lede-project.org/supported_devices I can information: Sufficient RAM for stable operation: 32MB min, 64MB better

... to the right of that text on the same page: a big warning box about 4 or 32:

Devices with ≤4MB flash and/or ≤32MB ram suffer from limitations in extensibility and stability of operation.

It all depends on your use case. It is possible to use a device with 32 MB RAM if you do not use memory-hungry apps. E.g. sorting a 2 MB list when there is 1 MB free RAM sounds like trouble.

But yes, you can hope. But do not put your expectations too high. So far I have seen zero interest toward "LEDE light".