Adding OpenWrt support for Xiaomi AX3600 (Part 1)

This was discussed many times. Do the sysupgrade twice (one after another), and it will work.

3 Likes

I still don't get what is bootloader doing after the flag has changed as for whatever reason on AX9000 it appears to work fine now.

Lets just say that I am not a fan of whatever Xiaomi is doing

1 Like

I think this might be the same thing you mentioned about the clocks: AX6/AX3600 might work because they use much older bootloader which does something differently. This might be the case for the boot flags. For me, it is the same on 5.10 and on 5.15 as well: it needs two sysupgrade, one after another, this is how it works.

I am going to get an AX capable phone hopefully monday, so I can do more AX tests with a bit of ease.

2 Likes

Hmm, this would mean that the flags don't actually change the rootfs it boots, otherwise, you wouldn't need to flash the second time.

It hasn't been as clean in my experience: not always two sysupgrades needed, but as low as one or as high as five or six. Seemingly random.

It seems that sysupgrade fails get to the point of overwriting flash, rather than a problem with the A/B partitions and bootflags. See @kirdes post above

I've had a 100% first-time success with sysupgrade since adding a prefix of "wifi down; killall -9 wpad".

1 Like

I updated today to 5.15.19.

In comparison with 5.10, I noticed the following:

  • Memory usage is lower (about 65% instead of 73%)
  • Download speed is significantly slower (now: 450 Mbps, before about double).
  • Upload speed is identically (about 50Mbps)
  • No wifi connection issues

@dchard: You are using 5.15 for a long time, do you have some improvements to speed up the download?

I'm using @Edrikks startup script and will add @bitthief CPU performance boost to check, if it helps to get an better download speed.

If i missed something else, please let me know.

1 Like

This would be really, like really weird that it only hits AX3600 cause sysupgrade will first send SIGTERM to hostapd and then if that fails it should kill it and only then move on to flashing.

Will boot the AX3600 up and try to see whats going on

1 Like

I've been wondering for a while whether if the whole set of Xiaomi dual-boot devices need to be added to luci-app-advanced-reboot.

As for needing to flash twice for it to stick... I don't know. You may want to emit some debug info in the platform upgrade scripts to see what in the world is happening.

OK, turns only 'wifi down' is necessary. It's actually hostapd that shows as a process on this rather than wpad so that killall would have found nothing. I went for the nuclear option without checking details. It does work though!

Uneducated speculation, but perhaps it's in the timing: maybe wifi/apd takes longer to stop than sysupgrade effectively allows for. At one point I was adding 'sleep 3' into the process.

@robimarko, @namidairo
And why did you only decide to imitate xiaomi Dual Boot...
It didn't need to be done at all.
Dual Boot is not only about changing partitions when flashing.
Dual Boot must also be implemented in the kernel, so that if an error occurs during the startup process, the kernel itself switches the active partition.

But why not?
Its not like space is lacking, and not it does not need to be implemented in the kernel its implemented in the bootloader

1 Like

You are probably not reading my posts. I told many times, that on 5.15 with SW offload and packet steering enabled,I can get gigabit speeds on PPPoE, and that also means pure IP WAN should also get similar if not better results. I also shared the IRQ optimizations I use: Adding OpenWrt support for Xiaomi AX3600 - #5790 by dchard

1 Like

On my end, it always takes two tries. Never worked with one, never needed more than two. And I am upgrading this device for a long time, more or less on a weekly basis.

1 Like

In /lib/upgrade/stage2 is the following defined:

kill_remaining() { # [ <signal> [ <loop> ] ]
	local loop_limit=10
       local sig="${1:-TERM}"

.......
......
v "Sending signal $sig to $name ($pid)"
			kill -$sig $pid 2>/dev/null

			[ $loop -eq 1 ] && run=true
		done

let loop_limit--
		[ $loop_limit -eq 0 ] && {
			v "Failed to kill all processes."
			exit 1
		}

That means, if hostapd/wpad doesn't respond to the term signal 10 times, sysupgrade just exits. There is no forced kill. And that is what I have seen on my QNAP as well.
Sysupgrade just does a reboot, but no upgrade.

Maybe it's related to the hostapd+ath11k combination.
After a fresh restart sysupgrade is working most of the time.

2 Likes

Thank you, i will add this to the local startup and give it a try.

Unfortunately, nothing changed. The download speed remains at 450Mbps.
I would appreciate any further comments or help.

@dchard: I am sorry that I do not read all the posts carefully. But thanks for your help!

Switching the active partition after an error occurred during boot is implemented in the kernel!
The xiaomi bootloader simply looks at the contents of the flag_try_sysX_failed parameters and decides which partition to boot from.
Look the disasm from the kernel:

int sub_C047B834()
{
  char *_flag_boot_rootfs; // r0
  int flag_try_sysX_failed; // r4
  int ret; // r4
  const char *msg; // r0

  if ( (unsigned int)sub_C047AE48() <= 2 )
  {
    _flag_boot_rootfs = j_nvram_get(0, "flag_boot_rootfs");
    if ( !_flag_boot_rootfs || cmpstr(_flag_boot_rootfs, "1") )
      flag_try_sysX_failed = j_nvram_set(0, "flag_try_sys1_failed", "1");
    else
      flag_try_sysX_failed = j_nvram_set(0, "flag_try_sys2_failed", "1");
    ret = j_nvram_set(0, "flag_ota_reboot", "0") + flag_try_sysX_failed;
    if ( ret + j_nvram_commit(0) )
      LOWORD(msg) = 0xF78;
    else
      LOWORD(msg) = 0xF98;
    HIWORD(msg) = 0xC076;
    printk(msg);
  }
  return 0;
}

@robimarko do you think this would work?

If ath11k frees all its memory, thus leaving no leak behind, it must be doing that after being requested to unload. So the idea is basically to put some printks into the deinit procedure printing current memory usage and use that to find out what that big chunk of memory ath11k hogged was actually used for.

https://elixir.bootlin.com/linux/v5.15.22/source/mm/page_alloc.c#L5775

Ahh, that's gotta be the dumbest implementation I have seen.
I thought that they implemented it in U-boot like everybody else, heck U-boot already has the generic implementation.

In that case it really makes no sense to have dual rootfs at all

1 Like

So I myself was surprised when, for new devices from xiaomi, they began to change the active partition during flashing.
For all devices based on MT7621, this was not implemented into OpenWrt firmware, just because of the need to edit the kernel.

And Xiaomi came up with this idea from the very first of its routers. And applies this technology unchanged in the newest routers.

1 Like

Ok, so I was converting AX9000 to fixed partitions to merge all of the rootfs+overlay parts but OF parts parser isn't working at all.
Like WTF?

&qpic_nand {
	status = "okay";

	nand@0 {
		reg = <0>;
		nand-ecc-strength = <4>;
		nand-ecc-step-size = <512>;
		nand-bus-width = <8>;

		partitions {
			compatible = "fixed-partitions";
			#address-cells = <1>;
			#size-cells = <1>;

			partition@0 {
				label = "0:sbl1";
				reg = <0x0 0x100000>;
				read-only;
			};

			partition@100000 {
				label = "0:mibib";
				reg = <0x100000 0x100000>;
				read-only;
			};

			partition@200000 {
				label = "0:bootconfig";
				reg = <0x200000 0x80000>;
				read-only;
			};

			partition@280000 {
				label = "0:bootconfig1";
				reg = <0x280000 0x80000>;
				read-only;
			};

			partition@300000 {
				label = "0:qsee";
				reg = <0x300000 0x300000>;
				read-only;
			};

			partition@600000 {
				label = "0:qsee_1";
				reg = <0x600000 0x300000>;
				read-only;
			};

			partition@900000 {
				label = "0:devcfg";
				reg = <0x900000 0x80000>;
				read-only;
			};

			partition@980000 {
				label = "0:devcfg_1";
				reg = <0x980000 0x80000>;
				read-only;
			};

			partition@a00000 {
				label = "0:apdp";
				reg = <0xa00000 0x80000>;
				read-only;
			};

			partition@a80000 {
				label = "0:apdp_1";
				reg = <0xa80000 0x80000>;
				read-only;
			};

			partition@b00000 {
				label = "0:rpm";
				reg = <0xb00000 0x80000>;
				read-only;
			};

			partition@b80000 {
				label = "0:rpm_1";
				reg = <0xb80000 0x80000>;
				read-only;
			};

			partition@c00000 {
				label = "0:cdt";
				reg = <0xc00000 0x80000>;
				read-only;
			};

			partition@c80000 {
				label = "0:cdt_1";
				reg = <0xc80000 0x80000>;
				read-only;
			};

			partition@d00000 {
				label = "0:appsblenv";
				reg = <0xd00000 0x80000>;
			};

			partition@d80000 {
				label = "0:appsbl_1";
				reg = <0xd80000 0x100000>;
				read-only;
			};

			partition@e80000 {
				label = "0:appsbl";
				reg = <0xe80000 0x100000>;
				read-only;
			};

			partition@f80000 {
				label = "0:art";
				reg = <0xf80000 0x80000>;
				read-only;
			};

			partition@1000000 {
				label = "bdata";
				reg = <0x1000000 0x80000>;
				read-only;
			};

			partition@1080000 {
				label = "crash";
				reg = <0x1080000 0x80000>;
			};

			partition@1100000 {
				label = "crash_syslog";
				reg = <0x1100000 0x80000>;
			};

			partition@1180000 {
				label = "ubi";
				reg = <0x1180000 0xee80000>;
			};
		};
	};
};

It will still fallback to SMEM and if not compile SMEM no partitions are populated.
For whatever reason its seeing the number of partitions as 0.

1 Like