Acer Predator W6 with OpenWrt

We won't be able to fix the strapping pins (if it's actually that) but we sure could fix the null pointer dereference... I am happy to test - after all, I can reproduce it quite reliably :stuck_out_tongue:

Wow, you've found the bad apple.

I am still trying to understand the relevant mt76 code. We're getting a bit off topic, but do you agree that the (null pointer) bug can be found in file tx.c between lines 239 and 296 or in the functions (with no exported symbol – which would be seen in the logs) called in these lines?

__mt76_tx_complete_skb (tx.c line 239) is called by
mt76_txq_schedule_pending_wcid (tx.c lines 600/633) called by
mt76_txq_schedule_pending (tx.c lines 644/646/666) called by
mt76_txq_schedule_all (tx.c lines 683/688) called by
mt7915_mac_restart (mt7915/mac.c lines 1308/1341/1343) called by
mt7915_mac_full_reset (mt7915/mac.c lines 1433/1456) called by
mt7915_mac_reset_work (mt7915/mac.c lines 1486/1508).

I doubt source code line numbers can be printed in the debug logs but we see the binary offset in the program counter.

It would be difficult to hunt down the bug without a disassembler and without a runtime debugging environment, but if you are happy to test, maybe you want to add debug infos to the code of __mt76_tx_complete_skb?

I am not so sure about __mt76_tx_complete_skb. Initially I thought so too but the offset doesn't make any sense. I can disassemble the kernel module just fine in gdb. The interleaving with source code (disas/m/disas/s) seems a bit broken - at least where it actually counts - but that simply might be because it's in an inline function from list.h.

The interesting part is that the address of __mt76_tx_complete_skb+0x56c is actually outside of that function, which ends at +632 (NB: gdb prints it as decimal, that's 0x278).

p/x __mt76_tx_complete_skb+0x56c
0x8c6c

disas/s 0x8c6c then gives us...

568	static inline void list_splice_init(struct list_head *list,
569					    struct list_head *head)
570	{
571		if (!list_empty(list)) {
   0x0000000000008c5c <+124>:	ldr	x3, [sp, #104]
   0x0000000000008c60 <+128>:	cmp	x3, x0
   0x0000000000008c64 <+132>:	b.eq	0x8c88 <mt76_txq_schedule_pending+168>  // b.none

572			__list_splice(list, head, head->next);
   0x0000000000008c68 <+136>:	ldp	x2, x1, [x22, #48]

530		first->prev = prev;
   0x0000000000008c6c <+140>:	str	x28, [x2, #8]

When I compare this to the output of disas/m 0x8c6c....

disas/m 0x8c6c
Dump of assembler code for function mt76_txq_schedule_pending:
646	
   0x0000000000008be0 <+0>:	stp	x29, x30, [sp, #-128]!
   0x0000000000008be4 <+4>:	mov	x29, sp
   0x0000000000008be8 <+8>:	stp	x21, x22, [sp, #32]
   0x0000000000008bec <+12>:	mov	x22, x0
   0x0000000000008bf4 <+20>:	stp	x27, x28, [sp, #80]

647			spin_unlock(&phy->tx_lock);
   0x0000000000008bfc <+28>:	add	x28, sp, #0x70
   0x0000000000008c00 <+32>:	stp	x0, x28, [sp, #104]
   0x0000000000008c08 <+40>:	str	x28, [sp, #120]

648			ret = mt76_txq_schedule_pending_wcid(phy, wcid);
649			spin_lock(&phy->tx_lock);
   0x0000000000008bf0 <+16>:	add	x0, x0, #0x30
   0x0000000000008bf8 <+24>:	mov	x1, x0

650	
651			if (ret) {
652				if (list_empty(&wcid->tx_list))
653					list_add_tail(&wcid->tx_list, &phy->tx_list);
654				break;
655			}
656		}
657		spin_unlock(&phy->tx_lock);
   0x0000000000008c94 <+180>:	cmp	x0, x28
   0x0000000000008c98 <+184>:	b.eq	0x8d40 <mt76_txq_schedule_pending+352>  // b.none

one would guess that list_add_tail() (or list_empty()) somehow call list_splice_init(). However, I don't see how that would be... list_splice_init() is actually called directly in mac.c though. That's all more puzzling than helping.

The above was produced via...

./build_dir/target-aarch64_cortex-a53_musl/openwrt-sdk-24.10-SNAPSHOT-mediatek-filogic_gcc-13.3.0_musl.Linux-x86_64/staging_dir/toolchain-aarch64_cortex-a53_gcc-13.3.0_musl/bin/aarch64-openwrt-linux-gdb ./staging_dir/target-aarch64_cortex-a53_musl/root-mediatek/lib/modules/6.6.73/mt76.ko
[…]
(gdb) dir ./build_dir/target-aarch64_cortex-a53_musl/linux-mediatek_filogic/linux-6.6.73/drivers/net/wireless/mediatek/mt76
(gdb) dir ./build_dir/target-aarch64_cortex-a53_musl/openwrt-sdk-24.10-SNAPSHOT-mediatek-filogic_gcc-13.3.0_musl.Linux-x86_64/build_dir/target-aarch64_cortex-a53_musl/linux-mediatek_filogic/linux-6.6.73
(gdb) set substitute-path ../mt76-2025.02.14~e5fef138 .

on top of current openwrt-24.10 (b7b6ae7424) and a minor ccache patch of mine.

I would like to add dump_stack() and printk() but I don't know how to do that correctly. Is there a practical way to do this somewhat structured (i.e. with git support)? I don't fancy editing inside build_dir a lot because I haven't grasped the openwrt build system.

It's also simply not working AFAICT. That's what I did:

  1. edit build_dir/target-aarch64_cortex-a53_musl/linux-mediatek_filogic/linux-6.6.73/drivers/net/wireless/mediatek/mt76/tx.c
  2. edit build_dir/target-aarch64_cortex-a53_musl/linux-mediatek_filogic/linux-6.6.73/drivers/net/wireless/mediatek/mt76/mt7615/mac.c
  3. edit build_dir/target-aarch64_cortex-a53_musl/openwrt-sdk-24.10-SNAPSHOT-mediatek-filogic_gcc-13.3.0_musl.Linux-x86_64/build_dir/target-aarch64_cortex-a53_musl/linux-mediatek_filogic/linux-6.6.73/include/linux/list.h
  4. run make target/linux/compile
  5. run make target/linux/install
  6. run make

And although I intentionally added illegal code to list.h everything built just fine - and of course showed no signs of behavior change at runtime. So obviously this is not the way... what is?

@JxnLexn
Test this PR. It will allow you to change the state of the LEDs on the 2.5G ports with LUCI. When you test it, write a comment on github.

I managed to break my Predator W6d's serial port before flashing OpenWRT on it. I checked with a scope: no output at all. Not on the holes, no signal on the pads, no signal on the small 1k resistor. Not even switching to 3v3 after turning on the power. Also I checked: there is no short to the ground (leftover solder or something). I'm guessing I must have messed up with a wrong connection to the TX pin somewhere - frying it.
Without serial port, OpenWRT is out of scope now, right? Or is there another way to flash firmware on the W6d?
It seems the RX pin still works - I can type "reboot" on my serial console and the original firmware will reboot. But trying to interrupt the boot process doesn't seem to work for some reason: when I just press 00000, the regular firmware starts. When I hold reset and WPS during a start, first two leds on the board will be blue and after about 20 seconds they turn red. I tried typing "reset" (in the hope uboot would restart), but nothing happens.
Does anyone have an idea about what to try?