Wallys DR9574 (IPQ9574 + 3x QCN9274 hw2.0) mainline OpenWrt port — boot works, WiFi blocked by firmware RDDM after WMI INIT

Hi all,

I'm working on a mainline OpenWrt port for the Wallys DR9574 — an IPQ9574
tri-radio AP with three QCN9274 hw2.0 M.2 cards. Posting to share what I've
found so far and ask for help on the remaining WiFi blocker.


Hardware

Component Detail
Board Wallys DR9574 (Qualcomm AP-AL02-C4 reference design)
SoC IPQ9574
WiFi 3x QCN9274 hw2.0 M.2 (DR9274-2GK / DR9274-5GK / DR9274-6GK) — single-band each
LAN 4x 1 GbE (QCA8075 via MDIO) + 2x 2.5G/10G USXGMII (not yet enabled)
RAM 1 GB DDR

Status: what works on mainline OpenWrt (kernel 6.12.89)

  • Boots: mainline OpenWrt via UART tftpboot (RAM-only, NAND untouched)
  • Ethernet: all 4x 1GbE LAN ports working (QCA8075 PHYs via EDMA PPE)
  • WiFi init: all 3 QCN9274 cards detected, firmware loads, HTC/WMI
    service negotiation completes cleanly... up to a point (see below)

What it took to get here (two patches required)

Patch 1: single-mac cards must not request WMI MAC1

Without this: HTC Service WMI MAC1 connect response: status: 0x1 (NOT_FOUND)

ath12k_hw_params.max_radios = 2 for QCN9274 hw2.0 tells ath12k_htc_init()
to open two WMI endpoints. These cards have single-mac firmware (no dualmac OTP
bit), so the firmware doesn't advertise MAC1. Fix: cap wmi_ep_count at 1 when
the OTP dualmac bit is clear.

Patch 2: DMA ring capability TLV parser must tolerate unknown module IDs

Without this: "Invalid module id 3" -> "failed to parse tlv -22" -> WMI ready timeout

WLAN.WBE.1.6 sends a DMA ring capability entry for module_id 3 (and possibly
others) that the host-side enum doesn't know. Current ath12k returns -EINVAL,
which breaks the entire TLV parse. Fix: warn-and-skip unknown module_ids.

Both patches confirmed working on this hardware. HTC services (Control, HTT,
WMI) all connect with status 0 and SVC_READY_EXT parses cleanly.


Current blocker: firmware RDDM ~300 ms after WMI_INIT_CMD

With both patches and CONFIG_ATH12K_DEBUG=y + debug_mask=0xffffff:

[19.485] WMI connect response: status: 0, assigned ep: 2 <- clean
[19.486] num hw modes 1 preferred_hw_mode 0
[19.486] slot 0002: supported_bands 2 freq 5 GHz [4890-7125] <- 5/6 GHz card
[19.486] slot 0001: supported_bands 1 freq 2 GHz [2401-2495] <- 2.4 GHz card
[20.408] htc ep 2 consumed 1 credits (total 1) <- WMI_INIT_CMD sent
[20.724] mhi notify status reason MHI_CB_EE_RDDM <- 316 ms later: firmware crash
[20.883] mhi notify status reason MHI_CB_EE_RDDM
[24.517] failed to receive wmi unified ready event: -110

MHI_CB_EE_RDDM = firmware entered RAM Dump Debug Mode (fatal crash). The
crash fires on all three cards within 300-500 ms of the host sending
WMI_INIT_CMD. Tried multiple firmware versions and host-side patches targeting
the post-INIT path — RDDM fires before any of that code runs, so it's something
in how the firmware processes our INIT payload.

Firmware version: WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1 (from
linux-firmware 20260410).


What I've ruled out

  • HTC transport problems: all services connect status=0
  • Board data / calibration: using the board-2.bin extracted from Wallys's
    own mainline-driver reference image — exact match for this hardware
  • Driver-side NULL ptr and DMA ring timing issues: tested, RDDM fires
    upstream of those code paths

Questions

  1. Has anyone successfully run QCN9274 hw2.0 cards on IPQ9574 with current
    mainline ath12k? What firmware version are you using?

  2. Does the RDDM-after-WMI-INIT pattern ring a bell with anyone who has
    debugged QCN9274 on other platforms?

  3. Is there a way to extract the RDDM crash dump from a QCN9274 card via
    debugfs or PCIe after this failure?

  4. Has anyone tried older WBE firmware (1.4.x or 1.3.x) with current
    ath-next on these cards?


Upstream plans

The DTS and these two patches are in good shape for upstream submission once
WiFi is working. Happy to coordinate with anyone else working on IPQ9574 or
QCN9274 mainlining.

Full verbose dmesg (850 lines) available on request.

Thanks,
Stefano

Update — the WiFi blocker is resolved; sharing where this landed.

Big update since the first post. The "firmware RDDM after WMI_INIT" blocker is gone, and the board now runs a 3-link Wi-Fi 7 AP-MLD on fully mainline OpenWrt + ath12k.

WiFi: working

No ath12k driver backport was needed. OpenWrt's mac80211 (backports 6.18.26) already carries the WSI multi-chip / single-wiphy MLO support and the AP-MLD path. The whole MLO software stack is in place on current snapshot.

The three single-band QCN9274 hw2.0 cards come up as one grouped wiphy (WSI multi-chip), and hostapd brings up a single AP-MLD with 3 links (6 + 5 + 2.4 GHz), WPA3-SAE, all in AP-ENABLED / beaconing state.

The config trick that matters: one wifi-iface with list device 'radio1' / 'radio2' / 'radio3' + option mlo '1' — NOT three separate ifaces.

Caveat for honesty: I've confirmed a Wi-Fi 6 client associating (single link). I have not yet tested a true multi-link Wi-Fi 7 STA, so the multi-link data path is unverified — but the AP-MLD itself is up and beaconing all three links.

About the RDDM "blocker"

It turned out not to be reproducible in my current tree — same WBE.1.6 firmware. I can no longer trigger the MHI_CB_EE_RDDM crash, and I haven't yet isolated which local ath12k change made it go away (it is NOT CMA size and NOT the WSI grouping — both A/B-tested and ruled out). The kernel.org bug (#221550) is on hold pending a clean isolation rather than a guess.

The two host-side patches still apply

cap wmi_ep_count at 1 for single-mac QCN9274 hw2.0 (OTP dualmac bit clear)

warn-and-skip unknown WMI DMA-ring module_ids instead of failing the whole TLV parse

Plus a generic debugfs fix (the grouped single-wiphy model has every radio try to create the same ath12k debugfs symlink → "already present"). Preparing these for ath12k upstream.

Current frontier: the 10 GbE port

Now bringing up the 2× 10G Aquantia (AQR113C) USXGMII ports. The IPQ9574 XPCS and the AQR are each locally "up" (XPCS in XPCS mode + calibrated + USXG/AN enabled; AQR copper linked, USXGMII in-band AN enabled) but the XPCS receives zero in-band from the AQR (XPCS_MII_AN_INTR_STS stays 0) — looks like a SerDes/UNIPHY RX-interop / bootloader-init gap on a non-RDP board. Taking the precise register diagnosis to netdev / the qcom_ppe + PCS maintainers.

Happy to share the MLD UCI config or verbose logs if anyone else is on IPQ9574 + QCN9274.

Update — the 10G USXGMII port is working, and the earlier "XPCS gets zero in-band AN" diagnosis was a red herring. The real root cause was a clock-rate bug.

Quick recap of where I left it: on a cold boot the AQR113C copper links at 1 Gbps, lan5: Link is Up - 1Gbps/Full, carrier=1 — but the RX counter stayed 0 forever. TX worked. If I let U-Boot do its own ethernet bring-up first (a "warm" hand-off), Linux RX then worked, which made this look like a SerDes/XPCS in-band-AN interop problem.

It isn't. Here's what it actually was.

Ruling out the XPCS. I dumped the full XPCS/PCS register set at link_up in both the working (warm, RX flowing) and broken (cold, RX=0) states. All of them — mode, calibration, KR_STS block-lock (0x100d in both), DIG_CTRL, MII_CTRL, MII_AN_INTR_STS — were byte-for-byte identical. The XPCS is configured the same whether RX works or not, so USXG_AN_LINK_STS / the C37 AN-complete bit were never the gate (RX even flows with USXG_AN_LINK_STS = 0).

The actual difference was one register, in the NSS clock controller, on the per-port RX/TX clock source (nss_cc_port5_rx_clk_src / ..._tx_clk_src CFG):

Code

Same parent (P_UNIPHY1_NSS_RX_CLK), different divider: cold hid_div=1 → 312.5 MHz, warm hid_div=4 → 125 MHz.

Root cause. USXGMII rate-adapts a fixed 10G SerDes to the negotiated speed, so the per-port GMII-side RX/TX clock that feeds the EDMA must follow the negotiated link speed, not the 10G interface base rate. On a 1 Gbps link it needs 125 MHz. pcs-qcom-ipq9574 pins it at 312.5 MHz for every USXGMII speed — ipq_pcs_clk_rate_get() keys off the interface mode, never the speed. With the port RX clock at the 10G rate on a 1G link, the EDMA never clocks in received frames → RX=0. U-Boot programs 125 MHz for the 1G link, and since Linux never reprograms that per-port clock, a warm hand-off silently inherited the correct value — the whole reason "warm worked".

Fix (in ipq_pcs_link_up(), where the negotiated speed is known):

C

The branch (nss_cc_uniphy_portN_rx_clk) and divider (nss_cc_portN_rx_div_clk_src) both carry CLK_SET_RATE_PARENT, so this propagates to the RCG; the 125 MHz freq entry is C(P_UNIPHY1_NSS_RX_CLK, 2.5) — exactly the value U-Boot leaves.

Result: cold NAND boot now brings up lan5 with real RX traffic and a DHCP lease, alongside the 3-link Wi-Fi 7 AP-MLD — no U-Boot dependency, no env hacks. Reproduced across multiple cold boots.

This looks like a generic mainline bug rather than anything board-specific: any IPQ95xx (and probably IPQ53xx) USXGMII port to an AQR-class PHY that negotiates below 2.5 G should hit it. I'll send the one-liner to the pcs-qcom-ipq9574 / qcom_ppe maintainers. Happy to share the full register diff or the cold-vs-warm capture method if useful to anyone else on IPQ95xx + Aquantia USXGMII

Hi! It just happens I'm working on a Compex AP.AL.02.3 that seems to be based on the same reference design and I'm hitting similar issues, specifically having RX=0 on the lan1 port (but not even on warm boot). Somehow the traffic is not getting through, but I'm having trouble pinpointing the reason (mostly due to lack of experience with the lowlevel bits of network hardware). I tried looking at both 6.12 and 6.18 kernels and both have different issues (6.12 reports the link up correctly, but no traffic gets through. 6.18 seems to have switched to in-band link negotiation and does not report the link as up -- possibly quite correctly). Would mind sharing the work in progress port for me to look at and possibly use some bits for the Compex board?

Hi @Krakonos — welcome, and yes, the AP.AL.02.3 is the same AP-AL02 reference design, so most of this should carry over. The WIP port is public here:

https://codeberg.org/insalata-fresca/openwrt-dr9574 — device tree (dts/), the ethernet patches (patches/ethernet/), the ath12k/ath11k WiFi patches, and notes in docs/. Grab whatever's useful.

On your lan1 RX=0: on these AP-AL02 boards there are actually two different RX=0 failure modes, depending on what the port is physically wired to, with completely different fixes. Worth pinning down which one you've got, because they look identical from ifconfig (link up, TX ok, RX=0).

1) If lan1 is an Aquantia / USXGMII port (2.5G/10G PHY) negotiating below 2.5G (e.g. a 1G link):
This is the generic mainline clock-rate bug from my post above. pcs-qcom-ipq9574 pins the per-port GMII-side RX/TX clock (NSSCC nss_cc_portN_rx/tx_clk_src) at 312.5 MHz for every USXGMII speed — ipq_pcs_clk_rate_get() keys off the interface mode, not the negotiated speed. A 1G link needs 125 MHz; with the port clock at the 10G rate the EDMA never clocks in received frames → RX=0. Fix is a few lines in ipq_pcs_link_up() (where the speed is known):
https://codeberg.org/insalata-fresca/openwrt-dr9574/src/branch/main/patches/ethernet/0004-pcs-ipq9574-usxgmii-per-port-clock-rate-by-speed.patch
Quick confirm: dump the nss_cc_portN_rx_clk_src CFG and look at hid_divhid_div=1 (312.5 MHz) on a 1G link is the smoking gun; it should be hid_div=4 (125 MHz).

2) If lan1 is a QCA8075/QCA8084 1-GbE PHY (the 4×1G block, qsgmii/psgmii):
This is the harder one and I'm still on it — heads-up so you don't lose days. On mainline the QCA8075 PHY driver brings the copper link up (so 6.12 reports "link up"), but the QCA8084/switch internal uplink datapath (GPHY-MAC → uplink SerDes → SoC) is never programmed, so nothing is forwarded → RX=0, and warm boot doesn't help (unlike the Aquantia case). I've confirmed it's mode-independent: both phy-mode = "qsgmii" and "psgmii" give RX=0. The factory QSDK drives this exact silicon in PSGMII (switch_mac_mode = 0x00) and it works — but via the full closed qca-ssdk switch init + the NSS-dp datapath, none of which is in mainline. The mainline qcom,qca8084-package path (phy-mode 10g-qxgmii, UQXGMII) does try to program the chip, but on a cold boot it dies on a clk reparent -EBUSY (the NSSCC per-port MAC RX clock can't reparent onto the QCA8084 uniphy SerDes clock because that recovered clock isn't toggling yet). So neither mainline path passes traffic today. I'm currently diffing the register state of a known-good vendor-QSDK boot against the broken mainline state to find the missing bring-up step — will post when I crack it.

On your 6.18 note ("switched to in-band link negotiation, link not reported up"): on this SerDes the USXGMII/PCS in-band AN never converges. What worked for the 10G ports was dropping managed = "in-band-status" and running phy-managed (link comes up off the PHY's own status, not PCS in-band AN). If your 6.18 path forces in-band, that's likely why it never reports up — try phy-managed.

General technique that's been gold here: flash the vendor (QSDK) image once and dump the working register state (NSSCC clock CFGs, uniphy/XPCS, switch regs), then diff against mainline at the same point. Every ethernet bug on this board so far has been a single register / clock-divider difference, not a logic bug — the cold-vs-warm and mainline-vs-QSDK register diffs find them fast.

Status for the AP-AL02 family on current mainline: 2× Aquantia 10G USXGMII = solved (patches in the repo, prepping upstream); 3-link Wi-Fi 7 AP-MLD = working; 4× QCA8075/8084 1G = link-up but no traffic, still WIP (the switch-datapath gap above). Happy to compare DTS / notes — we're clearly on the same silicon.