GL.iNet Flint 3 exploration (GL-BE9300, IPQ5332)

Some time ago I bought Flint 3 and now trying to bring it to vanilla OpenWrt. So far was able to get working (mainline OpenWrt qualcommax/ipq53xx, kernel 6.12.87): boot, flash, UART, USB, GPIO, LEDs, both radios and eth0 (LAN switch path).

However eth1 (WAN) unexpectedly is a harder nut to crack. WAN MAC TX writes egress fine (no TX errors with phy-mode fixed at usxgmii) but with DTS: phy-mode = "usxgmii", pcs-handle = <&pcs1_ch0>, phy-handle = <&wan_phy>, RTL8221B-VB-CG bound by upstream realtek driver port is not coming up. On flint3 side: carrier=0, operstate=down, missing Link is Up event.

Root-cause candidates (from most likely):

  1. Upstream RTL8221B has no USXGMII inband caps. rtl822x_inband_caps() in drivers/net/phy/realtek/realtek_main.c returns LINK_INBAND_DISABLE only for 2500BASEX, DISABLE|ENABLE for SGMII, and 0 for everything else (including USXGMII). Pairing phy-mode = "usxgmii" with managed = "in-band-status" makes phylink configure the Qcom PCS for inband but the PHY won't drive USXGMII inband signaling β€” link never resolves.
  2. NSSCC RCG mux can't be reparented from XO to PCS. Upstream nsscc-ipq5332 declares port[12]_rx/tx_clk_src parents via parent_data.index into the PCS DT phandle, but upstream PCS driver doesn't expose those as clock provider outputs. Workaround: a nss-uniphy-fake-clk fixed @ 312.5 MHz (the name nss-uniphy-stub-clk triggers a vendor U-Boot fdt fixup that zeroes its rate β€” note that gotcha). The clk_branch2_enable HALT-poll then fails with -EBUSY ("Failed to enable MII 0 RX clock β€” continuing"), but eth0 still links so it's not the blocker for carrier; only for actual data path. The RCG itself stays at SRC_SEL=0 (XO 24 MHz), never gets moved to SRC_SEL=3 (PCS-derived) β€” every write returns "rcg didn't update". Vendor SSDK likely asserts per-clock resets before writing the mux config.
  3. Mainline qcom-ppe + pcs-qcom-ipq9574 are missing IPQ5332-specific tweaks β€” UNIPHY VCO-calib-restart sequence and the MAC rate-change reset cycle. IPQ5332's UNIPHY needs a VCO-calib-restart sequence not in upstream (patched locally). The MAC rate-change reset cycle (assert + 150 ms + deassert + 150 ms on port_mac/rx/tx around clk_set_rate) also isn't in upstream.

Hardware is ok: stock firmware brings WAN up to 2.5G full duplex with the same endpoint (TR3000). So the SerDes/PHY/board path works. The issue is with upstream driver coverage.

  • Anyone running RTL8221B-VB-CG on USXGMII in mainline 6.12+ with phylink in-band-status?
  • Anyone got the IPQ5332 NSSCC port RCG to reparent from XO to PCS without vendor's qca-ssdk?

Shouldn't the target be qualcommbe?

You should probably aim for 6.18.

Update β€” moved to qualcommbe target, concrete NSSCC register diff vs vendor.

Patches and subtarget for qualcommbe/ipq53xx are in tree (commits 5fffc20 + 6bfdb2b); kernel 6.12.87 + mac80211-backports v6.18.26. With our 21 IPQ5332-specific patches, qcom-pcie comes up Gen.3 x2 and the QCN9274 endpoint enumerates cleanly; eth0 (LAN) and eth1 (WAN, RTL8221B USXGMII) both reach Link is Up - 2.5Gbps/Full. But no packet flow β€” TX errors=0 dropped=1 RX=0 on eth0; ping doesn't even ARP from the device.

SSH'd into vendor (QSDK 5.4.213, working hardware) to compare quiescent register state. NSSCC base 0x39b00000:

mainline qcommbe vendor (working)
0x4b4 port1_rx_clk_src CMD 0x1 0x1
0x4b8 port1_rx_clk_src CFG 0x1 0x1 (SRC_SEL=0=XO, same)
0x4bc port1_rx_div_clk_src 0x80000000 0x1 ← mainline bit31 set, vendor div=1
0x4c0 port1_tx_clk_src CMD 0x1 0x1
0x57c uniphy_port1_rx_clk branch 0x80000010 0x0 ← mainline branch status bits
0x580 uniphy_port1_tx_clk branch 0x1 0x501 ← vendor bits 8+10 set

Two surprises that may help others:

  1. The RCG sitting at SRC_SEL=0=XO is the vendor's normal working state, not a bug. Every "nss_cc_port1_*_clk_src: rcg didn't update its configuration" warning we (and others) see in clk_set_rate is a red herring β€” vendor sits in the same XO-parented state and has working 2.5G traffic.
  2. The actual difference is in the uniphy_port1_tx_clk branch enable register 0x580: vendor 0x501 (bit 0 enabled + bits 8 + 10), mainline 0x1 (bit 0 enabled only). Bits 8 and 10 in upstream nsscc-ipq5332.c aren't documented as control bits β€” they read as HW status flags indicating the clock domain is actually alive (serdes feeding it). Mainline's 0x0 mask there says the domain isn't alive, which is consistent with RX=0. PCS init looks similar otherwise β€” PCS_MODE_CTRL=0x8020 matches between vendor and mainline.

So the real question for this thread is: what gates bits 8/10 in nss_cc_uniphy_port1_tx_clk (NSSCC offset 0x580) going live? Vendor qca-ssdk does some PPE/UNIPHY init step that turns on the serdes clock domain feeding that branch, and we haven't found the upstream equivalent. Anyone who's brought up qcom-ppe + RTL8372N or any IPQ5332 / qualcommbe board past Link is Up?

ath12k QCN9274 over PCIe wedges CPU on IPQ5332 host during MHI bringup

Same board. PCIe Gen.3 x2 link comes up cleanly (with v6.16 phy-qcom-uniphy-pcie-28lp backport), QCN9274 endpoint 17cb:1109 enumerated. ath12k binds the device, then probe spins one CPU forever β€” rest of system stays alive but no usable shell. Removing QCN9274 from ath12k_pci_id_table is the workaround. Anyone else got QCN9274 ath12k probe to complete on an IPQ5332 host?

IPQ5332 NSSCC port-clock chain: uniphy{0,1}_gcc_{tx,rx}_clk are missing from mainline gcc-ipq5332.c; real gap is the branch CBCR

Continued debugging from post #3. Added kernel-side pr_warn register snapshots inside my qcom_nsscc_ipq5332_force_port_pcs helper (NSSCC mux helper called from ppe_port_mac_link_up via patches 0551/0552) so I could compare HW state against vendor running firmware. Boot trace on real hardware:

[ 5.930] nsscc-ipq5332: port1 RX PRE  CMD@0x4b4=0x1 CFG@0x4b8=0x1 BR@0x57c=0x80000010
[ 5.936] nsscc-ipq5332: port1 RX POST CMD@0x4b4=0x1 CFG@0x4b8=0x1 BR@0x57c=0x80000010 UPDATE-STUCK
[ 5.943] nsscc-ipq5332: port1 TX PRE  CMD@0x4c0=0x1 CFG@0x4c4=0x0 BR@0x580=0x1
[ 5.953] nsscc-ipq5332: port1 TX POST CMD@0x4c0=0x1 CFG@0x4c4=0x0 BR@0x580=0x1 UPDATE-STUCK
[ 5.959] qcom_ppe 3a000000.ethernet eth0: Link is Up - 2.5Gbps/Full

Two findings

1. CFG_RCGR writes from the kernel after clk_set_rate() are silently rejected because the UPDATE bit in CMD_RCGR is sticky once a prior update_config poll has timed out. So force_port_pcs writing regmap_write(0x4c4, 0x401) (SRC_SEL=4=UNIPHY_TX, DIV=1) does nothing β€” PRE and POST CFG are identical. 0551 force_port_pcs is a no-op in this state.

2. Vendor's running-state CFG_RCGR is also at the boot default (0x1 / 0x0), so vendor isn't relying on the RCG mux either. The actual difference between vendor-working and mainline-not-working is in the branch CBCR:

port1 TX branch CBCR @ 0x39b00580:
  mainline: 0x1   (only ENABLE_REQ; CLK_OFF=0; no upstream-domain bits)
  vendor:   0x501 (ENABLE_REQ + bits 8 + 10 = upstream-domain-alive)

Bits 8 and 10 are HW-set when the upstream SerDes-derived clock is actually feeding the branch. They never light up in mainline, which is why we get nss_cc_uniphy_port1_rx_clk status stuck at 'off' from clk_branch_toggle and Failed to enable MII 0 RX clock (-16) from ipq_pcs_enable on every link-up.

Where vendor's chain comes from

Vendor's clk_summary on running firmware (eth0 LAN @ 2.5G via RTL8372N, eth1 WAN @ 1G via RTL8221B):

uniphy0_gcc_tx_clk                       1 1 0 312500000
   nss_cc_port1_tx_clk_src               1 1 0 312500000
      nss_cc_port1_tx_div_clk_src        2 2 0 312500000
         nss_cc_uniphy_port1_tx_clk      1 1 0 312500000
         nss_cc_port1_tx_clk             1 1 0 312500000
uniphy0_gcc_rx_clk                       1 1 0 312500000
   nss_cc_port1_rx_clk_src               1 1 0 312500000
   ...
uniphy1_gcc_tx_clk                       1 1 0 125000000
   nss_cc_port2_tx_clk_src               1 1 0 125000000
   ...

The parent roots uniphy{0,1}_gcc_{tx,rx}_clk do not exist in mainline drivers/clk/qcom/gcc-ipq5332.c. They're defined in qca-ssdk src/init/ssdk_clk.c as software-only clk_hw entries (only recalc_rate / determine_rate / a set_rate that just stores the value). Pure framework bookkeeping β€” no GCC register writes.

Then ssdk_clk.c has a CLK_LOOKUP table that does the real HW writes per speed/mode, e.g.:

/* SGMII+ 2.5G (port1 TX) */
CLK_LOOKUP(0x4C4, 0x401, 0x4C8, 0, 0x580, UNIPHY0_PORT1_TX_CLK, EN_BIT,
           SGMII_PLUS_SPEED_2500M_CLK, SGMII_PLUS_SPEED_2500M_CLK, A_FALSE);
            ^CFG  ^val   ^DIV     0   ^CBCR  ^enable

So vendor does write CFG=0x401, but conditionally β€” and the alternate SSDK_RAW_CLOCK path skips the CFG write and uses raw HW pokes elsewhere. Which path runs depends on SSDK_RAW_CLOCK build-time selection and other gates. On the firmware I dumped, CFG ends up at the boot default (path that skipped CFG write was taken) β€” yet bits 8 + 10 in the branch CBCR light up anyway, so the raw-clock path's pokes are what arms them. Still tracking down exactly which register.

Open questions for IPQ5332 upstream

  1. Should mainline gcc-ipq5332.c gain uniphy{0,1}_gcc_{tx,rx}_clk software clocks as framework-parent roots for the NSSCC port_{rx,tx}_clk_src chain? (Vendor uses them; mainline NSSCC's parent_map for these RCGs only has XO + CMN_PLL options + the upstream uniphy clocks coming from PCS.)
  2. Where in qca-ssdk does the "arm bits 8+10 in 0x39b00580" logic live? Best guess is adpt_hppe_uniphy_mode_set (SerDes config) or the SSDK_RAW_CLOCK raw-write path. Pointers welcome.
  3. Mainline pcs-qcom-ipq9574.c's PCS_TX_CLK/PCS_RX_CLK clk_hw ops are minimal (only recalc_rate + determine_rate). On IPQ9574 a separate management entity presumably handles the upstream HW. On IPQ5332 nothing does, so the gap surfaces as a missing route from PCS to NSSCC at HW level.

Anyone else hitting this?

If you're porting other IPQ5332 boards (Xiaomi BE6500 Pro, Aruba AP-series, etc.) and seeing the same nss_cc_uniphy_port*_clk status stuck at 'off' warnings during link-up β€” you're hitting the same wall. Comparing notes welcome.

Work tree (not pushed yet): qualcommbe/ipq53xx subtarget with the reg-dump patch series β€” commit 47c3e12 + diff for 0551 reg dumps. Vendor reg-dump captures at from-ap2/vendor-regs-running-state-20260519.txt and from-ap2/vendor-nsscc/clk_summary.txt.

Please have a look a round the corner

Status update

Significant progress on the Flint 3 / IPQ5332 port today. Summary:

  • A clean-room rtl8372n DSA driver is now in the tree and probes on hardware β€” confirmed model=0x8372 via the MDIO indirect protocol.
  • The post-preinit boot wedge that previously blocked all userspace debug has a workaround: rdinit=/bin/sh on the kernel cmdline drops straight to an interactive shell on the console, bypassing procd-init.
  • The long-standing "port1 RCG UPDATE-STUCK" mystery is finally diagnosed end-to-end. It is not a missing reset, not missing CBCR bits, not missing GCC clocks β€” it's that the line into the SoC's PCS isn't carrying a 2500BASEX signal, so the PCS RX CDR can't recover a clock for NSSCC.
  • The line is supposed to come from the RTL8372N's SerDes 0. Vendor U-Boot leaves SDS0 in SERDES_10GR (mode 0x1A) rather than SERDES_2500BASEX (0x16) β€” that's the root cause.

Where the SerDes work is stuck right now: writing SDS_MODE_SEL alone doesn't relight the SerDes. Vendor SDK does a much bigger init sequence (SDS_MODE_SET_SW in dal_rtl8373_switch.c), and even with that full sequence ported, the SoC PCS still doesn't see a recovered clock. Suspect there's a top-level chip-init patch list (the patch_list[][2] table starting {0xC202, 0xC1}, {0xC203, 0x01}, … at the top of dal_rtl8373_switch.c) that's expected to run first.

What's working in the rtl8372n driver

target/linux/qualcommbe/files-6.12/drivers/net/dsa/rtl8372n.c

  • MDIO indirect protocol against the switch at PHY addr 0x1D (host regs 21-24, busy poll on reg 21 bit 2). Round-trip TPID self-test passes (write-selftest: OK (round-tripped TPID)).
  • Chip-ID detection at internal reg 0x0004 (model=0x8372).
  • Baseline register dump at probe: CPU_TAG_TPID = 0x8899, CPU_TAG_CTRL = 0x500, per-port link=0x098, etc. Useful for catching state-drift on retest.
  • CPU-tag pre-staging: EXT_CPU_CTRL = 3 (port 3 is the external CPU port), CPU_TAG_AWARE = BIT(3). The EN bits in CPU_TAG_CTRL are deliberately left clear β€” turning them on without a matching DSA tag protocol handler in the kernel would break forwarding.
  • SDS_INDACS SerDes-internal indirect access protocol (SDS_INDACS_CMD/RD/WD at 0x3F8/0x3FC/0x400; CMD bit layout TRIGGER|RWOP|REGAD|PAGE|INDEX). Full vendor 2500BASEX init sequence ported on top: an_3p125g_chipb patch list (17 SerDes-internal writes), dig_patch_mac (8 writes), fiber_fc_en, sds_nway_set, and the long PLL/reset poke loop on page 0x20.

All the above ran without error on hardware. The driver is structured so the next iteration can add patch-list arrays without re-deriving the framework.

The actual remaining blocker

Even after the full SerDes init runs, nss_cc_uniphy_port1_rx_clk (CBCR at NSSCC offset 0x57c) stays 0x80000010 β€” CLK_OFF bit 31 set, branch never enables. The PCS driver's clk_prepare_enable(qpcs_mii->rx_clk) returns -EBUSY. Note this is asymmetric β€” the TX-side branch (0x580) enables fine at 312.5 MHz, and PCS calibration completes cleanly (MODE_CTRL=0x820 = SEL=8 (2500BASEX), CALIB=0xac1 = DONE bit set).

So the SoC's PCS PLL is locked and is generating its own TX clock just fine. The asymmetry is consistent with the RX clock being recovered from the line: PCS CDR has nothing to lock onto, so RX side stays gated.

The kernel call chain that produces the update_config: rcg didn't update its configuration WARN is just a downstream artifact of this β€” when mac_link_up calls clk_set_rate on the port1 RCG, the framework picks UNIPHY0_NSS_RX_CLK as the best parent for 312.5M, the RCG mux update needs the new parent's clock to be ticking, it isn't, the UPDATE bit doesn't clear in 20ms, timeout. Fixing the RCG side is fundamentally impossible until the SerDes line is actually carrying 2500BASEX.

What I've verified is NOT the cause

  • NSSCC patches / reset / CBCR wakeup bits β€” all red herrings.
  • gcc_uniphy0_sys_clk / gcc_uniphy0_ahb_clk β€” both enabled (Y in clk_summary).
  • PCS PLL itself β€” CALIBRATION_DONE set, PLL_RESET cycle ran.
  • TX/RX symmetry on the NSSCC side β€” both branches have identical clk-branch definitions; the difference is real, in hardware.

Useful tooling that came out of this

  • rdinit=/bin/sh in bootargs-append β€” drops to an interactive busybox ash on the console as PID 1, bypassing procd entirely. Avoids the (still-unfixed) post-preinit wedge while keeping the system stable enough to mount sysfs/debugfs, bring eth0 up, run clk_summary, and inspect regmaps live. Key for any iteration that needs to read actual hardware state.
  • U-Boot bootipq trampoline β€” 113-byte mkimage -T script containing bootipq at the TFTP filename, lets you reliably revert to vendor firmware on any power cycle without UART-input gymnastics.
  • /sys/kernel/debug/regmap/39b00000.clock-controller/registers β€” full NSSCC register dump from userspace. Confirmed port1_rx_clk_src CMD@0x4b4=0x1 CFG@0x4b8=0x1 BR@0x57c=0x80000010 matches what the kernel WARN reports.

Asks

  1. If anyone has working RTL8372N SerDes init code (ideally the top-level chip-init patch list, not just SDS_MODE_SET_SW) β€” that would unblock the data path immediately. The patch_list[][2] starting {0xC202, 0xC1}, {0xC203, 0x01}, {0xC204, 0xFF}, … in dal_rtl8373_switch.c looks like it should run at chip bring-up, but I don't yet see where vendor calls it from, or whether the addresses are top-level switch regs or another indirection.
  2. ElektromAn β€” if your Wave7/Predator IPQ5332 board uses the same RTL8372N/8373 switch silicon, very interested to compare your SerDes init path.
  3. Anyone with an early/draft RTL8372N DSA driver β€” happy to coordinate or rebase onto your branch.

Thank you for sharing. I’ve been porting the TIP wlan-ap project to the GL-BE6500 (the only difference from the GL-BE9300 is the Wi-Fi 6 GHz band), and it’s working well so far, including the RTL8372n. Here’s my repository: https://github.com/JiaY-shi/wlan-ap/tree/gl-be6500. I also want to create a native OpenWrt port for the GL-BE6500. I tried before, but kept getting stuck on the WAN port. My issue is a bit stranger: IPv6 works, but IPv4 doesn’t. Thank you for sharing your work β€” would you mind sharing your repository address as well?

@perceival I have linked several discussion threads on RTL8372N switch to your post here. You check back in your original post at the top to find the links to them.

Not sure if it's helping but maybe useful to pool existing info & resources together.

PS: Link back to RTL8372 support discussion thread

Would this help development on my Ruijie Reyee RG-EW7200BE PRO? It also has this switch. How do you define ports? Would you care to share?

Could be, but I don't have the device.
Check out the questions in above where OP @perceival is seeking for feedback.

PS: See this post for update from @shi05275

@perceival

The Acer Wave7 have no switch chip, only the one in the SoC I assume..

Here is my untested branch for qualcommbe v6.18

Parts of them are from @mrnuke upstream efforts

Build Recipies for Acer Wave7 and GL 9300 are inside.

But is rtl8372 switch DSA or swconfig?

Progress Update β€” GL-BE9300 (IPQ5332) Port: Ethernet Working

Quick update on where things stand. LAN traffic is now flowing through the DSA switch (RTL8372N) β€” ping from the device to a laptop on LAN1 works at ~30% success rate, sustained without degradation. Still work to do, but the data path is alive end-to-end for the first time.

What's working

  • Boot: Kernel 6.12.87 on qualcommbe/ipq53xx. Boots via TFTP from U-Boot, rdinit=/bin/sh for testing. Procd boot needs minor preinit fix (separate issue).
  • PCIe: Gen.3 x2 link up. QCN9274 (ath12k) radio detected on the bus (WiFi bring-up is a separate effort).
  • eMMC: HS200, all 13 GPT partitions visible.
  • DSA switch (RTL8372N): New out-of-tree driver. Probes via MDIO indirect, registers with DSA, tag protocol tag_rtl8_4 confirmed working (captured real CPU-tagged frames off the wire). 4 LAN ports + 1 CPU port. VLAN, port isolation, STP states all programmed.
  • EDMA: RX path fully functional β€” frames from the laptop arrive in the kernel. TX path works but with ~70% packet loss (see below).
  • PCS/UNIPHY (10GBASE-R): SoC↔switch uplink at 10 Gbps. Key finding: U-Boot fully configures the PCS PLL and SerDes for 10GBASE-R. The Linux PCS driver must detect this and skip its destructive PLL-reset cycle, or the TX clock chain breaks permanently.

Key technical findings

1. PCS skip-reconfig (the TX clock fix)

The IPQ5332 vendor SDK (qca-ssdk MP branch) has no Linux-side 10GBASE-R PCS init for port 1. U-Boot does everything: PLL cal, VCO tuning, XPCS sequencing. The upstream pcs-qcom-ipq9574 driver's config_mode function asserts XPCS reset β†’ resets PLL β†’ attempts clk_set_rate on NSSCC port1 clocks β†’ RCG UPDATE gets stuck because the upstream UNIPHY clock disappeared during the reset.

Fix: at the top of ipq_pcs_config_mode, read PCS_MODE_CTRL. If the mode already matches the target interface AND PCS_CALIBRATION_DONE is set, skip the entire PLL-reset/calibration/VCO-restart sequence and jump to clock-rate propagation. Set a config_preserved flag so the PPE driver can also skip its MAC reset cycle (which similarly destroys the inherited clock state).

2. NSSCC TX_CBCR FORCE_PERIPH_ON

Even with the PCS preserved, TX was dead. Vendor register comparison (via devmem on the GL.iNet stock firmware) revealed the smoking gun: the NSSCC port1 TX branch CBCR register needs bit 10 (FORCE_PERIPH_ON) and bit 8 (SLEEP_CTL) set. Without FORCE_PERIPH_ON, the branch clock gates between frames and the switch never sees valid data on the SerDes.

Vendor: TX_CBCR = 0x501. Our default: 0x001. Setting 0x501 immediately brought TX back.

3. NSSCC RCG SRC_SEL stays at XO β€” and that's normal

The NSSCC port1 TX/RX CFG_RCGR always reads SRC_SEL = 0 (XO crystal) β€” even on vendor firmware. Writes to change it to UNIPHY source are silently ignored (the UPDATE bit never clears). The actual 10GBASE-R clock path bypasses the RCG mux entirely; the SRC_SEL reading is a red herring.

4. RTL8372N β€” new DSA driver

Mainline has no support for the RTL8372N (10-port L2 managed switch with 2Γ— 10G SerDes + 8Γ— 2.5G copper PHYs). Wrote a minimal DSA driver: MDIO indirect register access (table read/write), chip ID detection, CPU tag configuration (EXT_CPUTAG_EN on port 3), VLAN-1 + PVID setup, port isolation, and STP state management. Uses the existing tag_rtl8_4.c tagger β€” the wire format is identical to RTL8365MB.

5. EDMA TX completion interleaving

The PPE engine generates TXCMPL (TX completion) events on the same rings used by software-submitted packets. These PPE-internal completions carry buffer pointers that are NOT sk_buff pointers β€” the stock EDMA_TXCMPL_OPAQUE_GET reconstructs a valid-looking kernel address from word0+word1 but it's a PPE-internal buffer, not our skb. Calling dev_kfree_skb on it corrupts memory (silent at first, eventually triggers AXI error storms).

Current workaround: FIFO-based skb tracking with drain-all on mismatch. Still leaks DMA mappings and has ~70% packet loss. The proper fix likely involves either configuring the PPE to route internal completions to dedicated rings, or finding a reliable way to discriminate SW vs PPE completions.

What's left

  • TX packet loss (70%): The ~30% success rate is from TXCMPL handling issues (PPE-internal event interleaving). Need to either find why our packets' completions look identical to PPE-internal ones, or switch to a TXDESC-consumer-index-based cleanup approach.
  • WAN port (RTL8221B/eth1): Not tested yet. Needs its own PCS/PHY bring-up.
  • WiFi (ath12k + QCN9274): PCIe link up, device detected, but ath12k probe needs MLO/MLD masking (the firmware advertises MLO which ath12k can't handle yet). Patch exists, not tested on-air.
  • Procd boot: Hangs in preinit β€” needs debug. rdinit=/bin/sh works fine.
  • Flash (sysupgrade): Not started. eMMC partition layout known from vendor GPT.

Patches

19 patches in target/linux/qualcommbe/patches-6.12/ (0543–0590 range), plus several build_dir-only edits not yet formalized. DTS at target/linux/qualcommbe/dts/ipq5332-gl-be9300.dts. Happy to share the tree if anyone wants to look β€” it's messy WIP but boots and passes traffic.


If anyone else is working on IPQ5332 or has vendor documentation for the PPE/EDMA TXCMPL ring mapping, I'd love to compare notes. The TXCMPL interleaving issue is the main remaining blocker for reliable Ethernet

Are you sure all those patches will be needed in 6.18 ?

I'm guessing a 6.12 based PR won't be accepted anyway.

Good point β€” the 0300–0400 range patches are upstream qualcommbe enablement that's likely already in 6.18. The device-specific work would need porting either way:

  • RTL8372N DSA driver (new, no mainline equivalent)
  • PCS skip-reconfig for IPQ5332 10GBASE-R (vendor U-Boot dependency, not kernel-version-specific)
  • NSSCC CBCR FORCE_PERIPH_ON (hardware quirk)
  • EDMA TX completion workaround (IPQ5332-specific HW behavior)

Happy to rebase onto 6.18 once the data path is solid. The current 6.12 work is mostly about understanding the hardware β€” the fixes themselves are small and version-independent.

my Acer Wave7 boots with v6.18.33

root@OpenWrt:~# uname -a
Linux OpenWrt 6.18.33 #0 SMP Wed May 27 14:59:05 2026 aarch64 GNU/Linux
root@OpenWrt:~# dmesg | head -n 10
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x51af8014]
[    0.000000] Linux version 6.18.33 (elektroman@X270) (aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 14.3.0 r34665-ec173a9e85) 14.3.0, GNU ld (GNU Binutils) 2.44) #0 SMP Wed May 27 14:59:05 2026
[    0.000000] Machine model: Acer Wave7
[    0.000000] [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
[    0.000000] OF: reserved mem: 0x000000004a100000..0x000000004a4fffff (4096 KiB) nomap non-reusable bootloader@4a100000
[    0.000000] OF: reserved mem: 0x000000004a600000..0x000000004a7fffff (2048 KiB) nomap non-reusable tz@4a600000
[    0.000000] OF: reserved mem: 0x000000004a800000..0x000000004a8fffff (1024 KiB) nomap non-reusable smem@4a800000
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x000000005fffffff]
root@OpenWrt:~# cat /proc/cmdline 
console=ttyMSM0,115200n8

normal shell without any quirks
currently only initramfs without network interfaces

LAN data path is now end-to-end β€” two upstream-applicable bugs found

Picking up where I left off in the thread: the mainline OpenWrt port on the Flint 3 (IPQ5332 + RTL8372N + 2Γ— QCN6274) now has a fully functional Ethernet data path, including procd-mode boot and SSH over LAN. Throughput is wire-rate for a 1 Gbps client (iperf3 sustained at 942 Mbit/s, 0 TX errors, 0 retransmits in MIB).

This session closed two bugs that together had been masking each other for weeks. Both are general β€” they would affect any IPQ53xx board using the in-tree qcom_ppe driver with a DSA-tagged switch, not just the Flint 3.

Bug 1 β€” edma_rx_reap leaks one skb per NAPI poll

In drivers/net/ethernet/qualcomm/ppe/edma_rx.c, the linear-packet path pre-fetches the next iteration's skb at the tail of the loop body:

while (likely(work_to_do--)) {
    /* ...process skb_N... */
    next_skb = rxdesc_ring->rxfill->skbs[idx];
    rxdesc_ring->rxfill->skbs[idx] = NULL;
    if (unlikely(!next_skb)) { /* ... break ... */ }
}

work_to_do-- is a post-decrement, so on the last iteration of any batch the loop body still runs, the prefetch still NULLs skbs[idx], but the while header then sees work_to_do == 0 and exits β€” the prefetched skb is silently dropped and that slot stays NULL until the next refill cycle. Empirically this drops exactly one frame per NAPI poll. With work_to_do == 1 (one IRQ per descriptor on this SoC), the result is 50% packet loss.

Fix: wrap the in-loop prefetches (linear + scatter) with if (work_to_do) so the prefetch only runs when there will actually be a next iteration. After this fix, ping -c 1000 lands at 1000/1000 received, 0% loss.

There's a second, related issue worth bundling in the same patch: on IPQ5332 the chip overwrites word2 of the rxdesc before the kernel reads it back, so the index-based opaque pattern (looking up skbs[next_rxdesc_pri->word2 & MASK]) finds the wrong slot. The chip does preserve FIFO order between rxfill prod_idx and rxdesc cons_idx, so the kernel can use cons_idx directly as the skbs[] index. With both fixes together, RX is rock-solid.

Bug 2 β€” XGMAC powers up in cut-through with a tiny TX threshold

After Bug 1, RX worked but TX still failed. Specifically, frames larger than ~200 B from the SoC out to the wire would just vanish β€” ip -s link show eth0 showed tx_errors and tx_fifo_errors incrementing in lock-step with each lost frame. The threshold was sharp and reproducible:

ping -s 64    -c 10  β†’ 10/10 received
ping -s 200   -c 10  β†’ 3/10
ping -s 500   -c 10  β†’ 0/10
ping -s 1000  -c 10  β†’ 0/10
ping -s 1400  -c 10  β†’ 0/10

That's a classic DesignWare XGMAC TX FIFO underflow signature. The qcom_ppe driver writes XGMAC_TX_CONFIG_ADDR (speed + TXEN) but never touches MTL_TXQ_OPERATION_MODE at XGMAC_BASE + 0x1100, so the MAC powers up with TSF=0 (cut-through) and the default tiny TTC. Anything larger than a few hundred bytes starts on the wire before the rest of the frame is in the FIFO, and the MAC underflows and aborts.

Fix: set TSF=1 (Store-and-Forward) and TXQEN=2 in MTL_TXQ_OPERATION_MODE at the end of ppe_port_xgmac_link_up:

mtl_mask = GENMASK(3, 1);
mtl_val = BIT(1) | (2u << 2); /* TSF=1, TXQEN=2 */
regmap_update_bits(ppe_dev->regmap, reg + 0x1100, mtl_mask, mtl_val);

After this, every frame size from 64 B to 1500 B delivers at 100% and iperf3 sustains 942 Mbit/s with zero tx_underflow_err. The per-frame latency cost is the frame's serialisation time (~1.2 Β΅s for a full MTU at 10G) β€” negligible.

I think upstream qcom_ppe should probably default to TSF=1 unconditionally for any XGMAC port, since cut-through requires careful feeder tuning and there's no obvious harm in store-and-forward for current OpenWrt use cases.

What this unblocks on Flint 3

  • LAN1–4 reach 192.168.1.1, ping is sub-ms RTT at any size.
  • Procd boots cleanly with CONFIG_CMDLINE stripped of rdinit=/bin/sh. The earlier "PCS clk_set_rate hang in preinit" was fixed by the PCS skip-reconfig patch that detects U-Boot's existing 10GBASE-R config and leaves it alone.
  • Dropbear starts. nc 192.168.1.1 22 returns SSH-2.0-dropbear with the full kex algorithm list immediately.
  • The rtl8372n DSA driver registers with DSA_TAG_PROTO_RTL8_4. CPU port (port 3) IVL/SVL is set to 1 and FID/MSTI is 1 in the VLAN-1 entry β€” without those bits the chip routes lookups to a different FID and drops ~85% of unicast frames even with a static LUT pin (this was a previous session's find).

What's still pending

  • WAN port: RTL8221B PHY at USXGMII. Upstream rtl822x returns 0 in-band caps so managed = "in-band-status" fails; the in-tree workaround is to drop it and rely on MDIO polling.
  • ath12k wireless: currently the QCN9274 firmware advertises MLO and mainline ath12k can't register MLO devices the way the firmware wants, so I have a patch that masks the MLO advertisement and lets ath12k come up as two independent non-MLO wiphys. WiFi 7 per-radio works; cross-radio MLD does not. Closing that gap is upstream ath12k work, not a board patch.
  • The watchdog is currently status = "disabled"; needs proper procd integration before re-enabling.

Happy to share the two patches inline or as a branch link if anyone wants them. The fixes are mechanical and small (the EDMA one is one new if (work_to_do); the XGMAC one is one new regmap_update_bits) so anyone porting another IPQ53xx board can drop them in without much fuss.

@perceival do u have the code uploaded somewhere (eg github) ?

Awesome work. Hopefully GL.iNet will support you in the effort. Thanks for sharing the progress.

What you are doing is great. And appreciated. Many of us have Flint 3 routers and want to run vanilla OpenWRT. You are making that possible! Thank you!