Sitecom WLR-7100 development/progress

Hi, first time on openwrt. I have this kind of router (wifi is bad and often disconnected devices).
I want to know if is it possible to install a stable release directly from Toolbox/Firmware.

or then if it is better to buy a new router?
Immagine 2021-05-26 222524

Thanks in advance

Hi again,

I found the issue. The current stable release 21.02 works perfect on the WLR-7100 v1.001. But on the v1.002 it has a very poor download. Even wired. In suspect the chipset is a different revision.

I will look inside to check for any differences. Would that be of any help for the devs?

If you have v1.001 you may safely upload the dlf version of OpenWRT 21.02 to this device. It will most likely run perfectly, as mine does. The v1.002 version of this router behaves differently in my case. The download, both wired and wireless does never exceed 10Mb/s. While upload is on par with v1.001. Still have to figure out if it's just my router or all WLR-7100v1.002 are affected.

Does this help 4 months later?

I have both versions, and have been using the v1.001 with the v1.002 on the shelf for the past two years or so. To upgrade my router, I configured the v1.002 with a newer firmware and replaced the entire router, but ran into poor performance problems. I first suspected firmware changes (not realizing I had two different versions), but I downgraded the v1.002 to the same firmware as v1.001, and the problem persisted. I have not yet tried upgrading the v1.001 to the latest firmware (since that is currently running my organization's network, so I'm reluctant to risk breaking my only working setup), but it seems indeed that v1.002 is slightly different and thus OpenWRT has bad performance on it, just like @Dock also described.

@Dock Do you happen to have pictures of the v1.001 for comparison? I put pictures of the V1.002 on the wiki, but cannot easily open up my v1.001 now.

As for further debugging:

  • I have found that the root cause seems to be dropped incoming packets. I've seen 1% -15% depending on circumstances.
  • I've seen this happen on both WAN and LAN.
  • Setting up port mirroring on the internal switch suggests that the packets pass through the switch properly, but get lost before or in the CPU ethernet interface.
  • I've seen this happen with HTTP, scp, but also simple pings. The easiest way to reproduce this is to just ping the router: ping -s 1400 -i .01. This uses a high interval, so you can see the problem without having to wait for a long time, and bigger packets since that also seems to increase the packet loss percentage (but not linearly). However, the problem is also visible with default 56-byte packets and 1 second interval, so it does not seem to be a packet size or packet rate issue.
  • On the recent 23.04 release, I get about 1.3% with 56-byte pings, and 13% with 1400-byte pings. However, on some old snapshot version (r15249-85caf21ade, kernel 5.4.83) I get about 0.2-0.4% with 56-byte pings, and around 2% with 1400-byte packets, so it seems the problem did get worse in recent versions.
  • I suspected that only incoming packets were affected, because e.g. outgoing file transfers were fast while incoming were slow (outgoing file transfers would then still have dropped ACK packets, but those are smaller so less problematic). If this were true, however, then an outgoing ping would see the same amount of packet loss (replies would get lost instead of the pings), but that does not seem to be the case - ping -s 1400 gives me 0 lost packets (out of 400 tries, busyboxy pin doesn't support < 1s interval). Using -A (next ping as soon as previous reply is reached), I see 10 out of 4000 packets dropped, so maybe the problem still exists but is somehow even less likely to occur on reply packets? Running longer with 1s interval I see 2 out of 500 packets dropped and later 3 out of 4000 packet.
  • The stock firmware is the same for both hardware versions (both versions have their own download page and different filenames, but are byte-for-byte identical). Maybe the firmware source has a hint about what is different between hw versions, but I have not investigated yet.

On other interesting detail: On v1.001, /proc/cpuinfo says system type: Atheros AR9342 rev 3
and on v1.002 it says: system type: Atheros AR9342 rev 1. So this confirms a previous suspicion that these versions use a different revision of at least the SoC (interesting that the newer router version has a lower SoC revision, but maybe there's some revision register weirdness going on there), though it is unclear if the problem is related to a change in SoC revision, or maybe there is also a PCB change.

I've tried to lookup the revision history for this SoC, but I couldn't find a datasheet for the AR9342 at all (I could find 41 and 44 here). Also, the actual chip is marked AR1022, but I suspect this is the old atheros name for the Qualcomm-Atheros AR9342 (suggested by this wiki page as well). It's a bit weird that I can find hardly any mention of the AR1022 chip at all, though, so certainly no datasheet either...

These are very nice findings, the differences in SoC revision could explain the problem. Could You test if packet loss occurs if 100Mb/s or 10Mb/s is used. If not, then probably 1Gb value from pll-data is wrong. But if the losses are also there, maybe we need to add delays in dts gmac-config node ( The values can be from 0 to 3. For elaborate description check datasheet AR9344_May_2012.pdf (they float on GitHub) paragraph 9.7 and 11-3, 11-4 tables.

Do You still have vendor firmware on v1.001? Could You check if ethreg or devmem (eventually /dev/mem) is present?

Hi, nice see this re-opened...

I will take pictures of both v1.001 and v1.002 this week.

I also have different strange behaviour of the v1.002, which made me decide to stop using the device: it seems to run out of memory when trying to upgrade with Sysupgrade. I just hangs while uploading the file... only once in a lot of iterations reaching 100%. The v1.001 always uploads new firmware in a steady velocity.

Any of you also using the WLR-8100 (both versions)?

Both have the AR1022-AL1A,
but the V1.001 mentions additional PCK912.00B 1139
and the V1.002 mentions PCS681.00B 1146

The v1.002 is poorly readable, only with light in a specific angle.

Does this help?

Tested AC speeds on both with 23.05.0-rc3:

V1.001 reaches max (around my max WAN fiber speed of 100 Mbps) and v1.002 maxes on 38Mpbs downstream. Upstream both around 100Mbps.

Issue not as bad as years ago but significant. Uploading of firmware still struggling but doable.

@tmn505 Thanks for your debug suggestions, that was exactly what I needed to continue this investigation. I'll check - I know where to find the DTS file and can figure out how to test changes, just not what values to check.

As for the stock firmware - I do not have it running, but I can flash it again (did that this week to recover already). I'll check for /dev/mem and ethreg and come back to you with the result.

I've also seen this when upgrading through the webui - I can imagine that the packet loss just breaks the upgrade?

Furthermore: I have one new observation: it turns out I have two v1 002 boxes, and they have a different SoC in them (AR1022 and AR9342). I previously noticed a different revision number in /proc/cpuinfo and assumed that the other one (without problems) would be a v1 001, but today I removed it from our network and saw that it was also v1 002. Looking more closely, it seems that the PCB is fully identical (the only difference I could find is a bit of silkscreen on the right upside down that looks like it might be a timestamp and "P2-1" vs "P4-2").

This is the working v1 002 one, with AR9342-DL3A:

This is the broken v1 002 one, with AR1022-AL1A / PCS681.00B (just like @Dock's V1 002):

The 1 and 3 in AL1A and DL3A matches the revision shown in /proc/cpuinfo, so that might be related.

That's interesting - you have different version with both AR1022, I have two of the same version but with different chips, heh... I'm not sure if/how this helps yet, but it does suggest that we should not focus on the difference between AR1022/AR9342 too much, since there is a AR1022 chip that also works (or maybe something is operating on the edge of specifications and it just happens to work on some units but not on others...).

It also seems that your v1 001 version has a slightly different PCB color, my PCBs look more like your v1 002 one.

Otherwise your PCB looks completely identical (including silkscreen and which components are left out) for the part I can see on your picture. Could you maybe post one more picture of both entire PCBs, to compare the silkscreen version numbers (which fall off the bottom of your v1 001 picture)?

Nope, though a friend of mine mentioned they have a WLR-5100 (which is apparently identical to the WLR-4100 but with 5Ghz wifi added to an identical PCB).

Anyway, thanks for the input. I'm out of time for today, but will try to do some more tests soon (but I should also be doing other things, so might take me a bit more time).

Yeah, I already found that. I couldn't find the AR9342 datasheet, though. Do you know if that just wasn't leaked, or is the AR9342 and AR9344 both covered by that datasheet? I assume if not, that they are similar enough so most of the contents still applies, of course (though I noticed the docs for RST_REVISION_ID did not match the kernel sources - datasheet documents 0x011c1 for the AR9344, but the kernel sources use a different value - maybe the value was changed later or something...).

These are the same cores, they differ in CPU speed (MHz), supported wifi band. They could also differ in integrated peripherals, like integrated switch (not relevant in our case).

I want to check if registers value on all revisions differ somewhere. If both bare not present in vendor firmware, I'll attach busybox binary with ethreg compiled in.
Can You also check if switch chip differs between all Your devices?

For an example check:;a=blob;f=target/linux/ath79/dts/ar9342_iodata_etg3-r.dts;hb=HEAD#l127. Our dts is in the same dir.

Not me (also that's different SoC, newer generation).

I've taken my "working" AR9342 version, which I've only tested so far with an years old snapshot version, upgraded it to the latest snapshot version and confirmed it is still working. Then I did the same with my "broken" AR1022 version, and confirmed it is still broken. This is another confirmation that the problem is indeed caused by hardware differences.

The loss also happens on 100Mbit (my home switch does not do gbit).

The switch chip is the same - QCA8337N-AL3C on both my boards, and @Dock's v1 001 (I do not know about their v1 002 board).

I installed the stock firmware again, it has both ethreg and /dev/mem:

# ethreg -h
ethreg: option requires an argument -- h
usage: ethreg [-i ifname] [-p portnum] offset[=value]
usage: ethreg [-f]  -p portnum=10/100/0 [-d duplex]
usage: ethreg [-i ifname][-s value]
usage: ethreg [-i ifname][-j 0|1]
usage: ethreg [-i ifname][-h 0|1]
usage: ethreg [-i ifname][-p portnum] [-t mode]
# ls -l /dev/mem
crw-r--r--    1 0        15         1,   1 Dec  9  2015 /dev/mem

I also tried installing ethreg inside OpenWRT, but found that it does not seem to be available anywhere, I just found one github repo with a busybox version that still had ethreg.c. I guess it was never part of upstream busybox and is no longer used nowadays (I also couldn't find any reference to the IOCTL used in kernel sources, so I suspect that might have been WRT-specific patches in the past?). In any case - compiling it with prefix=mips-linux-gnu- worked, but then failed to run on OpenWRT with invalid instruction - probably missing some compiler option...

In any case - ethreg is available, but I couldn't quite figure out what the base register address for the tool is. I thought (looking at the AR9344 datasheet) maybe 0x18070000 (base address of GMAC registers), or maybe no offset (just full register addresses), but in both cases trying to read 0x18070004 (LUTs_AGER_INT) returns an incorrect value (bits 31:4 are reserved and should read 0):

Read Reg: 0x18070004 = 0x4b640200
# ethreg 0x4         
Read Reg: 0x00000004 = 0x87a00000

I also considered using /dev/mem, but could not find any usable tool to make a usable dump (there is tail, but not head, so that might be usable to make a binary dump, but then I do not have an easy way to get the binary file off the board through serial (stock firmware does not have telnet or ssh or netcat)

# dd
/bin/sh: dd: not found
# devmem
/bin/sh: devmem: not found
# hd
/bin/sh: hd: not found
# hexdump
/bin/sh: hexdump: not found
# ls /bin/
*        cp       egrep    ls       netstat  ps       sleep    umount
ash      date     grep     mkdir    nice     pwd      stty     uname
busybox  df       ip       mknod    pidof    rm       sync     vi
cat      dmesg    kill     mount    ping     sed      touch
chmod    echo     ln       mv       ping6    sh       true
# ls /sbin/
KC_SMB                 hotplug2               radvd
apcfg_init             httpd                  raether
app_agentd             ifconfig               rdisc6
app_agentsd            ifdown                 reboot
apps_init              ifup                   rmgmt
apps_init_ver.txt      igmpproxy              rmmod
appscore_init_ver.txt  init                   route
arp                    insmod                 rpcbind
autoFWupgrade          ip                     setconfig
brctl                  ip6tables              ssdk_sh
burn                   ippoold      
config_init            iptables               sysconf_cli
config_term            iwconfig               sysconfd
dhcp6c                 iwlist                 sysctl
dhcp6s                 iwpriv                 syslogd                       tc
dnsget                 llmnrd                 udhcpc
dnsmasq                logserver              udhcpd                  lsmod                  umount
dumpleases             md                     updatedd
eraseall               minidlna               usbmgr
ethreg                 miniupnpd              usbmgr_cli
ez-ipupdate            mkfs.jffs2             utmconfig
factory_apps_init      mm                     utmproxy
flashw                 modprobe               uuidgen          mount                  vconfig          ntfs-3g                wandetector          ntfslabel              wanmanager          ntpclient              wanmanager_host_get
halt                   openl2tpd              watchdog
header                     wget
hostapd                poweroff               wlanconfig
hostapd1               pppoe                  wolmanager
hostapd2               pptp
hostapd_cli            processmanager
# ls /usr/bin/
*            burnA        config_init  ipcs         tail         wc
[            burna        config_term  killall      test         which
[[           burnb        ether-wake   logger       tftp
awk          burnf        expr         md5sum       time
basename     burnk        find         printf       tty
burn         cmp          id           sort         uptime
# ls /usr/sbin
chat      poff      pon       pppd      pppd0     pppd1     pppstats

@tmn505 If you have suggestions on what offsets to pass to ethreg to inspect relevant registers, let me know.

I've also just soldered on a UART header on the AR9342 working board, so we can compare (see if the stock firmware uses different settings on both boards, or maybe just uses the same settings that work on both).

I can confirm the switch chip on my V1.002 is the same.

w00t! I went ahead and compiled a custom image with this patch (just to see if it would change anything - I copied these settings blindly from some other board), and it fixes the packet loss (tested just with ping so far) on both 100Mbps and 1Gbps:

--- a/target/linux/ath79/dts/ar1022_sitecom_wlr-7100.dts
+++ b/target/linux/ath79/dts/ar1022_sitecom_wlr-7100.dts
@@ -65,6 +65,10 @@ &eth0 {
        gmac-config {
                device = <&gmac>;
                rgmii-gmac0 = <1>;
+                rxdv-delay = <3>;
+                rxd-delay = <3>;
+                txen-delay = <3>;
+                txd-delay = <3>;

Of course, there is no indication that these are really the best settings (In git history I see other boards where they reverted these delays in favor of changing the phy-mode or pll-data), but at least it confirms that we're looking in the right place. So I guess still worth looking at the settings used by the stock firmware.

I also enabled /dev/mem in this image, so I can see the 4 times "3" settings in memory (the 0x3FC), bits 21:14:

# devmem 0x18070000

I can also change them back to the default 0, which causes packet loss again:

# devmem 0x18070000 32 0x00000001

Fiddling around with these settings, I can see that only the ETH_RXD_DELAY seems to influence the packet loss seen with ping, and setting it to 1 is sufficient to fix it completely (at both 100Mbps and 1Gbps).

I haven't tested the other board yet.

I also noticed an RX_DELAY setting in the ETH_XMII_CONTROL register (which is what pll-data sets AFAICS). In the default ar934x.dtsi, it is set to 1, but changed to 0 for the wlr-7100. However, if I cange it back to 1, connectivity seems to break entirely, but maybe that's because they shouldn't be changed while the NIC is running :slight_smile:

I also noticed that the phy-mode can also be used to introduce a delay, which then configures the PHY (or in this case, the AR8337 which is sortof a virtual PHY since it is connected to the CPU using RGMII) to introduce a delay (which, looking at upstream linux qca8k-8xxx.c, sets a delay of "2", which I presume is 2ns). This would be triggered by setting phy_mode = "rgmii-rxid. I haven't tried this yet, since that would involve updating the DTS, so regenerating the image (or at least the kernel), which takes a while.

What is interesting, is that AFAIU both RX and TX need a delay somewhere (by default CLK and DATA are outputted synchronously, resulting in instabilities). So it seems that TX actually does have a delay somewhere, or is just lucky that it always works? I see that sometimes PCB traces can be laid out to introduce this delay (there are some squiglies in the traces between CPU and switch, but maybe that's just length matching), but then I would be surprised if they did that for TX but not for RX.

One question I had: How is this really relevant? The link between the switch and the CPU is always 1Gbps AFAICS, only the link between the switch and the outside world is different. And AFAICS both the pll-data as well as the delays only affect this link, right? Or does the pll-data also somehow drive the switch chip?

I also had a look in the sources for the stock firmware (Found here, the v1 001 version has a broken link, I've already asked sitecom about that) for the ethreg utility and the corresponding kernel code. From that it seems that this tool actually reads registers inside the PHY (the switch in this case), not registers related to the eth gmac in the SoC itself. So I guess we can use that to see if the stock firmware configures the PHY for delay, not if it configures the GMAC for delay (but I'm going to see if I can get the OpenWRT devmem binary working on the stock firmware, if I can find some way to copy files onto it...).

Edit: I've partly figured out how to use ethreg in the stock firmware

You can read registers for the phy attached to each switch port, passing the port number with -p (it seems the ports are numbered in reverse from how they are numbered on the casing, though). This corresponds to the "PHY control registers" section (5.9) in the QCA8337 datasheet. E.g. to read the "Status register" and "PHY identifier" for port 0 respectively:

# ethreg -p 0 1
Read Reg: 0x00000001 = 0x0000796d
# ethreg -p 0 2
Read Reg: 0x00000002 = 0x0000004d

According to the kernel sources, port 0xf is special and triggers another register read function. I would suspect this allows reading global registers in the 8337, but somehow this fails:

# ethreg -p 15 0x0 
regread ioctl error
ethreg: eth0: Invalid argument

I'm not really sure how this is possible, though, reading the kernel sources I can't even see how this ioctl could even return EINVAL at all. Oh well.

Wow matthijs, I'm impressed!

May I conclude a delay of 1 (instead of 0) is sufficient to fix the packet loss (in ping at least) and the thing to test if that setting does not break the v1.001 board?

That's one test, the other is too see what the stock firmware does (see below).

Turns out the router has an USB port, so getting files on and off is easy :slight_smile: I copied both /sbin/devmem and /lib/ from OpenWRT into the stock firmware (into the proper places - I tried keeping them elsewhere and use LD_LIBRARY_PATH which didn't work, probably because that is used by the dynamic linker to find libraries, not by the kernel to find the linker in the first place).

So the stock firmware actually sets the GMAC delay registers:

# devmem 0x18070000

This means ETH_RXDV_DELAY=3 and ETH_RXD_DELAY=3, and both of the TX delays at 0.

I will also check the working board to see if that sets different values (but I'm going to guess it doesn't, and that board is just lucky that it works because of subtle changes in silicon between different chip revisions or even different chips).

I also looked in the stock firmware kernel sources to see if they ever set these values, and I couldn't find anything (but maybe this happens in a non-obvious place using hardcoded values instead of proper constants, or maybe it happens in some script rather than the kernel.., who knows...).

Checking the pll register value in the stock firmware shows it also matches the OpenWRT value:

# devmem 0x1805002C

(for reference, the 0x2C register offset comes from the pll reg attribute plus the second element of the eth0 pll-reg attribute, the pll-data defined in the OpenWRT DTS is for Gbps, 100Mbps, 10Mbps respectively, and this is an internal link that is always 1Gbps, so only the first value, 0x06000000, is relevant)

Looking more closely at the kernel code, it seems the ethreg global register reading code (triggered by -p 15, but failing) actually almost matches the QCA8337 datasheet (except for some magic numbers)ull, but the port-specific PHY register reading code (which does seem to work) does not match the datasheets (it uses register 0x98 for accessing the PHY-port registers, but that is the HEADER_CTRL register according to the datasheet.

I guess I might be looking at the wrong bit of code, or the sources I'm looking at are not the right sources for this board. So I'm going to be slightly distrustful of the ethreg output. Also, this means I cannot really verify whether the stock firmware sets up any (TX) delays in the PHY.

I've also tried bitbanging the right MII registers based on the 9344 and 8337 datasheets and current kernel sources using devmem, which seems to work. Here's the script for that, in case it's useful for anyone else:

# This script reads QCA8337 (and compatible) switch global registers
# through AR9344 (and compatible) MII control registers by poking directly
# in memory. This is probably slightly dangerous and will probably crash
# things if the kernel also happens to access these registers at the same
# time.

devmem 0x19000028 32 0x1800 # MII_ADDR
devmem 0x1900002C 32 $((REG >> 9)) # MII_CTRL

PHYNUM=$(((REG >> 6) & 0x7 | 0x10))
PHYREG=$(((REG >> 1) & 0x1e))
devmem 0x19000024 32 0 # MII_CMD
devmem 0x19000028 32 $((PHYNUM << 8 | PHYREG)) # MII_ADDR
devmem 0x19000024 32 1 # MII_CMD
LO=$(devmem 0x19000032 16) # MII_STATUS + 2

PHYREG=$((PHYREG | 0x1))
devmem 0x19000024 32 0 # MII_CMD
devmem 0x19000028 32 $((PHYNUM << 8 | PHYREG)) # MII_ADDR
devmem 0x19000024 32 1 # MII_CMD
HI=$(devmem 0x19000032 16) # MII_STATUS + 2

printf "0x%08x = 0x%04x%04x\n" "$REG" "$HI" "$LO"

I verified it works by reading rregister 0x0 (MASK_CTRL), bits 15:8 are the device ID which should be 0x13:

# ./ 0x0
0x00000000 = 0x00001302

Here's a full dump of the QCA8337 registers in the stock firmware on the "broken" board:

# for r in 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 184 188 192 196 200 204 208 212 216 220 224; do ./ $r; done
0x00000000 = 0x00001302
0x00000004 = 0x87a00000
0x00000008 = 0x01000000
0x0000000c = 0x01000000
0x00000010 = 0x80000000
0x00000014 = 0xf0107650
0x00000018 = 0x0000a980
0x0000001c = 0x00003f1f
0x00000020 = 0x3f170a00
0x00000024 = 0x000100ff
0x00000028 = 0x00000000
0x0000002c = 0x00000000
0x00000030 = 0x80000304
0x00000034 = 0x00000000
0x00000038 = 0x0f000000
0x0000003c = 0x00000000
0x00000040 = 0x20700aaa
0x00000044 = 0x00000000
0x00000048 = 0x000088a8
0x0000004c = 0x00000000
0x00000050 = 0xcf35cf35
0x00000054 = 0xcf35cf35
0x00000058 = 0xcf35cf35
0x0000005c = 0x03ffff00
0x00000060 = 0x00000001
0x00000064 = 0x00000000
0x00000068 = 0x00000000
0x0000006c = 0x00000000
0x00000070 = 0xb00ee060
0x00000074 = 0x03707f07
0x00000078 = 0x00002400
0x0000007c = 0x0000007e
0x00000080 = 0x80001ffd
0x00000084 = 0x00001280
0x00000088 = 0x00001280
0x0000008c = 0x00001280
0x00000090 = 0x00001280
0x00000094 = 0x0000007e
0x00000098 = 0x0001aaaa
0x0000009c = 0x00000002
0x000000a0 = 0x00000000
0x000000a4 = 0x00000000
0x000000a8 = 0x00000000
0x000000ac = 0x00000000
0x000000b0 = 0x00000000
0x000000b4 = 0x00000000
0x000000b8 = 0x00000000
0x000000bc = 0x00000000
0x000000c0 = 0x00000000
0x000000c4 = 0x00000000
0x000000c8 = 0x80901040
0x000000cc = 0x00000000
0x000000d0 = 0xfffbff7e
0x000000d4 = 0x00000001
0x000000d8 = 0x00000100
0x000000dc = 0x000303ff
0x000000e0 = 0xc70164c0

Some interesting observations:

  • The port connected to the CPU is probably PORT0, since its CTL register 0x4 has some values, while PORT5_PAD_CTL (0x8) and PORT6_PAD_CTL (0xC) have just one bit set. I couldn't actually find in the OpenWRT DTS file where it is define what switch port is which (even more, I couldn't even find the compatible = "qca,ar8327" that I think would trigger the right driver for the switch, so maybe there are some hardcoded defaults, or I didn't look in the right places).
  • Bit 31 of PORT0_PAD_CTL is set, which means MAC06_EXCHANGE_EN, which exchanges MAC0 and MAC6 (so maybe the hardware is wired to port 6 after all?).
  • The OpenWRT dts file sets some of these registers as well, to identical values as the stock firmware (except for bit 8 of the STATUS registers, but that is a read-only link status value so should not make a difference)
  • The PORT0_PAD_CTRL register (0x4) has value 0x87a00000 which means MAC0_RGMII_EN | MAC0_RGMII_TXCLK_DELAY_EN | MAC0_RGMII_RXCLK_DELAY_EN | MAC0_RGMII_TXCLK_DELAY_SEL(2) | MAC0_RGMII_RXCLK_DELAY_SEL (2) (note that the datasheet does not show MAC0_RGMII_RXCLK_DELAY_EN, but according to kernel sources that is bit 24). This means that the PHY is configured to apply delays on TX and RX, but apparently that is not actually enough? Also, OpenWRT applies the same delays (at least they are listed in the dts file), so nothing to be gained there.
  • It is actually weird that both the PHY and the GMAC apply a delay for the RX signal. I'm not sure about the units for this delay, but I understand that the goal is 2ns delay, so maybe 2 in the PHY means 2ns? If 3 in the GMAC also means 3ns, then that would add up to 5ns delay, which is more than half the RGMII clock period (8ns), given this is DDR, that would be too much. Weird...
  • I suspect that some if not all of the settings in the qca,ar8327-initvals attribute can be more semanticaly set using other DT properties (such as using phy_mode = "rgmii-rxid" for enabling a RX/TX delay value of 2, or using qca,ignore-power-on-sel to set that one bit in PWS_REG_VALUE, but I'm not sure if this is actually worth the trouble...

For good measure, here's also a dump of the GMAC0 config registers from stock firmware which might or might not be relevant (I left out the second half of the registers, which seem to be mostly status registers and counters, and the registers for GMAC1 which is unused):

# for r in 419430400 419430404 419430408 419430412 419430416 419430420 419430424 419430428 419430432 419430436 419430440 419430444 419430448 419430452 419430456 419430460 419430464 419430468 419430472 419430476 419430480 419430484 419430488 419430492; do printf "%08x = " $r; devmem $r; done

19000000 = 0x0000002F
19000004 = 0x00007215
19000008 = 0x40605060
1900000c = 0x00A1F037
19000010 = 0x00000600
19000014 = 0x00000000
19000018 = 0x00000000
1900001c = 0x00000000
19000020 = 0x0000000B
19000024 = 0x00000000
19000028 = 0x00000411
1900002c = 0x00004007
19000030 = 0x00000010
19000034 = 0x00000000
19000038 = 0x00000000
1900003c = 0x00000008
19000040 = 0x00026FB2
19000044 = 0x10010000
19000048 = 0x001F1F00
1900004c = 0x0010FFFF
19000050 = 0x03FF0155
19000054 = 0x01F00140
19000058 = 0x0003FFFF
1900005c = 0x000E6BE2

Ok, so a lot of things learned. Next steps are:

  • Dump registers with stock firmware on my other board, to confirm that uses the same values
  • Create an image with the right gmac delays (copied from stock) and test that on both my boards. I'll also publish the image somewhere, would be great if @Dock can then also test on their two boards.
  • If this works everywhere, submit the patch to OpenWRT.

I also checked the stock firmware on my other board:

  • ETH_CFG register - identical
  • PLL register - identical
  • 8337 registers - identical except bit 31 in register 0x80 (PORT1_STATUS), but that's documented as reserved, so probably not a configuration difference.
  • GMAC0 registers are mostly identical, except the MII_* registers, which are used to read MII registers (so reflect the address and value of the last read).

So: It seems likely that the stock firmware does not distinguish different versions, but just sets the same settings unconditionally and those apparently work with all versions.

I realized I forgot to dump the actual PHY registers (i.e. of the 5 PHYs connected to the external LAN and WAN ports on the 8337 switch). They're probably not relevant, but just in case, here's their values with the stock firmware (again snipped some counter and debug registers):

for p in 0 1 2 3 4; do echo Port $p; for r in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do ethreg -p $p $r; done; done
Port 0                           
Read Reg: 0x00000000 = 0x00001000
Read Reg: 0x00000001 = 0x0000796d
Read Reg: 0x00000002 = 0x0000004d
Read Reg: 0x00000003 = 0x0000d036
Read Reg: 0x00000004 = 0x00000de1
Read Reg: 0x00000005 = 0x000045e1
Read Reg: 0x00000006 = 0x00000005
Read Reg: 0x00000007 = 0x00002001
Read Reg: 0x00000008 = 0x00000000
Read Reg: 0x00000009 = 0x00000200
Read Reg: 0x0000000a = 0x00000000
Read Reg: 0x0000000b = 0x00000000
Read Reg: 0x0000000c = 0x00000000
Read Reg: 0x0000000d = 0x00004007
Read Reg: 0x0000000e = 0x00000000
Port 1                           
Read Reg: 0x00000000 = 0x00001000
Read Reg: 0x00000001 = 0x00007949
Read Reg: 0x00000002 = 0x0000004d
Read Reg: 0x00000003 = 0x0000d036
Read Reg: 0x00000004 = 0x00000de1
Read Reg: 0x00000005 = 0x00000000
Read Reg: 0x00000006 = 0x00000004
Read Reg: 0x00000007 = 0x00002801
Read Reg: 0x00000008 = 0x00000000
Read Reg: 0x00000009 = 0x00000200
Read Reg: 0x0000000a = 0x00000000
Read Reg: 0x0000000b = 0x00000000
Read Reg: 0x0000000c = 0x00000000
Read Reg: 0x0000000d = 0x00000000
Read Reg: 0x0000000e = 0x00000000
Port 2                        
Read Reg: 0x00000000 = 0x00001000
Read Reg: 0x00000001 = 0x00007949
Read Reg: 0x00000002 = 0x0000004d
Read Reg: 0x00000003 = 0x0000d036
Read Reg: 0x00000004 = 0x00000de1
Read Reg: 0x00000005 = 0x00000000
Read Reg: 0x00000006 = 0x00000004
Read Reg: 0x00000007 = 0x00002801
Read Reg: 0x00000008 = 0x00000000
Read Reg: 0x00000009 = 0x00000200
Read Reg: 0x0000000a = 0x00000000
Read Reg: 0x0000000b = 0x00000000
Read Reg: 0x0000000c = 0x00000000
Read Reg: 0x0000000d = 0x00000000
Read Reg: 0x0000000e = 0x00000000
Port 3
Read Reg: 0x00000000 = 0x00001000
Read Reg: 0x00000001 = 0x00007949
Read Reg: 0x00000002 = 0x0000004d
Read Reg: 0x00000003 = 0x0000d036
Read Reg: 0x00000004 = 0x00000de1
Read Reg: 0x00000005 = 0x00000000
Read Reg: 0x00000006 = 0x00000004
Read Reg: 0x00000007 = 0x00002801
Read Reg: 0x00000008 = 0x00000000
Read Reg: 0x00000009 = 0x00000200
Read Reg: 0x0000000a = 0x00000000
Read Reg: 0x0000000b = 0x00000000
Read Reg: 0x0000000c = 0x00000000
Read Reg: 0x0000000d = 0x00000000
Read Reg: 0x0000000e = 0x00000000
Port 4
Read Reg: 0x00000000 = 0x00001000
Read Reg: 0x00000001 = 0x00007949
Read Reg: 0x00000002 = 0x0000004d
Read Reg: 0x00000003 = 0x0000d036
Read Reg: 0x00000004 = 0x00000de1
Read Reg: 0x00000005 = 0x00000000
Read Reg: 0x00000006 = 0x00000004
Read Reg: 0x00000007 = 0x00002801
Read Reg: 0x00000008 = 0x00000000
Read Reg: 0x00000009 = 0x00000200
Read Reg: 0x0000000a = 0x00000000
Read Reg: 0x0000000b = 0x00000000
Read Reg: 0x0000000c = 0x00000000
Read Reg: 0x0000000d = 0x00000000
Read Reg: 0x0000000e = 0x00000000

Comparing stock firmware registers with OpenWRT registers (with the delay patching applied):

  • ETH_CFG register - identical
  • PLL register - identical
  • 8337 registers are largely identical, with some changes (see below). Nothing that stands out as incorrect.
  • GMAC0 registers are largely identical, with some changes (see below). Nothing that stands out as incorrect.
  • 8337 PHY (per-port) registers not checked, since I did not have a working ethreg available on OpenWRT (and it did seem worth the trouble of putting another layer on top of my just to compare them).
--- 8337.stock       2023-08-31 20:28:57.339128904 +0200
+++ 8337.owrt        2023-08-31 20:28:59.954953884 +0200
@@ -6,11 +6,11 @@
# Interrupt registers, probably not relevant
-0x00000020 = 0x3f170a00
+0x00000020 = 0x3f500a02
-0x00000024 = 0x000100ff 
+0x00000024 = 0x00010087
# MODULE_EN, stock enables L3_EN (0x4) and some reserved bits
-0x00000030 = 0x80000304
+0x00000030 = 0x80000000
# MAX_FRAME_SIZE, stock configures 9216 bytes, OpenWRT 9028
-0x00000078 = 0x00002400
+0x00000078 = 0x00002344
# PORT_x_STATUS registers, probably not relevant
-0x0000007c = 0x0000007e
+0x0000007c = 0x000000fe
-0x00000080 = 0x80001ffd
+0x00000080 = 0x00001f7d
-0x00000084 = 0x00001280
+0x00000084 = 0x00001200
-0x00000088 = 0x00001280
+0x00000088 = 0x00001200
-0x0000008c = 0x00001280
+0x0000008c = 0x00001200
-0x00000090 = 0x00001280
+0x00000090 = 0x00001200
-0x00000094 = 0x0000007e
+0x00000094 = 0x000000fe
# HEADER_CTRL, stock uses 4-byte header with 0xaaaa header type value, OpenWRT 2-byte header type value
-0x00000098 = 0x0001aaaa
+0x00000098 = 0x00000000
# PORT0_HEADER_CTRL, stock configures all frames with headers, OpenWRT configures no headers
-0x0000009c = 0x00000002
+0x0000009c = 0x00000000
--- gmac0.stock       2023-08-31 20:32:33.467169638 +0200
+++ gmac0.owrt        2023-08-31 20:32:34.507122738 +0200
@@ -1,22 +1,24 @@
# MAC Configuration 1, stock sets RX_FLOW_CONTROL here
-19000000 = 0x0000002F
+19000000 = 0x0000000F
# Maximum Frame Length, stock sets 1536, OpenWRT sets 1524
-19000010 = 0x00000600
+19000010 = 0x000005F4
# MII Configuration, stock sets MII clkdiv to /58, OpenWRT to /50
-19000020 = 0x0000000B
+19000020 = 0x0000000A
# MII access registers - not relevant
-19000028 = 0x00000411
+19000028 = 0x0000111F
-1900002c = 0x00004007
+1900002c = 0x00000000
-19000030 = 0x00000010
+19000030 = 0x00000000
# STA Address 1/2 - stock and OpenWRT apparently configure a different MAC address, but this is just eth0, not the VLAN devices that are actually used.
-19000040 = 0x00026FB2
+19000040 = 0xD021F3FE
-19000044 = 0x10010000
+19000044 = 0x38EE0000
# ETH Configuration 2, XON/XOFF flow control thresholds
-19000050 = 0x03FF0155
+19000050 = 0x015500AA
# ETH Configuration 4, stock sets "frame truncated" and "unicast address match" condition bits (but these conditions are ignored by OpenWRT anyway, so their values do not matter there).
-19000058 = 0x0003FFFF
+19000058 = 0x0000FFFF
# ETH Configuration 5, OpenWRT ignores extra frame drop conditions (drop event, falser carrier, code error, dribble nibble, long event, frame truncated).
-1900005c = 0x000E6BE2
+1900005c = 0x000FEFEF

Next up: Checking a newly generated image on my both boards.

Sorry, today I had very wonky internet connection, so even receiving a mail was a problem, which delayed my response.

With ethreg we are actually asking (we can also write) PHY/Switch connected to mdio interface for registers. What I wanted to know is if both revisions have the same register values on vendor firmware. We write the ones which differ from default values as qca,ar8327-initvals in dts. To know which registers to check look in Table 5-3 from QCA8337N_Data_Sheet_MKG-17793_v1.0.pdf, an example of invocation ethreg 0x10 should return 0x80000000 in our case. If for all board registers are same, we can ignore initial switch setup and focus on the delays. All other registers are configured as requested by your typical network tools, so we don't need to bothered.
BTW ethreg is a tool from Atheros LSDK, it was never part of busybox or any open source project.

As far as I can understand pll-data seem to be xMII reference clock, which is set when interface is brought up, so You can't fiddle with it when it's already running (check explanation in 9.5.10 paragraph of AR9344 datasheet). As I understand it, for correcting timings on the fly we have the tx or/and rx delays.

For ath79 target we are still using old swconfig drivers, check target/linux/generic/files/drivers/net/phy dir for source. There is a PR converting all switch users to DSA driver, but it's stalled by all devices having various range of configurations.

So in all, it seems setting rx delays and testing on all variants is everything we need. Great job.

All bits set in vendor firmware for the switch are in GPL_RELEASE/ISD2/configs/product_config.make starting from CONFIG_ATH_S17_PROPRIETARY_INIT symbol, but I asked to check on all variants, since I didn't know if the same source was used for them.