Sitecom WLR-7100 development/progress

Hi, I tested on I-O DATA WN-AG300DGR (WAN to LAN).

flow_offloading speed (avg)
0 (OFF) 190 Mbps
1 (ON) 650 Mbps

switch configuration:

config switch_vlan               <-- lan
        option device 'switch0'
        option vlan '1'
        option ports '1 2 3 4 0t'

config switch_vlan               <-- wan
        option device 'switch0'
        option vlan '2'
        option ports '1t 2t 3t 4t 5 0t' 

firewall rule for the testing:

config rule
        option src              wan
        option dest             lan
        option dest_port        5201
        option proto            tcp
        option target           ACCEPT

Environment:

  • OpenWrt: r16585-182eaa4916

  • PC: ThinkPad X250

  • NIC: Intel I218-LM

    • tagged (VID: 2): 192.168.1.2 (wan side)
    • untagged: 192.168.12.148 (lan side)
  • iperf3: 3.1.3-win64

    • command (server): .\iperf3.exe -s -B 192.168.12.148
    • command (client): .\iperf3.exe -c 192.168.12.148 -P 4 -B 192.168.1.2

I will check again with the current SNAPSHOT.

Could Luci slow things down? Memory is barely enough to have that installed.

Much appreciated. I'm positively surprised that soft flow_offloading gives so much benefit.

@Dock
Yes for wireless, as each client increases the memory usage (ath10k occupies a lot for each client). Shouldn't for wired, but in tight memory environment could.

cc: @tmn505 and @matthijs and @musashino

My results with the SNAPSHOT of 24 April 2021:

Fresh install, default settings (no LUCI): WAN->LAN:

DL 11.30 Mbps en UL 3.90 Mbps. Used to be my max fiber speed (around 100 Mbps up and down) with older OpenWRT firmware.

Installed LUCI and enabled software offloading. Results WAN->LAN

DL 16.61 Mbps and UL 2.35 Mbps

WiFi at default settings, just enabled.

WiFi 2.4 DL 11.8 Mbps and UL 40.8 Mbps (iPhone 12 mini)
WiFi 5.0 DL 19.9 Mbps and UL 94.4 Mbps (iPhone 12 mini)

Software offloading does not have a significant effect on the speeds.

I still have another WLR-7100 (v1.001) with SNAPSHOT r14256-a69949a13f:

Kernel 5.4.60
Ath10K firmware-qca988x-ct 2020-07-02-1
kmod-ath 5.4.60+5.8-1-1
kmod ath9k 5.4.60+5.8-1-1
kmod ath9k-common 5.4.60+5.8-1-1
kmod ath10k-ct-smallbuffers 5.4.60+2020-06-30-edfbf916-1

This version of OpenWRT is fast on WAN->LAN (offloading disabled) and has fiber speeds (around 90 Mbps) on the 5G WiFi. Luci installed. This router IS capable.

I suspect around December 2020 some changes in the drivers or switching did not work out very well for this router. It is very capable when configured properly.

I'm not so familiar with building own firmware, but if I can be of help with fixing this and even get this device to the stable release I would gladly make some time for that!

Hi,

I installed the stable release today and the issues are present as in the latest SNAPSHOT.

Very slow wired WAN -> LAN and DL on 5ac WiFi very slow.

Happy to see the WLR-7100 made it to stable release. Next is to get the transfer speeds up to specs. Who is the dev that knows this device best? :slight_smile:

Hi, first time on openwrt. I have this kind of router (wifi is bad and often disconnected devices).
I want to know if is it possible to install a stable release directly from Toolbox/Firmware.

or then if it is better to buy a new router?
Immagine 2021-05-26 222524

Thanks in advance

Hi again,

I found the issue. The current stable release 21.02 works perfect on the WLR-7100 v1.001. But on the v1.002 it has a very poor download. Even wired. In suspect the chipset is a different revision.

I will look inside to check for any differences. Would that be of any help for the devs?

If you have v1.001 you may safely upload the dlf version of OpenWRT 21.02 to this device. It will most likely run perfectly, as mine does. The v1.002 version of this router behaves differently in my case. The download, both wired and wireless does never exceed 10Mb/s. While upload is on par with v1.001. Still have to figure out if it's just my router or all WLR-7100v1.002 are affected.

Does this help 4 months later?

I have both versions, and have been using the v1.001 with the v1.002 on the shelf for the past two years or so. To upgrade my router, I configured the v1.002 with a newer firmware and replaced the entire router, but ran into poor performance problems. I first suspected firmware changes (not realizing I had two different versions), but I downgraded the v1.002 to the same firmware as v1.001, and the problem persisted. I have not yet tried upgrading the v1.001 to the latest firmware (since that is currently running my organization's network, so I'm reluctant to risk breaking my only working setup), but it seems indeed that v1.002 is slightly different and thus OpenWRT has bad performance on it, just like @Dock also described.

@Dock Do you happen to have pictures of the v1.001 for comparison? I put pictures of the V1.002 on the wiki, but cannot easily open up my v1.001 now.

As for further debugging:

  • I have found that the root cause seems to be dropped incoming packets. I've seen 1% -15% depending on circumstances.
  • I've seen this happen on both WAN and LAN.
  • Setting up port mirroring on the internal switch suggests that the packets pass through the switch properly, but get lost before or in the CPU ethernet interface.
  • I've seen this happen with HTTP, scp, but also simple pings. The easiest way to reproduce this is to just ping the router: ping 192.168.2.1 -s 1400 -i .01. This uses a high interval, so you can see the problem without having to wait for a long time, and bigger packets since that also seems to increase the packet loss percentage (but not linearly). However, the problem is also visible with default 56-byte packets and 1 second interval, so it does not seem to be a packet size or packet rate issue.
  • On the recent 23.04 release, I get about 1.3% with 56-byte pings, and 13% with 1400-byte pings. However, on some old snapshot version (r15249-85caf21ade, kernel 5.4.83) I get about 0.2-0.4% with 56-byte pings, and around 2% with 1400-byte packets, so it seems the problem did get worse in recent versions.
  • I suspected that only incoming packets were affected, because e.g. outgoing file transfers were fast while incoming were slow (outgoing file transfers would then still have dropped ACK packets, but those are smaller so less problematic). If this were true, however, then an outgoing ping would see the same amount of packet loss (replies would get lost instead of the pings), but that does not seem to be the case - ping 192.168.2.191 -s 1400 gives me 0 lost packets (out of 400 tries, busyboxy pin doesn't support < 1s interval). Using -A (next ping as soon as previous reply is reached), I see 10 out of 4000 packets dropped, so maybe the problem still exists but is somehow even less likely to occur on reply packets? Running longer with 1s interval I see 2 out of 500 packets dropped and later 3 out of 4000 packet.
  • The stock firmware is the same for both hardware versions (both versions have their own download page and different filenames, but are byte-for-byte identical). Maybe the firmware source has a hint about what is different between hw versions, but I have not investigated yet.

On other interesting detail: On v1.001, /proc/cpuinfo says system type: Atheros AR9342 rev 3
and on v1.002 it says: system type: Atheros AR9342 rev 1. So this confirms a previous suspicion that these versions use a different revision of at least the SoC (interesting that the newer router version has a lower SoC revision, but maybe there's some revision register weirdness going on there), though it is unclear if the problem is related to a change in SoC revision, or maybe there is also a PCB change.

I've tried to lookup the revision history for this SoC, but I couldn't find a datasheet for the AR9342 at all (I could find 41 and 44 here). Also, the actual chip is marked AR1022, but I suspect this is the old atheros name for the Qualcomm-Atheros AR9342 (suggested by this wiki page as well). It's a bit weird that I can find hardly any mention of the AR1022 chip at all, though, so certainly no datasheet either...

These are very nice findings, the differences in SoC revision could explain the problem. Could You test if packet loss occurs if 100Mb/s or 10Mb/s is used. If not, then probably 1Gb value from pll-data is wrong. But if the losses are also there, maybe we need to add delays in dts gmac-config node (https://git.openwrt.org/f3ffac90bc). The values can be from 0 to 3. For elaborate description check datasheet AR9344_May_2012.pdf (they float on GitHub) paragraph 9.7 and 11-3, 11-4 tables.

Do You still have vendor firmware on v1.001? Could You check if ethreg or devmem (eventually /dev/mem) is present?

Hi, nice see this re-opened...

I will take pictures of both v1.001 and v1.002 this week.

I also have different strange behaviour of the v1.002, which made me decide to stop using the device: it seems to run out of memory when trying to upgrade with Sysupgrade. I just hangs while uploading the file... only once in a lot of iterations reaching 100%. The v1.001 always uploads new firmware in a steady velocity.

Any of you also using the WLR-8100 (both versions)?


Both have the AR1022-AL1A,
but the V1.001 mentions additional PCK912.00B 1139
and the V1.002 mentions PCS681.00B 1146

The v1.002 is poorly readable, only with light in a specific angle.

Does this help?

Tested AC speeds on both with 23.05.0-rc3:

V1.001 reaches max (around my max WAN fiber speed of 100 Mbps) and v1.002 maxes on 38Mpbs downstream. Upstream both around 100Mbps.

Issue not as bad as years ago but significant. Uploading of firmware still struggling but doable.

@tmn505 Thanks for your debug suggestions, that was exactly what I needed to continue this investigation. I'll check - I know where to find the DTS file and can figure out how to test changes, just not what values to check.

As for the stock firmware - I do not have it running, but I can flash it again (did that this week to recover already). I'll check for /dev/mem and ethreg and come back to you with the result.

I've also seen this when upgrading through the webui - I can imagine that the packet loss just breaks the upgrade?

Furthermore: I have one new observation: it turns out I have two v1 002 boxes, and they have a different SoC in them (AR1022 and AR9342). I previously noticed a different revision number in /proc/cpuinfo and assumed that the other one (without problems) would be a v1 001, but today I removed it from our network and saw that it was also v1 002. Looking more closely, it seems that the PCB is fully identical (the only difference I could find is a bit of silkscreen on the right upside down that looks like it might be a timestamp and "P2-1" vs "P4-2").

This is the working v1 002 one, with AR9342-DL3A:

This is the broken v1 002 one, with AR1022-AL1A / PCS681.00B (just like @Dock's V1 002):

The 1 and 3 in AL1A and DL3A matches the revision shown in /proc/cpuinfo, so that might be related.

That's interesting - you have different version with both AR1022, I have two of the same version but with different chips, heh... I'm not sure if/how this helps yet, but it does suggest that we should not focus on the difference between AR1022/AR9342 too much, since there is a AR1022 chip that also works (or maybe something is operating on the edge of specifications and it just happens to work on some units but not on others...).

It also seems that your v1 001 version has a slightly different PCB color, my PCBs look more like your v1 002 one.

Otherwise your PCB looks completely identical (including silkscreen and which components are left out) for the part I can see on your picture. Could you maybe post one more picture of both entire PCBs, to compare the silkscreen version numbers (which fall off the bottom of your v1 001 picture)?

Nope, though a friend of mine mentioned they have a WLR-5100 (which is apparently identical to the WLR-4100 but with 5Ghz wifi added to an identical PCB).

Anyway, thanks for the input. I'm out of time for today, but will try to do some more tests soon (but I should also be doing other things, so might take me a bit more time).

Yeah, I already found that. I couldn't find the AR9342 datasheet, though. Do you know if that just wasn't leaked, or is the AR9342 and AR9344 both covered by that datasheet? I assume if not, that they are similar enough so most of the contents still applies, of course (though I noticed the docs for RST_REVISION_ID did not match the kernel sources - datasheet documents 0x011c1 for the AR9344, but the kernel sources use a different value - maybe the value was changed later or something...).

These are the same cores, they differ in CPU speed (MHz), supported wifi band. They could also differ in integrated peripherals, like integrated switch (not relevant in our case).

I want to check if registers value on all revisions differ somewhere. If both bare not present in vendor firmware, I'll attach busybox binary with ethreg compiled in.
Can You also check if switch chip differs between all Your devices?

For an example check: https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/ath79/dts/ar9342_iodata_etg3-r.dts;hb=HEAD#l127. Our dts is in the same dir.

Not me (also that's different SoC, newer generation).

I've taken my "working" AR9342 version, which I've only tested so far with an years old snapshot version, upgraded it to the latest snapshot version and confirmed it is still working. Then I did the same with my "broken" AR1022 version, and confirmed it is still broken. This is another confirmation that the problem is indeed caused by hardware differences.

The loss also happens on 100Mbit (my home switch does not do gbit).

The switch chip is the same - QCA8337N-AL3C on both my boards, and @Dock's v1 001 (I do not know about their v1 002 board).

I installed the stock firmware again, it has both ethreg and /dev/mem:

# ethreg -h
ethreg: option requires an argument -- h
usage: ethreg [-i ifname] [-p portnum] offset[=value]
usage: ethreg [-f]  -p portnum=10/100/0 [-d duplex]
usage: ethreg [-i ifname][-s value]
usage: ethreg [-i ifname][-j 0|1]
usage: ethreg [-i ifname][-h 0|1]
usage: ethreg [-i ifname][-p portnum] [-t mode]
# ls -l /dev/mem
crw-r--r--    1 0        15         1,   1 Dec  9  2015 /dev/mem

I also tried installing ethreg inside OpenWRT, but found that it does not seem to be available anywhere, I just found one github repo with a busybox version that still had ethreg.c. I guess it was never part of upstream busybox and is no longer used nowadays (I also couldn't find any reference to the IOCTL used in kernel sources, so I suspect that might have been WRT-specific patches in the past?). In any case - compiling it with prefix=mips-linux-gnu- worked, but then failed to run on OpenWRT with invalid instruction - probably missing some compiler option...

In any case - ethreg is available, but I couldn't quite figure out what the base register address for the tool is. I thought (looking at the AR9344 datasheet) maybe 0x18070000 (base address of GMAC registers), or maybe no offset (just full register addresses), but in both cases trying to read 0x18070004 (LUTs_AGER_INT) returns an incorrect value (bits 31:4 are reserved and should read 0):

Read Reg: 0x18070004 = 0x4b640200
# ethreg 0x4         
Read Reg: 0x00000004 = 0x87a00000

I also considered using /dev/mem, but could not find any usable tool to make a usable dump (there is tail, but not head, so that might be usable to make a binary dump, but then I do not have an easy way to get the binary file off the board through serial (stock firmware does not have telnet or ssh or netcat)

# dd
/bin/sh: dd: not found
# devmem
/bin/sh: devmem: not found
# hd
/bin/sh: hd: not found
# hexdump
/bin/sh: hexdump: not found
# ls /bin/
*        cp       egrep    ls       netstat  ps       sleep    umount
ash      date     grep     mkdir    nice     pwd      stty     uname
busybox  df       ip       mknod    pidof    rm       sync     vi
cat      dmesg    kill     mount    ping     sed      touch
chmod    echo     ln       mv       ping6    sh       true
# ls /sbin/
KC_SMB                 hotplug2               radvd
apcfg_init             httpd                  raether
app_agentd             ifconfig               rdisc6
app_agentsd            ifdown                 reboot
apps_init              ifup                   rmgmt
apps_init_ver.txt      igmpproxy              rmmod
appscore_init_ver.txt  init                   route
arp                    insmod                 rpcbind
autoFWupgrade          ip                     setconfig
brctl                  ip6tables              ssdk_sh
burn                   ippoold                starteth.sh
config_init            iptables               sysconf_cli
config_term            iwconfig               sysconfd
dhcp6c                 iwlist                 sysctl
dhcp6s                 iwpriv                 syslogd
dl.sh                  links.sh               tc
dnsget                 llmnrd                 udhcpc
dnsmasq                logserver              udhcpd
dr.sh                  lsmod                  umount
dumpleases             md                     updatedd
eraseall               minidlna               usbmgr
ethreg                 miniupnpd              usbmgr_cli
ez-ipupdate            mkfs.jffs2             utmconfig
factory_apps_init      mm                     utmproxy
flashw                 modprobe               uuidgen
gendoclist.sh          mount                  vconfig
genmuslist.sh          ntfs-3g                wandetector
genpiclist.sh          ntfslabel              wanmanager
genvidlist.sh          ntpclient              wanmanager_host_get
halt                   openl2tpd              watchdog
header                 opmode.sh              wget
hostapd                poweroff               wlanconfig
hostapd1               pppoe                  wolmanager
hostapd2               pptp
hostapd_cli            processmanager
# ls /usr/bin/
*            burnA        config_init  ipcs         tail         wc
[            burna        config_term  killall      test         which
[[           burnb        ether-wake   logger       tftp
awk          burnf        expr         md5sum       time
basename     burnk        find         printf       tty
burn         cmp          id           sort         uptime
# ls /usr/sbin
chat      poff      pon       pppd      pppd0     pppd1     pppstats

@tmn505 If you have suggestions on what offsets to pass to ethreg to inspect relevant registers, let me know.

I've also just soldered on a UART header on the AR9342 working board, so we can compare (see if the stock firmware uses different settings on both boards, or maybe just uses the same settings that work on both).

I can confirm the switch chip on my V1.002 is the same.

w00t! I went ahead and compiled a custom image with this patch (just to see if it would change anything - I copied these settings blindly from some other board), and it fixes the packet loss (tested just with ping so far) on both 100Mbps and 1Gbps:

--- a/target/linux/ath79/dts/ar1022_sitecom_wlr-7100.dts
+++ b/target/linux/ath79/dts/ar1022_sitecom_wlr-7100.dts
@@ -65,6 +65,10 @@ &eth0 {
        gmac-config {
                device = <&gmac>;
                rgmii-gmac0 = <1>;
+                rxdv-delay = <3>;
+                rxd-delay = <3>;
+                txen-delay = <3>;
+                txd-delay = <3>;
        };
 };

Of course, there is no indication that these are really the best settings (In git history I see other boards where they reverted these delays in favor of changing the phy-mode or pll-data), but at least it confirms that we're looking in the right place. So I guess still worth looking at the settings used by the stock firmware.

I also enabled /dev/mem in this image, so I can see the 4 times "3" settings in memory (the 0x3FC), bits 21:14:

# devmem 0x18070000
0x003FC001

I can also change them back to the default 0, which causes packet loss again:

# devmem 0x18070000 32 0x00000001

Fiddling around with these settings, I can see that only the ETH_RXD_DELAY seems to influence the packet loss seen with ping, and setting it to 1 is sufficient to fix it completely (at both 100Mbps and 1Gbps).

I haven't tested the other board yet.

I also noticed an RX_DELAY setting in the ETH_XMII_CONTROL register (which is what pll-data sets AFAICS). In the default ar934x.dtsi, it is set to 1, but changed to 0 for the wlr-7100. However, if I cange it back to 1, connectivity seems to break entirely, but maybe that's because they shouldn't be changed while the NIC is running :slight_smile:

I also noticed that the phy-mode can also be used to introduce a delay, which then configures the PHY (or in this case, the AR8337 which is sortof a virtual PHY since it is connected to the CPU using RGMII) to introduce a delay (which, looking at upstream linux qca8k-8xxx.c, sets a delay of "2", which I presume is 2ns). This would be triggered by setting phy_mode = "rgmii-rxid. I haven't tried this yet, since that would involve updating the DTS, so regenerating the image (or at least the kernel), which takes a while.

What is interesting, is that AFAIU both RX and TX need a delay somewhere (by default CLK and DATA are outputted synchronously, resulting in instabilities). So it seems that TX actually does have a delay somewhere, or is just lucky that it always works? I see that sometimes PCB traces can be laid out to introduce this delay (there are some squiglies in the traces between CPU and switch, but maybe that's just length matching), but then I would be surprised if they did that for TX but not for RX.

One question I had: How is this really relevant? The link between the switch and the CPU is always 1Gbps AFAICS, only the link between the switch and the outside world is different. And AFAICS both the pll-data as well as the delays only affect this link, right? Or does the pll-data also somehow drive the switch chip?