Netgear R7800 exploration (IPQ8065, QCA9984)


#1251

I think you misread my post. Instead of running iperf3 directly on the R7800 in client mode I ran it on my laptop connected to the R7800 via ethernet. I'm seeing low speeds while connecting to the same server as in the previous tests. Especially strange is the nosedive the speeds take towards the end of the last run.

I also ran the same test over WiFi with much better speeds and performance doesn't degrade as the test progresses:

[telia ~]$ iperf3 -c ping.online.net -p 5207 -R
Connecting to host ping.online.net, port 5207
Reverse mode, remote host ping.online.net is sending
[  5] local 192.168.1.204 port 33714 connected to 62.210.18.40 port 5207
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  6.10 MBytes  51.2 Mbits/sec                  
[  5]   1.00-2.00   sec  24.1 MBytes   202 Mbits/sec                  
[  5]   2.00-3.00   sec  29.5 MBytes   248 Mbits/sec                  
[  5]   3.00-4.00   sec  34.7 MBytes   291 Mbits/sec                  
[  5]   4.00-5.00   sec  38.4 MBytes   322 Mbits/sec                  
[  5]   5.00-6.00   sec  35.7 MBytes   300 Mbits/sec                  
[  5]   6.00-7.00   sec  38.7 MBytes   325 Mbits/sec                  
[  5]   7.00-8.00   sec  36.7 MBytes   308 Mbits/sec                  
[  5]   8.00-9.00   sec  39.0 MBytes   327 Mbits/sec                  
[  5]   9.00-10.00  sec  36.6 MBytes   307 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   326 MBytes   274 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   320 MBytes   268 Mbits/sec                  receiver

iperf Done.
[telia ~]$ iperf3 -c ping.online.net -p 5206 -R
Connecting to host ping.online.net, port 5206
Reverse mode, remote host ping.online.net is sending
[  5] local 192.168.1.204 port 45700 connected to 62.210.18.40 port 5206
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  5.85 MBytes  49.1 Mbits/sec                  
[  5]   1.00-2.00   sec  22.0 MBytes   185 Mbits/sec                  
[  5]   2.00-3.00   sec  29.0 MBytes   244 Mbits/sec                  
[  5]   3.00-4.00   sec  35.9 MBytes   301 Mbits/sec                  
[  5]   4.00-5.00   sec  38.4 MBytes   322 Mbits/sec                  
[  5]   5.00-6.00   sec  37.6 MBytes   316 Mbits/sec                  
[  5]   6.00-7.00   sec  38.9 MBytes   326 Mbits/sec                  
[  5]   7.00-8.00   sec  36.6 MBytes   307 Mbits/sec                  
[  5]   8.00-9.00   sec  39.0 MBytes   327 Mbits/sec                  
[  5]   9.00-10.00  sec  36.9 MBytes   310 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   328 MBytes   275 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   320 MBytes   269 Mbits/sec                  receiver

iperf Done.
[telia ~]$ iperf3 -c ping.online.net -p 5206 -R
Connecting to host ping.online.net, port 5206
Reverse mode, remote host ping.online.net is sending
[  5] local 192.168.1.204 port 45706 connected to 62.210.18.40 port 5206
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  7.41 MBytes  62.1 Mbits/sec                  
[  5]   1.00-2.00   sec  23.5 MBytes   197 Mbits/sec                  
[  5]   2.00-3.00   sec  31.5 MBytes   264 Mbits/sec                  
[  5]   3.00-4.00   sec  37.9 MBytes   318 Mbits/sec                  
[  5]   4.00-5.00   sec  37.2 MBytes   312 Mbits/sec                  
[  5]   5.00-6.00   sec  38.6 MBytes   324 Mbits/sec                  
[  5]   6.00-7.00   sec  38.6 MBytes   324 Mbits/sec                  
[  5]   7.00-8.00   sec  37.6 MBytes   315 Mbits/sec                  
[  5]   8.00-9.00   sec  38.9 MBytes   326 Mbits/sec                  
[  5]   9.00-10.00  sec  35.2 MBytes   296 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   334 MBytes   280 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   326 MBytes   274 Mbits/sec                  receiver

iperf Done.

#1252

Do you have eth1 and wlan0 on different cores by setting their affinity? Maybe IPC becomes a bottleneck


#1253

Latest master here with 4.14.51 patch applied, there must be some congestion on the link between me and ping.online.net, but the french iperf3 server I sometimes use is showing rock steady - so it looks like 4.14.51 definitely fixes it.

root@router01:~# iperf3 -c bouygues.iperf.fr -R -p 5207
Connecting to host bouygues.iperf.fr, port 5207
Reverse mode, remote host bouygues.iperf.fr is sending
[  5] local xxx.xxx.xxx.xxx port 52560 connected to 89.84.1.222 port 5207
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  39.0 MBytes   327 Mbits/sec
[  5]   1.00-2.00   sec  45.8 MBytes   384 Mbits/sec
[  5]   2.00-3.00   sec  45.8 MBytes   384 Mbits/sec
[  5]   3.00-4.00   sec  45.7 MBytes   384 Mbits/sec
[  5]   4.00-5.00   sec  45.7 MBytes   384 Mbits/sec
[  5]   5.00-6.00   sec  45.8 MBytes   384 Mbits/sec
[  5]   6.00-7.00   sec  45.7 MBytes   384 Mbits/sec
[  5]   7.00-8.00   sec  45.8 MBytes   384 Mbits/sec
[  5]   8.00-9.00   sec  45.8 MBytes   384 Mbits/sec
[  5]   9.00-10.00  sec  45.8 MBytes   384 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   454 MBytes   381 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   451 MBytes   378 Mbits/sec                  receiver

iperf Done.
root@router01:~# iperf3 -c ping.online.net -R -p 5207
Connecting to host ping.online.net, port 5207
Reverse mode, remote host ping.online.net is sending
[  5] local xxx.xxx.xxx.xxx port 58858 connected to 62.210.18.40 port 5207
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  6.31 MBytes  52.9 Mbits/sec
[  5]   1.00-2.00   sec  16.8 MBytes   141 Mbits/sec
[  5]   2.00-3.00   sec  23.4 MBytes   196 Mbits/sec
[  5]   3.00-4.00   sec  31.1 MBytes   261 Mbits/sec
[  5]   4.00-5.00   sec  45.8 MBytes   384 Mbits/sec
[  5]   5.00-6.00   sec  45.5 MBytes   382 Mbits/sec
[  5]   6.00-7.00   sec  45.8 MBytes   384 Mbits/sec
[  5]   7.00-8.00   sec  43.2 MBytes   362 Mbits/sec
[  5]   8.00-9.00   sec  45.6 MBytes   383 Mbits/sec
[  5]   9.00-10.00  sec  45.8 MBytes   384 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   354 MBytes   297 Mbits/sec  189             sender
[  5]   0.00-10.00  sec   349 MBytes   293 Mbits/sec                  receiver

Running single threaded over lan is a bit slower, and it would be good to get to the bottom of that, but a parallel test does still max out from LAN side at 384Mbps.


#1254

Good call! By default eth0 and eth1 have an smp_affinity of 3, but by changing both of them to 2 I'm able to get much better speeds:

[michael@telia ~]$ iperf3 -c ping.online.net -p 5208 -R
Connecting to host ping.online.net, port 5208
Reverse mode, remote host ping.online.net is sending
[  5] local 192.168.1.2 port 50574 connected to 62.210.18.40 port 5208
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  10.2 MBytes  85.2 Mbits/sec                  
[  5]   1.00-2.00   sec  27.9 MBytes   234 Mbits/sec                  
[  5]   2.00-3.00   sec  25.8 MBytes   217 Mbits/sec                  
[  5]   3.00-4.00   sec  28.7 MBytes   241 Mbits/sec                  
[  5]   4.00-5.00   sec  29.0 MBytes   243 Mbits/sec                  
[  5]   5.00-6.00   sec  29.3 MBytes   245 Mbits/sec                  
[  5]   6.00-7.00   sec  30.2 MBytes   253 Mbits/sec                  
[  5]   7.00-8.00   sec  28.6 MBytes   240 Mbits/sec                  
[  5]   8.00-9.00   sec  30.3 MBytes   254 Mbits/sec                  
[  5]   9.00-10.00  sec  28.6 MBytes   240 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   273 MBytes   229 Mbits/sec  230             sender
[  5]   0.00-10.00  sec   268 MBytes   225 Mbits/sec                  receiver

iperf Done.

Speeds over WiFi are a bit lower than what I previously saw, but it's a trade-off I'm more than happy to live with:

[michael@telia ~]$ iperf3 -c ping.online.net -p 5207 -R
Connecting to host ping.online.net, port 5207
Reverse mode, remote host ping.online.net is sending
[  5] local 192.168.1.204 port 38480 connected to 62.210.18.40 port 5207
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  7.35 MBytes  61.7 Mbits/sec                  
[  5]   1.00-2.00   sec  27.8 MBytes   233 Mbits/sec                  
[  5]   2.00-3.00   sec  27.7 MBytes   233 Mbits/sec                  
[  5]   3.00-4.00   sec  25.8 MBytes   217 Mbits/sec                  
[  5]   4.00-5.00   sec  26.1 MBytes   219 Mbits/sec                  
[  5]   5.00-6.00   sec  27.7 MBytes   233 Mbits/sec                  
[  5]   6.00-7.00   sec  28.7 MBytes   241 Mbits/sec                  
[  5]   7.00-8.00   sec  27.7 MBytes   232 Mbits/sec                  
[  5]   8.00-9.00   sec  29.3 MBytes   246 Mbits/sec                  
[  5]   9.00-10.00  sec  23.8 MBytes   200 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   257 MBytes   215 Mbits/sec  171             sender
[  5]   0.00-10.00  sec   252 MBytes   212 Mbits/sec                  receiver

iperf Done.

I kept both wireless interfaces on cpu0 (i.e affinity 1).


#1255

enable flowoffload,with pppoe,some thing cannot send out to network.
when i disable flowoffload,everything works.


#1256

Upgraded to latest @hnyman master firmware...

root@OpenWrt:~# cat /etc/banner

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt SNAPSHOT, r7314-c4aadbdaf6
 -----------------------------------------------------

root@OpenWrt:~# cat /proc/version
Linux version 4.14.50 (perus@ub1804) (gcc version 7.3.0 (OpenWrt GCC 7.3.0 r6781-2c192b6916)) #0 SMP Mon Jun 25 14:53:31 2018

... and noticed something strange:

root@OpenWrt:~# dmesg|grep '\-0x000'
[ 1.062810] 0x000000000000-0x000000c80000 : "qcadata"
[ 1.090125] 0x000000c80000-0x000001180000 : "APPSBL"
[ 1.099408] 0x000001180000-0x000001200000 : "APPSBLENV"
[ 1.101037] 0x000001200000-0x000001340000 : "art"
[ 1.106345] 0x000001340000-0x000001480000 : "artbak"
[ 1.111165] 0x000001480000-0x000001880000 : "kernel"
[ 1.120900] 0x000001880000-0x000007900000 : "ubi"
[ 1.286140] 0x000007900000-0x000008000000 : "reserve"
[ 1.298788] 0x000001480000-0x000007900000 : "firmware"
[ 2.435827] 0x000001480000-0x000001880000 : "kernel" <-- Seems to be redundant.
[ 2.449350] 0x000001880000-0x000007900000 : "ubi" <------ Seems to be redundant.

root@OpenWrt:~# cat /proc/mtd
dev: size erasesize name
mtd0: 00c80000 00020000 "qcadata"
mtd1: 00500000 00020000 "APPSBL"
mtd2: 00080000 00020000 "APPSBLENV"
mtd3: 00140000 00020000 "art"
mtd4: 00140000 00020000 "artbak"
mtd5: 00400000 00020000 "kernel"
mtd6: 06080000 00020000 "ubi"
mtd7: 00700000 00020000 "reserve"
mtd8: 06480000 00020000 "firmware"
mtd9: 00400000 00020000 "kernel" <-- Seems to be redundant.
mtd10: 06080000 00020000 "ubi" <-----Seems to be redundant.

Not likely to cause any problems.


Build for Netgear R7800
#1257

What doesn't work with flow offloading? We really need more details if we want to find this bug.


#1258

Those were not there in May (when I looked at the netgear partition splitting). Must be a recent regression/change in the mtd partition recognition.


#1259

I am using qca8k,is it not compatible?


#1260

As long as you are running kernel 4.14, software flow offloading should be compatible. Again, what isn't working exactly for you?


#1261

almost upload,including speedtest.net uploading test


#1262

More complete log of the "new" MTD partitions:

[    1.048426] 9 fixed-partitions partitions found on MTD device qcom_nand.0
[    1.055861] Creating 9 MTD partitions on "qcom_nand.0":
[    1.062716] 0x000000000000-0x000000c80000 : "qcadata"
[    1.089741] 0x000000c80000-0x000001180000 : "APPSBL"
[    1.098876] 0x000001180000-0x000001200000 : "APPSBLENV"
[    1.100482] 0x000001200000-0x000001340000 : "art"
[    1.105764] 0x000001340000-0x000001480000 : "artbak"
[    1.110595] 0x000001480000-0x000001880000 : "kernel"
[    1.120242] 0x000001880000-0x000007900000 : "ubi"
[    1.282232] 0x000007900000-0x000008000000 : "reserve"
[    1.294766] 0x000001480000-0x000007900000 : "firmware"
[    1.465539] no rootfs found after FIT image in "firmware"
[    2.421008] 2 uimage-fw partitions found on MTD device firmware
[    2.421078] 0x000001480000-0x000001880000 : "kernel"
[    2.434540] 0x000001880000-0x000007900000 : "ubi"

My hunch is that it is some automatic new logic that tries to split the "firmware" again, although the components (kernel, ubi) are already visible in DTS as themselves. Likely harmless.
Possibly caused by mtd changes in mid-May or something in upstream kernel.


#1263

Looks like the double partitions may actually cause trouble... :frowning:

Log from serial console when tyring to sysupgrade from r7314:

Sending KILL to remaining processes ...
Switching to ramdisk...
[  153.290095] UBIFS (ubi0:1): background thread "ubifs_bgt0_1" stops
[  153.398774] UBIFS (ubi0:1): un-mount UBI device 0
Performing system upgrade...
Unlocking kernel ...

Writing from <stdin> to kernel ...
ubiattach: error!: strtoul: unable to parse the number '6 mtd10'
ubiattach: error!: bad MTD device number: "6 mtd10"
ubiformat: error!: more then one MTD device specified (use -h for help)
ubiattach: error!: strtoul: unable to parse the number '6 mtd10'
ubiattach: error!: bad MTD device number: "6 mtd10"
libubi: error!: "/dev/" is not a character device
ubimkvol: error!: error while probing "/dev/"
          error 22 (Invalid argument)
cannot create rootfs volume
libubi: error!: "/dev/" is not a character device
ubiupdatevol: error!: error while probing "/dev/"
              error 22 (Invalid argument)
tar: write error: Broken pipe
mount: mounting /dev/ on /tmp/new_root failed: Invalid argument
mounting ubifs  failed
sysupgrade successful
umount: can't unmount /dev: Resource busy
umount: can't unmount /tmp: Resource busy
[  154.913143] reboot: Restarting system

Looks like the ubiattach gets "6mtd10" as argument, while it should get "6".
Likely there is "mtd6 mtd10" initially, and the first mtd gets stripped away, but the second word is unexpected. That is generated by some query to the mtd structure.

Question:
Is the "double MTD partitions seen in other builds than my build?" There should be nothing special related to mtd in my master build, so I assume that the same phenomenom should appear also in the buildbot snapshot and other recent master builds.


#1264

The double MTD kernel & ubi partitions is a recent base system bug.

I tested with today's buildbot snapshot, my own master from two days ago, and my master from last week:

  • double partition in buildbot snapshot r7344 and my master r7314
  • normal situation in my build r7275 from last week.

Looks like a major regression in master a few days ago.
The issue likely hits also some other routers.

https://bugs.openwrt.org/index.php?do=details&task_id=1617

EDIT:

The culprit may be:

+CONFIG_MTD_SPLIT_UIMAGE_FW=y

in the commit in the regression range:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=4645a6d3183ad27ab1ca74ed82495ac8f6019331
"ipq806x: add support for NEC Aterm WG2600HP"


#1265

this could explain the problem i had with the leds not showing up with a sysupgrade from a build before the leds implementation.


#1266

I built a test build of r7344 with "CONFIG_MTD_SPLIT_UIMAGE_FW=y" reverted in target/linux/ipq806x/config-4.14, and that fixes sysupgrade again.


#1267

tftp flash is needed right?


#1268

For those that care, @dissent1 's unmerged patched for kernel 4.14: https://github.com/neheb/source/tree/ipq8065


#1269

My "large flash" patch was backported yesterday to 18.06, so both master and 18.06 now use also the old "netgear" partition, so it will be easier to jump between master and 18.06 builds in future. Coming from 17.01 will require TFTP in any case.

Note that 18.06.0-rc1 has been compiled with the old traditional small flash space, so upgrading from that will require TFTP. But as sysupgrading from 18.06.0-rc1 is semi-broken in any case (due to the double ubi partition detection with ipq806x Netgear routers), TFTP will likely be needed in any case...

My patch for fixing ipq806x partition detection (remove "firmware" and thus the double ubi & kernel from mtd) was also merged yesterday to master and 18.06, so hopefully both are back on the normal sysupgrade path today.

Like said earlier, sysupgrading from master and 18.06 builds of 19-27 June (including rc1) for R7800 may be impossible. With 18.06 I got sysupgrade to succeed on the second try for some reason, but with master I always needed TFTP. Be prepared for that. (The bug described in FS#1617 affected at least R7800, R7500, R7500v2 and D7800, which all should be fixed now.)


#1270

r7344 from june 27th is ok :slight_smile: